It seems to me that AI is going to make it easier to write kernel that are super efficient for new chips. This removes the Nvidia/Cuda moat.
I have ~2 million DS output tokens in May on 600 million input tokens and it costed me $14 on my $50 deposit, I don't need you to tell me how much I'm spending lol
It's one thing to make up a custom benchmark to benchmax western models, it's another to make up cost that can be easily disproven.
There is no "if it keep on trying to find a solution" I use Kimi as main and since last month DS as secondary on a daily basis for actual enterprise level engineering work involving large mission critical codebases, there have been rare instances where DS failed to find a solution, but I have not had a single instance where Kimi did.
Do you know what "long horizon" coding means? It just means one-shots. Enterprise don't one-shot, nobody doing any production coding worth anything do one-shot because you can't guarentee 100% success rate and the cost of not being 100% is catastrophic.
I can't speak for how well western models compare against Kimi / DS in one-shot tasks because I don't use them for that purpose and neither does anyone in real businesses. I can only tell you that for creating features, debugging, testing, refactoring, coming up with ideas, operating tools agentically, reading and writing documentation, things that are actually done in industry, there has not be a single instance where Kimi failed.
I mean why do you think US companies are all starting to limit Claude use and removing token use targets? Because in real enterprise enviroments one-shot benchmarks means absolutely nothing while the productivity of using AI to do real work is entirely a function of how much you have to spend per token. Enterprise management are notorious for having no clue what engineers actually do, that's why they set those targets, now they know, but unless they discover Chinese models, they're going to be lapped by companies that uses Chinese models.
American frontier models are significantly smarter in raw one-shot intelligence. But when given a long horizon to perform agentic work, Chinese models have learned to excel in using arbitrary tools to dramatically improve their original response.
Tool use ability was traditionally Anthropic's edge (and still is, to a degree), but Chinese labs have surpassed them. The fact that this is true of almost all major Chinese releases suggests an innovation is being shared at the state level.