Artificial Intelligence thread

meedicx

Junior Member
Registered Member
I have ~2 million DS output tokens in May on 600 million input tokens and it costed me $14 on my $50 deposit, I don't need you to tell me how much I'm spending lol

It's one thing to make up a custom benchmark to benchmax western models, it's another to make up cost that can be easily disproven.

There is no "if it keep on trying to find a solution" I use Kimi as main and since last month DS as secondary on a daily basis for actual enterprise level engineering work involving large mission critical codebases, there have been rare instances where DS failed to find a solution, but I have not had a single instance where Kimi did.

Do you know what "long horizon" coding means? It just means one-shots. Enterprise don't one-shot, nobody doing any production coding worth anything do one-shot because you can't guarentee 100% success rate and the cost of not being 100% is catastrophic.

I can't speak for how well western models compare against Kimi / DS in one-shot tasks because I don't use them for that purpose and neither does anyone in real businesses. I can only tell you that for creating features, debugging, testing, refactoring, coming up with ideas, operating tools agentically, reading and writing documentation, things that are actually done in industry, there has not be a single instance where Kimi failed.

I mean why do you think US companies are all starting to limit Claude use and removing token use targets? Because in real enterprise enviroments one-shot benchmarks means absolutely nothing while the productivity of using AI to do real work is entirely a function of how much you have to spend per token. Enterprise management are notorious for having no clue what engineers actually do, that's why they set those targets, now they know, but unless they discover Chinese models, they're going to be lapped by companies that uses Chinese models.

Your point about one-shot tasks that benchmarks focus on versus iterative tasks more representative of real work is well made.

Here is an interesting comment from a LLM benchmark organization that supports your observation:
American frontier models are significantly smarter in raw one-shot intelligence. But when given a long horizon to perform agentic work, Chinese models have learned to excel in using arbitrary tools to dramatically improve their original response.

Tool use ability was traditionally Anthropic's edge (and still is, to a degree), but Chinese labs have surpassed them. The fact that this is true of almost all major Chinese releases suggests an innovation is being shared at the state level.

Interestingly this also explains why Xiaomi MiMo / MiniMax is so polarizing with some benchmarks and users saying its terrible while others swear by it. Different usage patterns explain a lot.

 
Last edited:

Wrought

Captain
Registered Member
Lots of claims floating around about Deepseek funding. The latest numbers are in the ballpark of 50 billion yuan, with a big chunk coming from Liang Wenfeng himself.

Chinese AI startup DeepSeek is set to raise about 50 billion yuan ($7.4 billion) in its first funding round from investors including
Please, Log in or Register to view URLs content!
and
Please, Log in or Register to view URLs content!
people with knowledge of the matter said. The fundraising could value the company after the investment at between 350 billion yuan and 400 billion yuan, or between $52 billion and $59 billion, the people said, declining to be identified because the information is confidential.

The startup’s founder, Liang Wenfeng, has committed 20 billion yuan of his own money, the people said, adding that tech conglomerate Tencent is considering 10 billion yuan and battery giant CATL is looking at 5 billion yuan, which would make them the largest external investors in the round. DeepSeek is also in final talks with China’s national artificial intelligence fund, gaming developer
Please, Log in or Register to view URLs content!
and e-commerce giant
Please, Log in or Register to view URLs content!
, they said, noting that the planned number of investors was fewer than 10.

Please, Log in or Register to view URLs content!
 
Top