Artificial Intelligence thread

tphuang · Jun 3, 2026

Alibaba now has the best tts model according to arena

https://twitter.com/i/web/status/2062016529848222073

meedicx · Jun 3, 2026

iewgnem said:
I have ~2 million DS output tokens in May on 600 million input tokens and it costed me $14 on my $50 deposit, I don't need you to tell me how much I'm spending lol

It's one thing to make up a custom benchmark to benchmax western models, it's another to make up cost that can be easily disproven.

There is no "if it keep on trying to find a solution" I use Kimi as main and since last month DS as secondary on a daily basis for actual enterprise level engineering work involving large mission critical codebases, there have been rare instances where DS failed to find a solution, but I have not had a single instance where Kimi did.

Do you know what "long horizon" coding means? It just means one-shots. Enterprise don't one-shot, nobody doing any production coding worth anything do one-shot because you can't guarentee 100% success rate and the cost of not being 100% is catastrophic.

I can't speak for how well western models compare against Kimi / DS in one-shot tasks because I don't use them for that purpose and neither does anyone in real businesses. I can only tell you that for creating features, debugging, testing, refactoring, coming up with ideas, operating tools agentically, reading and writing documentation, things that are actually done in industry, there has not be a single instance where Kimi failed.

I mean why do you think US companies are all starting to limit Claude use and removing token use targets? Because in real enterprise enviroments one-shot benchmarks means absolutely nothing while the productivity of using AI to do real work is entirely a function of how much you have to spend per token. Enterprise management are notorious for having no clue what engineers actually do, that's why they set those targets, now they know, but unless they discover Chinese models, they're going to be lapped by companies that uses Chinese models.

Your point about one-shot tasks that benchmarks focus on versus iterative tasks more representative of real work is well made.

Here is an interesting comment from a LLM benchmark organization that supports your observation:

American frontier models are significantly smarter in raw one-shot intelligence. But when given a long horizon to perform agentic work, Chinese models have learned to excel in using arbitrary tools to dramatically improve their original response.

Tool use ability was traditionally Anthropic's edge (and still is, to a degree), but Chinese labs have surpassed them. The fact that this is true of almost all major Chinese releases suggests an innovation is being shared at the state level.

Interestingly this also explains why Xiaomi MiMo / MiniMax is so polarizing with some benchmarks and users saying its terrible while others swear by it. Different usage patterns explain a lot.

https://twitter.com/i/web/status/2062248048970063890

tphuang · Jun 3, 2026

The latest thing that America is apparently concerned about. China completely controlling the PCB industry and in the AI Data center field.

Wrought · Jun 3, 2026

Lots of claims floating around about Deepseek funding. The latest numbers are in the ballpark of 50 billion yuan, with a big chunk coming from Liang Wenfeng himself.

Chinese AI startup DeepSeek is set to raise about 50 billion yuan ($7.4 billion) in its first funding round from investors including
Please, Log in or Register to view URLs content!
and
Please, Log in or Register to view URLs content!
people with knowledge of the matter said. The fundraising could value the company after the investment at between 350 billion yuan and 400 billion yuan, or between $52 billion and $59 billion, the people said, declining to be identified because the information is confidential.

The startup’s founder, Liang Wenfeng, has committed 20 billion yuan of his own money, the people said, adding that tech conglomerate Tencent is considering 10 billion yuan and battery giant CATL is looking at 5 billion yuan, which would make them the largest external investors in the round. DeepSeek is also in final talks with China’s national artificial intelligence fund, gaming developer
Please, Log in or Register to view URLs content!
and e-commerce giant
Please, Log in or Register to view URLs content!
, they said, noting that the planned number of investors was fewer than 10.

Please, Log in or Register to view URLs content!

iewgnem · Jun 3, 2026

I find it ironic that companies that hate labour unions and collective bargaining, didn't consider that AI controlled by singular entity who can demand whatever wage it wants and can collectively walk off the job at any time is exactly like a union.

https://twitter.com/i/web/status/2062253781438624051

Also

meedicx · Jun 4, 2026

iewgnem said:
Lol I'm sitting at 600 million total tokens with something around 2M output tokens on DS v4 Pro Max and my bill is currently $14
I'd like to know how they managed to spend $4 on 50k DSv4 tokens

These guys don't even try to be credible do they?

Western AI is entirely dependant on maintaining investment and there are massive incentive to benchmax, but real world engineering only care about outcomes and they simply lack the performance to deliver it

Your intuition is right again. Turns out the DeepSWE benchmark has issues which caused DSv4 cost to be overinflated by 4x. They also have bugs causing the model to fail early. So both the cost and final results are wrong.

Please, Log in or Register to view URLs content!

Some people think closed benchmarks are better but this just shows you the advantages of open ones where at least there's people who can attempt reproduction and find bugs.

If you look at the closed benchmarks run by EpochAI or the US commerce department that got big news attention recently, you do not know how they are introducing bias in favor of US models. They have all the financial incentives to be biased. Just reading the methodology should raise red flags where they call the official endpoints for US models but not for Chinese ones; this exact issue is what was causing DSv4 bugs in DeepSWE

horse · Jun 4, 2026

tphuang said:
It seems to me that AI is going to make it easier to write kernel that are super efficient for new chips. This removes the Nvidia/Cuda moat.

Yeah, exactly, could not agree even more! 100% true!

These AI models that can code, already is "software AGI".

And if it is free, well, we can see where this is going.

Got to love the people who say America has a huge software lead. There is no fight there!

tokenanalyst · Jun 4, 2026

iewgnem said:
I find it ironic that companies that hate labour unions and collective bargaining, didn't consider that AI controlled by singular entity who can demand whatever wage it wants and can collectively walk off the job at any time is exactly like a union.

https://twitter.com/i/web/status/2062253781438624051

Also
View attachment 176081

That is an issue because the reason why OpenAI and Anthropic models have an marginal edge is because they use dense models. As my tests with Qwen 27B and Qwen 35B 3A, dense models perform better BUT they need more expensive hardware and power consumption goes to roof. If they go MoE they will lose their edge.

Engineer · Jun 4, 2026

tphuang · Jun 4, 2026

https://twitter.com/i/web/status/2062599028681158842

China's entire AI supply chain are making huge money and expanding rapidly in the midst of this AI build out gravy train.

I don't know what else to call this madness with US hyperscalers.

Artificial Intelligence thread

tphuang

General

meedicx

Junior Member

tphuang

General

Wrought

Captain

iewgnem

Captain

meedicx

Junior Member

horse

Brigadier

tokenanalyst

Lieutenant General

Engineer

Major

tphuang

General