Artificial Intelligence thread

Eventine · Jun 8, 2026

iewgnem said:
This again? The bench that claims DS cost $4 for 50k tokens? Yeah we know western models are desperate to keep their scam up, but they could use more effort when faking their benches.

Or maybe they used their own models to fake the bench, which would make it even more ironic.

Please, Log in or Register to view URLs content!

You should read the update, he made a mistake in his evaluation of the benchmark. Deep Seek is unlikely to be much better than 10%. It is a fact that Western frontier models are superior on difficult tasks from a raw capability perspective; the question is whether the costs of running them are worth the value being delivered as the vast majority of productivity gains aren’t from the hardest problems & productivity gains don’t necessarily translate to higher $$$ either in the software industry.

There is an impending crisis in the agentic AI industry, but it is not because Chinese frontier models are challenging Western frontier models on raw capability. It is because Western hyper scalers and their investors are being absolutely taken for a ride by hardware makers, and the return on value on raw capability is rapidly diminishing. Expect to see a shift, soon, to more "efficient" and "cost effective" models on the Western side, as well; then we will see if China can truly squeeze them out of the game.

iewgnem · Jun 8, 2026

Eventine said:
You should read the update, he made a mistake in his evaluation of the benchmark. Deep Seek is unlikely to be much better than 10%. It is a fact that Western frontier models are superior on difficult tasks from a raw capability perspective; the question is whether the costs of running them are worth the value being delivered as the vast majority of productivity gains aren’t from the hardest problems & productivity gains don’t necessarily translate to higher $$$ either in the software industry.

There is an impending crisis in the agentic AI industry, but it is not because Chinese frontier models are challenging Western frontier models on raw capability. It is because Western hyper scalers and their investors are being absolutely taken for a ride by hardware makers, and the return on value on raw capability is rapidly diminishing. Expect to see a shift, soon, to more "efficient" and "cost effective" models on the Western side, as well; then we will see if China can truly squeeze them out of the game.

The update does not change the fact the bench set thinking level to null and faked pricing metric, which discredit the entire exercise.

I won't argue DS isn't the most capable model, I use Kimi when that's needed.

But the idea that western models are more capable on "difficult tasks" is an entirely made up scenario, because the nature of "difficult tasks" (which are mostly just one-shot tasks), is that by being difficult it's also difficult to manually fix or check, which means anything short of 100% success rate become practically useless, and they're not at 100% success.

Actual ability to solve difficult tasks require the ability to do things step by step, iterate, test, use tools and fix problems, all of which western model are inferior in at constant budget.

I know this directly because I do use them to solve difficult tasks involving large systems and I've been able to do so far better than if I were still capped by Claude usage.

The inability to actually solve difficult tasks outside of bench where you can immediately evaluate outcome and all you care is a % success is why western models are failing in enterprise use.

HighGround · Jun 8, 2026

Eventine said:
There is an impending crisis in the agentic AI industry, but it is not because Chinese frontier models are challenging Western frontier models on raw capability. It is because Western hyper scalers and their investors are being absolutely taken for a ride by hardware makers, and the return on value on raw capability is rapidly diminishing. Expect to see a shift, soon, to more "efficient" and "cost effective" models on the Western side, as well; then we will see if China can truly squeeze them out of the game.

Nobody is forcing them to pay these exorbitant prices. Fact of the matter is, Western AI labs wanted more compute, as opposed to learning to do more with less. Which is fine, but that's what begets a bidding war. The underlying evil is that it's not even their money. Anthropic, OpenAI, Meta, and xAI all decided to pour gasoline on the pile of cash gifted to them by investors, all in the hopes of reaching the mythical AGI point that will, somehow, deliver them to the money printer.

At least Meta uses its own money, but the others all spend other people's money. Note the Google, Amazon, and Microsoft were all much more conservative and were actively pursuing cost reduction.

Anyway. Yeah people need to give it a rest. More compute and more tokens will produce better AI outputs. However, the Chinese models are actually viable as businesses. They will generate returns on CAPEX far faster than Anthropic or OpenAI. I have no idea how or why investors justified signing over money to a company that wants to spend a trillion dollars in CAPEX over the next few years. The only viable business strategy here is to simply sell your shares to some other sucker before the bubble collapses into itself.

AI uses all the same tactics that the Crypto/NFT bubble did a few years ago. It's a reality warping bubble over there.

gpt · Jun 9, 2026

Michael90 said:
Im
Even surprised US closed paid models were so dominant compared to Chinese open sourced mostly free/cheap models. I don’t get why many people favored US MODELS so much . Afterall , who wouldn’t prefer something that’s almost free compared to something you have to pay a lot for?

Palantir realized early on that selling raw software platforms or data infrastructure was a race to the bottom. Instead, their strategy was to embed themselves so deeply into the core operating workflows of governments and enterprise that extracting them becomes nearly impossible. They sell 'mission-critical integration' (Maven, Gotham) rather than just software. Anthropic is pulling from that exact playbook to survive the commoditization of AI.

tokenanalyst · Jun 9, 2026

gpt said:
Palantir realized early on that selling raw software platforms or data infrastructure was a race to the bottom. Instead, their strategy was to embed themselves so deeply into the core operating workflows of governments and enterprise that extracting them becomes nearly impossible.

That's creepy as hell.

Eventine · Jun 10, 2026

So, Anthropic launched Mythos / Fable 5 today. The benchmarks put it at ~65 in Artificial Intelligence, which is about a +5 lead on the previous state of the art (Opus 4.8) and nearest competitor (GPT 5.5), but what I found more interesting was this:

Looks like Anthropic is actively trying to stifle competitors from using their models to accelerate their own model development by making the model quietly and intentionally direct you in the wrong direction. This is an interesting (and rather ruthless) move against competing labs and further secures their reputation as a closed lab with no intention of sharing with the world, and every intention of monopolizing intelligence capabilities for their own purposes. As always, Western labs can't seem to avoid mask off moments.

It is also a reminder of the importance of AI sovereignty and independence, as the alternative of relying on the US - as the likes of Europe, Japan, South Korea, etc. are doing - means giving them the right to keep you permanently behind via quietly sabotaging the capabilities of models available to you. The sooner the world is rid of the West's dominance of frontier coding AI, the better.

tamsen_ikard · Jun 10, 2026

Eventine said:
So, Anthropic launched Mythos / Fable 5 today. The benchmarks put it at ~65 in Artificial Intelligence, which is about a +5 lead on the previous state of the art (Opus 4.8) and nearest competitor (GPT 5.5), but what I found more interesting was this:

View attachment 176424

Looks like Anthropic is actively trying to stifle competitors from using their models to accelerate their own model development by making the model quietly and intentionally direct you in the wrong direction. This is an interesting (and rather ruthless) move against competing labs and further secures their reputation as a closed lab with no intention of sharing with the world, and every intention of monopolizing intelligence capabilities for their own purposes. As always, Western labs can't seem to avoid mask off moments.

It is also a reminder of the importance of AI sovereignty and independence, as the alternative of relying on the US - as the likes of Europe, Japan, South Korea, etc. are doing - means giving them the right to keep you permanently behind via quietly sabotaging the capabilities of models available to you. The sooner the world is rid of the West's dominance of frontier coding AI, the better.

All the distillers need to do is create 100s of accounts and ask questions breaking patterns, switch the way they ask those questions.

If anthropic tries too much to make bad answers, it is likely that it will also provide bad answers for legitimate users, which will make the model less useful and they will lose business.

Stopping distillation is impossible

9dashline · Jun 10, 2026

Tested Fable 5 (Mythos with guardrails for public) extensively... not impressed.... while it is better than Opus 4.8 and for sure probably the top model right now, the actual capabilities in real world use are still very very far from any sort of "AGI" and/or "ASI" lets put it that way. and I tested both the web and claude code versions and on MAX intelligent with the 1 million context etc etc. It still couldn't do some simple bug fixes and I still had to hand hold, basically its no where near the level of intelligence that these hyperscalers would need it to be at in order to justify their current IPO valuations... plus on top of capabilities lacking for all the hype it was hyped up to be, every other innocent thing was blocked due to an overaggressive filter. In one instance I started a new chat and pointed it to the anthrophic's own public news announcement and gave it anthrophic's own url link and immediately the chat got paused/halted etc... and even in claude code, things like academic "Pi calculator" stuff that worked under Opus, when continued under Fable, got false flagged as malicious etc etc... basically they added the gating that in its current form makes it less than useless.

They also don't plan to let subscribers use it from their subscription plan after the 22nd June, instead everyone will have to additionally fork out extra credit spends at the rate of $50 bucks per million output tokens, which since the whole strength of Fable is long horizon agentic/autonomous operations, kind of defeats the point since most people won't be able to afford to economically use Fable for under its intended use cases.

Frankly, I've been running the numbers some more and I think all the US hyperscalers are fucked.

https://www.reddit.com/r/LocalLLaMA/comments/1u1s2oz

bsdnf · Jun 10, 2026

9dashline said:
Tested Fable 5 (Mythos with guardrails for public) extensively... not impressed.... while it is better than Opus 4.8 and for sure probably the top model right now, the actual capabilities in real world use are still very very far from any sort of "AGI" and/or "ASI" lets put it that way. and I tested both the web and claude code versions and on MAX intelligent with the 1 million context etc etc. It still couldn't do some simple bug fixes and I still had to hand hold, basically its no where near the level of intelligence that these hyperscalers would need it to be at in order to justify their current IPO valuations... plus on top of capabilities lacking for all the hype it was hyped up to be, every other innocent thing was blocked due to an overaggressive filter. In one instance I started a new chat and pointed it to the anthrophic's own public news announcement and gave it anthrophic's own url link and immediately the chat got paused/halted etc... and even in claude code, things like academic "Pi calculator" stuff that worked under Opus, when continued under Fable, got false flagged as malicious etc etc... basically they added the gating that in its current form makes it less than useless.

They also don't plan to let subscribers use it from their subscription plan after the 22nd June, instead everyone will have to additionally fork out extra credit spends at the rate of $50 bucks per million output tokens, which since the whole strength of Fable is long horizon agentic/autonomous operations, kind of defeats the point since most people won't be able to afford to economically use Fable for under its intended use cases.

Frankly, I've been running the numbers some more and I think all the US hyperscalers are fucked.

Fable is a bait to lure large corporations and the DoWs into begging Ant for the right to use Mythos; they never intended for consumers or even small and medium-sized enterprises to use it, aside from gambling and scam companies, no one can afford it.

lockedemosthenes1 · Jun 10, 2026

HighGround said:
However, the Chinese models are actually viable as businesses. They will generate returns on CAPEX far faster than Anthropic or OpenAI.

For Deepseek and Bytedance's video generation model, the claim is true, while for Kimi, Qwen, Minimax, GLM, it's not the case.
Also, I may wonder the claim that Claude's best model are so-called unnecessary (because the so-called easy, daily and personal tasks can be also finished by open source model, so Opus 4.8, GPT 5.5 Pro and Deepseek V4 Pro "has no difference to me" even if the monthly revenue for Anthropic is around 3-4 billion $ which can be compared to the sum of yearly revenue of all the LLM providers in China) and too expensive may be not that convincing when we have a look at Seedance 2.0, which also criticized for its 1￥/1 second price but widely used, or have a look at the evolution path for LLM which initially can only complete a few lines of code, but can complete the full script and automatically do some agentic work now.

Please, Log in or Register to view URLs content!

Artificial Intelligence thread

Eventine

Senior Member

iewgnem

Major

HighGround

Senior Member

gpt

Junior Member

tokenanalyst

Lieutenant General

Eventine

Senior Member

tamsen_ikard

Captain

9dashline

Major

bsdnf

Senior Member

lockedemosthenes1

New Member