Artificial Intelligence thread

tphuang · Feb 11, 2026

https://twitter.com/i/web/status/2021667336319832265

so I just tested now with GLM-5.0 and compared it to Kimi-2.5 and Gemini-3 free. GLM had the best answer no the python coding request I gave it. It's actually shockingly good.

And according to artificial analysis, it's already evaluated to be better than Gemini-3 pro and same level as Claude Opus 4.5.

mossen · Feb 12, 2026

The low hallucination rate of GLM 5 is indeed the most impressive aspect of the model. I hope more people will focus on hallucination rates instead of just benchmaxxing. Some top Western models are really poor at that (e.g. Gemini).

It's also nice to see their model take a huge jump in size. I've long advocated leading labs should be pushing models into the trillions. Scale remains a key improvement metric if RL is done properly. It's unavoidable.

I've also tried the newest DeepSeek model. It's very good. The speed difference is shocking. It's just way faster now.

Of the Western-facing Chinese frontier labs DeepSeek and Moonshot remain great but GLM is now on par. Qwen has slipped further behind. I hope their rumored Qwen 3.5 release will improve things. I also hope ByteDance will start to focus on Western audiences in 2026. So far, they've mostly concentrated on the domestic market.

bsdnf · Feb 12, 2026

GLM Coding Pro users can already use GLM-5, while Lite users will have to wait a while.

What's particularly intriguing is the official announcement's mention of "strong support from domestic chip partners" for achieving the computing power expansion. I don't know if this is just political correctness or if they genuinely use domestically produced chips for inference.

If this is true, then at least the model of using NVIDIA to train SOTA models and domestically produced chips for inference to cope with the surge in traffic has once again proven to be feasible.

I hope Kimi can learn from this lesson. Users literally putting money in your pocket and you can't even take it, how is that acceptable?

tphuang · Feb 12, 2026

https://twitter.com/i/web/status/2022004800053739717

Huawei/Ascend here documents its support for GLM-5. Quite interesting stuff actually. Flash attention supposedly makes inference run a lot faster (although I never personally got it to work before). vLLM is a lot faster than PyTorch and I think SGLang is the one all the Chinese AI shops use. Being able to shrink it down to 400-500GB using w4a8 format is a big deal since much smaller boxes can run it now.

tphuang · Feb 12, 2026

https://twitter.com/i/web/status/2021980761210134808

Looks like Minimax 2.5 here at least on coding has caught up to Anthropic.

bsdnf · Feb 12, 2026

Minimax M2.5 is still a 10B activated parameters model, input price $0.30/M and output is $1.20/M?

absolute minimax

bsdnf · Feb 12, 2026

And they're training the actual codex's competitor: MiniMax-Mx

They just casually announced that.

https://twitter.com/i/web/status/2021996347734536701

bsdnf · Feb 12, 2026

Boss Zhipin (a major job-seeking app in China) has also launched their mini-model.

https://twitter.com/i/web/status/2021916472381931895

bsdnf · Feb 12, 2026

And then there's Ant. Not many people actually use their models; 1T is just too big, but at least they're honest about their benchmark scores.

https://twitter.com/i/web/status/2021974501660274924

meedicx · Feb 12, 2026

bsdnf said:
And then there's Ant. Not many people actually use their models; 1T is just too big, but at least they're honest about their benchmark scores.

https://twitter.com/i/web/status/2021974501660274924

Kimi, GLM and MiniMax should all include a test-time scaling "heavy thinking" mode to benchmaxx

The top benchmarks you see for ChatGPT and Opus are usually their most expensive mode where they use TTS to run the query in parallel multiple times. OpenAI always introduces a new model w/ xhigh mode when they are losing the benchmark war.

Kimi-K2.5 introduced a swarm feature as preview. In theory, they could eventually use this to launch 100 swarm agents and crush OpenAI / Anthropic on these benchmarks.

Artificial Intelligence thread

tphuang

General

mossen

Senior Member

bsdnf

Senior Member

tphuang

General

tphuang

General

bsdnf

Senior Member

bsdnf

Senior Member

bsdnf

Senior Member

bsdnf

Senior Member

meedicx

Junior Member