Artificial Intelligence thread

tphuang

General
Staff member
Super Moderator
VIP Professional
Registered Member

so I just tested now with GLM-5.0 and compared it to Kimi-2.5 and Gemini-3 free. GLM had the best answer no the python coding request I gave it. It's actually shockingly good.

And according to artificial analysis, it's already evaluated to be better than Gemini-3 pro and same level as Claude Opus 4.5.
 

mossen

Senior Member
Registered Member
The low hallucination rate of GLM 5 is indeed the most impressive aspect of the model. I hope more people will focus on hallucination rates instead of just benchmaxxing. Some top Western models are really poor at that (e.g. Gemini).

It's also nice to see their model take a huge jump in size. I've long advocated leading labs should be pushing models into the trillions. Scale remains a key improvement metric if RL is done properly. It's unavoidable.

I've also tried the newest DeepSeek model. It's very good. The speed difference is shocking. It's just way faster now.

Of the Western-facing Chinese frontier labs DeepSeek and Moonshot remain great but GLM is now on par. Qwen has slipped further behind. I hope their rumored Qwen 3.5 release will improve things. I also hope ByteDance will start to focus on Western audiences in 2026. So far, they've mostly concentrated on the domestic market.
 

bsdnf

Senior Member
Registered Member
GLM Coding Pro users can already use GLM-5, while Lite users will have to wait a while.

What's particularly intriguing is the official announcement's mention of "strong support from domestic chip partners" for achieving the computing power expansion. I don't know if this is just political correctness or if they genuinely use domestically produced chips for inference.

If this is true, then at least the model of using NVIDIA to train SOTA models and domestically produced chips for inference to cope with the surge in traffic has once again proven to be feasible.

I hope Kimi can learn from this lesson. Users literally putting money in your pocket and you can't even take it, how is that acceptable?
 
Last edited:

tphuang

General
Staff member
Super Moderator
VIP Professional
Registered Member

Huawei/Ascend here documents its support for GLM-5. Quite interesting stuff actually. Flash attention supposedly makes inference run a lot faster (although I never personally got it to work before). vLLM is a lot faster than PyTorch and I think SGLang is the one all the Chinese AI shops use. Being able to shrink it down to 400-500GB using w4a8 format is a big deal since much smaller boxes can run it now.
 

meedicx

Junior Member
Registered Member
And then there's Ant. Not many people actually use their models; 1T is just too big, but at least they're honest about their benchmark scores.

Kimi, GLM and MiniMax should all include a test-time scaling "heavy thinking" mode to benchmaxx

The top benchmarks you see for ChatGPT and Opus are usually their most expensive mode where they use TTS to run the query in parallel multiple times. OpenAI always introduces a new model w/ xhigh mode when they are losing the benchmark war.

Kimi-K2.5 introduced a swarm feature as preview. In theory, they could eventually use this to launch 100 swarm agents and crush OpenAI / Anthropic on these benchmarks.
 
Top