The biggest regression for DeepSeek has been an increase in hallucination rates. They were already bad for V3.2 and they somehow got worse. By contrast, Xiaomi's MiMo 2.5 Pro has recently been released and they are now near the frontier. Zhipu and Kimi also do well. So among the Chinese start-ups, DeepSeek seem to have the most problems with hallucinations.

GPT 5.5 also does badly compared to Opus or Gemini. Hallucination rates is one of the most overlooked metrics in AI yet one of the most important. Can you trust the output or not?

GPT 5.5 also does badly compared to Opus or Gemini. Hallucination rates is one of the most overlooked metrics in AI yet one of the most important. Can you trust the output or not?

