Artificial Intelligence thread

siegecrossbow

Field Marshall
Staff member
Super Moderator
They should have waited a few more months then. Now US companies will take their algorithmic innovations and and apply them to achieve better results and deepseek will again lag behind even if they get better with more training.
What makes you think this is the best algorithmic innovation from DeepSeek or China, for that matter?
 

Michael90

Senior Member
Registered Member
The biggest regression for DeepSeek has been an increase in hallucination rates. They were already bad for V3.2 and they somehow got worse. By contrast, Xiaomi's MiMo 2.5 Pro has recently been released and they are now near the frontier. Zhipu and Kimi also do well. So among the Chinese start-ups, DeepSeek seem to have the most problems with hallucinations.

View attachment 173946

GPT 5.5 also does badly compared to Opus or Gemini. Hallucination rates is one of the most overlooked metrics in AI yet one of the most important. Can you trust the output or not?
I’m impressed by Xiaomi, their first try and they outperform many companies who focus only on a that sector, they did same with EVs outcompeting most ChineseEV makers who have been in car business for years . Impressive company overall .
 

Michael90

Senior Member
Registered Member
I
My latest on DeepSeek V4. My sense is that they were really resource constrained here and that things won’t change until atlas 950 goes into service. Probably the base model all trained on Nvidia and the further pre training of lite model and pro model in the future will be done on CANN.

Very under trained and under thinking model right now it seems like. Kimi runs so much slower and delivers better result.

If so, maybe it would have been better to wait a few more months to release a more advanced ready made version with atlas 950?
 

dingyibvs

Senior Member
AI can mimic human appearances and voices, and replicate the artistic styles of painters, composers, and writers—practices that were once considered unacceptable in the world of original art, and one of the main reasons AI has drawn criticism in the past. While current technological advancements have exacerbated inequality for some, we must look to the future; the train of progress will not stop to show mercy to those still living in the old world.

I'm sure painters didn't view photographers very kindly in their early days either.
 

playmaker1478

New Member
Registered Member
The AI industry is moving so fast, there is no scope to "save" your breakthroughs. Its use it or someone else will type of situation.
I think the implication of holding back launches to save algorithmic innovation is answered in your comment. If DS doesn’t launch its model, then someone will eventually use their innovation, like mHC, to innovate upon. There is no use in saving your breakthrough for a better launch; it is best to publish the result and work on the next frontier.

DS founder did claim to be pushing for China’s own AI stack sufficiency, and in doing so, they trained on Huawei hardware to understand all the quirks of training on an independent hardware stack. In all honesty, DS V4 is less about algorithmic innovation and more about the hardware front. They did just that and are probably in the best position to pull off training on their own hardware stack similar to Google (not sure about design, but they were originally quant, so they can't be that bad at designing hardware accelerators).

Also, this has massive implications for other Chinese AI teams if the political situation calls for a migration to Huawei infrastructure. In that case, Huawei will have this wealth of experience working with a frontier lab on how to provide, roll out, and optimize infrastructure for an AI launch. This can help if other teams decide to also move away or be less reliant on Nvidia's hardware in the future.
 
Last edited:

Michael90

Senior Member
Registered Member
I think the implication of holding back launches to save algorithmic innovation is answered in your comment. If DS doesn’t launch its model, then someone will eventually use their innovation, like mHC, to innovate upon. There is no use in saving your breakthrough for a better launch; it is best to publish the result and work on the next frontier.

DS founder did claim to be pushing for China’s own AI stack sufficiency, and in doing so, they trained on Huawei hardware to understand all the quirks of training on an independent hardware stack. In all honesty, DS V4 is less about algorithmic innovation and more about the hardware front. They did just that and are probably in the best position to pull off training on their own hardware stack similar to Google (not sure about design, but they were originally quant, so they can't be that bad at designing hardware accelerators).

Also, this has massive implications for other Chinese AI teams if the political situation calls for a migration to Huawei infrastructure. In that case, Huawei will have this wealth of experience working with a frontier lab on how to provide, roll out, and optimize infrastructure for an AI launch. This can help if other teams decide to also move away or be less reliant on Nvidia's hardware in the future.
The way Nvidia has a stranglehold over AI training and the stack is crazy. It so hard to break that without falling out of the race altogether, so it’s a tough one for China , and requires a lot of careful policies not to cut off totally from their stack, even despite the sanctions , since on a normal day China should not even be using Nvidia due to security risks as well
 

tphuang

General
Staff member
Super Moderator
VIP Professional
Registered Member
They should have waited a few more months then. Now US companies will take their algorithmic innovations and and apply them to achieve better results and deepseek will again lag behind even if they get better with more training.
there are always things that US labs can learn from Chinese AI labs, but they tend to train larger models. There are definitely things that Chinese AI labs can learn from each other's papers. DeepSeek, Kimi and GLM are all learning from each other. ByteDance and Tencent also benefit from all the stuff open labs are sharing with rest of the world.
 

Tomboy

Captain
Registered Member
Ascend engineers held a presentation with technical detail on how they optimized for DeepSeek v4. Lots of detail about inference optimization, but still unclear if pre-training uses Ascend. They will hold additional presentations Apr 27-29 that will go into more detail on training optimizations for DeepSeek v4, which hints that Ascend was used in training


Please, Log in or Register to view URLs content!
AFAIK from what is being rumored around on Chinese internet, V4 still used Nvidia cards for training and only use Ascend cards for inference.
its the largest opensource model at 1.6Trillion parameters and the first to be trained entirely on nonWestern GPU...

also I heard a rumor that Kimi K3 will be near Mythos level later this june
It's very likely still trained on western hardware, just that inference now use Chinese cards which is a big step forward but not as big as what most people are trying to make it out to be.
 
Top