Artificial Intelligence thread

tphuang · May 29, 2025

Eventine said:
Yes, I realize Kling is not a LLM, which will be problematic for Kuaishou as the new breed of multi-modal models will be capable of substantially better details control & instructions following. Seed is more promising in this respect, and I hope Byte Dance can scale up on the LLM side, to the level of challenging Veo 3, which is the current favorite to "win" the closed AI video race.

For video and image creation, Kling does the job quite well. There are others that do the job of watching video and getting audio out and such. But if you are just looking at creating ads, short dramas and things like that, Kling serves the Chinese market quite well.

Remember, there are many Chinese models that are simply not being benchmarked by these 3rd party lists.

Eventine said:
Multi-modal models are useful outside of edge devices.

I work in a content development industry. Multi-modal models are extremely disruptive for content generation. A model cross-trained between videos & text is not just useful for video recognition, it's also useful for generating videos. That's typically done on compute clusters.

The advantage over a model pipeline is that the video generator has a deeper & richer understanding of the association between words, objects, and motion, and can follow instructions much, much better than a typical video model that's only been trained on tags & descriptions. Thus, for instance, you can tell a multi-modal model to modify small details of a picture it generated, which is impossible for a tags-only image model like Stable Diffusion, where you'd have to do manual in painting.

There are also synergies when doing cross-training - e.g. LLMs' visuo-spatial knowledge are improved by reasoning in a latent space that's been trained on images & videos. Even Claude, which has been hyper optimized for logic & coding, has native vision capabilities because of this.

Any way, I'm not saying the future is necessarily native multi-modal models, but that it is a weakness in the Chinese AI ecosystem. Yes, Byte Dance has Seed and Alibaba has Qwen, but neither are built off of state of the art LLMs. Consequently, when folks in my industry are looking at options for enterprise quality content production, they're gravitating towards Google's and Open AI's solutions, even those that formerly preferred Kling.

reasoning frankly is overrated.

In most AI applications, you don't have the time to generate a bunch of reasoning tokens.

Think about actual multi-modal applications like ADAS, autonomous delivery robots, drones, humanoid robot and things like that. America is quite far behind in these areas. None of that can run computer clusters.

people in most of AI community in America has 0 clue what it's like to put AI in physical objects that do stuff. I guarantee you very few of the talking heads who follow LLMs understand this stuff.

Michael90 · May 30, 2025

Eventine said:
Hence the need for more strong Chinese competitors in this space, as well as replacing Baidu with a more competitive search engine

I dont think there's any other viable competitor from china in search . Baidu was Chinas best chance, but they messed up.

tphuang · May 30, 2025

https://twitter.com/i/web/status/1928565016816464377

CNPC's Kunlun large models unveiled. Ran purely on domestic compute stack, basically Huawei/Ascend.

9dashline · Jun 1, 2025

Please, Log in or Register to view URLs content!

tanino · Jun 2, 2025

Hello everyone. I read the (excellent) analysis here that clearly explained the advantage of Mandarin language semantics over languages without idioms (e.g., Latin languages) in AI. I would like to create some infographics and publish them here for the entire forum to use and enjoy. However, I would need a summary outline. Could you help me? Thank you all.

StraightEdge · Jun 2, 2025

Beijing's policy push is amplifying demand. The "High-Quality AI Compute Infrastructure Plan" sets a 2025 target of 300 EFLOPS, with at least 35% allocated to intelligent computing. Cities like Shanghai are also mandating that new AI centers use over 50% domestic chips.

China's AI infrastructure buildout is accelerating. Yicai reports compute server shipments surged 97.3% in 2024 and will grow another 52.9% in 2025. As of May 26, 2025, 123 AI center contracts had been awarded — 2.2 times more than the same period last year, with full-year tenders expected to hit 213, up from 53.

domestic chips could exceed 40% market share by mid-year, up from about 30% in 2023 — a surge that would have been unimaginable just two years ago.

https://twitter.com/i/web/status/1929322084154634290

tokenanalyst · Jun 2, 2025

Please, Log in or Register to view URLs content!

tokenanalyst · Jun 3, 2025

Please, Log in or Register to view URLs content!

huemens · Jun 5, 2025

Huawei claims better AI training method than DeepSeek using own Ascend chips

Please, Log in or Register to view URLs content!

Researchers working on Huawei Technologies’ large language model (LLM) Pangu claimed they have improved on DeepSeek’s original approach to training artificial intelligence (AI) by leveraging the US-sanctioned company’s proprietary hardware.

A paper – published last week by Huawei’s Pangu team, which comprises 22 core contributors and 56 additional researchers – introduced the concept of Mixture of Grouped Experts (MoGE). It is an upgraded version of the Mixture of Experts (MoE) technique that has been instrumental in DeepSeek’s cost-effective AI models.

Researchers at Huawei tested the new architecture on its Ascend neural processing unit (NPU) designed to accelerate AI tasks, and found that MoGE “leads to better expert load balancing and more efficient execution for both model training and inference”.

Compared to models like DeepSeek-V3, Alibaba Group Holding’s Qwen2.5-72B and Meta Platforms’ Llama-405B, Pangu achieved state-of-the-art performance on most general English benchmarks and all Chinese benchmarks, and showed higher efficiency in long-context training, according to the paper.

Pangu Ultra, an LLM with 135 billion parameters that is optimised for NPUs, highlights the effectiveness of Huawei’s architectural and systemic optimisations while showcasing the capabilities of its NPUs.

According to Huawei, the training process includes three main stages: pre-training, long context extension and post-training. This involves pre-training on 13.2 trillion tokens and long context extension using 8,192 Ascend chips.

Researchers said the model and system would soon be available to Huawei’s commercial customers.

Eventine · Jun 5, 2025

Huawei is probably the closest to a "Chinese Google" in terms of general competence and the marriage of hardware + software + data center expertise in the same company; long-term, they may be China's best bet in the frontier models space, or at least its greatest enabling factor, since we are moving into a world where co-evolution of software and hardware appears key to the realization of superior results in AI (see: the Google TPU advantage, which has been transformed into a price advantage for its AI products, allowing Google to offer Gemini 2.5 Pro at much cheaper costs than Open AI).

Artificial Intelligence thread

tphuang

General

Michael90

Senior Member

tphuang

General

9dashline

Captain

tanino

New Member

StraightEdge

Junior Member

tokenanalyst

Lieutenant General

tokenanalyst

Lieutenant General

huemens

Junior Member

Huawei claims better AI training method than DeepSeek using own Ascend chips

Eventine

Senior Member

Artificial Intelligence thread

General

Senior Member

General

Captain

New Member

Junior Member

Lieutenant General

Lieutenant General

Junior Member

Huawei claims better AI training method than DeepSeek using own Ascend chips​

Senior Member

Huawei claims better AI training method than DeepSeek using own Ascend chips