Artificial Intelligence thread

bsdnf · May 5, 2026

From my observation, while China's LLM industry lacks computing power, it's not to that extent; this isn't something that can be solved simply with money.

As Deepseek pointed out, what they truly lack is data. LLM Company repeatedly claims that internet data has been exhausted and that it needs to rely on synthetic data. However, a significant portion of new data does not simply disappear; it flows into their closed database through chatbots and agents. Leading companies in the US first provide services, acquire data, then train more powerful models, gain more users and data, and simultaneously use these models to develop more powerful development pipelines, achieving a self-iterating flywheel.

This is likely one of the reasons why Deepseek initiated the major price cut: on the one hand, their extremely powerful kv-caching mechanism allows them to do so. When anthropic only provides a 5-minute kv cache, Deepseek can retain it for several days without charging any additional fees. And on the other hand, they hope to absorb enough data to re-accelerate their flywheel. Yes, 99% of the data is useless, but given enough data, some will eventually become useful. No company dares to admit that they can rely solely on synthetic data and human experts.

bsdnf · May 5, 2026

To be honest, I've been a bit spoiled by Deepseek.

Every time I switch to Kimi, I get annoyed by how quickly its context approaches the automatic compression limit for Claude code. Z.ai and Kimi really need to step up their efforts to replicate the V4 architecture; it's a lifesaver for their coding plans.

tphuang · May 5, 2026

bsdnf said:
From my observation, while China's LLM industry lacks computing power, it's not to that extent; this isn't something that can be solved simply with money.

As Deepseek pointed out, what they truly lack is data. LLM Company repeatedly claims that internet data has been exhausted and that it needs to rely on synthetic data. However, a significant portion of new data does not simply disappear; it flows into their closed database through chatbots and agents. Leading companies in the US first provide services, acquire data, then train more powerful models, gain more users and data, and simultaneously use these models to develop more powerful development pipelines, achieving a self-iterating flywheel.

This is likely one of the reasons why Deepseek initiated the major price cut: on the one hand, their extremely powerful kv-caching mechanism allows them to do so. When anthropic only provides a 5-minute kv cache, Deepseek can retain it for several days without charging any additional fees. And on the other hand, they hope to absorb enough data to re-accelerate their flywheel. Yes, 99% of the data is useless, but given enough data, some will eventually become useful. No company dares to admit that they can rely solely on synthetic data and human experts.

keep in mind that ByteDance generates as much data as OpenAI and Google. DeepSeek is probably 1/2 to 1/3 of that. Even Z.ai is apparently doing 5.5T tokens/day.

but yes, DeepSeek V4 is quite good and super fast. I do find Kimi 2.6 to still be better in coding, at least one-shooting. Although I often get into "server busy" with Kimi and that has never happened with V4, which is always super fast.

GulfLander · May 5, 2026

new AI company focused on Chinese equities?

https://twitter.com/i/web/status/2051737895875035414

bsdnf · May 6, 2026

Yep, acquiring data is one thing, but efficiently cleaning data is another challenge.

https://twitter.com/i/web/status/2051873677998956851

BlackWindMnt · May 6, 2026

jli88 said:
Anthropic is growing like > 10x every year in annualized revenue. Most developers that I know are using it extensively and are impressed with it, that the growth can continue for quite a while.

This is like SpaceX all over again, Chinese analysts were in denial until very recently about the feasibility of the tech.

China needs to improve its innovation ecosystem where high risk bets with large capital can be placed.

It's also extreme astroturfed on social media, everydam tech influencer went fully on AI model this, AI model that. It's like tech's own labubu and dubai chocolate psyops campaign.

Then you have big tech paying influencer up to a million dollar to promote AI usage. Let's see how useful AI becomes when the OpenAI and Anthropic go IPO and the bubble somewhat deflates(don't think it will pop). And VC money moves out of AI and stop subsidizing token usage(you already see this happening)

shiftenter · May 6, 2026

DeepSeek nears $45bn valuation as China’s ‘Big Fund’ leads investment talks
Value soars in ongoing fundraising discussions as investors including Tencent seek slice of AI lab

Backing from China’s most strategic government fund in semiconductors would reinforce DeepSeek’s leading position in the country

China’s biggest state-backed semiconductor investment vehicle is in talks to lead the financing of DeepSeek’s first fundraising that could value the AI group at about $45bn.

The China Integrated Circuit Industry Investment Fund, typically referred to as the “Big Fund”, is seeking to lead the investment into DeepSeek, according to four people with knowledge of the discussions.

Other investors still in talks for a stake include Chinese tech giant Tencent, although the final line-up has not yet been finalised.

DeepSeek shot to prominence in January 2025 following the release of R1, a powerful open-source large language model, which it said was trained on a fraction of the computing power of models developed by American rivals such as OpenAI.

The valuation of DeepSeek has increased significantly from $20bn when it started the fundraising talks only weeks ago, as investors strive to bet on the lab’s potential despite its lack of focus on commercialisation.

Liang Wenfeng, the billionaire founder of the Hangzhou-based start-up, could also invest personally in this round, two of the people said. He controls 89.5 per cent of DeepSeek through personal holdings and affiliated groups, according to company filings.

Backing from China’s most strategic government fund in semiconductors would reinforce DeepSeek’s position as a leader in the country’s frontier AI model development, as well as promote a Chinese ecosystem comprising domestic models, software and chips.

China has launched three phases of the state-backed “Big Fund” to aid President Xi Jinping’s self-sufficiency drive in the face of US efforts to restrict the country’s access to technology such as advanced semiconductor production equipment.

The fund assembled $47bn from the finance ministry, local government and state-owned banks in its third round of funding in 2024, with a mandate to invest in semiconductor equipment and materials. It has not publicly backed any of China’s other LLM players.

The Big Fund has bankrolled key companies in China’s semiconductor industry including Semiconductor Manufacturing International Corporation, the country’s largest and most advanced foundry, as well as Yangtze Memory Technologies Corp., China’s leading memory chipmaker.

DeepSeek said in its latest V4 model launch that it had been optimised to run inference — the computation that LLMs use to generate responses — on Huawei’s Ascend 950PR chips.

Huawei’s AI chip sales have surged this year as it overtook Nvidia in China, the world’s largest AI chip supplier, whose advanced products are still banned from entering the country, the FT reported last week.

Still, the overall amount of AI chips that China produces is only a fraction of that from the US and these processors are at least two generations behind.

To catch up, Beijing is counting on its tech companies — from chipmakers to model builders — to work closely together. The aim is to develop an ecosystem that could sustain China’s competitiveness in AI despite US export controls tightening.

Such an ecosystem could pose a danger to US dominance globally, according to Nvidia chief Jensen Huang.

“The day that DeepSeek comes out on Huawei first, that is a horrible outcome for our nation,” he said in a recent interview with podcaster Dwarkesh Patel. It could lead to a scenario where “AI models around the world are developed and they run best on non-American hardware”, he added.

Since it came to global attention last year DeepSeek has focused on training frontier AI models, rather than developing a commercial business selling AI to companies or growing its consumer AI chatbot.

DeepSeek’s coding capability is among the best in China, where peers such as Zhipu and Moonshot expect revenues to keep surging, according to one of the people considering an investment. Hong Kong-listed Zhipu has a market value of $52bn.

Liang initially wanted to raise a nominal sum to assign a value to DeepSeek’s options in a bid to stop his researchers being poached by competitors offering huge packages, the FT reported last month. Stock options typically make up most of an AI researcher’s remuneration.

Now that DeepSeek’s valuation has risen significantly, Liang might consider raising more money to boost a war chest for future investment in computing capacity, the person said.

DeepSeek, the Big Fund and Tencent did not immediately respond to requests for comment.

Additional reporting by Arjun Neil Alim in Hong Kong. Data visualisation by Haohsiang Ko in Hong Kong

Please, Log in or Register to view URLs content!

bsdnf · May 6, 2026

Seed2.0-lite checkpoint update

tphuang · May 6, 2026

Looks like ByteDance getting this out there in time for charging people on Doubao. The regular payment plan probably get this lite version, which isn't too bad.

more on ByteDance capex
230B RMB this yr on GPU and 120B on CPU, IDC and network gear

Please, Log in or Register to view URLs content!

so Carmbricon get 25-30B, Huawei get 22-23B and SeedChip 17-19B

结构来看，海外采购约1200亿，其中英伟达占65%，AMD占30%，博通等占5%；国产采购预算约1100亿，核心供应商为寒武纪和华为。

其中，寒武纪预算约250-300亿，华为预算约220-230亿，此外，字节自研芯片（通过芯原等设计服务公司流片）预算约170-190亿。

expecting at least 50B through 2027 with Cambricon in 590/690 series.
Ascend has a lot of internal demand.
Kunlunxin order with Baidu
ByteDance's internal designed chip looks to have been with Verisilicon help, who gets $300-400 profit per chip.

tphuang · May 7, 2026

https://twitter.com/i/web/status/2052360834311987582

Another huge expansion of ByteDance data center in Southeast Asia. Here is what seems to be a 1GW data center in Thailand. It already has a few other ones. They should have plenty of compute with Nvidia chips from these DCs for both training and inference.

Artificial Intelligence thread

bsdnf

Senior Member

bsdnf

Senior Member

tphuang

General

GulfLander

Brigadier

bsdnf

Senior Member

BlackWindMnt

Major

shiftenter

Junior Member

bsdnf

Senior Member

tphuang

General

tphuang

General