Artificial Intelligence thread

Randomuser · Monday at 10:30 AM

Please, Log in or Register to view URLs content!

I wonder are they targeting subject matter experts for "mysterious" deaths. Or if its just the fact that China has a lot of them so some of them are bound to die.

Michael90 · Monday at 10:33 AM

AI Scholar said:
It’s only a matter of time before most Chinese companies surpass U.S companies in frontier reasoning LLMs(Kimi already surpasses all U.S companies in non-reasoning LLMs by a large margin) imo. They’re likely still in the experimentation phase, which is why they’re currently offering API credits equivalent to the value of their subscription. I believe their subscription plans will be highly competitive by 2026.

Let’s forget about us AI models who are still ahead, their main issue for now it’s Chinese competitors who offer open sourced free models and have no intention of asking for subscription anytime soon. Plus I don’t think is/will be the best among the top Chinese models like deepseek, or Qwen who are still ahead of kimi and offering their products without any subscription. Why should consumers adopt their models instead of those two? I don’t see any major advantage they have over Deepseek and Qwen to justify a consumer paying their models over those two who are even better and free.

AI Scholar · Monday at 10:49 AM

Michael90 said:
Let’s forget about us AI models who are still ahead, their main issue for now it’s Chinese competitors who offer open sourced free models and have no intention of asking for subscription anytime soon. Plus I don’t think is/will be the best among the top Chinese models like deepseek, or Qwen who are still ahead of kimi and offering their products without any subscription. Why should consumers adopt their models instead of those two? I don’t see any major advantage they have over Deepseek and Qwen to justify a consumer paying their models over those two who are even better and free.

They’re currently offering higher quotas for the Ok Computer Agent and Deep Research features with their subscriptions, along with API credits, though Kimi itself remains free to use on the website and app. Currently, it’s not a bad deal to get both API credits and computer agent + Deep Research quotas for the price of a subscription. Also, Kimi K2 is currently the best open-source non-reasoning model on the Artificial Analysis leaderboard, since Qwen3-Max is not open source.

I suspect they have a monetization strategy in the works for 2026 and will likely discontinue bundling API credits with subscriptions once that plan is in place. The challenge to monetize will instead fall on U.S AI labs by 2026–2027 as Chinese labs continue releasing superior open-source models for free imo.

tphuang · Monday at 12:37 PM

https://twitter.com/i/web/status/1972701949095997940

if feels like DeepSeek is really working with domestic chipmakers to improve support for domestic chips like Ascend and Cambricon. Likely also other chipmakers, but these two are more well known and their libraries are open source, so we can just spot them here.

tphuang · Tuesday at 6:41 AM

https://twitter.com/i/web/status/1972972696225890396

Zhipu is out with GLM-4.6 and according to the benchmarks, it's really good.

Remains to be seen how popular it gets though.

tphuang · Wednesday at 7:36 AM

Glenn and I did a podcast on the current state of China's AI progress with Alibaba and Huawei

Please, Log in or Register to view URLs content!

luminary · Friday at 5:23 PM

A

Please, Log in or Register to view URLs content!

note that LLMs that follow a “Kepler-esque” approach: they can successfully predict the next position in a planet’s orbit, but fail to find the underlying explanation of Newton’s Law of Gravity (see

Please, Log in or Register to view URLs content!

). Instead, they resort to incorrect fitting rules that allow them to successfully predict the planet’s next orbital position but

Please, Log in or Register to view URLs content!

to find the force vector and generalize to other physics. Explained in

Please, Log in or Register to view URLs content!

.

Siffrin · 2025-10-04T03:51:29-0400

Please, Log in or Register to view URLs content!

apparently reaches near parity with original 30b a3b on pure text performance but with vision capability on top of it

daifo · 2025-10-04T09:11:36-0400

Huawei open source a software technique to reduce memory size for LLM

Across models of different sizes, SINQ cuts memory usage by 60–70%, depending on architecture and bit-width.

This enables models that would previously require >60 GB of memory to run on ~20 GB setups—a critical enabler for running large models on a single high-end GPU or even multi-GPU consumer-grade setups.

This makes it possible to run models that previously needed high-end enterprise GPUs—like NVIDIA’s A100 or H100—on significantly more affordable hardware, such as a single Nvidia GeForce RTX 4090 (
Please, Log in or Register to view URLs content!
), instead of enterprise hardware like the A100 80GB (
Please, Log in or Register to view URLs content!
) or even H100 units that
Please, Log in or Register to view URLs content!
.

For teams using cloud infrastructure, the savings are similarly tangible. A100-based instances often cost $3–4.50 per hour, while 24 GB GPUs like the RTX 4090 are available on many platforms for $1–1.50 per hour.

Please, Log in or Register to view URLs content!

Eventine · 2025-10-04T13:45:10-0400

luminary said:
A
Please, Log in or Register to view URLs content!
note that LLMs that follow a “Kepler-esque” approach: they can successfully predict the next position in a planet’s orbit, but fail to find the underlying explanation of Newton’s Law of Gravity (see
Please, Log in or Register to view URLs content!
). Instead, they resort to incorrect fitting rules that allow them to successfully predict the planet’s next orbital position but
Please, Log in or Register to view URLs content!
to find the force vector and generalize to other physics. Explained in
Please, Log in or Register to view URLs content!
.

The lack of higher level “meta” thinking is a well known problem with deep neural network models in general. The underlying architecture after all is an approximation function for higher order principles built from a bunch of primitive attention layers - this is never going to exactly represent something like F=ma.

It’d be the equivalent of trying to have the human brain learn F=ma as an exact neural function. Impossible- you can never calculate F=ma instinctually. You have to do the math on a paper or calculator.

Thus the current focus on agentic systems. You don’t need neural networks to exactly represent F=ma. You need them to come up with symbolic solutions that they can then implement and use in code.

The symbolic calculation should not be represented in the LLM but be a function they can call. But the tricky part is having the LLM go through the experimentation & abstract thinking process to arrive at a formula they can code up in the first place.

It should be possible since LLMs can obviously manipulate symbols and use tools in a principled way. But having them combine approximate neural network thinking with precise symbolic thinking is not trivial. The big labs are working on it but it’s not clear anyone has the “solution.”

Artificial Intelligence thread

Randomuser

Captain

Michael90

Junior Member

AI Scholar

New Member

tphuang

General

tphuang

General

tphuang

General

luminary

Senior Member

Siffrin

Just Hatched

daifo

Major

Eventine

Senior Member