Artificial Intelligence thread

Eventine

Junior Member
Registered Member
I would argue that what the U.S is truly exceptional at, and what shows no signs of slowing down, is AI hype. Gemini 2.5 Pro is an incremental technical upgrade over DeepSeek-R1 in many areas, but it’s closed-source, leaving us in the dark about its details. It likely doesn’t surpass what we’ll soon see open-sourced with R2. Can this really be called AI progress when it’s a black box, accessible only through Google’s heavily censored API, with no accompanying research paper? Google can’t achieve true AI progress like DeepSeek because its core business model holds it back.

Meanwhile, high-quality, practical AI advancements are flourishing in China, going far beyond chatbots into manufacturing, healthcare, education, electronics, EVs, and many other fields. It’s being integrated everywhere due to DeepSeek’s open-source nature, which allows anyone to host and modify it to their needs.

In my view, meaningful U.S AI progress has undoubtedly slowed over the past six months (since the reasoning paradigm was proven by o1), while the hype around U.S AI only grows louder. In contrast, China’s meaningful AI progress has accelerated significantly since the release of Qwen-2.5 six months ago, followed by DeepSeek V3/R1 and the imminent R2. Add to that BYD’s integration of AI into all its cars, the rapid advancements by countless humanoid robot companies, and the profound economic ripple effect created by R1 across China’s entire economy. This is real AI progress. Not Gemini’s censored, closed black box, which will become irrelevant the moment R2 arrives.
I think you're underestimating Google.

As I've said before, Google is incredibly competitive for a number of reasons, the biggest of which is that it dominates search.

Having used these AI models in a production setting for a while, the knowledge cut off is absolutely killer for practical applications. Not being able to pull the latest research, news, numbers, coding practices, social media trends, entertainment content, etc. is a huge detriment to the model's value. I can't ask the model to summarize new research in a certain area, for example, because it doesn't know about it.

It's the reason Deep Seek's web service is no longer really being used in my work flows (
Please, Log in or Register to view URLs content!
), because its search function is broken in the West. It works in China, and there Deep Seek continues to top charts, but in the West, it's largely been forgotten by users even without a ban.

Google dominates search across most of the world. This means that as soon as they have a model that is any where close to state of the art, they "win" just by default, because people are going to be using the Google service that's built into their search engine, and not jump through the hoops for Open AI, Anthropic, Deep Seek, etc. Even if Google allows your model to use their search engine, you're going to be paying a premium to them and they'll just profit off of every user you have.

For these reasons, Google's achievements with Gemini 2.5 Pro are huge and are being under sold by the idea that it's just "hype." It's not just hype, Google is well positioned to win the AI adoption race, and should be taken seriously.
 
Last edited:

AI Scholar

New Member
Registered Member
It's the reason Deep Seek's web service is no longer really being used in my work flows (
Please, Log in or Register to view URLs content!
), because its search function is broken in the West.
Yeah, I think that’s also what’s limiting their popularity for now. However, Qwen Chat somehow seems to be doing fine with web searches using what appears to be a Chinese search engine, though I don’t know more details about it.

We are finally getting open-source omni models. Qwen Chat already has video/image generation and search, and with this, the platform is steadily becoming more and more complete. They also released a Qwen2.5-Omni technical report, available via the Twitter link.
 

ZeEa5KPul

Colonel
Registered Member
I'm going to eat my words a bit here. I have my own set of "benchmarks" I put LLMs through, and I will say that the new Gemini is the first model where I'm getting the impression that it's more than parroting. It picks up nuances and details in the writing I give it, the chains of thought are the most human-like I've read. I'm more impressed with it than I was with R1, I'm sorry to say. More than anything, it's the speed in a "reasoning" model that puts it above the rest.

So far so good, but it's still early going and other models have held up for a while before breaking down into nonsense. Let's see how long Gemini can hold up for. I expect big things from R2 if Google can put something like this out.

Having said all that, I'm still not convinced that this is the correct road to AGI, but I can definitely see very useable and quite reliable chatbots emerging.

Edit: It seems I spoke too soon, the usual errors have started creeping up. Still, it's "progress" that it held up the illusion this long.
 
Last edited:

Hyper

Junior Member
Registered Member
I'm going to eat my words a bit here. I have my own set of "benchmarks" I put LLMs through, and I will say that the new Gemini is the first model where I'm getting the impression that it's more than parroting. It picks up nuances and details in the writing I give it, the chains of thought are the most human-like I've read. I'm more impressed with it than I was with R1, I'm sorry to say. More than anything, it's the speed in a "reasoning" model that puts it above the rest.

So far so good, but it's still early going and other models have held up for a while before breaking down into nonsense. Let's see how long Gemini can hold up for. I expect big things from R2 if Google can put something like this out.

Having said all that, I'm still not convinced that this is the correct road to AGI, but I can definitely see very useable and quite reliable chatbots emerging.

Edit: It seems I spoke too soon, the usual errors have started creeping up. Still, it's "progress" that it held up the illusion this long.
How do errors creep in? Does the model degrade over time? Why?
 

OptimusLion

Junior Member
Registered Member
A new DeepSeek paper is here! In a study jointly published by Tsinghua researchers, they discovered a new method for scaling when rewarding model reasoning.

Researchers from DeepSeek and Tsinghua found that by using pointwise generative reward modeling (GRM) on the RM method, the model's flexible adaptability to different input types can be improved, and it has the potential to be scalable in the reasoning stage.

To this end, they proposed a learning method called Self-Principled Critique Tuning (SPCT).

Through online RL training, GRM is promoted to generate behaviors with scalable reward capabilities, that is, it can adaptively generate judgment principles and accurately generate review content, thereby obtaining the DeepSeek-GRM model.

They proposed DeepSeek-GRM-27B, which is based on Gemma-2-27B trained after SPCT.

It can be found that SPCT significantly improves the quality and scalability of GRM, and outperforms existing methods and models in multiple comprehensive RM benchmarks.

The researchers also compared the reasoning time scalability of DeepSeek-GRM-27B with a larger model of 671B and found that it has better training time scalability on model size.

In addition, they introduced a meta-reward model (meta RM) to guide the voting process to improve scalability.

Paper address: arxiv.org/abs/2504.02495

Screenshot_2025-04-05-08-19-16-50_512d59465dcd1d9c8e9be3e706423fdf.jpg

Screenshot_2025-04-05-08-15-53-76_512d59465dcd1d9c8e9be3e706423fdf.png

Screenshot_2025-04-05-08-17-44-42_512d59465dcd1d9c8e9be3e706423fdf.png
 

european_guy

Junior Member
Registered Member
LLama 4 is out

Please, Log in or Register to view URLs content!

By their own admission, they took "inspiration" from Deepseek like for instance for the MoE (mixture of expert) model instead of using their classic "dense" model, but also introduced some novelty like
Please, Log in or Register to view URLs content!
and
Please, Log in or Register to view URLs content!
.

Their model is also native multi-mode (text, images, video) and this may explain the huge pretraining dataset of 30T tokens (about X2 compared to Qwen and DeepSeek). They pretrained on 32K GPUs.

LLama approach has always being: simple architecture + brute force...it is not a wrong idea when you have unlimited hardware. I hope Huawei and others will soon fill the GPU gap, so to allow Chinese labs to compete on almost equal footing.

Anyhow kudos to them for opensurcing the models. They are definetely the most open among US companies: Google opensoruces only their tier-3 models, and OpenAI...well, they are a joke regarding Open, at least Anthropic, the closest one, does not pretend.
 
Top