Artificial Intelligence thread

GulfLander

Brigadier
Registered Member
OpenAI’s Pivot to Porn Is Problematic — But Lucrative
When you’re building artificial intelligence to benefit humanity, you might have to compromise.

AI is expensive, so you raise billions of dollars from investors like Microsoft, Nvidia and the United Arab Emirates. As you strive to build super-intelligent computers that will cure cancer, you also need to make money for your backers. So, after pitching your powerful chatbot technology to businesses, who struggle to make it useful, your next option may be monetising your enormous user base of 800 million weekly visitors – with a sex bot.
Please, Log in or Register to view URLs content!
Please, Log in or Register to view URLs content!
 

9dashline

Captain
Registered Member
DeepSeek has just unveiled research that could very well redefine the landscape of artificial intelligence, yet they’ve masked this monumental achievement under the unassuming title of "DeepSeek-OCR."
Do not be misled. While the model exhibits state-of-the-art OCR capabilities, its true significance lies not in reading text, but in a radical reimagining of how Large Language Models (LLMs) encode and process information. This is a seismic shift that challenges the fundamental constraints of modern AI architecture.
The Tokenization Barrier Shattered
Historically, the integration of vision into LLMs has been notoriously inefficient. In traditional multimodal systems, visual tokens were treated as bulky, secondary additions to the text-based paradigm. Converting a lengthy document into pixels required vastly more tokens than the symbolic text itself. This inefficiency relegated the visual modality to handling only data that couldn't be expressed in words.
DeepSeek has spectacularly inverted this relationship.
They haven't just optimized the process; they have achieved a stunning 10x compression advantage using their novel visual tokens compared to standard text tokens. To put this into perspective: a 10,000-word dissertation that might consume 15,000 text tokens can now be visually encoded and compressed into just 1,500 visual tokens, with near-lossless fidelity.
This is a profound breakthrough. By treating information visually, we can now achieve an information density far beyond what symbolic language representation allows.
The Dawn of the Mega-Context Era
This compression advantage is the key to unlocking capabilities that were previously impossible. We are no longer talking about incremental increases in context windows; we are facing the immediate potential for effective context windows scaling into the tens of millions of tokens.
The implications for real-world applications are staggering and signal a qualitative leap in AI utility:
1. The Potential Obsolescence of RAG:
Complex Retrieval-Augmented Generation (RAG) systems exist primarily to circumvent limited memory. This new paradigm renders much of RAG obsolete. Instead of forcing the AI to constantly interrupt its reasoning to search external databases, we can now preload an organization's entire knowledge base—every document, every contract, every technical archive—directly into the model's working memory.
2. Holistic Code Understanding:
Developers will be able to ingest entire, massive code repositories into the context and cache it. The AI will maintain a complete, real-time understanding of the entire architecture, allowing for profound debugging, refactoring, and feature generation that understands every dependency simultaneously.
3. Supercharged Research and Synthesis:
Imagine loading every relevant paper on a specific scientific topic published in the last decade—or the entirety of a complex legal case history—into a single prompt, allowing the model to synthesize breakthroughs and identify novel connections instantaneously.
A Cognitive Leap Forward
This approach intuitively makes sense. It mirrors how high-level human expertise often functions. We frequently recall information spatially and visually—the location of a passage in a book or the shape of a diagram.
By expanding the AI's effective memory by an order of magnitude, we are approaching the cognitive fluency seen in legendary thinkers like the physicist Hans Bethe. Bethe was known for internalizing vast quantities of physical data, allowing him to compute and reason seamlessly without consulting references. This visual compression technique promises to gift AI with a similarly supercharged, uninterrupted cognitive capacity.
The Path Ahead
Crucial questions certainly remain. Can an LLM reason with the same precision and articulation over these hyper-compressed visual tokens as it does with symbolic text? What are the exact tradeoffs between compression levels and cognitive fidelity?
It is highly plausible that proprietary models like Google's Gemini already employ similar techniques to achieve their massive context capabilities. But the revolutionary aspect here is DeepSeek’s commitment to transparency. By open-sourcing the methodology and the weights, they have democratized this breakthrough, igniting a firestorm of innovation across the global research community.
This is not merely an academic curiosity. It is a potential inflection point in the history of AI. By redefining the limits of information density, DeepSeek has opened the door to a new generation of systems capable of holistic reasoning over entire domains of human knowledge. The game has fundamentally changed.
 

tamsen_ikard

Captain
Registered Member
DeepSeek has just unveiled research that could very well redefine the landscape of artificial intelligence, yet they’ve masked this monumental achievement under the unassuming title of "DeepSeek-OCR."
Do not be misled. While the model exhibits state-of-the-art OCR capabilities, its true significance lies not in reading text, but in a radical reimagining of how Large Language Models (LLMs) encode and process information. This is a seismic shift that challenges the fundamental constraints of modern AI architecture.
The Tokenization Barrier Shattered
Historically, the integration of vision into LLMs has been notoriously inefficient. In traditional multimodal systems, visual tokens were treated as bulky, secondary additions to the text-based paradigm. Converting a lengthy document into pixels required vastly more tokens than the symbolic text itself. This inefficiency relegated the visual modality to handling only data that couldn't be expressed in words.
DeepSeek has spectacularly inverted this relationship.
They haven't just optimized the process; they have achieved a stunning 10x compression advantage using their novel visual tokens compared to standard text tokens. To put this into perspective: a 10,000-word dissertation that might consume 15,000 text tokens can now be visually encoded and compressed into just 1,500 visual tokens, with near-lossless fidelity.
This is a profound breakthrough. By treating information visually, we can now achieve an information density far beyond what symbolic language representation allows.
The Dawn of the Mega-Context Era
This compression advantage is the key to unlocking capabilities that were previously impossible. We are no longer talking about incremental increases in context windows; we are facing the immediate potential for effective context windows scaling into the tens of millions of tokens.
The implications for real-world applications are staggering and signal a qualitative leap in AI utility:
1. The Potential Obsolescence of RAG:
Complex Retrieval-Augmented Generation (RAG) systems exist primarily to circumvent limited memory. This new paradigm renders much of RAG obsolete. Instead of forcing the AI to constantly interrupt its reasoning to search external databases, we can now preload an organization's entire knowledge base—every document, every contract, every technical archive—directly into the model's working memory.
2. Holistic Code Understanding:
Developers will be able to ingest entire, massive code repositories into the context and cache it. The AI will maintain a complete, real-time understanding of the entire architecture, allowing for profound debugging, refactoring, and feature generation that understands every dependency simultaneously.
3. Supercharged Research and Synthesis:
Imagine loading every relevant paper on a specific scientific topic published in the last decade—or the entirety of a complex legal case history—into a single prompt, allowing the model to synthesize breakthroughs and identify novel connections instantaneously.
A Cognitive Leap Forward
This approach intuitively makes sense. It mirrors how high-level human expertise often functions. We frequently recall information spatially and visually—the location of a passage in a book or the shape of a diagram.
By expanding the AI's effective memory by an order of magnitude, we are approaching the cognitive fluency seen in legendary thinkers like the physicist Hans Bethe. Bethe was known for internalizing vast quantities of physical data, allowing him to compute and reason seamlessly without consulting references. This visual compression technique promises to gift AI with a similarly supercharged, uninterrupted cognitive capacity.
The Path Ahead
Crucial questions certainly remain. Can an LLM reason with the same precision and articulation over these hyper-compressed visual tokens as it does with symbolic text? What are the exact tradeoffs between compression levels and cognitive fidelity?
It is highly plausible that proprietary models like Google's Gemini already employ similar techniques to achieve their massive context capabilities. But the revolutionary aspect here is DeepSeek’s commitment to transparency. By open-sourcing the methodology and the weights, they have democratized this breakthrough, igniting a firestorm of innovation across the global research community.
This is not merely an academic curiosity. It is a potential inflection point in the history of AI. By redefining the limits of information density, DeepSeek has opened the door to a new generation of systems capable of holistic reasoning over entire domains of human knowledge. The game has fundamentally changed.
I dont like that deepseek is giving out all of these advanced research achievements for free. which are then being used on American closed AI models with much higher bubble budget and also access to all the chips and then they are staying on top in terms of benchmarks.
 

Sinofan

Just Hatched
Registered Member
DeepSeek has just unveiled research that could very well redefine the landscape of artificial intelligence, yet they’ve masked this monumental achievement under the unassuming title of "DeepSeek-OCR."
Do not be misled. While the model exhibits state-of-the-art OCR capabilities, its true significance lies not in reading text, but in a radical reimagining of how Large Language Models (LLMs) encode and process information. This is a seismic shift that challenges the fundamental constraints of modern AI architecture.
The Tokenization Barrier Shattered
Historically, the integration of vision into LLMs has been notoriously inefficient. In traditional multimodal systems, visual tokens were treated as bulky, secondary additions to the text-based paradigm. Converting a lengthy document into pixels required vastly more tokens than the symbolic text itself. This inefficiency relegated the visual modality to handling only data that couldn't be expressed in words.
DeepSeek has spectacularly inverted this relationship.
They haven't just optimized the process; they have achieved a stunning 10x compression advantage using their novel visual tokens compared to standard text tokens. To put this into perspective: a 10,000-word dissertation that might consume 15,000 text tokens can now be visually encoded and compressed into just 1,500 visual tokens, with near-lossless fidelity.
This is a profound breakthrough. By treating information visually, we can now achieve an information density far beyond what symbolic language representation allows.
The Dawn of the Mega-Context Era
This compression advantage is the key to unlocking capabilities that were previously impossible. We are no longer talking about incremental increases in context windows; we are facing the immediate potential for effective context windows scaling into the tens of millions of tokens.
The implications for real-world applications are staggering and signal a qualitative leap in AI utility:
1. The Potential Obsolescence of RAG:
Complex Retrieval-Augmented Generation (RAG) systems exist primarily to circumvent limited memory. This new paradigm renders much of RAG obsolete. Instead of forcing the AI to constantly interrupt its reasoning to search external databases, we can now preload an organization's entire knowledge base—every document, every contract, every technical archive—directly into the model's working memory.
2. Holistic Code Understanding:
Developers will be able to ingest entire, massive code repositories into the context and cache it. The AI will maintain a complete, real-time understanding of the entire architecture, allowing for profound debugging, refactoring, and feature generation that understands every dependency simultaneously.
3. Supercharged Research and Synthesis:
Imagine loading every relevant paper on a specific scientific topic published in the last decade—or the entirety of a complex legal case history—into a single prompt, allowing the model to synthesize breakthroughs and identify novel connections instantaneously.
A Cognitive Leap Forward
This approach intuitively makes sense. It mirrors how high-level human expertise often functions. We frequently recall information spatially and visually—the location of a passage in a book or the shape of a diagram.
By expanding the AI's effective memory by an order of magnitude, we are approaching the cognitive fluency seen in legendary thinkers like the physicist Hans Bethe. Bethe was known for internalizing vast quantities of physical data, allowing him to compute and reason seamlessly without consulting references. This visual compression technique promises to gift AI with a similarly supercharged, uninterrupted cognitive capacity.
The Path Ahead
Crucial questions certainly remain. Can an LLM reason with the same precision and articulation over these hyper-compressed visual tokens as it does with symbolic text? What are the exact tradeoffs between compression levels and cognitive fidelity?
It is highly plausible that proprietary models like Google's Gemini already employ similar techniques to achieve their massive context capabilities. But the revolutionary aspect here is DeepSeek’s commitment to transparency. By open-sourcing the methodology and the weights, they have democratized this breakthrough, igniting a firestorm of innovation across the global research community.
This is not merely an academic curiosity. It is a potential inflection point in the history of AI. By redefining the limits of information density, DeepSeek has opened the door to a new generation of systems capable of holistic reasoning over entire domains of human knowledge. The game has fundamentally changed.
Below is a response I got from a friend who runs the AI program in one of the leading university in Southeast Asia.
我这门外汉真的跟不上AI的进展。。。
--------

It’s good of DeepSeek to tell the world their achievements and how they did it.
It is highly plausible that proprietary models like Google's Gemini already employ similar techniques to achieve their massive context capabilities.
This is indeed the case. In case you are interested in the astonishing increase in the capacity of context window, see the following which is taken from
Please, Log in or Register to view URLs content!

⁠The Context Revolution: From Scarcity to Abundance
The context window explosion made Claude Code possible:

2022-2025 Context-Poor Era:

•⁠ ⁠GPT-4: 8K tokens (~12 pages)

•⁠ ⁠GPT-4-32k: 32K tokens (~50 pages)

2025 and beyond Context Revolution:

•⁠ ⁠Claude Sonnet 4: 200k tokens (~700 pages)

•⁠ ⁠Gemini 2.5: 1M tokens (~3,000 pages)

•⁠ ⁠Grok 4-fast: 2M tokens (~6,000 pages)

At 2M tokens, you can fit an entire year of SEC filings for most companies.

The trajectory is even more dramatic: we’re likely heading toward 10M+ context windows by 2027, with Sam Altman hinting at billions of context tokens on the horizon. This represents a fundamental shift in how AI systems process information. Equally important, attention mechanisms are rapidly improving—LLMs are becoming far better at maintaining coherence and focus across massive context windows without getting “lost” in the noise.
 
Top