Zhipu's new MaaS: VLM with the best 10B effect is launched, and Agents application space is launched
The Zhipu Open Platform Industry Ecosystem Conference was held at the Zhangjiang Science Hall in Pudong, Shanghai. The conference brought together government leaders, top developers and corporate users to present Zhipu's core achievements in multimodal intelligence and MaaS ecology. At the conference, Pudong Venture Capital Group and Zhangjiang Group jointly announced a strategic investment in Zhipu, injecting solid momentum into Zhipu's construction of a trusted artificial intelligence infrastructure.
In his keynote speech, Zhang Peng, CEO of Zhipu, announced two latest achievements of Zhipu and its ecological partners in moving towards AGI: one is the open source release of the new generation general visual language model GLM-4.1V-Thinking , which takes reasoning ability as the core breakthrough and refreshes the performance ceiling of the 10B-level multimodal model; the other is the launch of the new MaaS Agent aggregation platform "Application Space" , which fully activates AI capabilities in industry scenarios and links with Z Fund to launch a special support plan for Agent pioneers.
At the same time, in the keynote speech session of the conference, Wu Weijie, senior vice president of Zhipu, Li Chengjie, vice president and chief digital intelligence officer of Mengniu Group, and Lv Xufeng, deputy director of China UnionPay Financial Technology Research Institute, respectively shared their views on the implementation of big models. In the panel session, Hu Xiuhan, founder of NieTA, Wang Zhentong, co-founder of AiPPT, Guaizi, CMO of Flowith, and Shen Ling, general manager of Zhangjiang Zhihui, exchanged views on the topic of bigmodel native entrepreneurship.
he model performs particularly well in the following tasks, demonstrating high versatility and robustness:
Image General: Accurately identify and comprehensively analyze image and text information;
Math & Science: Supports complex problem solving, multi-step deduction, and formula understanding;
Video Understanding (Video): Capable of time series analysis and event logic modeling;
GUI and web agent tasks (UI2Code, Agent): understand the interface structure and assist in automated operations;
Visual anchoring and entity positioning (Grounding): Accurately align language and image areas to improve the controllability of human-computer interaction.
Currently, GLM-4.1V-9B-Thinking has been open-sourced simultaneously in Hugging Face and the MoDa community. It contains two models:
GLM-4.1V-9B-Base , which is expected to help more researchers explore the capabilities of visual language models; and
GLM-4.1V-9B-Thinking , which is a model with deep thinking and reasoning capabilities. This model is used for normal use and experience.