Tongyi Qianwen has open-sourced a new visual understanding model, Qwen2.5-VL, the flagship visual language model of the Qwen model family, with three sizes: 3B, 7B, and 72B.
The main features of Qwen2.5-VL:
◆Visual understanding: Qwen2.5-VL is not only good at identifying common objects such as flowers, birds, fish, and insects, but also can analyze text, charts, icons, graphics, and layouts in images.
◆Agent: Qwen2.5-VL directly acts as a visual agent, can reason and use tools dynamically, and has the initial ability to use computers and mobile phones.
◆Understanding long videos and capturing events: Qwen2.5-VL can understand videos of more than 1 hour, and this time it has the new ability to capture events by accurately locating related video clips.
◆Visual positioning: Qwen2.5-VL can accurately locate objects in images by generating bounding boxes or points, and can provide stable JSON output for coordinates and attributes.
◆Structured output: For invoices, forms, tables and other data, Qwen2.5-VL supports structured output of its content, which benefits applications in finance, commerce and other fields.
The main features of Qwen2.5-VL:
◆Visual understanding: Qwen2.5-VL is not only good at identifying common objects such as flowers, birds, fish, and insects, but also can analyze text, charts, icons, graphics, and layouts in images.
◆Agent: Qwen2.5-VL directly acts as a visual agent, can reason and use tools dynamically, and has the initial ability to use computers and mobile phones.
◆Understanding long videos and capturing events: Qwen2.5-VL can understand videos of more than 1 hour, and this time it has the new ability to capture events by accurately locating related video clips.
◆Visual positioning: Qwen2.5-VL can accurately locate objects in images by generating bounding boxes or points, and can provide stable JSON output for coordinates and attributes.
◆Structured output: For invoices, forms, tables and other data, Qwen2.5-VL supports structured output of its content, which benefits applications in finance, commerce and other fields.