Technology

InternVL

InternVL is a high-performance, open-source multimodal model scaling vision-language integration to 26B parameters to match proprietary leaders.

InternVL delivers top-tier multimodal capabilities by pairing the InternViT-6B vision encoder with advanced language models (such as InternLM2 and Qwen2). The 2.5 release dominates 40+ industry benchmarks: it matches GPT-4o performance on the MMMU and excels in document-centric tasks like DocVQA. By utilizing a dynamic resolution strategy, the system processes high-definition inputs with precision, making it a primary choice for complex visual reasoning and long-form video analysis.

https://github.com/OpenGVLab/InternVL

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.