Technology
DINOv3
Meta AI's 7B-parameter self-supervised Vision Transformer (ViT) serves as a universal, frozen backbone, delivering state-of-the-art performance across 60+ computer vision tasks without task-specific fine-tuning.
DINOv3 (Distillation with No Labels v3) is Meta AI’s third-generation self-supervised vision foundation model. It scales a Vision Transformer architecture to 7 billion parameters, training on an unprecedented 1.7 billion unlabeled images. The core innovation establishes DINOv3 as a universal 'frozen backbone': it produces rich, high-quality features that can be leveraged for diverse downstream tasks—like object detection (achieving 66.1 mAP on COCO) and semantic segmentation—by simply attaching a lightweight adapter, eliminating the need to retrain the core model. This efficiency, combined with technical breakthroughs like Gram Anchoring (which stabilizes dense features), makes DINOv3 a robust, general-purpose vision encoder for commercial and research applications.
Recent Talks & Demos
Showing 1-0 of 0