Technology

DINOv3

Meta AI's 7B-parameter self-supervised Vision Transformer (ViT) serves as a universal, frozen backbone, delivering state-of-the-art performance across 60+ computer vision tasks without task-specific fine-tuning.

DINOv3 (Distillation with No Labels v3) is Meta AI’s third-generation self-supervised vision foundation model. It scales a Vision Transformer architecture to 7 billion parameters, training on an unprecedented 1.7 billion unlabeled images. The core innovation establishes DINOv3 as a universal 'frozen backbone': it produces rich, high-quality features that can be leveraged for diverse downstream tasks—like object detection (achieving 66.1 mAP on COCO) and semantic segmentation—by simply attaching a lightweight adapter, eliminating the need to retrain the core model. This efficiency, combined with technical breakthroughs like Gram Anchoring (which stabilizes dense features), makes DINOv3 a robust, general-purpose vision encoder for commercial and research applications.

https://github.com/facebookresearch/dinov3

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.