Technology

fuyu-8B

Fuyu-8B is an 8-billion parameter, multimodal transformer (text and image) from Adept AI, purpose-built for high-speed, fine-grained visual reasoning in digital agents.

This is Fuyu-8B: an 8B-parameter, open-source, multimodal model from Adept. It is a decoder-only transformer, uniquely engineered for digital agent workflows, not just general vision tasks. The architecture is simplified, eliminating a dedicated image encoder, which allows it to natively support arbitrary image resolutions. This design is key to its performance on agent-specific tasks: it excels at UI-based reasoning, fine-grained localization (bbox_to_text/text_to_bbox), and visual question-answering. Crucially, Fuyu-8B is fast, delivering responses for large images in under 100 milliseconds, a critical metric for real-time agent deployment.

https://www.adept.ai/blog/fuyu-8b

1 project · 1 city

Related technologies

Computer Vision 22 OpenAI API 507 OpenAI Function Calling API 1 RAG 137

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

—headful

Berlin Nov 24

fuyu-8B OpenAI API