Technology

TinyLlama

A compact 1.1B parameter Llama model pre-trained on 3 trillion tokens for high-performance edge computing.

TinyLlama packs the punch of the Llama 2 architecture into a lean 1.1 billion parameter framework. Developed by the Singapore University of Technology and Design, it was trained on 3 trillion tokens using 16 NVIDIA H100 GPUs over 90 days. This efficiency allows it to run on devices with limited VRAM (under 3GB) while maintaining a throughput of 50 tokens per second on consumer hardware. It is the go-to choice for developers needing local, low-latency text generation without the overhead of massive server-grade clusters.

https://github.com/jzhang38/TinyLlama

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.