Technology
TinyLlama
A compact 1.1B parameter Llama model pre-trained on 3 trillion tokens for high-performance edge computing.
TinyLlama packs the punch of the Llama 2 architecture into a lean 1.1 billion parameter framework. Developed by the Singapore University of Technology and Design, it was trained on 3 trillion tokens using 16 NVIDIA H100 GPUs over 90 days. This efficiency allows it to run on devices with limited VRAM (under 3GB) while maintaining a throughput of 50 tokens per second on consumer hardware. It is the go-to choice for developers needing local, low-latency text generation without the overhead of massive server-grade clusters.
Recent Talks & Demos
Showing 1-0 of 0