Technology
and runtime for on-device FunctionGemma 270M inference
FunctionGemma 270M delivers sub-300MB on-device function calling via the LiteRT-LM runtime, clocking 126 tokens per second on mobile.
FunctionGemma 270M is a specialized variant of Google's Gemma 3 architecture optimized for low-latency function calling on edge hardware. By utilizing the LiteRT-LM runtime (formerly TensorFlow Lite), developers can deploy 8-bit quantized models that execute entirely offline with a memory footprint of just 550MB RAM. The model fits in a 271MB package and handles a 32k context window, making it ideal for 'Mobile Actions' like managing calendars or controlling device hardware without cloud round-trips. It clocks roughly 126 tokens per second on modern mobile chipsets (such as the Samsung S25 Ultra) and supports cross-platform deployment across Android, iOS, and Web via the MediaPipe LLM Inference API.
Recent Talks & Demos
Showing 1-0 of 0