Technology

and runtime for on-device FunctionGemma 270M inference

FunctionGemma 270M delivers sub-300MB on-device function calling via the LiteRT-LM runtime, clocking 126 tokens per second on mobile.

FunctionGemma 270M is a specialized variant of Google's Gemma 3 architecture optimized for low-latency function calling on edge hardware. By utilizing the LiteRT-LM runtime (formerly TensorFlow Lite), developers can deploy 8-bit quantized models that execute entirely offline with a memory footprint of just 550MB RAM. The model fits in a 271MB package and handles a 32k context window, making it ideal for 'Mobile Actions' like managing calendars or controlling device hardware without cloud round-trips. It clocks roughly 126 tokens per second on modern mobile chipsets (such as the Samsung S25 Ultra) and supports cross-platform deployment across Android, iOS, and Web via the MediaPipe LLM Inference API.

https://ai.google.dev/gemma/docs/functiongemma

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.