Technology

Gemini prefix caching

Gemini context caching reduces costs by 90% and slashes latency by reusing frequently accessed prefixes like massive document sets or video files.

Gemini context caching (also known as prefix caching) allows developers to store and reuse large amounts of input data across multiple requests. By caching the KV cache of a common prefix (such as a 100,000 token technical manual or a 1 hour video file) the model skips redundant computations for every subsequent prompt. Google offers two paths: implicit caching, which is enabled by default for Gemini 2.5 and newer models to provide automatic 90% discounts on hits, and explicit caching, which gives developers manual control over Time to Live (TTL) and guaranteed cost savings. This technology is essential for high frequency agentic workflows and RAG pipelines where the system instruction or reference corpus remains static while user queries change.

https://ai.google.dev/gemini-api/docs/caching

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.