Technology

Groundedness metric

Quantifies the factual consistency of an LLM's output: it measures the degree to which a generated response is supported by the provided source context, directly opposing hallucination.

Groundedness is a critical evaluation metric for Large Language Model (LLM) applications, particularly in Retrieval-Augmented Generation (RAG) pipelines. It assesses if the LLM's answer is substantiated *only* by the retrieved documents, ensuring reliability for enterprise use cases (e.g., knowledge-based agents). The score operates on a 0.0 to 1.0 scale (1.0 being fully faithful), often calculated by verifying each sentence in the response against the source text. For instance, a low score like 0.64 indicates a 'Poor' RAG pipeline, signaling an immediate need to optimize the document retriever or generation step to boost factual accuracy.

https://deepset.ai/blog/measuring-llm-groundedness-in-rag-systems-with-evaluation-metrics

1 project · 1 city

Related technologies

Fine-tuning 20 GPT-4 528 LLM 89 Prompting 5

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

GPT-4 Prompting vs Fine-tuning

Zürich May 8

GPT-4 Prompting