Technology

M-MAD

M-MAD is an LLM-based framework using multi-agent debate for robust, multidimensional machine translation evaluation.

M-MAD, or Multidimensional Multi-Agent Debate, is a systematic framework designed to enhance LLM-as-a-judge machine translation evaluation. It operates in three stages: decoupling MQM criteria into distinct dimensions, conducting multi-agent debates within each dimension, and synthesizing the results into a final judgment. This approach significantly improves segment-level performance, often outperforming existing LLM-as-a-judge methods and competing with state-of-the-art automatic metrics. The framework's design emphasizes collaborative reasoning and fine-grained assessment to achieve robust and reliable evaluation outcomes aligned with human judgments.

https://arxiv.org/abs/2412.20127

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.