Technology
M-MAD
M-MAD is an LLM-based framework using multi-agent debate for robust, multidimensional machine translation evaluation.
M-MAD, or Multidimensional Multi-Agent Debate, is a systematic framework designed to enhance LLM-as-a-judge machine translation evaluation. It operates in three stages: decoupling MQM criteria into distinct dimensions, conducting multi-agent debates within each dimension, and synthesizing the results into a final judgment. This approach significantly improves segment-level performance, often outperforming existing LLM-as-a-judge methods and competing with state-of-the-art automatic metrics. The framework's design emphasizes collaborative reasoning and fine-grained assessment to achieve robust and reliable evaluation outcomes aligned with human judgments.
Recent Talks & Demos
Showing 1-0 of 0