Technology
DeBERTa
Microsoft's transformer model that outperformed humans on the SuperGLUE benchmark using disentangled attention and an enhanced mask decoder.
DeBERTa (Decoding-enhanced BERT with disentangled Attention) improves on BERT and RoBERTa by representing each word using two separate vectors for content and relative position. This disentangled attention mechanism allows the model to capture word dependencies more effectively than standard transformers. By integrating a Virtual Adversarial Training (VAT) method during fine-tuning, DeBERTa-v3 achieved a 90.0 score on the SuperGLUE leaderboard, surpassing the human baseline of 89.8. It remains a top-tier choice for NLU tasks like named entity recognition and question answering due to its superior parameter efficiency.
Recent Talks & Demos
Showing 1-0 of 0