Tacotron 2 Projects .

Technology

Tacotron 2

Google's neural network architecture for direct speech synthesis from text using recurrent sequence-to-sequence models and WaveNet vocoders.

Tacotron 2 streamlines the text-to-speech pipeline by mapping character sequences directly to mel-scale spectrograms. This architecture combines a recurrent sequence-to-sequence model with an attention mechanism to handle alignment, followed by a modified WaveNet vocoder to generate the final 24 kHz audio. By eliminating complex hand-engineered features like phoneme alignments or linguistic prosody models, the system achieves a Mean Opinion Score (MOS) of 4.53, nearly matching the 4.58 score of natural human speech. It remains a foundational framework for producing high-fidelity, natural-sounding synthetic voices in production environments.

https://github.com/google/tacotron
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects