Technology

Tacotron 2

Google's neural network architecture for direct speech synthesis from text using recurrent sequence-to-sequence models and WaveNet vocoders.

Tacotron 2 streamlines the text-to-speech pipeline by mapping character sequences directly to mel-scale spectrograms. This architecture combines a recurrent sequence-to-sequence model with an attention mechanism to handle alignment, followed by a modified WaveNet vocoder to generate the final 24 kHz audio. By eliminating complex hand-engineered features like phoneme alignments or linguistic prosody models, the system achieves a Mean Opinion Score (MOS) of 4.53, nearly matching the 4.58 score of natural human speech. It remains a foundational framework for producing high-fidelity, natural-sounding synthetic voices in production environments.

https://github.com/google/tacotron

1 project · 1 city

Related technologies

Amazon Polly 1 Amazon Transcribe 2 Azure Speech to Text 1 BERT 179 CMU Sphinx 2 eSpeak 1 Festival 1 Google Cloud Speech-to-Text 2 Google Cloud Text-to-Speech 1 GPT-3 191 GPT-4 528 IBM Watson Speech to Text 2 IBM Watson Text to Speech 1 Kaldi 2 Keras 74 Microsoft Azure Text-to-Speech 1 Mozilla DeepSpeech 1 ONNX 82

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

UzbekVoice

Tashkent Oct 31

Google Cloud Speech-to-Text Google Cloud Text-to-Speech