Demo page for "Neural network-based speech waveform generative models"
[Japanese version] [Top page of personal HP]
Last update: 11 June 2022 (samples synthesized by JETS added)
More synthesized speech samples and models will be added as they become available.
Review article
T. Okamoto, "Neural network-based speech waveform generative models," J. Acoust. Soc. Jpn., vol. 78, no. 6, pp. 328–337, June 2022. (in Japanese)
Entire end to end text-to-speech: JETS (FastSpeech 2 + HiFi-GAN)
Not cited in the review
slt (trainable with few data!!)
bdl (trainable with few data!!)
LJSpeech
JSUT (24 kHz)
JSUT (48 kHz, trainable with full-band)
Update histoty
11 June 2022: Samples synthesized by JETS added
27 May 2022: Demo speech samples uploaded
Acknowledgement
The synthesized samples of LPCNet (all) and Parallel WaveGAN (only JSUT) are produced when Keisuke Matsubara with Kobe University (graduated at March 2022) was interning at NICT.