Mikołaj Binkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande,
Luis C Cobo, and Karen Simonyan. High fidelity speech synthesis with adversarial networks.
arXiv preprint arXiv:1909.11646, 2019.
Chris Donahue, Julian McAuley, and Miller Puckette. Adversarial audio synthesis. arXiv preprint
arXiv:1802.04208, 2018.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,
Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural informa-
tion processing systems, pages 2672–2680, 2014.
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with
conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 1125–1134, 2017.
Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In
Advances in Neural Information Processing Systems, pages 10215–10224, 2018.
Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo,
Alexandre de Brébisson, Yoshua Bengio, and Aaron C Courville. Melgan: Generative adversarial
networks for conditional waveform synthesis. In Advances in Neural Information Processing
Systems 32, pages 14910–14921, 2019.
Rithesh Kumar.
descriptinc/melgan-neurips.
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoen-
coding beyond pixels using a learned similarity metric. In International conference on machine
learning, pages 1558–1566. PMLR, 2016.
Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and Ming Liu. Neural speech synthesis with
transformer network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33,
pages 6706–6713, 2019.
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint
arXiv:1711.05101, 2017.
Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least
squares generative adversarial networks. In Proceedings of the IEEE International Conference on
Computer Vision, pages 2794–2802, 2017.
Michael Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond
mean square error. arXiv preprint arXiv:1511.05440, 2015.
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for
generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
Aaron Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu,
George Driessche, Edward Lockhart, Luis Cobo, Florian Stimberg, et al. Parallel wavenet: Fast
high-fidelity speech synthesis. In International conference on machine learning, pages 3918–3926.
PMLR, 2018.
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves,
Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw
audio. arXiv preprint arXiv:1609.03499, 2016.
Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O Arik, Ajay Kannan, Sharan Narang, Jonathan
Raiman, and John Miller. Deep voice 3: Scaling text-to-speech with convolutional sequence
learning. arXiv preprint arXiv:1710.07654, 2017.
11