Baoueb, Ms Teysir (2023) SpecDiff-GAN: A spectrally-shaped noise diffusion GAN for speech and music synthesis PFE - Project Graduation, ENSTA.

[img]
Preview
PDF
773Kb

Abstract

Generative Adversarial Network (GAN) models can synthesise high-quality audio signals while ensuring fast sample generation. However, they are difficult to train and are prone to several issues including mode collapse and divergence. In this work, we introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN, which was initially devised for speech synthesis from mel spectrogram. In our model, the training stability is enhanced by means of a forward diffusion process which consists in injecting noise from a Gaussian distribution to both real and fake samples before inputting them to the discriminator. We further improve the model by exploiting a spectrally-shaped noise distribution with the aim to make the discriminator’s task more challenging. We then show the merits of our proposed model for speech and music synthesis on several datasets. Our experiments confirm that our model compares favorably in audio quality and efficiency compared to several baselines.

Item Type:Thesis (PFE - Project Graduation)
Uncontrolled Keywords:Generative Adversarial Networks (GANs), diffusion process, deep audio synthesis, spectral envelope, mel spectrogram inversion
Subjects:Information and Communication Sciences and Technologies
Mathematics and Applications
ID Code:9751
Deposited By:Teysir BAOUEB
Deposited On:02 oct. 2023 17:27
Dernière modification:02 oct. 2023 17:27

Repository Staff Only: item control page