Gabetni, M. Firas (2024) Compression of a Text-to-Speech model Styletts2 PFE - Project Graduation, ENSTA.

[img]
Preview
PDF
1714Kb

Abstract

The rapid growth of generative AI models both in size and computational requirements, poses significant challenges for their deployment in hardware constrained environments like automotive systems. During this internship we tried to compress StyleTTS2, a state-of-the-art text to speech (TTS) model, to enable its efficient deployment in vehicles. By employing advanced model compression techniques, including quantization and tensor networks, the model size was reduced by 6.8x without compromising audio quality. The compression process retained the model’s ability to produce high-quality, natural-sounding speech, achieving superior results compared to existing solutions in both model size and audio fidelity. This work shows how generative models can be effectively integrated into hardware with limited resources, pushing the boundaries of what is achievable in edge AI deployment.

Item Type:Thesis (PFE - Project Graduation)
Uncontrolled Keywords:Generative AI, Model Compression, Quantization, Tensor Networks, StyleTTS2, Text-to-Speech
Subjects:Information and Communication Sciences and Technologies
ID Code:10399
Deposited By:Firas GABETNI
Deposited On:04 oct. 2024 17:18
Dernière modification:04 oct. 2024 17:18

Repository Staff Only: item control page