Gabetni, M. Firas (2024) Compression of a Text-to-Speech model Styletts2 PFE - Project Graduation, ENSTA.
![]()
| PDF 1714Kb |
Abstract
The rapid growth of generative AI models both in size and computational requirements, poses significant challenges for their deployment in hardware constrained environments like automotive systems. During this internship we tried to compress StyleTTS2, a state-of-the-art text to speech (TTS) model, to enable its efficient deployment in vehicles. By employing advanced model compression techniques, including quantization and tensor networks, the model size was reduced by 6.8x without compromising audio quality. The compression process retained the model’s ability to produce high-quality, natural-sounding speech, achieving superior results compared to existing solutions in both model size and audio fidelity. This work shows how generative models can be effectively integrated into hardware with limited resources, pushing the boundaries of what is achievable in edge AI deployment.
Item Type: | Thesis (PFE - Project Graduation) |
---|---|
Uncontrolled Keywords: | Generative AI, Model Compression, Quantization, Tensor Networks, StyleTTS2, Text-to-Speech |
Subjects: | Information and Communication Sciences and Technologies |
ID Code: | 10399 |
Deposited By: | Firas GABETNI |
Deposited On: | 04 oct. 2024 17:18 |
Dernière modification: | 04 oct. 2024 17:18 |
Repository Staff Only: item control page