Gabetni, M. Firas (2024) Compression of a Text-to-Speech model Styletts2 PFE - Projet de fin d'études, ENSTA.
Fichier(s) associé(s) à ce document :
| PDF 1714Kb |
Résumé
The rapid growth of generative AI models both in size and computational requirements, poses significant challenges for their deployment in hardware constrained environments like automotive systems. During this internship we tried to compress StyleTTS2, a state-of-the-art text to speech (TTS) model, to enable its efficient deployment in vehicles. By employing advanced model compression techniques, including quantization and tensor networks, the model size was reduced by 6.8x without compromising audio quality. The compression process retained the model’s ability to produce high-quality, natural-sounding speech, achieving superior results compared to existing solutions in both model size and audio fidelity. This work shows how generative models can be effectively integrated into hardware with limited resources, pushing the boundaries of what is achievable in edge AI deployment.
Type de document: | Rapport ou mémoire (PFE - Projet de fin d'études) |
---|---|
Mots-clés libres: | Generative AI, Model Compression, Quantization, Tensor Networks, StyleTTS2, Text-to-Speech |
Sujets: | Sciences et technologies de l'information et de la communication |
Code ID : | 10399 |
Déposé par : | Firas GABETNI |
Déposé le : | 04 oct. 2024 17:18 |
Dernière modification: | 04 oct. 2024 17:18 |