Gabetni, M. Firas (2024) Compression of a Text-to-Speech model Styletts2 PFE - Projet de fin d'études, ENSTA.

Fichier(s) associé(s) à ce document :

[img]
Prévisualisation
PDF
1714Kb

Résumé

The rapid growth of generative AI models both in size and computational requirements, poses significant challenges for their deployment in hardware constrained environments like automotive systems. During this internship we tried to compress StyleTTS2, a state-of-the-art text to speech (TTS) model, to enable its efficient deployment in vehicles. By employing advanced model compression techniques, including quantization and tensor networks, the model size was reduced by 6.8x without compromising audio quality. The compression process retained the model’s ability to produce high-quality, natural-sounding speech, achieving superior results compared to existing solutions in both model size and audio fidelity. This work shows how generative models can be effectively integrated into hardware with limited resources, pushing the boundaries of what is achievable in edge AI deployment.

Type de document:Rapport ou mémoire (PFE - Projet de fin d'études)
Mots-clés libres:Generative AI, Model Compression, Quantization, Tensor Networks, StyleTTS2, Text-to-Speech
Sujets:Sciences et technologies de l'information et de la communication
Code ID :10399
Déposé par :Firas GABETNI
Déposé le :04 oct. 2024 17:18
Dernière modification:04 oct. 2024 17:18

Modifier les métadonnées de ce document.