FIZYCKI, M. Tom (2025) Studying plasticity in language learning: pretraining on formal languages for sample efficiency PFE - Projet de fin d'études, ENSTA.
Fichier(s) associé(s) à ce document :
| PDF 2592Kb |
Résumé
This research investigates whether pre-pretraining on formal languages can induce transferable inductive biases that improve sample efficiency in natural language learning. While large language models achieve remarkable performance through massive data scaling, this approach is computationally expensive and does not explain how linguistic structure is acquired efficiently—a capability that human learners demonstrate from limited exposure. We conducted four main experiments using formal languages (particularly Dyck languages) as controlled pre-pretraining substrates before adaptation to natural language corpora. Our experiments examined: (1) the transfer benefits of pre-pretraining across different formal and natural language combinations, (2) the effects of active forgetting through periodic embedding resets on model plasticity, (3) cross-lingual transfer patterns across English, French, German, and Italian Wikipedia, and (4) embedding space stability and dynamics during training. Working in resource-constrained regimes (10M-100M tokens), we evaluated models using both perplexity and the BLiMP syntactic benchmark. Results show that formal language pre-pretraining yields modest improvements in low-data settings, with gains of approximately 1-4 percentage points on BLiMP scores. However, active forgetting did not consistently improve downstream performance, and cross-lingual experiments revealed substantial catastrophic forgetting when adapting across languages. Embedding analysis demonstrated that models converge to distinct but stable representational equilibria after each reset, with early training steps largely determining final configurations. These findings suggest that formal languages provide interpretable testbeds for studying inductive biases, though their practical benefits remain limited in magnitude. The work contributes to understanding how structural priors emerge and transfer in neural language models, offering both empirical evidence and methodological frameworks for future research on sample-efficient pretraining.
| Type de document: | Rapport ou mémoire (PFE - Projet de fin d'études) |
|---|---|
| Mots-clés libres: | inductive biases, formal languages, pre-pretraining, sample efficiency, embedding dynamics, active forgetting, language model plasticity |
| Sujets: | Sciences et technologies de l'information et de la communication |
| Code ID : | 10905 |
| Déposé par : | Tom FIZYCKI |
| Déposé le : | 11 févr. 2026 12:05 |
| Dernière modification: | 11 févr. 2026 12:05 |