CHORNA, Mme Sofiia (2025) Mapping atomistic datasets for machine learning potentials PFE - Projet de fin d'études, ENSTA.

Fichier(s) associé(s) à ce document :

[img]PDF
7Mb

Résumé

The increasing complexity of atomistic machine learning potentials (MLIPs) necessitates robust datasets and interpretable models to predict atomic-scale properties efficiently. This work presents a comprehensive analysis and visualization of the Massive Atomic Diversity (MAD) dataset, developed at the Laboratory of Computational Science and Modeling (COSMO, EPFL) for training universal MLIPs, such as PET-MAD. We introduce a generalizable approach to map atom- istic datasets into intuitive, low-dimensional representations by leveraging the last-layer features of ML models. This method directly compares MAD’s chemical and structural diversity against other established benchmarks. Furthermore, we investigate the latent spaces learned by prominent MLIPs to gain an understanding of their underlying atomic representations. The study demon- strates a systematic framework for the characterization, visualization, and integration of large-scale atomistic datasets, thereby advancing the development of more efficient and interpretable machine learning models in materials modeling.

Type de document:Rapport ou mémoire (PFE - Projet de fin d'études)
Mots-clés libres:Interatomic potentials, atomic representations, data visualization, latent spaces, material modeling
Sujets:Sciences et technologies de l'information et de la communication
Science des matériaux, mécanique, génie mécanique
Code ID :10835
Déposé par :Sofiia CHORNA
Déposé le :08 oct. 2025 10:34
Dernière modification:08 oct. 2025 10:34

Modifier les métadonnées de ce document.