CHORNA, Mme Sofiia (2025) Mapping atomistic datasets for machine learning potentials PFE - Project Graduation, ENSTA.

[img]PDF
7Mb

Abstract

The increasing complexity of atomistic machine learning potentials (MLIPs) necessitates robust datasets and interpretable models to predict atomic-scale properties efficiently. This work presents a comprehensive analysis and visualization of the Massive Atomic Diversity (MAD) dataset, developed at the Laboratory of Computational Science and Modeling (COSMO, EPFL) for training universal MLIPs, such as PET-MAD. We introduce a generalizable approach to map atom- istic datasets into intuitive, low-dimensional representations by leveraging the last-layer features of ML models. This method directly compares MAD’s chemical and structural diversity against other established benchmarks. Furthermore, we investigate the latent spaces learned by prominent MLIPs to gain an understanding of their underlying atomic representations. The study demon- strates a systematic framework for the characterization, visualization, and integration of large-scale atomistic datasets, thereby advancing the development of more efficient and interpretable machine learning models in materials modeling.

Item Type:Thesis (PFE - Project Graduation)
Uncontrolled Keywords:Interatomic potentials, atomic representations, data visualization, latent spaces, material modeling
Subjects:Information and Communication Sciences and Technologies
Materials Science, Mechanics and Mechanical Engineering
ID Code:10835
Deposited By:Sofiia CHORNA
Deposited On:08 oct. 2025 10:34
Dernière modification:08 oct. 2025 10:34

Repository Staff Only: item control page