COSTA WERNECK, M. André (2024) Leveraging Machine Learning for Healthcare: A Study on Medical Code Embeddings PFE - Project Graduation, ENSTA.
![]()
| PDF 4074Kb |
Abstract
In this work, we set out to enhance the existing local approaches used at Qantev for analyzing ICD code consistency by developing a general embedding model based on contrastive learning. This model aimed to capture the relationships between ICD codes and apply them broadly across use cases like fraud detection (mainly) and patient journey analysis, thereby improving upon the client-specific methods cur- rently in place. We tackled the problem as a classification task, using precision, recall, and F1 scores to evaluate performance, and also analyzed similarity distri- butions on validation data to assess training impact. While the model effectively learned structural and semantic relationships between ICD codes, its performance struggled to generalize across varied clinical/client datasets due to significant dif- ferences between the training and clinical data. This disparity limited the models ability to replace local analysis. We concluded that while the model shows promise, it cannot currently outperform the local client-specific methods. However, with bet- ter and more representative data, it holds potential for improved fraud detection and patient care optimization. Future efforts will focus on refining the model and exploring new strategies, such as enhancing cross-relations between diagnosis and procedure codes.
Item Type: | Thesis (PFE - Project Graduation) |
---|---|
Uncontrolled Keywords: | ICD codes, General Embedding model, Fraud detection, Consistency analysis, Classification task, Enhance local approaches, Contrastive Learning |
Subjects: | Information and Communication Sciences and Technologies |
ID Code: | 10398 |
Deposited By: | André COSTA WERNECK |
Deposited On: | 30 oct. 2024 11:56 |
Dernière modification: | 30 oct. 2024 11:56 |
Repository Staff Only: item control page