Explainable AI for Plasma Amino Acid Diagnosis

A SHAP-integrated XGBoost ensemble framework that delivers both high diagnostic accuracy and transparent, biochemically interpretable reasoning for each clinical prediction.

Suresh Kaidyala

3/6/20232 min read

Explainable Machine Learning for Plasma Amino Acid Classification

Inherited metabolic disorders can be serious and sometimes life-threatening if they are not identified early. One important way clinicians investigate these disorders is through plasma amino acid profiling, which helps detect abnormal patterns in amino acids. However, interpreting these profiles is not always simple because the diagnosis often depends on multiple amino acids changing together, not just one abnormal value. This research focuses on using machine learning to support faster and more accurate interpretation of plasma amino acid results.

The study introduces an explainable AI approach using XGBoost, SMOTE, and SHAP to classify plasma amino acid profiles. The main goal is not only to predict whether a profile is normal or abnormal, but also to explain which amino acids influenced the prediction. This is important because clinicians need to understand the reasoning behind an AI result before trusting it in a clinical diagnostic workflow.

Why Plasma Amino Acid Classification Matters

Plasma amino acid testing plays an important role in diagnosing inherited metabolic disorders such as phenylketonuria, maple syrup urine disease, urea cycle disorders, and other amino acid pathway conditions. These diseases can cause severe complications if diagnosis is delayed. The challenge is that many of these disorders show complex biochemical patterns, where several amino acids must be interpreted together. Because of this, machine learning can be useful as a decision-support tool for clinical laboratories.

How Machine Learning Was Used in the Study

In this research, the dataset included thousands of plasma amino acid profiles collected from multiple clinical laboratories. The study compared different machine learning models, including Random Forest, Gradient Boosted Trees, and XGBoost. Among these, XGBoost performed the best, especially when combined with SMOTE, which helped balance rare disease classes in the dataset. This made the model better at identifying abnormal and rare metabolic conditions instead of only predicting the most common normal class.

Role of SHAP in Making AI Explainable

One of the strongest parts of this research is the use of SHAP explainability. Instead of giving only a final diagnosis or prediction, SHAP helps show which amino acids contributed most to the model’s decision. For example, phenylalanine, citrulline, tyrosine, and branched-chain amino acids were identified as important contributors. This makes the AI system more clinically meaningful because its reasoning matches known biochemical patterns used by experts.

Key Findings and Clinical Importance

The XGBoost model with SMOTE and SHAP showed strong performance in both binary and multiclass classification. The model achieved high accuracy while still providing clear explanations for each prediction. Expert reviewers confirmed that the SHAP explanations matched standard clinical reasoning in most abnormal cases. This shows that explainable AI can support clinical scientists by improving efficiency, reducing interpretation burden, and increasing confidence in AI-assisted diagnostic decisions.

Conclusion

Overall, this research shows that explainable machine learning can be a valuable tool in metabolic diagnostics. By combining strong predictive performance with transparent feature-level explanations, the proposed framework can help bridge the gap between AI research and real clinical laboratory use. With further validation, this type of system could support faster diagnosis, better clinical decision-making, and improved patient outcomes in inherited metabolic disorders.

My Research

Download