Credit Card Fraud Detection

A real-time fraud detection framework using stacking ensemble learning, explainable AI, and adaptive retraining to detect credit card fraud under evolving fraud patterns.

Suresh Kaidyala

10/22/20242 min read

Overview

Credit card fraud continues to grow as digital payments, e-commerce, mobile banking, and card-not-present transactions become more common. Traditional rule-based fraud systems are often limited because fraudsters continuously adapt their behavior to avoid fixed thresholds and known detection patterns. This research proposes an Adaptive Stacking Ensemble framework that combines XGBoost, LightGBM, and CatBoost with explainable AI techniques such as SHAP and LIME to detect fraudulent transactions in real time. The framework also includes concept drift monitoring, allowing the model to retrain when fraud patterns change over time.

Why Real-Time Fraud Detection Needs Adaptive AI

Fraud detection is challenging because fraudulent transactions usually represent only a very small percentage of total transactions. This creates a class imbalance problem where models can easily become biased toward legitimate transactions and miss actual fraud.

Another major challenge is concept drift. Fraud patterns change continuously as attackers adjust their strategies, test new transaction behaviors, and exploit new payment channels. A model that performs well at the time of deployment can lose accuracy within months if it is not monitored and updated.

This study addresses these challenges by using SMOTE-Tomek to manage class imbalance and a Kolmogorov–Smirnov drift detector to monitor changes in transaction behavior. When drift is detected, the system can trigger partial or full retraining depending on the severity of the shift.

Machine Learning Approach and Key Results

The proposed Adaptive Stacking Ensemble uses three gradient boosting models as base learners: XGBoost, LightGBM, and CatBoost. Their outputs are then passed into a LightGBM meta-learner, which learns how to combine the strengths of each model for better fraud detection performance.

The framework was evaluated on more than 1.24 million credit card transactions over a 12-month period. The final model achieved strong results, including a PR-AUC of 0.981, ROC-AUC of 0.994, precision of 0.917, recall of 0.931, and F1 score of 0.924.

The study also used SHAP-based feature selection to identify the most important fraud predictors. Key drivers included transaction amount, behavioral PCA components, time since last transaction, card velocity within 24 hours, merchant category, and geographic distance from the cardholder’s registered address.

Explainability, Deployment Value, and Future Scope

A key strength of this framework is that it does not only predict fraud; it also explains why a transaction was flagged. SHAP provides both global and transaction-level explanations, while LIME gives an additional model-agnostic explanation for individual fraud alerts.

This is important in financial environments where fraud analysts, compliance teams, and regulators need clear reasoning behind automated decisions. In the analyst review exercise, SHAP explanations received an average usefulness rating of 4.3 out of 5.0, showing that the explanations were practical and easy to interpret.

The adaptive retraining pipeline also makes the framework suitable for real-world deployment. When moderate drift was detected, the system retrained the meta-learner. When major drift was detected, it retrained the full model using a rolling six-month transaction window. This helped restore performance close to the original baseline after fraud patterns changed.

Future work can focus on transfer learning across financial institutions, improving interpretability of behavioral features, detecting specific fraud typology shifts, and exploring federated learning so institutions can collaborate without sharing sensitive transaction data.

My Research

Download