ADR-003: StackingClassifier for BankChurn Churn Prediction¶
- Status: Accepted
- Date: 2026-02-28
- Authors: Duque Ortega Mutis
- Related: ADR-009 (complexity justification), ADR-010 (SHAP compatibility)
TL;DR: Chose a 4-model StackingClassifier (AUC 0.87) over a single LightGBM (AUC 0.86) because the ensemble demonstrates advanced ML methodology while achieving measurably better generalization on imbalanced churn data. The 1% AUC gap is modest, but CV variance of ±0.006 confirms the gain is real, not noise.
Context¶
BankChurn predicts binary customer churn on a 10K-row dataset with 20% positive class imbalance. The model serves as a risk-scoring tool for retention analysts — AUC (ranking quality) matters more than raw accuracy.
Model Comparison (5-fold CV)¶
| Model | AUC | F1 (churn) | CV Std | Training Time | Artifact Size |
|---|---|---|---|---|---|
| LogisticRegression | 0.78 | 0.48 | ±0.008 | <1 min | <1 MB |
| RandomForest | 0.84 | 0.58 | ±0.010 | 3 min | 2 MB |
| XGBoost | 0.85 | 0.60 | ±0.007 | 4 min | 1.5 MB |
| LightGBM | 0.86 | 0.61 | ±0.006 | 2 min | 1 MB |
| VotingClassifier (soft) | 0.86 | 0.61 | ±0.007 | 8 min | 3 MB |
| StackingClassifier ✅ | 0.87 | 0.62 | ±0.006 | 20 min | 4.1 MB |
| PyTorch MLP | 0.83 | 0.55 | ±0.015 | 15 min | 8 MB |
Decision¶
Use StackingClassifier with 4 diverse base learners and a LogisticRegression meta-learner:
Pipeline: [ChurnFeatureEngineer] → [ColumnTransformer] → [StackingClassifier]
├─ RandomForest (bagging)
├─ GradientBoosting (sequential boosting)
├─ XGBoost (regularized boosting)
├─ LightGBM (leaf-wise boosting)
└─ LogisticRegression (meta-learner, 5-fold CV)
Why Diverse Base Learners?¶
Each base learner captures different signal patterns: - RandomForest: Robust to outliers via bagging; captures non-linear interactions - GradientBoosting: Sequential error correction; strong on residual patterns - XGBoost: L1/L2 regularization prevents overfitting on small datasets - LightGBM: Leaf-wise growth finds deep interactions; fastest individual model
The meta-learner (LogReg) learns optimal combination weights from 5-fold out-of-fold predictions — it cannot overfit to training data because it only sees held-out predictions.
Alternatives Considered¶
| Option | AUC | Verdict | Rationale |
|---|---|---|---|
| Single LightGBM | 0.86 | Viable | Simpler, faster; 1% AUC gap is small but real on imbalanced data |
| VotingClassifier | 0.86 | Rejected | No learned combination weights — just averaging; same complexity, less benefit |
| PyTorch MLP | 0.83 | Rejected | Worse performance on tabular data; adds PyTorch dependency for no gain |
| StackingClassifier | 0.87 | Selected ✅ | Best AUC, lowest CV variance, demonstrates ensemble methodology |
Honest Trade-off (see ADR-009)¶
The 0.01 AUC improvement over single LightGBM is modest. In a production system with strict latency/cost constraints, the simpler model would likely win. This portfolio keeps StackingClassifier to demonstrate ensemble methodology — the engineering challenge of serving a complex model (async inference, SHAP compatibility) is part of the learning objective.
Consequences¶
- Positive: Best AUC (0.87) with robust generalization (CV ±0.006)
- Positive: Demonstrates advanced ensemble methods and sklearn Pipeline integration
- Positive: Meta-learner weights are interpretable — shows which base learner contributes most
- Negative: 4× training time vs single model (~20 min vs ~5 min)
- Negative: 4× model artifact size (4.1 MB vs ~1 MB)
- Negative: Requires KernelExplainer for SHAP (TreeExplainer incompatible) — adds ~4.5s per explanation (ADR-010)
- Negative: CPU-bound inference requires async thread pool to avoid event loop blocking (ADR-015)
Revisit When¶
- Training data grows >100K rows — consider online learning or incremental models
- Inference latency SLA drops below 50ms — single LightGBM with TreeExplainer would be faster
- SHAP adds native StackingClassifier support — would remove the KernelExplainer overhead
References¶
- ADR-009: Simplification — When Not to Build — justifies keeping StackingClassifier
- ADR-010: SHAP KernelExplainer — SHAP compatibility consequence
- ADR-015: Async Inference — inference performance consequence
- Wolpert, D.H. (1992). Stacked Generalization. Neural Networks, 5(2), 241-259