ADR-010: SHAP KernelExplainer for BankChurn StackingClassifier¶

Status: Accepted
Date: 2026-03-10
Authors: Duque Ortega Mutis
Related: ADR-003 (model choice), ADR-015 (async inference)

TL;DR: TreeExplainer does not support StackingClassifier. Implemented KernelExplainer as a model-agnostic fallback that computes SHAP values in the original 10-feature space (interpretable by business stakeholders). The ~4.5s latency is acceptable because explanations are opt-in (?explain=true) and only triggered for high-risk predictions.

Context¶

BankChurn v3.0.0 uses a StackingClassifier (RF + GradientBoosting + XGBoost + LightGBM → LogisticRegression meta-learner) inside a sklearn Pipeline with three steps:

Pipeline: [ChurnFeatureEngineer] → [ColumnTransformer] → [StackingClassifier]

The /predict?explain=true endpoint is designed to return per-feature SHAP values for individual predictions, enabling business stakeholders to understand why a customer is flagged as high-risk.

Problem: The initial implementation returned all-zero feature contributions in production because: 1. shap was missing from requirements-prod.txt (only in dev requirements) 2. When SHAP was added, shap.TreeExplainer raised: "Model type not yet supported: StackingClassifier"

TreeExplainer works with single tree-based models (RandomForest, XGBoost, LightGBM, GradientBoosting) but does not support ensemble wrappers like StackingClassifier because it cannot trace prediction paths through the meta-learner combination logic.

Decision¶

Use shap.KernelExplainer as the SHAP backend for BankChurn, with a TreeExplainer attempt first (for forward-compatibility if the model changes to a single tree model).

Implementation:

# _initialize_explainer() — simplified
try:
    inner = self._unwrap_classifier(classifier)  # extract estimator from pipeline
    self.explainer = shap.TreeExplainer(inner)   # fast, but fails for StackingClassifier
    self._uses_kernel_explainer = False
    return
except Exception:
    pass  # fall through to KernelExplainer

# KernelExplainer: model-agnostic, works with any black-box
def predict_proba_wrapper(X_array):
    X_df = pd.DataFrame(X_array, columns=self.feature_names)  # restore column names
    return self.model.predict_proba(X_df)[:, 1]               # full pipeline (features + preprocessor + classifier)

self.explainer = shap.KernelExplainer(predict_proba_wrapper, X_background.values[:50])
self._uses_kernel_explainer = True

Key design choices: - The predict_proba_wrapper receives raw features (10 columns) and calls self.model.predict_proba, which internally applies the full pipeline (ChurnFeatureEngineer → ColumnTransformer → StackingClassifier) - SHAP values are therefore computed in the original feature space (10 interpretable business features), not in the expanded transformed space (38+ encoded columns) - Background data: 50 samples from Churn.csv raw features, representative of the training distribution

Alternatives Considered¶

1. TreeExplainer on individual base learners¶

Apply SHAP separately to each of RF, GB, XGB, LGB, then aggregate.

Rejected: Produces 4 separate SHAP explanations that are not directly comparable. The meta-learner (LogisticRegression) applies learned weights to base learner outputs — not captured by individual explanations. Would require a custom aggregation strategy with no standard interpretation.

2. LinearExplainer on the meta-learner¶

Explain only the LogisticRegression meta-learner using base learner out-of-fold predictions as features.

Rejected: Explains the combination weights between models (e.g., "XGBoost contributed +0.3"), not the business features (e.g., "Age contributed +0.05"). Useless for non-technical stakeholders who need feature-level attribution.

3. Remove SHAP from production¶

Remove explainability entirely or only provide it in development.

Rejected: Explainability is a core feature for portfolio differentiation and represents responsible AI practice. The ?explain=true opt-in pattern already ensures zero overhead for standard predictions.

4. Replace StackingClassifier with a single tree model¶

Downgrade from StackingClassifier (AUC 0.87) to a single LightGBM/XGBoost (AUC 0.84–0.86) to enable TreeExplainer.

Rejected: Losing 1–3% AUC for the sole purpose of SHAP compatibility is not a sound trade-off. KernelExplainer is the correct solution.

Trade-offs¶

Dimension	TreeExplainer	KernelExplainer (chosen)
Compatibility	Tree models only	Any black-box model
Latency	~5–50ms per request	~4.5s per request (measured in GKE)
Accuracy	Exact Shapley values	Approximate (sampling-based)
Background samples	Not required	50 samples (used as baseline distribution)
Interpretability space	Transformed features (38 cols)	Raw features (10 cols) ✅

The 4.5s latency for ?explain=true is an accepted trade-off because: - It is an opt-in endpoint — standard /predict requests run at 200ms p50 (GCP) / 110ms p50 (AWS) - In production, SHAP explanations would be triggered only for high-risk predictions (risk_level=HIGH), typically representing ~20% of requests - At portfolio scale (demo load test: 6.58 RPS), this does not create backpressure

Production Patterns (How This Would Scale)¶

At larger scale, this pattern would be extended as follows:

Scale	Pattern
<100 RPS	Synchronous on-demand (current implementation)
100–1000 RPS	Async queue: predict sync, explain via Celery/Redis async, webhook callback
>1000 RPS	Dedicated explainability microservice with model replica + explanation cache (Redis TTL 1h for repeated inputs)
Batch	Nightly SHAP job on high-risk segment, stored in feature store

The current synchronous implementation is appropriate for a portfolio demo and for low-traffic enterprise use cases (e.g., a retention analyst requesting explanations for flagged customers).

Verification¶

SHAP initialization (pod startup log):

INFO: Loaded 100 background samples from /app/data/raw/Churn.csv (10 features)
INFO: Feature engineering: 10 → 25 features (+15)
INFO: Initialized KernelExplainer
INFO: Model loaded successfully

Production output (measured 2026-03-10):

{
  "churn_probability": 0.4067,
  "feature_contributions": {
    "NumOfProducts":  0.1059,
    "Age":            0.0500,
    "Balance":        0.0267,
    "Gender":         0.0289,
    "CreditScore":    0.0241,
    "Tenure":         0.0168,
    "IsActiveMember": -0.0510,
    "Geography":      -0.0130,
    "EstimatedSalary": -0.0137,
    "HasCrCard":      -0.0167
  }
}

Interpretation: Single product (NumOfProducts=1, +0.106) and age 42 (+0.05) are the strongest churn risk factors for this customer, while active membership (-0.051) is the strongest protective factor.

Consequences¶

Positive: Real, non-zero feature contributions for all predictions with ?explain=true
Positive: SHAP values in original feature space — directly interpretable by business stakeholders
Positive: Demonstrates responsible AI practice (explainable ML) in portfolio
Negative: 4.5s latency for ?explain=true (synchronous KernelExplainer)
Negative: shap~=0.46.0 adds ~150MB to Docker image (342MB → ~490MB)
Negative: +30s pod startup time (KernelExplainer initialization with 100 background samples)

Revisit When¶

Model changes to a single tree ensemble (XGBoost/LightGBM) → switch to TreeExplainer (<50ms)
Throughput requirement for explanations exceeds 10 RPS → move to async architecture
shap 0.47+ adds native StackingClassifier support → re-evaluate TreeExplainer