Architectural Decision Records¶

17 documented decisions covering every non-trivial choice in this portfolio. These are not explanations of what was built — they are records of what was evaluated, what was rejected, and why.

Last Updated: April 2026 | Portfolio Version: 3.6.0

Full ADR Library: See docs/decisions/ for detailed per-decision records with context, alternatives considered, consequences, and verification evidence.

Why ADRs Matter¶

Most portfolios show tools used. This one shows decisions made — the trade-offs, the alternatives rejected, the incidents that forced a rethink. ADRs are the engineering artifact that separates "I deployed X" from "I chose X over Y because of Z, and here's the measured result."

Decision Summary¶

Infrastructure & Scaling¶

ADR	Decision	Choice	Key Rationale	Revisit When
001	HPA Metric Selection	CPU-only HPA	Memory-based HPA mathematically cannot scale down ML pods (fixed model RAM footprint). Proved with `ceil(replicas × usage/target)` formula	Custom metrics HPA (latency-based) needed
002	Model Storage	emptyDir + Init Container	Models downloaded from GCS/S3 at pod startup. Decouples model versioning from Docker images. Zero persistent storage cost	Models >1GB or startup time >30s
013	Multi-Cloud Parity	Functional parity, not identical infra	ML workloads identical across GKE/EKS. Node counts, instance types intentionally differ (autoscaler-managed)	Regulatory requirement for identical infra
014	Uvicorn Workers	Single-worker pod pattern	`uvicorn --workers N` under K8s is an anti-pattern: shared CPU budget → thrashing, diluted HPA signal. 1 worker + HPA = correct scaling	N/A — this is the K8s-native pattern
016	GCP vs AWS Performance	Accept 2-3× GCP latency under load	`e2-medium` shared CPU ($24/mo) vs `t3.medium` burstable ($~145/mo). Both meet <500ms SLA with 0% errors. FinOps decision, not performance bug	SLA requires <200ms p95 on GCP

ML Model & Inference¶

ADR	Decision	Choice	Key Rationale	Revisit When
003	BankChurn Model	StackingClassifier (4 base + LR meta)	AUC 0.87 vs 0.85 single model. Documented that single LightGBM might be better in production (simpler, faster). Kept for engineering depth	Latency budget <50ms
005	Dependency Pinning	Compatible release (`~=`)	numpy 2.x silently corrupted joblib-serialized models — worst category of bug (no error, wrong predictions). `~=` blocks major/minor, receives patches	Adopt lockfile-based approach
010	SHAP Explainability	KernelExplainer fallback	TreeExplainer incompatible with StackingClassifier → all-zero SHAP in production. KernelExplainer computes in original 10-feature space for business interpretability. ~4.5s latency accepted (opt-in)	Pre-computed SHAP at training time
015	Async Inference	ThreadPoolExecutor (4 threads)	sklearn/XGBoost/LightGBM release GIL during C extensions → real threading parallelism. Resolved 81% error rate under load. CPU halved (2000m → 1000m)	GPU inference (different concurrency model)
017	ML Serving Platform	Custom FastAPI + K8s primary, SageMaker complement	Custom = full control (SHAP middleware, Prometheus metrics, multi-cloud). SageMaker = managed complement for versatility demonstration	Team >5 engineers or >10 models

MLOps Pipeline¶

ADR	Decision	Choice	Key Rationale	Revisit When
004	Observability	OpenTelemetry with no-op fallback	Production pods emit traces; dev/test/CI environments incur zero overhead. Activated via env var, no Docker image bloat	Vendor-specific APM required
006	Retraining Trigger	K8s CronJob → GitHub Actions webhook	No new infra (Airflow = 3-5 pods + DB). Reuses CI pipeline. PSI-based drift detection daily	>5 models or real-time drift detection
007	Feature Store	Deferred — not needed	All features in request payload. Training-serving skew prevented structurally by serialized `model.joblib` pipeline. Full Feast architecture designed for when it IS needed	Time-window aggregations or shared features across models
008	Deployment Strategy	Argo Rollouts canary (20→50→100%)	Prometheus-based analysis templates with auto-rollback on error rate or latency. Replaces K8s all-or-nothing RollingUpdate for ML services	A/B testing with business metrics

Engineering Discipline¶

ADR	Decision	Choice	Key Rationale	Revisit When
009	Simplification Audit	Removed CarVision, fixed data leakage	MAPE 32.9% isn't defensible for a pricing model. ChicagoTaxi had same-period aggregates (data leakage). Knowing when NOT to build is harder than building	N/A — ongoing discipline
011	Gradio in Production	Local dev tool only	Deploying to K8s = architectural inconsistency (1/3 services with UI), duplicates Swagger, wastes cluster resources	User-facing demo requirement
012	Security Scanner Policy	Tiered: acknowledge in staging, remediate in production	Demonstrates security awareness without over-engineering for demo data. All findings have inline justifications	Production with real customer data

Production Incidents That Drove Decisions¶

These weren't planned features — they were problems discovered in production that forced architectural changes.

Incident	Symptom	Root Cause	Resolution	ADR
81% failure rate	BankChurn errors under 100 users	`uvicorn --workers N` shares CPU budget → thrashing	Single-worker + ThreadPoolExecutor (GIL release)	014, 015
HPA stuck at 3 replicas	Pods never scale down after load drops	Memory-based HPA + fixed ML model footprint	CPU-only HPA, verified 3→1 in 8 min	001
SHAP all-zero values	`?explain=true` returns zero contributions	(1) shap missing from prod reqs (2) TreeExplainer ≠ StackingClassifier	KernelExplainer in 10-feature space	010
Silent wrong predictions	numpy 2.x + joblib deserialization	dtype layout change in numpy 2.0, no error raised	Compatible release pinning (`~=`)	005
Data leakage	ChicagoTaxi R² suspiciously high	Same-period aggregates as features	Lag features + temporal split	009
GCP 2-3× slower	Higher latency under load vs AWS	`e2-medium` shared CPU vs `t3.medium` burstable	Documented as FinOps trade-off	016

Decision Flow¶

New Problem
    │
    ├─ Is there a managed service? → Evaluate cost/control trade-off (ADR-017)
    │
    ├─ Does it add >1 component? → Justify complexity or defer (ADR-006, 007, 009)
    │
    ├─ Can we prove it works with data? → Measure before/after (ADR-001, 014, 015)
    │
    └─ Document the decision → ADR with context, alternatives, consequences

How to Read These ADRs¶

Each ADR follows a consistent structure:

Context — What problem or opportunity triggered the decision
Decision — What was chosen and why
Alternatives Considered — What was rejected and why (often the most valuable section)
Consequences — Positive, negative, and neutral outcomes
Verification — Measured evidence that the decision was correct

For technical reviewers: Start with ADR-014/015 (async inference incident) and ADR-001 (HPA scaling) — they best demonstrate production debugging methodology.

This page is part of the ML-MLOps Portfolio. See also: Engineering Highlights for a quick reference of incidents and trade-offs.