ADR-016: GCP vs AWS Performance Parity — Cost vs Performance Trade-off¶
- Status: Accepted
- Date: 2026-03-18
- Authors: Duque Ortega Mutis
- Related: ADR-015 (async inference), ADR-013 (parity policy)
TL;DR: BankChurn shows 2-3× higher latency on GCP under load due to
e2-mediumshared CPU vs AWSt3.mediumburstable credits. Accepted the difference as a documented cost vs performance trade-off ($24/mo vs $145/mo for parity). Both clouds meet the <500ms idle SLA with 0% errors under 100 concurrent users.
Context¶
After implementing async inference (ADR-015), load testing revealed significant performance differences between GCP and AWS for BankChurn:
| Environment | Idle p50 | Load p50 (50 users) | Stress p50 (100 users) |
|---|---|---|---|
| GCP (GKE) | 200ms | 3100ms | 8200ms |
| AWS (EKS) | 110ms | 120ms | 130ms |
NLPInsight and ChicagoTaxi show comparable performance on both clouds (78-100ms).
Root Cause Analysis¶
BankChurn is CPU-bound — StackingClassifier (5 base learners + meta-learner) consumes ~100ms of pure CPU time per prediction.
Infrastructure differences:
| Factor | GCP (current) | AWS (current) | Impact on BankChurn |
|---|---|---|---|
| Instance type | e2-medium |
t3.medium |
High |
| vCPU allocation | 2 shared (burstable) | 2 burstable (better credits) | High |
| CPU frequency | AMD EPYC Rome 2.2 GHz | Intel Xeon Platinum 2.5-3.1 GHz | Critical |
| Network | NGINX Ingress (L7) | NLB (L4) | Moderate |
| Node count | 1 (all 6 pods) | 1-2 (auto-scaled) | Moderate |
Why NLPInsight/ChicagoTaxi don't improve on AWS: - NLPInsight: I/O-bound (tokenizer), not CPU-saturated - ChicagoTaxi: Lightweight RandomForest (~5ms CPU), not CPU-saturated
Decision¶
Accept the performance difference as a documented cost vs performance trade-off.
Do NOT upgrade GCP instance types to match AWS performance.
Rationale¶
- BankChurn 200ms idle latency is acceptable for a bank churn prediction tool (analyst-facing, not real-time payments)
- 0% error rate under 100 concurrent users on both clouds — production-ready
- Cost efficiency:
e2-mediumat $24/mo vsn2-highcpu-2at $51/mo (+$27/mo for marginal latency improvement) - Portfolio value: Demonstrating multi-cloud trade-off analysis is more valuable than spending $100+/mo to hide infrastructure differences
Performance Targets Met¶
| Service | SLA | GCP p50 | AWS p50 | Status |
|---|---|---|---|---|
| BankChurn | < 500ms idle | 200ms | 110ms | ✅ Both pass |
| NLPInsight | < 200ms idle | 78ms | 100ms | ✅ Both pass |
| ChicagoTaxi | < 200ms idle | 100ms | 120ms | ✅ Both pass |
| All services | < 1% errors under load | 0-0.02% | 0% | ✅ Both pass |
Consequences¶
Positive¶
- Cost-effective: $24/mo GCP vs potential $145/mo for
c2-standard-4 - Educational: Portfolio demonstrates understanding of cloud trade-offs, not just "make everything fast"
- Realistic: Production systems balance cost vs performance, not maximize performance at any cost
- Documented: ADR explains the difference instead of hiding it
Negative¶
- GCP BankChurn latency is 2-3x higher than AWS under concurrent load
- May raise questions from recruiters unfamiliar with cloud instance differences
Mitigation¶
- Document the difference in
docs/MULTI_CLOUD_COMPARISON.mdwith root cause analysis - Include this ADR in portfolio navigation
- Prepare interview talking points: "I chose to optimize for cost efficiency while meeting SLAs rather than over-provisioning for vanity metrics"
Alternatives Considered¶
1. Upgrade GCP to n2-highcpu-2 (2 dedicated vCPU, 2.8 GHz)¶
- Cost: +$27/mo
- Expected improvement: 200ms → ~120-150ms
- Rejected: Marginal improvement, doubles infrastructure cost
2. Upgrade GCP to c2-standard-4 (4 dedicated vCPU, 3.8 GHz)¶
- Cost: +$121/mo
- Expected improvement: 200ms → <100ms (would beat AWS)
- Rejected: 6x cost increase for a portfolio project, unsustainable
3. Increase min_node_count to 2 (distribute load)¶
- Cost: +$24/mo
- Expected improvement: 200ms → ~150ms (moderate)
- Rejected: Still doesn't achieve parity, adds complexity
4. Node affinity for BankChurn on dedicated c2 node pool¶
- Cost: +$145/mo for dedicated node
- Expected improvement: Best performance, but highest cost
- Rejected: Over-engineering for a 3-service portfolio
Implementation¶
No code changes required. Document the performance difference in:
- docs/MULTI_CLOUD_COMPARISON.md — add root cause analysis
- docs/load-test-results.md — already includes GCP vs AWS comparison
- Portfolio README — mention cost-optimized infrastructure
Interview Talking Points¶
"BankChurn shows 2-3x higher latency on GCP under load because I'm using cost-optimized
e2-mediuminstances ($24/mo) versus AWSt3.medium. The difference is CPU frequency and allocation — BankChurn's StackingClassifier is CPU-bound, while NLPInsight and ChicagoTaxi aren't, so they perform identically on both clouds.I could upgrade to
c2-standard-4for <100ms latency, but that's $145/mo — 6x the cost — for a portfolio project. Instead, I documented the trade-off in ADR-016. In production, I'd make this decision based on actual SLAs and business requirements, not vanity metrics. Both clouds meet the <500ms SLA with 0% errors under 100 concurrent users."