Duque Ortega Mutis · Mexico City · open to junior ML / MLOps roles

Building ML systems
that survive production.

Three services deployed on GKE and EKS. Three production incidents measured, root-caused and fixed. One open-source template that packaged the lessons. Built on fourteen years of running operations before the first model.

Projects About Contact

Portfolio

one monorepo · three production services

Fourteen years of operations flow into the build: churn, financial NLP and demand forecasting — tested, deployed and operated on GKE and EKS.

0 ML services · one monorepo

0+ automated tests

0 architecture decision records

81% error rate under load — after the fix

The work

Projects, problems, solutions

Each service, the production problem it hit, and the fix — measured, root-caused and documented. Keep scrolling — the cases slide through.

incident // serving

errors under load: 81% → 0%
cpu request: 2000m → 1000m
model: AUC 0.87 · 90% cov

BankChurn Predictor

81% of requests failing. The model was fine.

A load test exposed an 81% error rate on the BankChurn API. Root cause: uvicorn --workers inside Kubernetes — multiple workers competing for one shared CPU budget produce thrashing, not parallelism. Redesigned the inference path with asyncio plus a ThreadPoolExecutor (GIL analysis documented), errors dropped to zero and the CPU request was halved.

See the BankChurn Predictor service Read the debugging deep dive

trade-off // nlp serving

accuracy: 80.6%
coverage: 98%
inference: CPU-only · low cost

NLPInsight Analyzer

The heavier model we chose not to ship

Financial sentiment classification where the production question mattered more than the leaderboard: a transformer would score higher and cost more to operate, explain and debug. NLPInsight ships a lightweight, explainable path — and documents the rejected alternative as an engineering decision, not an omission.

See the NLPInsight Analyzer service

leakage // forecasting

r²: 0.96 — honest
volume: 6.3M trips
etl: PySpark · temporal CV

ChicagoTaxi Pipeline

A score too good to be true — until it was

Demand forecasting over 6.3M Chicago taxi trips. The first model looked perfect because a feature was leaking the future into training. Removed the leak, rebuilt validation as strictly temporal, and the R² of 0.96 that survived honest re-evaluation is the one published.

See the ChicagoTaxi Pipeline service

Template

every lesson, packaged as defaults

The ML-MLOps Production Template turns the portfolio's production lessons into a reusable open-source system — the second project, and the strongest proof of product thinking.

32 anti-patternsD-01..D-32, each with its corrective action
28 ADRsdecisions with rejected alternatives
SLSA L2 supply chainsigned images + SBOM attestation
AUTO · CONSULT · STOPagent behavior protocol
6 env×cloud overlaysdev / staging / prod on GCP + AWS
Governed AI-assisted devreviewable, bounded agentic coding

Production Template Repository

The toolbox

Capability map

ml engineering

Python
scikit-learn
XGBoost / LightGBM
SHAP explainability
Pandera validation

mlops · serving

FastAPI
Docker · Kubernetes
MLflow
GitHub Actions CI/CD
Prometheus · Grafana

cloud · infra

GCP — GKE · Workload Identity
AWS — EKS · IRSA
Terraform
Kustomize overlays
Cosign · SBOM

data

PySpark
Pandas
Temporal validation
Drift — PSI · Evidently
Leakage gates

governed ai-assisted development Claude Code · Cursor · Windsurf — behavior rules, skills, audit trail, eval gates. Engineered, not hidden.

Next system

Let's talk about the systems
you need to survive production.

DuqueOrtegaMutis@gmail.com

available in 2 weeks · cdmx hybrid · remote latam / us / eu · spanish native · english professional