Duque Ortega Mutis · Mexico City · open to junior ML / MLOps roles

Building ML systems
that survive production.

Three services deployed on GKE and EKS. Three production incidents measured, root-caused and fixed. One open-source template that packaged the lessons. Built on fourteen years of running operations before the first model.

Duque Ortega Mutis in a professional portrait

Portfolio

one monorepo · three production services

Fourteen years of operations flow into the build: churn, financial NLP and demand forecasting — tested, deployed and operated on GKE and EKS.

0 ML services · one monorepo
0+ automated tests
0 architecture decision records
81% error rate under load — after the fix

The work

Projects, problems, solutions

Each service, the production problem it hit, and the fix — measured, root-caused and documented. Keep scrolling — the cases slide through.

incident // serving

errors under load
81% → 0%
cpu request
2000m → 1000m
model
AUC 0.87 · 90% cov

BankChurn Predictor

81% of requests failing. The model was fine.

A load test exposed an 81% error rate on the BankChurn API. Root cause: uvicorn --workers inside Kubernetes — multiple workers competing for one shared CPU budget produce thrashing, not parallelism. Redesigned the inference path with asyncio plus a ThreadPoolExecutor (GIL analysis documented), errors dropped to zero and the CPU request was halved.

client fastapi · 1 worker threadpool model event loop stays free — probes alive under load

trade-off // nlp serving

accuracy
80.6%
coverage
98%
inference
CPU-only · low cost

NLPInsight Analyzer

The heavier model we chose not to ship

Financial sentiment classification where the production question mattered more than the leaderboard: a transformer would score higher and cost more to operate, explain and debug. NLPInsight ships a lightweight, explainable path — and documents the rejected alternative as an engineering decision, not an omission.

text tf-idf + linear api transformer documented, not shipped — operability won

leakage // forecasting

0.96 — honest
volume
6.3M trips
etl
PySpark · temporal CV

ChicagoTaxi Pipeline

A score too good to be true — until it was

Demand forecasting over 6.3M Chicago taxi trips. The first model looked perfect because a feature was leaking the future into training. Removed the leak, rebuilt validation as strictly temporal, and the R² of 0.96 that survived honest re-evaluation is the one published.

6.3M trips pyspark etl temporal cv forecast leaky feature → removed before metrics

Template

every lesson, packaged as defaults

The ML-MLOps Production Template turns the portfolio's production lessons into a reusable open-source system — the second project, and the strongest proof of product thinking.

  • 32 anti-patternsD-01..D-32, each with its corrective action
  • 28 ADRsdecisions with rejected alternatives
  • SLSA L2 supply chainsigned images + SBOM attestation
  • AUTO · CONSULT · STOPagent behavior protocol
  • 6 env×cloud overlaysdev / staging / prod on GCP + AWS
  • Governed AI-assisted devreviewable, bounded agentic coding

The toolbox

Capability map

ml engineering

  • Python
  • scikit-learn
  • XGBoost / LightGBM
  • SHAP explainability
  • Pandera validation

mlops · serving

  • FastAPI
  • Docker · Kubernetes
  • MLflow
  • GitHub Actions CI/CD
  • Prometheus · Grafana

cloud · infra

  • GCP — GKE · Workload Identity
  • AWS — EKS · IRSA
  • Terraform
  • Kustomize overlays
  • Cosign · SBOM

data

  • PySpark
  • Pandas
  • Temporal validation
  • Drift — PSI · Evidently
  • Leakage gates

governed ai-assisted development Claude Code · Cursor · Windsurf — behavior rules, skills, audit trail, eval gates. Engineered, not hidden.

Next system

Let's talk about the systems
you need to survive production.

DuqueOrtegaMutis@gmail.com

available in 2 weeks · cdmx hybrid · remote latam / us / eu · spanish native · english professional