Portfolio Status¶
Operating status
Production-oriented evidence, currently in showcase mode¶
This page separates what is active today from what was proven during the live cloud deployment period. It is designed for recruiters, hiring managers, and technical reviewers who need the status in minutes, not a wall of operational detail.
Executive Readout¶
What this is
Reference MLOps portfolio¶
Three ML services, multi-cloud Kubernetes artifacts, Terraform, CI/CD, observability, drift detection and ADR-backed design decisions.
What is real
Implementation, not slideware¶
The code, manifests, Terraform and workflows were used against live clusters during development. Evidence from that period is preserved in the docs.
What is off
Cost-controlled infrastructure¶
GKE, EKS, MLflow, Prometheus and Grafana are not running continuously because the cloud control-plane cost is not justified for a permanent showcase.
Active vs Paused Surfaces¶
The fastest way to review the portfolio is to separate active engineering assets from intentionally paused cloud runtime surfaces.
Active Now¶
Active
Unit and integration CI¶
ci-mlops.yml runs on push and PR with 395+ tests and 90-96% coverage.
Active
Terraform validation¶
ci-infra.yml validates infrastructure changes without requiring live clusters.
Active
GitHub Pages docs¶
This site is the current public review surface for architecture, evidence and operations.
Active
Docker build path¶
Images are built as CI artifacts and the Dockerfiles remain production-oriented.
Paused or Inactive¶
Inactive
MLflow and observability stack¶
MLflow, Prometheus and Grafana were deployed on the clusters; they are gone with them.
Paused
Promotion workflows¶
Artifact Registry and ECR promotion are disabled until a live demo is requested.
Paused
Daily drift detection¶
The scheduled trigger is disabled; workflow_dispatch is still available.
Why Infrastructure Is Off¶
Running GKE + EKS + managed Postgres + container registries continuously costs roughly $180-$220/month combined. That spend was justified during active development, load testing and incident-style validation; it is not economical as an always-on showcase.
This is the same operating logic a team would use with any paid infrastructure: keep the evidence, automation and reactivation path available, but do not pay for idle runtime when nobody is reviewing it live.
The important distinction is that the portfolio is not claiming a fictional live environment. It keeps the evidence that matters: manifests, Terraform, runbooks, ADRs, screenshots, load-test results and incident notes from the real deployment period.
Decision record
The full rationale lives in ADR-018: Portfolio Maintenance Mode.
How Repository Noise Is Controlled¶
Drift issues
Workflow dispatch only¶
The daily schedule was disabled. The previous condition treated script failures as drift events; the workflow now checks for successful drift detection and an explicit drift flag.
Security alerts
Trivy signal cleanup¶
ignore-unfixed: true keeps unfixable base-image CVEs from becoming
permanent noise while preserving actionable scanner findings.
Dependencies
Dependabot with limits¶
GitHub Actions updates run weekly and are capped at three open PRs, keeping maintenance visible without drowning the repo.
Reactivation Playbook¶
A live end-to-end demo can be restored from the existing Terraform, workflows and runbooks. Budget approval is the main prerequisite.
1
Provision infrastructure, about 30 min¶
2
Push images to cloud registries, about 15 min¶
3
Deploy to clusters, about 20 min¶
4
Re-enable scheduled drift detection¶
Uncomment the schedule: block in
.github/workflows/drift-detection.yml.
5
Run smoke tests¶
6
Teardown after the demo¶
Run terraform destroy in both cloud directories to avoid turning a
demo into recurring cost.
Maintenance Pass Summary¶
Issue hygiene
168 stale drift alerts closed¶
Each closure points reviewers back to this status page and the maintenance mode decision.
Dependency hygiene
3 Dependabot PRs merged¶
GitHub Actions bumps were merged while heavier Docker-image changes were deferred to the next active sprint.
Security hygiene
210 legacy Trivy alerts handled¶
Legacy unfixable alerts were dismissed with documented won't fix
reasoning.