Skip to content

Managed ML Platforms — SageMaker (AWS) & Vertex AI (GCP)

Multi-paradigm portfolio: Custom FastAPI + Kubernetes as the primary serving architecture, AND managed endpoints (SageMaker + Vertex AI) to demonstrate managed-platform skills. See ADR-017 for the full comparison.

Architecture: Three Paradigms, One Portfolio

┌───────────────────────────────────────────────────────────────────────────────────┐
│                           ML-MLOps-Portfolio                                      │
│                                                                                   │
│  ┌──────────────────────┐  ┌───────────────────────┐  ┌────────────────────────┐  │
│  │  1. Custom FastAPI   │  │  2. AWS SageMaker     │  │  3. GCP Vertex AI      │  │
│  │  on K8s (GKE + EKS)  │  │  Managed Endpoint     │  │  Managed Endpoint      │  │
│  │  ── PRIMARY ──       │  │  ── COMPLEMENT ──     │  │  ── COMPLEMENT ──      │  │
│  │                      │  │                       │  │                        │  │
│  │  ✅ SHAP middleware  │  │  ✅ Built-in scaling │  │  ✅ Built-in scaling   │  │
│  │  ✅ Prometheus       │  │  ✅ CloudWatch       │  │  ✅ Cloud Monitoring   │  │
│  │  ✅ Multi-cloud      │  │  ✅ Model Monitor    │  │  ✅ Model Monitoring   │  │
│  │  ✅ Full control     │  │  ✅ 5-min deploy     │  │  ✅ 5-min deploy       │  │
│  │  ❌ More maintenance │  │  ❌ AWS-only         │  │  ❌ GCP-only           │  │
│  └──────────────────────┘  └───────────────────────┘  └────────────────────────┘  │
│                                                                                   │
│  "I can build custom infrastructure AND use managed platforms —                   │
│   the right choice depends on team context, not personal preference."             │
└───────────────────────────────────────────────────────────────────────────────────┘

Part 1: AWS SageMaker Endpoint

Prerequisites (AWS)

# AWS CLI configured
export AWS_PROFILE=ml-portfolio
aws sts get-caller-identity  # Verify access

# Python packages
pip install sagemaker boto3

# SageMaker execution role (one-time setup)
bash scripts/sagemaker/setup-role.sh

Quick Start (SageMaker)

# 1. Deploy endpoint (~5 minutes)
python scripts/sagemaker/deploy_endpoint.py

# 2. Test endpoint
python scripts/sagemaker/deploy_endpoint.py test

# 3. Check status
python scripts/sagemaker/deploy_endpoint.py status

# 4. DELETE when done (stops charges!)
python scripts/sagemaker/deploy_endpoint.py delete

How SageMaker Works

Model Packaging

SageMaker requires a model.tar.gz containing:

model.tar.gz
├── model.joblib      ← trained BankChurn StackingClassifier
└── inference.py      ← SageMaker inference handler

The inference.py implements 4 functions that SageMaker calls automatically:

Function Purpose Called When
model_fn(model_dir) Load model from disk Container startup
input_fn(body, content_type) Deserialize JSON → DataFrame Each request
predict_fn(data, model) Run predict_proba() Each request
output_fn(prediction, accept) Serialize dict → JSON Each request

SageMaker Deployment Flow

model.joblib + inference.py
    model.tar.gz ──▶ S3 bucket
    SageMaker Model (references S3 + sklearn container)
    Endpoint Config (instance type, count)
    Endpoint (InService) ──▶ HTTPS endpoint ready for invocations

SageMaker Auto-Scaling

# Example: Configure SageMaker auto-scaling (not in portfolio scripts — for reference)
import boto3

client = boto3.client("application-autoscaling")

# Register scalable target
client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId="endpoint/bankchurn-endpoint/variant/primary",
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=1,
    MaxCapacity=5,
)

# Target tracking policy — scale on invocations per instance
client.put_scaling_policy(
    PolicyName="bankchurn-scaling",
    ServiceNamespace="sagemaker",
    ResourceId="endpoint/bankchurn-endpoint/variant/primary",
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    PolicyType="TargetTrackingScaling",
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 100.0,  # 100 invocations per instance
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"
        },
        "ScaleInCooldown": 300,
        "ScaleOutCooldown": 60,
    },
)

Compare to the custom K8s HPA approach:

# k8s/bankchurn-hpa.yaml (portfolio's custom approach)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 1
  maxReplicas: 3
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Key difference: SageMaker scales on invocations per instance (business metric), K8s HPA scales on CPU utilization (infrastructure metric). SageMaker's approach is more ML-native; K8s gives more infrastructure control.

SageMaker Model Monitor

# Example: Set up data quality monitoring (reference — not in portfolio scripts)
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

monitor = DefaultModelMonitor(
    role=role_arn,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
)

# Create baseline from training data
monitor.suggest_baseline(
    baseline_dataset="s3://bucket/training-data.csv",
    dataset_format=DatasetFormat.csv(header=True),
)

# Schedule hourly monitoring
monitor.create_monitoring_schedule(
    monitor_schedule_name="bankchurn-monitor",
    endpoint_input=endpoint_name,
    output_s3_uri="s3://bucket/monitoring-output",
    schedule_cron_expression="cron(0 * ? * * *)",
)

Compare to the portfolio's custom drift detection:

# k8s/drift-detection-cronjob.yaml (portfolio's custom approach)
# Runs daily at 06:00 UTC — KS test + PSI + Evidently report
kubectl get cronjobs -n ml-portfolio

Key difference: SageMaker Model Monitor gives you out-of-the-box data quality checks with CloudWatch alerts. Custom drift detection (KS + PSI + Evidently) gives full control over statistical tests and thresholds.

SageMaker Infrastructure Created

Resource Name Cost
SageMaker Model bankchurn-model Free
Endpoint Config bankchurn-endpoint-config Free
Endpoint bankchurn-endpoint ~$0.065/hr (~$47/mo if 24/7)
S3 artifact s3://ml-portfolio-ml-models-production/sagemaker/bankchurn/model.tar.gz ~$0.001/mo

⚠️ Cost control: Only keep the endpoint alive during demos. Deploy → test → delete. Recreating takes ~5 minutes.

SageMaker Troubleshooting

Issue Cause Solution
NoSuchEntityException on role IAM role not created Run bash scripts/sagemaker/setup-role.sh
Endpoint stuck in Creating Large model or cold region Wait up to 10 minutes; check CloudWatch logs
ModelError on invoke inference.py error Check aws logs for /aws/sagemaker/Endpoints/bankchurn-endpoint
AccessDeniedException Role missing S3 or SageMaker perms Re-run setup-role.sh
High latency on first call Cold start Second call will be ~150ms
ValidationException Wrong sklearn framework version Verify 1.2-1 matches your model's sklearn version

Part 2: GCP Vertex AI Endpoint

Prerequisites (GCP)

# GCP CLI configured
export GCP_PROJECT=ml-portfolio-duque-om-202602
export GCP_REGION=us-central1
gcloud auth application-default login

# Python packages
pip install google-cloud-aiplatform google-cloud-storage

# Vertex AI service account (one-time setup)
bash scripts/vertex_ai/setup-service-account.sh

# Enable Vertex AI API (if not already)
gcloud services enable aiplatform.googleapis.com --project=$GCP_PROJECT

Quick Start (Vertex AI)

# 1. Deploy endpoint (~5-10 minutes)
python scripts/vertex_ai/deploy_endpoint.py

# 2. Test endpoint
python scripts/vertex_ai/deploy_endpoint.py test

# 3. Check status
python scripts/vertex_ai/deploy_endpoint.py status

# 4. DELETE when done (stops charges!)
python scripts/vertex_ai/deploy_endpoint.py delete

How Vertex AI Works

Model Upload

Vertex AI does NOT require model.tar.gz — it reads directly from a GCS directory:

gs://bucket/vertex-ai/bankchurn/
└── model.joblib      ← trained BankChurn StackingClassifier

For custom preprocessing, Vertex AI uses a Predictor class (vs SageMaker's 4 functions):

# scripts/vertex_ai/predictor.py — Vertex AI Custom Prediction Routine
class BankChurnPredictor:
    def __init__(self):
        """Load model (called once at container startup)."""
        model_path = os.path.join(os.environ["AIP_STORAGE_URI"], "model.joblib")
        self._model = joblib.load(model_path)

    def predict(self, instances: List[Dict]) -> List[Dict]:
        """Run prediction (called for each request)."""
        df = pd.DataFrame(instances)
        proba = self._model.predict_proba(df)
        return [{"churn_probability": round(float(p[1]), 4)} for p in proba]

Compare the inference contracts:

Aspect SageMaker (inference.py) Vertex AI (predictor.py)
Pattern 4 standalone functions 1 class with methods
Model loading model_fn(model_dir) __init__(self)
Input parsing input_fn(body, content_type) SDK handles JSON automatically
Prediction predict_fn(data, model) predict(self, instances)
Output output_fn(prediction, accept) postprocess(self, prediction) (optional)
Batch support Must handle in predict_fn instances is always a list

Vertex AI Deployment Flow

model.joblib
  GCS bucket (gs://bucket/vertex-ai/bankchurn/)
  Vertex AI Model (references GCS + sklearn container)
  Vertex AI Endpoint (display name, region)
  Deploy Model to Endpoint (machine type, replicas)
  Endpoint (ready) ──▶ HTTPS endpoint for predictions

Vertex AI Auto-Scaling

# Vertex AI auto-scaling is configured during deploy
model.deploy(
    endpoint=endpoint,
    machine_type="n1-standard-2",
    min_replica_count=1,      # Minimum instances
    max_replica_count=5,      # Maximum instances
    traffic_percentage=100,
    # Auto-scaling is automatic based on CPU utilization
    # No separate scaling policy needed (unlike SageMaker)
)

Key difference: Vertex AI auto-scaling is simpler — it's built into the deploy() call. SageMaker requires a separate Application Auto Scaling policy. Vertex AI scales based on CPU/GPU utilization automatically.

Vertex AI Model Monitoring

# Example: Set up model monitoring on Vertex AI (reference)
from google.cloud import aiplatform

# Create monitoring job
job = aiplatform.ModelDeploymentMonitoringJob.create(
    display_name="bankchurn-monitoring",
    endpoint=endpoint,
    logging_sampling_strategy={"random_sample_config": {"sample_rate": 0.8}},
    log_ttl="7776000s",  # 90 days
    # Skew detection (training vs serving data)
    model_deployment_monitoring_objective_configs=[{
        "deployed_model_id": deployed_model_id,
        "objective_config": {
            "training_dataset": {
                "gcs_source": {"uris": ["gs://bucket/training-data.csv"]},
                "data_format": "csv",
                "target_field": "Exited",
            },
            "training_prediction_skew_detection_config": {
                "skew_thresholds": {
                    "Age": {"value": 0.3},
                    "Balance": {"value": 0.3},
                }
            },
        },
    }],
    schedule_config={"monitor_interval": {"seconds": 3600}},  # Hourly
)

Vertex AI Infrastructure Created

Resource Name Cost
Vertex AI Model bankchurn-model Free
Vertex AI Endpoint bankchurn-endpoint ~$0.095/hr (~$68/mo if 24/7)
GCS artifact gs://...ml-models-production/vertex-ai/bankchurn/model.joblib ~$0.001/mo

⚠️ Cost control: Same as SageMaker — deploy only during demos. Delete immediately after.

Vertex AI Troubleshooting

Issue Cause Solution
PermissionDenied Missing Vertex AI roles Run bash scripts/vertex_ai/setup-service-account.sh
RESOURCE_EXHAUSTED Region quota exceeded Try us-central1 or request quota increase
Model upload fails GCS permissions Verify storage.objectViewer role on service account
Endpoint stuck deploying Container pull or health check failure Check Vertex AI console logs
InvalidArgument on predict Wrong input format Vertex AI expects {"instances": [...]}
Slow first request Container cold start ~5-10s on first request, then ~150ms

Part 3: Full Comparison — Custom vs SageMaker vs Vertex AI

Latency Comparison

Metric Custom FastAPI (K8s) SageMaker (AWS) Vertex AI (GCP)
p50 latency ~103ms ~150-200ms ~150-200ms
p95 latency ~150ms ~300ms ~300ms
Cold start ~3s (pod) ~5s (container) ~5-10s (container)
Deploy time ~15 min (CI/CD) ~5 min ~5-10 min

Feature Matrix

Feature Custom FastAPI SageMaker Vertex AI
SHAP explainability (?explain=true)
Custom Prometheus metrics
Multi-cloud portable ✅ (GKE + EKS) ❌ (AWS) ❌ (GCP)
Auto-scaling HPA (CPU-based) Built-in (invocations) Built-in (CPU)
A/B testing Argo Rollouts Built-in variants Built-in traffic split
Model monitoring Custom (Prometheus + Evidently) Model Monitor Model Monitoring
Data quality checks Custom (Pandera + CronJob) Baseline + scheduled Skew detection
Canary deployments Argo Rollouts manifests Endpoint variants Traffic split %
Cost (idle, portfolio) Included in K8s cluster ~$47/mo per endpoint ~$68/mo per endpoint
Inference script pattern FastAPI route + Pydantic 4 functions (model/input/predict/output) Predictor class
SDK lock-in None (HTTP + OpenAPI) sagemaker SDK google-cloud-aiplatform SDK

When to Use Each (Decision Guide)

Scenario Best Choice Why
Small team, no K8s SageMaker / Vertex AI Zero infra management
Existing K8s platform Custom FastAPI Leverage existing infra
Need SHAP/custom middleware Custom FastAPI Full control over request pipeline
Rapid prototyping SageMaker / Vertex AI 5-min deploys
Regulated industry Custom FastAPI SHAP middleware + audit trail
Multi-cloud required Custom + K8s Kustomize overlays portable
AWS-only shop SageMaker Native integration + Model Monitor
GCP-only shop Vertex AI Native integration + BigQuery ML
Portfolio / interviews All three Demonstrates versatility
Cost-sensitive startup Custom on K8s No per-endpoint charges
Larger team with ML ownership SageMaker / Vertex AI Focus on models, not infra

Files

SageMaker (AWS)

File Purpose
scripts/sagemaker/inference.py SageMaker inference handler (4 functions)
scripts/sagemaker/deploy_endpoint.py Deploy, test, delete, status, package
scripts/sagemaker/setup-role.sh Create IAM execution role

Vertex AI (GCP)

File Purpose
scripts/vertex_ai/predictor.py Vertex AI Custom Prediction Routine (Predictor class)
scripts/vertex_ai/deploy_endpoint.py Deploy, test, delete, upload, status
scripts/vertex_ai/setup-service-account.sh Create GCP service account + IAM roles

Documentation

File Purpose
docs/decisions/017-custom-vs-managed-ml-platforms.md ADR with full comparison
docs/MANAGED_ML_GUIDE.md This guide
docs/MULTI_CLOUD_COMPARISON.md Cross-cloud comparison including managed platforms