Day 144: Cloud Deployment
Cloud deployment matters because once a model service leaves a single machine, reliability depends on platform contracts such as replicas, health checks, rollout policy, and externalized state.
Today's "Aha!" Moment
After containerizing an ML service, the next temptation is to think deployment is just "push the image somewhere in the cloud." That sounds reasonable, but it misses the real shift.
On one machine, the service owns its runtime almost directly. In the cloud, a platform now sits between your container and reality. That platform decides where the service runs, when it restarts, how many replicas exist, which instances receive traffic, and what happens during rollout or scale-up. In other words, the unit you deploy is still your model service, but the environment that keeps it alive is now a control system.
That changes what "ready for production" means. The service must survive being rescheduled. It must tolerate instances disappearing. It must put durable state somewhere outside the container. It must expose health in a way the platform can trust. And it must start fast enough, or at least predictably enough, that scaling and recovery do not turn into user-visible incidents.
That is the aha. Cloud deployment is not "running the same server on someone else's computer." It is handing the service to a platform that can replace, scale, route, and roll it out continuously. Your job is to make the service behave correctly inside that contract.
Why This Matters
Suppose the warehouse defect-classification API now needs to serve multiple fulfillment centers, survive instance failures, and absorb periodic traffic spikes when a new scanning workflow goes live. A single VM and a manual restart procedure are no longer enough.
The team wants several things at once:
- new versions without long downtime
- automatic recovery when a node dies
- enough replicas to handle bursts
- the option to move closer to users or other systems
Cloud deployment is the layer that makes those operational goals realistic. But it only works if the service is designed for that environment. If the container stores important state on local disk, takes too long to become healthy, or assumes one stable host forever, the platform will do exactly what it was built to do and still break your assumptions.
So this lesson is really about a contract between your ML service and the platform running it. Once that contract is clear, cloud deployment stops sounding vague and starts looking like a concrete engineering discipline.
Learning Objectives
By the end of this session, you will be able to:
- Explain what changes when an ML service moves from one machine to a cloud platform - Understand the role of the platform as scheduler, router, and recovery loop.
- Design an inference service that behaves correctly under cloud deployment - Externalize state, expose meaningful health, and prepare for replacement and scaling.
- Compare common deployment substrates and rollout choices - Recognize when managed containers, serverless, or more explicit orchestration make sense.
Core Concepts Explained
Concept 1: Cloud Deployment Adds a Control Plane Around the Container
Docker gave us a portable runtime artifact. Cloud deployment adds the system that decides how that artifact is actually run.
For an ML inference service, the deployed shape usually looks more like this:
container image
-> image registry
-> deployment/service definition
-> platform scheduler
-> running replicas
-> load balancer / ingress
-> client traffic
Or, in a slightly richer view:
+----------------------+
request -----> | ingress / load bal. |
+----------+-----------+
|
+---------------+---------------+
| |
+------+-------+ +------+-------+
| replica A | | replica B |
| model API | | model API |
+------+-------+ +------+-------+
| |
+---------------+---------------+
|
external state/services
(db, object store, cache, logs)
The platform is doing several jobs simultaneously:
- placing replicas on available compute
- checking health and replacing unhealthy instances
- routing traffic to ready instances
- scaling replica count up or down
- rolling new versions out gradually or all at once
This matters because the "deployed system" is no longer just your process. It is your process plus a control loop around it. That is why health checks, readiness, startup time, and graceful shutdown suddenly become first-class design concerns instead of operational afterthoughts.
The trade-off is convenience versus surrendering some direct control. Managed platforms remove a lot of infrastructure work, but they also require your service to fit the platform's operating model.
Concept 2: A Cloud-Friendly Service Treats Instances as Disposable and State as External
The safest assumption in cloud deployment is that any instance can disappear and be recreated somewhere else.
That means the service should treat its own container filesystem and process memory as temporary unless there is a very deliberate reason not to. For an ML inference API, the durable pieces usually live elsewhere:
- request history or user state in a database
- shared cache in Redis or another external cache
- model artifacts in the image itself or in object storage
- logs and metrics in external observability systems
- secrets and config injected by the platform
The practical question is simple: if the platform kills one replica and starts another, what must still be true?
Usually the answer is:
- the API contract should remain the same
- the model version should still be known and reproducible
- no important business state should vanish with the old instance
- the new instance should be able to become ready without manual repair
For ML services, model loading becomes a particularly important design choice. Baking the model into the image improves reproducibility and reduces runtime dependencies, but increases image size and rollout cost. Downloading the model at startup reduces image size and can simplify updates, but increases cold-start time and adds a dependency on object storage or a model registry.
So the deeper principle is not "stateless at all costs." It is "keep irreplaceable state out of disposable compute." That principle is what makes autoscaling, replacement, and regional movement possible.
Concept 3: Deployment Is Also a Policy Decision About Platform and Rollout
Once the service is cloud-friendly, the next question is not just where to run it, but what operating model you want.
A few common options look like this:
- Managed container service: good when you want container-based deployment without operating a cluster yourself.
- Serverless/container-on-demand: good when traffic is bursty and operational simplicity matters more than fine-grained control, though cold starts can be painful for large models.
- Kubernetes-style orchestration: useful when you need more control over networking, rollout, GPU scheduling, sidecars, or mixed workloads.
For ML serving, the decision often turns on a few concrete pressures:
- model size and startup time
- whether GPUs are needed
- scaling pattern: steady load vs sharp bursts
- how much rollout control and observability the team needs
- how much infrastructure complexity the team can absorb
Rollout policy matters just as much as platform choice. If a new model version is slightly wrong, the damage can be logical, not just technical. That makes progressive rollout extremely valuable:
new image/model
-> deploy to small slice
-> watch latency + error rate + prediction quality signals
-> expand traffic gradually
-> rollback quickly if behavior degrades
This is one reason cloud deployment fits naturally with the previous lessons on serving, containerization, and production optimization. Deployment is where all of those concerns become operational policy. Latency budgets affect autoscaling. Image size affects cold starts. API contract quality affects rollout safety. Observability determines whether rollback happens quickly enough.
The trade-off is clear: more automation and elasticity in exchange for more careful thinking about platform fit, startup behavior, and safe rollout.
Troubleshooting
Issue: The service works in one container locally but fails or behaves oddly after rescheduling in the cloud.
Why it happens / is confusing: The implementation was relying on local disk, stable host identity, or some other property that disappears once instances become replaceable.
Clarification / Fix: Treat each instance as disposable. Move durable state to external systems and make startup deterministic.
Issue: Autoscaling technically works, but latency spikes badly during bursts.
Why it happens / is confusing: New replicas may need to pull a large image, download model artifacts, warm caches, or initialize frameworks before they can serve traffic.
Clarification / Fix: Reduce image size where possible, control startup work, use readiness correctly, and choose a platform whose cold-start behavior matches the workload.
Issue: Health checks say the service is healthy, but requests still fail.
Why it happens / is confusing: A liveness endpoint may only prove the process is running, not that the model is loaded or that critical dependencies are reachable enough for real traffic.
Clarification / Fix: Separate liveness from readiness. Readiness should answer, "Can this instance safely receive production requests right now?"
Issue: A new deployment is technically successful but prediction quality regresses.
Why it happens / is confusing: Traditional rollout checks often focus on uptime and latency, while model-serving rollouts also need behavioral checks such as output distribution, business metrics, or shadow comparisons.
Clarification / Fix: Treat rollout as both a systems and ML validation step. Watch operational signals and prediction-quality signals together.
Advanced Connections
Connection 1: Cloud Deployment ↔ Containerization
The parallel: Containerization creates a reproducible artifact; cloud deployment creates the platform contract that runs, scales, and replaces that artifact safely.
Real-world case: An image that works perfectly in Docker can still fail in the cloud if readiness, state placement, or startup behavior do not match the platform's expectations.
Connection 2: Cloud Deployment ↔ SRE and Progressive Delivery
The parallel: Once a model service is deployed on a managed platform, reliability goals, rollout policy, autoscaling behavior, and observability all become part of the model's effective production design.
Real-world case: A canary rollout for a new model version is only useful if traces, metrics, and business indicators can reveal both system regressions and prediction regressions quickly.
Resources
Optional Deepening Resources
- [DOCS] Kubernetes Concepts Overview
- Link: https://kubernetes.io/docs/concepts/overview/
- Focus: Review the control-plane model behind scheduling, services, and reconciliation.
- [DOCS] Google Cloud Run Overview
- Link: https://cloud.google.com/run/docs/overview/
- Focus: See a managed container platform that makes the container-to-cloud contract very explicit.
- [DOCS] Amazon ECS Developer Guide
- Link: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html
- Focus: Compare another managed-container model and the operational pieces it handles for you.
- [DOCS] KServe Documentation
- Link: https://kserve.github.io/website/
- Focus: See how a Kubernetes-native serving layer handles model deployment, scaling, and inference-specific concerns.
Key Insights
- Cloud deployment adds a control loop around the service - The platform now schedules, routes, scales, and replaces your container.
- Disposable compute requires externalized state - The service must survive instance loss without losing important truth.
- Deployment choice is also policy choice - Platform, startup behavior, and rollout strategy shape both reliability and model-serving quality.
Knowledge Check (Test Questions)
-
What changes most fundamentally when an ML service moves from one machine to a cloud platform?
- A) The Python syntax of the model code.
- B) A platform control loop now manages placement, health, scaling, and rollout around the service.
- C) The need for observability disappears.
-
Why is local container state dangerous in cloud deployment?
- A) Because cloud platforms never allow filesystems.
- B) Because instances are disposable, so important state tied to one instance can vanish during rescheduling or scaling.
- C) Because databases cannot be used together with containers.
-
Why are progressive rollouts especially useful for ML services?
- A) Because they let you test both system health and model behavior before sending all traffic to the new version.
- B) Because they permanently remove the need for monitoring.
- C) Because they make startup time irrelevant.
Answers
1. B: The biggest change is the presence of a platform that continuously manages execution around the service.
2. B: Disposable instances mean durable truth should not live only on one container's local state.
3. A: Model deployments can regress operational metrics or prediction behavior, so gradual rollout lowers risk on both fronts.