Zero-downtime deploy without Kubernetes: a practical tutorial in 2026
You don't need Kubernetes for zero-downtime deploys. Full tutorial with 2 servers, Caddy/Traefik in front, and rolling update via script or lightweight orchestrator.
There's a persistent myth that zero-downtime deploy is exclusive to those who ran Kubernetes in production. It isn't. The technique has existed since before the colossus had a name — any team that ran a pair of physical servers behind a load balancer last decade was already doing it, with fifty-line scripts and zero CRDs in their lives. What changed was the marketing around the practice, not the practice itself.
This post is a step-by-step tutorial to set up zero-downtime deploy from scratch, on two Linux machines, with no heavyweight orchestrator, no magic panel. At the end you'll have a bash script that swaps one instance at a time, waits for the new one to be healthy, and rolls to the next — exactly the algorithm large orchestrators implement, just without the boilerplate.
TL;DR
Zero-downtime deploy depends on three ingredients, not on a specific tool. First: two or more application instances running in parallel, behind a basic proxy. Second: a reliable health check endpoint that validates real dependencies (database, cache, queue), doesn't just return 200 instantly. Third: a script or orchestrator that replaces one container at a time, waiting for the new one to be healthy before moving on to the next.
This tutorial sets up the full setup on two Linux VPS with Docker, Caddy in front as proxy + load balancer, and a fifty-line bash script that does rolling update with active health check, minimum healthy time, and automatic rollback on failure. Result: deploy with no 5xx visible to the user, in less than a minute, with no maintenance window.
Prerequisites: two Linux VPS with Docker (Hetzner CPX11 at R$30 each), domain with controllable DNS, app with a decent health check. Setup time: two to three hours. Monthly cost: R$60 (R$75 if you want a third VPS dedicated to the proxy). At the end we show the "robust" version via HeroCtl for those who want to stop scripting.
The three ingredients (without these, it's not zero-downtime)
Before any command, worth fixing the theory — because every more elaborate configuration you'll see on the internet is a variation on these three pieces.
- Multiple instances of the app running in parallel. Minimum two. If you only have one, any restart is an error window. There's no working around it with a configuration trick.
- A proxy/load balancer in front, doing health checks. The proxy decides which instance to send traffic to. If one falls (or was deliberately taken out for the deploy), the proxy only sends to the remaining ones.
- A script that swaps instances one at a time. Never all together. Wait for the new one to be healthy before touching the next. If the new one fails, halt the deploy and keep the old ones serving.
That's it. The rest — Kubernetes, modern panels, lightweight orchestrators — is wrapping around these three points.
Why single-server is NEVER zero-downtime (even if it's fast)
I see this question every week in the community Discord: "can I do zero-downtime with a single server, if the deploy is fast enough?". Short answer: no.
On a single machine, the deploy cycle is: stop the old container, bring up the new one. Even if everything happens in three seconds, those three seconds exist. In-flight TCP connections are cut. Requests arriving in that interval get connection refused or 502. If you have five requests per second, that's fifteen users seeing errors per deploy.
There are clever variations — bring the new one up on a different port, switch the local proxy, drop the old one. That improves things but doesn't eliminate them. If the app takes time to close in-flight connections, the cutover still produces errors. If the health check is weak, the proxy points traffic at an app that hasn't finished coming up. There's always a window.
The only reliable way to eliminate the window is to have at least one instance always available throughout the deploy. That requires two machines. Period.
The minimum setup (two VPS + a proxy)
The cheapest topology that delivers real zero-downtime:
| Component | Size | Cost | Function |
|---|---|---|---|
| VPS A | 2 vCPU / 2 GB RAM | R$30/month | App instance 1 |
| VPS B | 2 vCPU / 2 GB RAM | R$30/month | App instance 2 |
| Proxy | running on VPS A or third VPS | R$0 (shared) or R$15/month | Caddy/nginx doing balance |
| Database | managed Postgres or third VPS | varies | Shared state between A and B |
Keeping the proxy shared on one of the VPS itself saves money but has a trade-off: if the VPS hosting the proxy falls entirely, the site falls with it (even with the other VPS running). For a small team this is acceptable. When you grow, the proxy migrates to a dedicated VPS or becomes a redundant pair.
Your domain's DNS A record points to the proxy IP. Apps on A and B connect to the same database — without that shared part, the two instances diverge and the user sees different results depending on which one answered.
Step 1 — Provision two VPS (15 min)
I use Hetzner CPX11 (€4.75 ≈ R$30) as a reference. DigitalOcean Droplet at US$6, Vultr Cloud Compute at US$6, or Linode Nanode at US$5 deliver something similar. What matters is modern Linux (Ubuntu 24.04 LTS or Debian 12) with Docker.
Provision both machines with the same SSH key:
# from your laptop
ssh-keygen -t ed25519 -f ~/.ssh/deploy_key -C "deploy@meudominio.com"
# add ~/.ssh/deploy_key.pub on the provider console before creating the VPS
Create each VPS, note the IPs. I'll use 203.0.113.10 (VPS A) and 203.0.113.20 (VPS B) as placeholders for the rest of the post.
Install Docker on each:
ssh root@203.0.113.10 "curl -fsSL https://get.docker.com | sh"
ssh root@203.0.113.20 "curl -fsSL https://get.docker.com | sh"
Configure firewall to allow only 22 (SSH) and 8080 (internal port where the app will listen). HTTP/HTTPS traffic only arrives at the proxy:
ssh root@203.0.113.10 "ufw allow 22 && ufw allow 8080/tcp && ufw --force enable"
ssh root@203.0.113.20 "ufw allow 22 && ufw allow 8080/tcp && ufw --force enable"
Validation: docker run --rm hello-world on each machine should complete without errors.
Step 2 — App with a decent health check (30 min)
The /healthz endpoint is the heart of the scheme. If it returns 200 when the app isn't actually ready, the proxy sends traffic to a broken instance and the user sees an error. If it returns 500 when the app is healthy, the proxy takes the good instance out of balancing. Meaning: the health check is the source of truth for the entire system.
Golden rule: /healthz validates real dependencies the app needs to respond. Minimum: connection to the database. If you have a cache (Redis), include it. If you have a queue (SQS, RabbitMQ), include it. DON'T return 200 right at boot — wait for assets to compile, cache to warm, connections to open.
Node.js (Express)
import express from "express"
import { Pool } from "pg"
const app = express()
const pool = new Pool({ connectionString: process.env.DATABASE_URL })
let ready = false
// warm-up assíncrono — só fica ready quando dependencies validam
;(async () => {
await pool.query("SELECT 1")
// outras inicializações: cache prime, etc.
ready = true
})()
app.get("/healthz", async (_req, res) => {
if (!ready) return res.status(503).send("warming up")
try {
await pool.query("SELECT 1")
res.status(200).send("ok")
} catch (e) {
res.status(503).send("db down")
}
})
app.get("/", (_req, res) => res.send("Hello v1"))
const server = app.listen(8080, () => console.log("listening 8080"))
// graceful shutdown — drena conexões antes de morrer
process.on("SIGTERM", () => {
ready = false // health check passa a falhar imediatamente
setTimeout(() => {
server.close(() => process.exit(0))
}, 5000) // 5s pro proxy notar e parar de mandar tráfego novo
})
Python (Django + gunicorn)
# health/views.py
from django.db import connection
from django.http import JsonResponse, HttpResponse
import redis, os
_r = redis.from_url(os.environ["REDIS_URL"])
def healthz(request):
try:
with connection.cursor() as c:
c.execute("SELECT 1")
_r.ping()
return HttpResponse("ok", status=200)
except Exception as e:
return HttpResponse(f"unhealthy: {e}", status=503)
Ruby (Rails)
# config/routes.rb
get "/healthz", to: "health#show"
# app/controllers/health_controller.rb
class HealthController < ApplicationController
def show
ActiveRecord::Base.connection.execute("SELECT 1")
Rails.cache.read("__healthcheck__")
head :ok
rescue => e
Rails.logger.warn("healthcheck failed: #{e.message}")
head :service_unavailable
end
end
The detail that separates an amateur from a professional health check is graceful shutdown: on receiving SIGTERM, the app starts returning 503 on /healthz immediately, but keeps accepting in-flight connections for a few more seconds. The proxy notices the 503, stops sending new traffic, and when the app finally closes there's nobody waiting for a response.
Without this, the cutover always leaks some errors even with everything else right.
Step 3 — Bring up two Docker instances (15 min)
Build your app into a Docker image. For the tutorial I'll use a generic image you replace:
# no seu laptop, push pra registry (Docker Hub, ECR, GHCR)
docker build -t meuusuario/myapp:v1 .
docker push meuusuario/myapp:v1
Bring up the instance on VPS A:
ssh root@203.0.113.10 "
docker pull meuusuario/myapp:v1 &&
docker run -d --name app --restart=unless-stopped \
-p 8080:8080 \
-e DATABASE_URL='postgres://user:pass@db.example.com:5432/app' \
--health-cmd='curl -f http://localhost:8080/healthz || exit 1' \
--health-interval=5s --health-timeout=2s --health-retries=3 \
meuusuario/myapp:v1
"
Repeat for VPS B swapping the IP. Validate:
curl http://203.0.113.10:8080/healthz # deve retornar "ok"
curl http://203.0.113.20:8080/healthz # deve retornar "ok"
If both return 200, the base is ready.
Step 4 — Caddy as reverse proxy + load balancer (30 min)
Caddy is easier to start with than nginx because of built-in automatic TLS — Let's Encrypt works out of the box, no external bot to configure. nginx is more flexible and has a larger ecosystem; Caddy is simpler for this case. For the tutorial I'll use Caddy.
I'll run Caddy on VPS A, sharing the machine with one of the app instances. If you prefer a dedicated third VPS, swap the IP where relevant.
First, open ports 80 and 443 on VPS A:
ssh root@203.0.113.10 "ufw allow 80 && ufw allow 443"
Create the Caddyfile:
meudominio.com {
reverse_proxy 203.0.113.10:8080 203.0.113.20:8080 {
lb_policy round_robin
health_uri /healthz
health_interval 5s
health_timeout 2s
health_status 200
fail_duration 30s
max_fails 2
unhealthy_status 5xx
transport http {
dial_timeout 2s
}
}
}
Fifteen lines. Everything that matters is there: round-robin between the two IPs, active health check every five seconds on /healthz, marks as unhealthy after two consecutive failures in 30s, two-second timeout to open a connection.
Bring up Caddy:
ssh root@203.0.113.10 "
mkdir -p /etc/caddy &&
docker run -d --name caddy --restart=unless-stopped \
--network host \
-v /etc/caddy/Caddyfile:/etc/caddy/Caddyfile \
-v caddy_data:/data \
-v caddy_config:/config \
caddy:2-alpine
"
Point your domain's DNS A to 203.0.113.10. In a few minutes:
curl https://meudominio.com/
# deve retornar "Hello v1" (alternando entre as duas instâncias)
Caddy issued a Let's Encrypt certificate automatically. This works because the domain resolves to the IP where Caddy is listening on port 80 (HTTP-01 challenge).
Step 5 — Bash deploy script (60 min)
This is the heart of the tutorial. A script that orchestrates rolling update between the two VPS:
#!/usr/bin/env bash
# deploy.sh — rolling deploy zero-downtime entre duas VPS
set -euo pipefail
IMAGE="${1:?Uso: ./deploy.sh meuusuario/myapp:v2}"
HOSTS=("203.0.113.10" "203.0.113.20")
HEALTH_DEADLINE=300 # max segundos esperando health check
MIN_HEALTHY_TIME=10 # segundos saudável sustentado antes de prosseguir
SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=5"
deploy_host() {
local host=$1
local image=$2
echo "==> [${host}] pulling ${image}"
ssh ${SSH_OPTS} "root@${host}" "docker pull ${image}"
# guarda imagem antiga pro caso de rollback
local old_image
old_image=$(ssh ${SSH_OPTS} "root@${host}" "docker inspect app --format '{{.Config.Image}}' 2>/dev/null || echo none")
echo "==> [${host}] versão atual: ${old_image}"
echo "==> [${host}] substituindo contêiner"
ssh ${SSH_OPTS} "root@${host}" "
docker stop app 2>/dev/null || true
docker rm app 2>/dev/null || true
docker run -d --name app --restart=unless-stopped \
-p 8080:8080 \
-e DATABASE_URL='${DATABASE_URL}' \
--health-cmd='curl -f http://localhost:8080/healthz || exit 1' \
--health-interval=5s --health-timeout=2s --health-retries=3 \
${image}
"
echo "==> [${host}] esperando health check (max ${HEALTH_DEADLINE}s)"
local start=$(date +%s)
local healthy_since=0
while true; do
local now=$(date +%s)
if (( now - start > HEALTH_DEADLINE )); then
echo "!! [${host}] healthy_deadline excedido — fazendo rollback pra ${old_image}"
ssh ${SSH_OPTS} "root@${host}" "
docker stop app && docker rm app &&
docker run -d --name app --restart=unless-stopped \
-p 8080:8080 -e DATABASE_URL='${DATABASE_URL}' \
${old_image}
"
return 1
fi
if curl -sf --max-time 2 "http://${host}:8080/healthz" > /dev/null; then
if (( healthy_since == 0 )); then
healthy_since=${now}
echo " [${host}] saudável — confirmando por ${MIN_HEALTHY_TIME}s"
elif (( now - healthy_since >= MIN_HEALTHY_TIME )); then
echo "==> [${host}] saudável sustentado — promovendo"
return 0
fi
else
healthy_since=0
fi
sleep 2
done
}
echo "### Deploy ${IMAGE} em ${#HOSTS[@]} hosts (rolling, max_parallel=1)"
for host in "${HOSTS[@]}"; do
if ! deploy_host "${host}" "${IMAGE}"; then
echo "### Deploy abortado em ${host}. Hosts anteriores mantidos como estavam."
exit 1
fi
done
echo "### Deploy completo: todos os hosts em ${IMAGE}"
Save as deploy.sh, chmod +x, and:
export DATABASE_URL='postgres://user:pass@db.example.com:5432/app'
./deploy.sh meuusuario/myapp:v2
The algorithm is literally what large orchestrators do internally:
- For each host, sequentially (max_parallel = 1)
- Pull the new image before touching the container — that way the downtime between
docker stopanddocker runis minimal - Save reference to the old image for rollback if something goes wrong
- Replace the container
- Loop waiting for health check with a five-minute deadline
- Min healthy time of ten seconds: only advances when
/healthzreturned 200 sustainedly for ten seconds (if it falls in the middle, restart the count) - Automatic rollback if the deadline is exceeded
The numbers (max_parallel: 1, min_healthy_time: 10s, healthy_deadline: 300s) are exactly the defaults we use in HeroCtl. It's no coincidence — these are the values that survived years of trial and error. Min healthy time too short detects transient symptoms as "healthy" and breaks; too long makes the deploy slow with no gain. Ten seconds is the point where noise disappears and the deploy still finishes quickly.
Step 6 — Validate with a load test during deploy (15 min)
This is the fire test: run sustained load and deploy at the same time. If any 5xx appears, some part of the scheme is broken.
On an external machine (your laptop or another VPS):
# instale hey
go install github.com/rakyll/hey@latest
# carga sustentada de 60s, 5 conexões concorrentes
hey -z 60s -c 5 https://meudominio.com/
In another window, simultaneously:
./deploy.sh meuusuario/myapp:v2
At the end of hey:
Status code distribution:
[200] 1847 responses
Only 200. If a 502 or 503 shows up, one of the three pieces is weak: health check returning 200 too early, missing graceful shutdown, or short min healthy time. Investigate and fix.
The six details that separate real zero-downtime from approximation
We covered most of these throughout the tutorial, but worth consolidating — because a single one missing turns the whole scheme into "mostly zero-downtime", which is different.
- Connection draining on SIGTERM. When the container receives the stop signal, the app marks
/healthzas failing immediately, but keeps accepting in-flight connections for a few seconds. Without it, connections open at the moment of stop are cut. - Pre-stop hook if you have an async worker. Queues that process background jobs need an explicit pause before killing the process, or the running job is orphaned. In Sidekiq, it's
:quietbefore:term. In Celery, it's--soft-time-limit. - Health check BEFORE promoting, not "container running".
docker psshows "running" milliseconds afterdocker run. It means nothing. Promote only after/healthzreturns 200 sustainedly. - Min healthy time of ten sustained seconds. Don't trust seeing a single 200 and moving on — apps with irregular warm-up pass for a moment and fail again.
- Previous version pre-pulled for fast rollback. If you trusted "keep the old image in Docker's cache", at some point it's cleared by garbage collection and rollback gets slow. Keep the last three images explicitly.
- Auto-revert when the healthy deadline is exceeded. Without it, the deploy gets stuck in a partial state — half the hosts on v2, half on v1, with nobody to decide what to do.
Database migrations + zero-downtime (the part that breaks experienced people's deploys)
This is the topic I see senior developers get wrong most often. Rolling deploy assumes that both versions of the app run simultaneously in production for some period. If v2 expects a schema incompatible with what v1 understands, one of the two breaks during the transition window.
Non-negotiable golden rule: migrations are always backward-compatible.
Classic case: you want to rename column email to email_address. Wrong solution: do the migration that renames directly before the deploy. Result: during the rolling, v1 instances still write to email (which no longer exists) and break. Right solution, in three deploys:
| Deploy | Migration | Code v* |
|---|---|---|
| 1 | Add email_address (nullable). No removal. | App writes to email AND to email_address; reads from email. |
| 2 | Backfill: UPDATE users SET email_address = email WHERE email_address IS NULL. NOT NULL constraint. | App reads from email_address; still writes to both. |
| 3 | Drop email. | App only uses email_address. |
Three deploys, weeks apart. It's tedious, it's the way. Direct column drop always breaks. Direct type change always breaks. Adding NOT NULL without a default directly always breaks.
Tools that help: pg-osc and pgroll (Postgres), gh-ost (MySQL) — do online schema change without a long lock. For light migrations, the manual three-step way solves it.
Patterns beyond rolling
Rolling is the default and most economical pattern. Others worth knowing:
- Blue-green. Two complete parallel environments — "blue" running v1, "green" provisioned with v2 empty. You bring up v2 entirely on green, validate, switch DNS (or load balancer cutover). Advantage: instant rollback (return DNS to blue). Disadvantage: costs double the resources during the deploy window.
- Canary. Send 5% of traffic to v2, observe metrics (errors, latency, conversion rate), decide whether to promote to 100% or abort. Detects subtle bugs that health check doesn't catch — like regression in checkout conversion. Requires a proxy with weighted routing and decent observability.
- Rainbow / N+1. Generalization of blue-green with N coexisting versions. Useful when you want long-running A/B tests between entire versions.
For the tutorial, rolling is what makes sense. The others are worth it when the traffic size justifies the extra investment.
"Easy" version — Coolify or Dokploy
If you don't want to script, two modern panels do rolling deploy automatically:
- Coolify in multi-server mode does rolling with configurable health check. Multi-server was added in more recent versions — before it was single-server only. Worth checking the version.
- Dokploy on top of Docker Swarm does rolling with
--update-parallelism 1 --update-delay. Leverages what Swarm already offers.
Trade-off: you swap the fifty-line script (where you understand everything happening) for a panel (which is faster to set up, but becomes a black box when something goes wrong). For a small team where one person handles operations partially, the panel wins. For a team where you need to understand exactly what happened at 3 a.m., the script wins.
"Robust" version — HeroCtl
For those who want to stop scripting but don't want a black box, HeroCtl combines automatic rolling deploy with a replicated control plane. You describe the service in a configuration file and the orchestrator does the rest:
job "minhaapp" {
group "web" {
count = 2
task "app" {
driver = "docker"
config {
image = "meuusuario/myapp:v2"
ports = ["http"]
}
service {
port = "http"
check {
type = "http"
path = "/healthz"
interval = "5s"
timeout = "2s"
}
}
}
update {
max_parallel = 1
min_healthy_time = "10s"
healthy_deadline = "5m"
auto_revert = true
}
}
}
The same parameters as the bash script, declarative. The difference is that the orchestrator coordinates rolling across N servers (not just two), does automatic leader election in around seven seconds if the current node falls, and keeps the control plane distributed across the first three servers. Cluster survives the loss of any single server without human intervention.
Installation:
curl -sSL https://get.heroctl.com/install.sh | sh
Community plan is permanently free — no server or job limit, with all the orchestration features described in the tutorial. Business plan adds SSO/SAML, granular RBAC, detailed audit, and SLA-backed support, for teams that have formal platform requirements. Enterprise plan adds source-code escrow, continuity contract, and 24×7 support. Business and Enterprise prices are published on the plans page — no mandatory "talk to sales".
Comparison: five paths side by side
| Criterion | Bash script (2 servers) | Coolify multi-server | Dokploy + Swarm | HeroCtl | Kamal | Kubernetes |
|---|---|---|---|---|---|---|
| Setup time | 2-3h | 30 min | 1h | 5 min | 1h | 4h-4 days |
| Lines of config | ~50 (script) | UI | ~20 | ~50 | ~40 | 300+ |
| HA of the control plane | N/A | No | Limited | Yes | N/A | Yes (5+ components) |
| Declarative health check | Manual | Yes | Yes | Yes | Yes | Yes |
| Automatic rollback | Manual in script | Yes | Yes | Yes | Yes | Yes |
| Target scale | 1-3 servers | 1-10 servers | 1-20 servers | 1-500 servers | 1-10 servers | 50+ servers |
| Black box? | No (you wrote it) | Yes | Partial | No (short declarative) | No | Yes |
| Learning curve | Low | Low | Medium | Low | Low | High |
Each column has its niche. Bash script is unbeatable when you want to understand each line. Coolify wins when you just want a panel. HeroCtl wins when you need real HA without setting up an external control plane. Kubernetes wins at planetary scale — where the complexity pays off.
The five most common errors
- Health check on
/returning 200 without validating dependencies. The app returns 200 before connecting to the database, the proxy promotes, and the user sees a 500 error on the first requests. Solution:/healthzvalidates database, cache, queue — anything the app needs to actually respond. - Min healthy time of 1 second. Apps with irregular warm-up may return 200 at one moment and 503 right after (cache populating, class being lazy-loaded). The orchestrator promotes on the first good window, and the next request hits a bad state. Ten sustained seconds eliminate ninety percent of these cases.
- No max_parallel (or max_parallel = N). If you swap all instances together, during the cutover window there's nobody healthy serving. It's single-server downtime in disguise. Always
max_parallel = 1to start. - Mix of versions in production without schema compat. v1 writes to
email, v2 reads fromemail_address, and during the five-minute rolling the two coexist — users hitting v2 don't see data v1 just wrote. Backward-compatible migration in three steps solves it. - Stale cache on the client (CDN, browser, service worker). Backend is already v2 but the user has the v1 JS in cache, and the old JS calls an API that no longer exists. Solution: keep old endpoints for a window; API versioning; strong cache-busting on critical assets.
FAQ
Can I do zero-downtime with a single server?
No. Every variation that promises this has a measurable error window when you measure with hey -c 20. The only way to have real zero-downtime is to keep at least one instance always healthy throughout the deploy — which requires two machines minimum.
Does DNS round-robin work as a load balancer?
It works as a basic load balancer, but not as a health check mechanism. DNS doesn't quickly remove a dead IP from rotation — TTLs caching at ISPs and clients keep the wrong IP in use for minutes or hours. For zero-downtime you need a real proxy (Caddy, nginx, HAProxy) that takes unhealthy instances out of balancing in seconds.
Caddy or Traefik — which is better for this setup?
For two servers and a static setup, Caddy is simpler — fifteen-line Caddyfile solves it. Traefik shines when you have dynamic service discovery (like Docker labels or Consul) and many backends changing all the time. nginx sits in the middle: more flexible, no built-in automatic TLS (needs external certbot). For this tutorial, Caddy.
Do WebSocket connections survive during rolling?
Connections open on an instance that's being torn down are cut. The client has to reconnect. A good WebSocket library (Socket.IO, Phoenix Channels) reconnects automatically — the user sees a half-second blink in state. Connection draining helps: the instance marks /healthz failing, the proxy stops sending new connections, but existing ones continue until the pre-stop timer. Thirty seconds of drain are usually enough for long-running connections to drain naturally.
Database migrations — what's the golden rule?
Every migration must be backward-compatible. Drop a column never directly. Rename never directly. Type change never directly. Instead, three deploys: add new structure, backfill, remove the old. Slow, yes. But rolling deploy depends on this not to break.
Automatic rollback — how to implement?
Two pieces: deadline (max time waiting for health check) and reference to the previous image pre-pulled. If the deadline passes without becoming healthy, the script reinstalls the previous version. The example in Step 5 does exactly that. In declarative orchestrators, it becomes auto_revert = true.
Do sticky sessions complicate zero-downtime?
Yes. If the app stores session state in process memory, taking down the instance takes down the sessions of users connected to it. Solution: take session out of memory — Redis, Postgres, or signed JWT. Then any instance serves any user, and rolling cuts no session.
How long does a complete deploy take?
Two servers, app that comes up in fifteen seconds: about a minute. Breakdown: image pull (5-15s, depends on network and size), container replacement (1s), warm-up + health check (10-30s), 10s min healthy time, total around 30-50s per host, multiplied by two hosts in sequence = 1-2 min. Four servers around 2-4 min. With fifty servers, deploy starts taking ten or fifteen minutes — time to raise max_parallel to two or three (keeping rigorous health check).
Closing
Zero-downtime deploy is architecture, not tool. The three ingredients — multiple instances, proxy with health check, controlled rolling — work with bash and Caddy as well as with a large orchestrator. The difference is in how much of the operation you want to write by hand and how much to delegate.
For a small SaaS, three VPS and a fifty-line script solve it indefinitely. When the cluster grows to dozens of servers or the team needs real HA on the control plane, it's worth stepping up to the declarative orchestrator:
curl -sSL https://get.heroctl.com/install.sh | sh
More on the rolling algorithm in Safe rolling deploy: why yours might not be. For those leaving Compose for a multi-server setup, Docker deploy in production: from compose to a cluster covers the intermediate path.
Container orchestration, without ceremony.