Rolling, canary, blue-green, and rainbow
Four deploy strategies. When to use each, with complete examples and honest trade-offs.
Every update swaps code in production. The difference between the 4 strategies is in how much traffic sees the new code, for how long, and how fast you can go back if something breaks.
Quick comparison
| Strategy | Risk | Cost | Speed | When to use |
|---|---|---|---|---|
| Rolling | medium | 1x | medium | 90% of cases |
| Canary | low | 1x | slow | risky change with clear metrics |
| Blue-green | low | 2x | fast | rollback must be instant |
| Rainbow | low | Nx | medium | multiple coexisting versions (B2B) |
There is no "best strategy". There is the right strategy for each change.
Rolling
The default strategy. Replaces replicas one at a time (or in small batches), waiting for each new one to become healthy before touching the next.
How it works
Imagine 4 replicas running v1. Rolling update with max_parallel: 1:
- Kill 1 v1 replica, bring up 1 v2. Wait healthy.
- Kill another v1, bring up 1 v2. Wait healthy.
- Repeat until all 4 are v2.
At any moment, there is a mix of v1 and v2 serving. For backward-compatible changes, ok. For schema or API contract changes, dangerous.
YAML spec
job: api-vendas
tasks:
- name: web
image: minhaempresa/api-vendas:1.5.0
count: 4
update:
strategy: rolling
max_parallel: 1
min_healthy_time: 30s
healthy_deadline: 300s
auto_revert: true
Important parameters:
- max_parallel — how many to swap at the same time. 1 is slower and safer.
- min_healthy_time — how long the new one must stay healthy before advancing. 30s catches most late crashes.
- healthy_deadline — past this, considers it a failure.
- auto_revert — reverts on its own if it fails.
Trade-offs
Zero extra cost. The version-mix window is the weak point. If v2 has a bug that only appears with real traffic, some users suffer before rollback.
Note: rolling update is the right choice when your changes are small, backward-compatible, and the app supports different active versions at the same time.
Canary
Instead of swapping replicas, canary injects a small amount of new ones and routes a slice of traffic to them. You watch metrics. If all good, you increase the slice. If metrics degrade, you revert.
How it works
With count: 10 and canary configured for 5% / 25% / 50% / 100%:
- Bring up 1 v2 replica next to the 10 v1. Route 5% of traffic to it.
- Wait 5 min collecting metrics (latency, errors, CPU).
- If metrics within baseline, bring up more v2 replicas and route 25%.
- Repeat at 50% and 100%.
- If at any step metrics worsen, revert automatically.
YAML spec
job: api-vendas
tasks:
- name: web
image: minhaempresa/api-vendas:1.5.0
count: 10
update:
strategy: canary
stages:
- percent: 5
duration: 5m
- percent: 25
duration: 10m
- percent: 50
duration: 10m
- percent: 100
analysis:
success_rate_min: 99.5
latency_p95_max: 250ms
error_rate_max: 0.5
baseline: previous_version
auto_revert: true
The analysis block defines what counts as "ok". If the new version has latency_p95 of 300ms while the previous one had 200ms, the system reverts on its own before reaching 100%.
Trade-offs
Slower (40 min to fully promote vs. 5 min for rolling). Requires reliable metrics and a clear baseline. If your application does not have well-defined business metrics, canary becomes theater.
Note: use canary when the cost of a bad version in production is high and you have instrumentation to detect regression without depending on user tickets.
Blue-green
Two parallel environments. "Blue" receives 100% of traffic with the current version. "Green" comes up with the new version, empty. When healthy, instant traffic switch from blue to green.
How it works
- Initial state: blue (v1) receives 100%. Green does not exist.
- Bring up green with v2, same number of replicas.
- Wait for green to become healthy (no traffic, but with health check passing).
- Switch: ingress now points to green. Blue stays alive, no traffic.
- Observation period (15 min, for example).
- If ok, discard blue. If something breaks, switch back in seconds.
YAML spec
job: api-vendas
tasks:
- name: web
image: minhaempresa/api-vendas:1.5.0
count: 4
update:
strategy: blue-green
promote_after: 15m
auto_promote: false
auto_revert: true
With auto_promote: false, the switch needs a manual command:
heroctl deploy promote dep-2026-04-26-005
To revert:
heroctl deploy abort dep-2026-04-26-005
# tráfego volta para blue em 1-2 segundos
Trade-offs
Cost doubles during the validation window (4 + 4 replicas instead of 4). For apps with lots of memory or GPU, that hurts. In return, rollback is the fastest of the 4 strategies and there is no version mix in production at any moment.
Warning: blue-green does not solve database schema changes. If the new version needs a new column, it is still the application's responsibility to do the migration compatible with both versions during the window.
Rainbow
Multiple versions coexisting permanently, each serving a specific set of users. It is not an update strategy but an operating model.
When it makes sense
Only for B2B with clients that need a fixed version by contract. Examples:
- ERP where client A asked to be locked at v3.2 until they audit.
- API that charges per SLA and the premium client has the right to change versions on demand.
- Multi-tenant platform with heavy per-client customization.
In SaaS B2C or mass-market products, rainbow is waste.
How it works
Several versions of the same job running at the same time, each with a distinct tag. Routing by header, subdomain, or token claim decides which version answers each request.
YAML spec
job: api-vendas
versions:
- tag: v3.2
image: minhaempresa/api-vendas:3.2.7
count: 2
routing:
tenants: [acme, contoso]
- tag: v4.0
image: minhaempresa/api-vendas:4.0.1
count: 4
routing:
tenants: [default]
- tag: v4.1-beta
image: minhaempresa/api-vendas:4.1.0-rc3
count: 1
routing:
tenants: [internal-test]
The routing.tenants rule is evaluated on each request. Ingress routes by the token's tenant_id claim.
Trade-offs
Cost proportional to the number of live versions. Operations get complex: each bug fix needs to be ported to all supported versions. Go rainbow only with contracts or regulation that justify it.
Warning: rainbow is easy to start and hard to leave. Before adopting, ask whether 2 "green" versions and a defined migration window do not solve the case.
How to choose
In question order:
- Is the change backward-compatible? If yes, rolling solves it.
- Is there a clear metric to detect regression in 5 min? If yes, canary.
- Does rollback need to be instant? If yes, blue-green.
- Do several clients pay to stay on a fixed version? Then, rainbow.
When in doubt between canary and blue-green, choose the one that matches your observability maturity. Canary without metrics turns into bureaucracy. Blue-green without doubled capacity breaks at the wrong time.
Next step: complete CLI reference with the deploy promote, deploy abort, and deploy pause commands used here.