Rolling, canary, blue-green, and rainbow

Four deploy strategies. When to use each, with complete examples and honest trade-offs.

Deploy·12 min read·last reviewed 2026-04-26

Every update swaps code in production. The difference between the 4 strategies is in how much traffic sees the new code, for how long, and how fast you can go back if something breaks.

Quick comparison

Strategy	Risk	Cost	Speed	When to use
Rolling	medium	1x	medium	90% of cases
Canary	low	1x	slow	risky change with clear metrics
Blue-green	low	2x	fast	rollback must be instant
Rainbow	low	Nx	medium	multiple coexisting versions (B2B)

There is no "best strategy". There is the right strategy for each change.

Rolling

The default strategy. Replaces replicas one at a time (or in small batches), waiting for each new one to become healthy before touching the next.

How it works

Imagine 4 replicas running v1. Rolling update with max_parallel: 1:

Kill 1 v1 replica, bring up 1 v2. Wait healthy.
Kill another v1, bring up 1 v2. Wait healthy.
Repeat until all 4 are v2.

At any moment, there is a mix of v1 and v2 serving. For backward-compatible changes, ok. For schema or API contract changes, dangerous.

YAML spec

job: api-vendas
tasks:
  - name: web
    image: minhaempresa/api-vendas:1.5.0
    count: 4
    update:
      strategy: rolling
      max_parallel: 1
      min_healthy_time: 30s
      healthy_deadline: 300s
      auto_revert: true

Important parameters:

max_parallel — how many to swap at the same time. 1 is slower and safer.
min_healthy_time — how long the new one must stay healthy before advancing. 30s catches most late crashes.
healthy_deadline — past this, considers it a failure.
auto_revert — reverts on its own if it fails.

Trade-offs

Zero extra cost. The version-mix window is the weak point. If v2 has a bug that only appears with real traffic, some users suffer before rollback.

Note: rolling update is the right choice when your changes are small, backward-compatible, and the app supports different active versions at the same time.

Canary

Instead of swapping replicas, canary injects a small amount of new ones and routes a slice of traffic to them. You watch metrics. If all good, you increase the slice. If metrics degrade, you revert.

How it works

With count: 10 and canary configured for 5% / 25% / 50% / 100%:

Bring up 1 v2 replica next to the 10 v1. Route 5% of traffic to it.
Wait 5 min collecting metrics (latency, errors, CPU).
If metrics within baseline, bring up more v2 replicas and route 25%.
Repeat at 50% and 100%.
If at any step metrics worsen, revert automatically.

YAML spec

job: api-vendas
tasks:
  - name: web
    image: minhaempresa/api-vendas:1.5.0
    count: 10
    update:
      strategy: canary
      stages:
        - percent: 5
          duration: 5m
        - percent: 25
          duration: 10m
        - percent: 50
          duration: 10m
        - percent: 100
      analysis:
        success_rate_min: 99.5
        latency_p95_max: 250ms
        error_rate_max: 0.5
        baseline: previous_version
      auto_revert: true

The analysis block defines what counts as "ok". If the new version has latency_p95 of 300ms while the previous one had 200ms, the system reverts on its own before reaching 100%.

Trade-offs

Slower (40 min to fully promote vs. 5 min for rolling). Requires reliable metrics and a clear baseline. If your application does not have well-defined business metrics, canary becomes theater.

Note: use canary when the cost of a bad version in production is high and you have instrumentation to detect regression without depending on user tickets.

Blue-green

Two parallel environments. "Blue" receives 100% of traffic with the current version. "Green" comes up with the new version, empty. When healthy, instant traffic switch from blue to green.

How it works

Initial state: blue (v1) receives 100%. Green does not exist.
Bring up green with v2, same number of replicas.
Wait for green to become healthy (no traffic, but with health check passing).
Switch: ingress now points to green. Blue stays alive, no traffic.
Observation period (15 min, for example).
If ok, discard blue. If something breaks, switch back in seconds.

YAML spec

job: api-vendas
tasks:
  - name: web
    image: minhaempresa/api-vendas:1.5.0
    count: 4
    update:
      strategy: blue-green
      promote_after: 15m
      auto_promote: false
      auto_revert: true

With auto_promote: false, the switch needs a manual command:

heroctl deploy promote dep-2026-04-26-005

To revert:

heroctl deploy abort dep-2026-04-26-005
# tráfego volta para blue em 1-2 segundos

Cost doubles during the validation window (4 + 4 replicas instead of 4). For apps with lots of memory or GPU, that hurts. In return, rollback is the fastest of the 4 strategies and there is no version mix in production at any moment.

Warning: blue-green does not solve database schema changes. If the new version needs a new column, it is still the application's responsibility to do the migration compatible with both versions during the window.

Rainbow

Multiple versions coexisting permanently, each serving a specific set of users. It is not an update strategy but an operating model.

When it makes sense

Only for B2B with clients that need a fixed version by contract. Examples:

ERP where client A asked to be locked at v3.2 until they audit.
API that charges per SLA and the premium client has the right to change versions on demand.
Multi-tenant platform with heavy per-client customization.

In SaaS B2C or mass-market products, rainbow is waste.

How it works

Several versions of the same job running at the same time, each with a distinct tag. Routing by header, subdomain, or token claim decides which version answers each request.

YAML spec

job: api-vendas
versions:
  - tag: v3.2
    image: minhaempresa/api-vendas:3.2.7
    count: 2
    routing:
      tenants: [acme, contoso]

  - tag: v4.0
    image: minhaempresa/api-vendas:4.0.1
    count: 4
    routing:
      tenants: [default]

  - tag: v4.1-beta
    image: minhaempresa/api-vendas:4.1.0-rc3
    count: 1
    routing:
      tenants: [internal-test]

The routing.tenants rule is evaluated on each request. Ingress routes by the token's tenant_id claim.

Trade-offs

Cost proportional to the number of live versions. Operations get complex: each bug fix needs to be ported to all supported versions. Go rainbow only with contracts or regulation that justify it.

Warning: rainbow is easy to start and hard to leave. Before adopting, ask whether 2 "green" versions and a defined migration window do not solve the case.

How to choose

In question order:

Is the change backward-compatible? If yes, rolling solves it.
Is there a clear metric to detect regression in 5 min? If yes, canary.
Does rollback need to be instant? If yes, blue-green.
Do several clients pay to stay on a fixed version? Then, rainbow.

When in doubt between canary and blue-green, choose the one that matches your observability maturity. Canary without metrics turns into bureaucracy. Blue-green without doubled capacity breaks at the wrong time.

Next step: complete CLI reference with the deploy promote, deploy abort, and deploy pause commands used here.

#deploy#rolling#canary#blue-green#strategies

Back to index