[{"data":1,"prerenderedAt":1011},["ShallowReactive",2],{"blog-en-\u002Fen\u002Fblog\u002Fdocker-deploy-production-compose-to-cluster":3,"blog-en-surround-\u002Fen\u002Fblog\u002Fdocker-deploy-production-compose-to-cluster":997},{"id":4,"title":5,"author":6,"body":7,"category":978,"cover":979,"date":980,"description":981,"draft":982,"extension":983,"lastReviewed":979,"meta":984,"navigation":985,"path":986,"readingTime":987,"seo":988,"sitemap":989,"stem":990,"tags":991,"__hash__":996},"blog_en\u002Fen\u002Fblog\u002Fdocker-deploy-production-compose-to-cluster.md","Docker deploy in production: from compose to a high-availability cluster","HeroCtl team",{"type":8,"value":9,"toc":964},"minimark",[10,23,26,29,34,37,40,59,62,79,82,88,92,102,108,137,142,145,150,171,176,179,184,279,282,286,289,294,311,315,318,323,330,335,338,342,345,349,352,356,388,392,395,400,403,406,410,413,418,421,425,428,432,449,453,456,461,464,469,472,476,479,483,671,674,678,681,687,693,699,705,709,712,721,727,733,737,740,757,767,781,787,797,801,804,830,837,841,853,859,865,871,892,898,908,912,915,918,940,943,957,960],[11,12,13,14,18,19,22],"p",{},"The flow is familiar to any dev who came up in the last five years. You write a ",[15,16,17],"code",{},"docker-compose.yml"," with three services, run ",[15,20,21],{},"docker compose up"," on your machine, it works. You bring it up on a staging VPS, it works. You move it to production by pointing DNS at it, and it works — until the first Friday at five p.m. when it stops working.",[11,24,25],{},"The exact point at which \"works in production\" stops being true depends much less on which tool you picked, and much more on which maturity stage your product reached. This post maps the four stages almost every team passes through, shows the practical signs that you need to step up, and makes it explicit when you don't yet.",[11,27,28],{},"It isn't an anti-compose post, nor a pro-cluster post. It's a post about when each thing fits.",[30,31,33],"h2",{"id":32},"why-docker-compose-did-so-well-in-development","Why Docker Compose did so well in development",[11,35,36],{},"Before talking about the stages, it's worth understanding what Compose was designed to be. It solves a very specific problem and solves it well: orchestrating multiple containers on the same machine, declaring dependencies, networks, volumes, and environment variables in a single readable file.",[11,38,39],{},"The premises baked into that design are the premises of someone developing:",[41,42,43,47,50,53,56],"ul",{},[44,45,46],"li",{},"A single machine. Yours.",[44,48,49],{},"A single user. You.",[44,51,52],{},"Manual restart. When something breaks, you open the terminal and type.",[44,54,55],{},"Ephemeral data. If the database resets, you re-run the seed.",[44,57,58],{},"Nobody depends on it staying up. If it falls at three a.m., the world goes on.",[11,60,61],{},"In production, all five premises invert:",[41,63,64,67,70,73,76],{},[44,65,66],{},"N machines. You're not alone anymore.",[44,68,69],{},"N users. They don't know you.",[44,71,72],{},"Automatic restart is the bare minimum. Nobody is going to wake you up at four a.m. Ideally, nobody wakes anyone.",[44,74,75],{},"Data matters. Losing the database becomes a reportable incident.",[44,77,78],{},"Someone sleeps with this active. Possibly a customer, possibly a contract with penalties.",[11,80,81],{},"Docker Compose still works outside its original premises. It makes things work — it just makes them work badly. The shortcuts that look innocent in development (shared network between services, volume mounted directly on the host, log going to the terminal) become traps when the environment changes from \"one machine where I know everything that's running\" to \"three machines where something needs to be running twenty-four hours a day\".",[11,83,84,85,87],{},"The four stages below show the natural curve of someone taking the same ",[15,86,17],{}," from a hobby through to a SaaS with a contractual SLA.",[30,89,91],{"id":90},"stage-1-compose-on-a-single-vps","Stage 1: Compose on a single VPS",[11,93,94,95,97,98,101],{},"The most honest entry point for Docker in production. A cheap VPS, a ",[15,96,17],{}," file, and ",[15,99,100],{},"docker compose up -d"," solving life.",[11,103,104],{},[105,106,107],"strong",{},"Minimum viable setup:",[41,109,110,113,116,123,130],{},[44,111,112],{},"1 VPS with 1–2 vCPUs and 2 GB of RAM (about R$30 per month at a decent Brazilian provider).",[44,114,115],{},"Docker and Docker Compose installed via official script.",[44,117,118,119,122],{},"All services with ",[15,120,121],{},"restart: always"," in compose.",[44,124,125,126,129],{},"Named volumes for data (not bind mounts pointing to ",[15,127,128],{},"\u002Fopt\u002Fapp\u002Fdata",").",[44,131,132,133,136],{},"A daily cron running ",[15,134,135],{},"pg_dump"," and shipping to S3 or to a Backblaze B2.",[11,138,139],{},[105,140,141],{},"Who this works for:",[11,143,144],{},"Hobby projects, MVPs validating product-market fit, internal tools for the team, private admin dashboards, personal blogs. Any application where the phrase \"if it's down for five minutes, nobody dies\" is literally true.",[11,146,147],{},[105,148,149],{},"Risks you're explicitly accepting:",[41,151,152,155,158,168],{},[44,153,154],{},"The VPS goes down (provider maintenance, noisy-neighbor spike, hardware) and your service goes with it. There's no fail-over.",[44,156,157],{},"The disk dies, and if you weren't backing up, you lost the data. Cloud-provider SSDs fail less than the old datacenter ones, but they fail. It happens.",[44,159,160,161,164,165,167],{},"Each deploy has a window of about 30 seconds between ",[15,162,163],{},"docker compose down"," and ",[15,166,21],{}," during which the service is down.",[44,169,170],{},"You are the sysadmin. Kernel patch, Docker update, log rotation, disk monitoring — all on you.",[11,172,173],{},[105,174,175],{},"Practical limits:",[11,177,178],{},"Comfortably handles 1 to 3 small applications, traffic on the order of 100 requests per second, and tolerance of 5 to 30 minutes of downtime per month. If any of those numbers gets pushed up, you're abusing the stage.",[11,180,181],{},[105,182,183],{},"The backup nobody does and should:",[185,186,191],"pre",{"className":187,"code":188,"language":189,"meta":190,"style":190},"language-bash shiki shiki-themes github-dark-default","# \u002Fetc\u002Fcron.daily\u002Fdb-backup\ndocker exec postgres pg_dump -U app app \\\n  | gzip \\\n  | aws s3 cp - s3:\u002F\u002Fmeu-bucket\u002Fbackups\u002F$(date +%F).sql.gz\n","bash","",[15,192,193,202,232,243],{"__ignoreMap":190},[194,195,198],"span",{"class":196,"line":197},"line",1,[194,199,201],{"class":200},"sH3jZ","# \u002Fetc\u002Fcron.daily\u002Fdb-backup\n",[194,203,205,209,213,216,219,223,226,228],{"class":196,"line":204},2,[194,206,208],{"class":207},"sQhOw","docker",[194,210,212],{"class":211},"s9uIt"," exec",[194,214,215],{"class":211}," postgres",[194,217,218],{"class":211}," pg_dump",[194,220,222],{"class":221},"sFSAA"," -U",[194,224,225],{"class":211}," app",[194,227,225],{"class":211},[194,229,231],{"class":230},"suJrU"," \\\n",[194,233,235,238,241],{"class":196,"line":234},3,[194,236,237],{"class":230},"  |",[194,239,240],{"class":207}," gzip",[194,242,231],{"class":230},[194,244,246,248,251,254,257,260,263,267,270,273,276],{"class":196,"line":245},4,[194,247,237],{"class":230},[194,249,250],{"class":207}," aws",[194,252,253],{"class":211}," s3",[194,255,256],{"class":211}," cp",[194,258,259],{"class":211}," -",[194,261,262],{"class":211}," s3:\u002F\u002Fmeu-bucket\u002Fbackups\u002F",[194,264,266],{"class":265},"sZEs4","$(",[194,268,269],{"class":207},"date",[194,271,272],{"class":211}," +%F",[194,274,275],{"class":265},")",[194,277,278],{"class":211},".sql.gz\n",[11,280,281],{},"Without this, you're not in production. You're in \"development exposed on the internet\". The difference between one and the other is exactly this cron.",[30,283,285],{"id":284},"stage-2-compose-with-auto-update-and-a-router-in-front","Stage 2: Compose with auto-update and a router in front",[11,287,288],{},"The first natural evolution. You still have a VPS, but now it has two floors: a router that terminates TLS and distributes requests, and the application's containers behind it.",[11,290,291],{},[105,292,293],{},"Setup:",[41,295,296,299,302,305,308],{},[44,297,298],{},"1 slightly beefier VPS (2–4 vCPUs, 4–8 GB), running around R$50 to R$80 per month.",[44,300,301],{},"Same stack as the previous stage, plus a reverse proxy (Caddy or a standalone Traefik) terminating TLS automatically via Let's Encrypt.",[44,303,304],{},"Watchtower (or equivalent) pulling new images from the registry periodically.",[44,306,307],{},"Simple pipeline on GitHub Actions or GitLab CI that builds the image, ships it to the registry, and lets Watchtower discover it.",[44,309,310],{},"Automated backup like stage 1, now with longer retention.",[11,312,313],{},[105,314,141],{},[11,316,317],{},"Indie hackers with 2 to 5 small apps, first paying customer on a side SaaS, an agency hosting sites for clients with no contractual SLA, internal tools that grew past the \"three people use it\" phase.",[11,319,320],{},[105,321,322],{},"What improved over stage 1:",[11,324,325,326,329],{},"Deploy became seamless from the developer's point of view. You ",[15,327,328],{},"git push",", the CI builds and publishes, and two minutes later the new version is live without you having SSHed into any server. Automatic TLS solves a pain that used to consume an afternoon per quarter. Multiple apps share the same wildcard certificate via the router.",[11,331,332],{},[105,333,334],{},"What still hurts:",[11,336,337],{},"Watchtower pulls any new image without a second thought. There's no rolling deploy — during the swap, the application is unavailable for somewhere between 10 and 30 seconds. There's no real health check before promoting the new version; if you published a broken image, the service is down until you notice and revert manually. And the single point of failure remains: the VPS that goes down takes everything with it.",[11,339,340],{},[105,341,175],{},[11,343,344],{},"5 to 10 apps on the same server, traffic up to 500 requests per second (very dependent on the load shape), tolerance of 5 to 15 minutes of downtime per month. If you started losing sleep because Watchtower updated something at three a.m. and broke it, you've moved past the stage.",[30,346,348],{"id":347},"stage-3-multi-server-with-docker-swarm","Stage 3: Multi-server with Docker Swarm",[11,350,351],{},"Here the conversation changes. You go from \"one beefy machine\" to \"three machines getting along together\". Docker Swarm is the natural step for someone already comfortable with Compose: the file is practically the same, the vocabulary is the same, and the conceptual jump is smaller than going straight to Kubernetes.",[11,353,354],{},[105,355,293],{},[41,357,358,361,371,382,385],{},[44,359,360],{},"3 or more medium-sized VPS. Three is the practical minimum so the cluster survives losing a machine.",[44,362,363,366,367,370],{},[15,364,365],{},"docker swarm init"," on the first node, ",[15,368,369],{},"docker swarm join"," on the other two.",[44,372,373,374,377,378,381],{},"Stack file (",[15,375,376],{},"docker stack deploy -c stack.yml meuapp",") instead of ",[15,379,380],{},"docker-compose up",".",[44,383,384],{},"Router integrated with the cluster (Traefik has a native Swarm mode) listening to daemon events and rebalancing automatically.",[44,386,387],{},"Centralized logs and metrics? You bolt them on. Not in the box.",[11,389,390],{},[105,391,141],{},[11,393,394],{},"B2B SaaS with a first contract requiring \"best-effort 99%\", team has a dev comfortable with the terminal and willing to learn, application grew past what fits on a single VPS without pain.",[11,396,397],{},[105,398,399],{},"The elephant in the room:",[11,401,402],{},"Docker Swarm has been in maintenance mode since 2019. Docker Inc. doesn't actively invest in new features. Critical bugs are still fixed, but the plugin ecosystem stagnated, and scheduler evolution practically stopped. It works — thousands of companies run Swarm in production without problems today. But you're betting on a technology whose investment trajectory was cut more than five years ago.",[11,404,405],{},"The honest version: if you adopt Swarm in 2026, you're adopting it expecting to eventually migrate to something else. It's not an immediate problem — it's a problem that shows up when you need something Swarm will never gain.",[11,407,408],{},[105,409,175],{},[11,411,412],{},"3 to 30 servers, traffic on the order of 5 thousand aggregated requests per second, tolerance of 5 to 30 seconds of failover when a machine drops. Above that, either you complement with external pieces (observability stack, manual autoscaler, GSLB DNS), or you step up to the next stage.",[11,414,415],{},[105,416,417],{},"Where else it hurts day-to-day:",[11,419,420],{},"The overlay network under high load has known edge cases, mainly on cloud-provider networks with non-standard MTU. Recovery after split-brain in some scenarios needs manual intervention — the cluster doesn't recompose itself in 100% of cases. And anything involving detailed observability (persisted metrics, structured logs, distributed tracing) you assemble separately, maintaining two or three more products.",[30,422,424],{"id":423},"stage-4-cluster-with-replicated-control-plane","Stage 4: Cluster with replicated control plane",[11,426,427],{},"The step where \"production\" starts to mean the same thing it means at mature platform companies. You're no longer running a legacy orchestrator in maintenance, nor depending on a single server for continuity.",[11,429,430],{},[105,431,293],{},[41,433,434,437,440,443,446],{},[44,435,436],{},"3 to 5 servers in the minimum configuration, with the control plane replicated across the first three. The rest join as agents.",[44,438,439],{},"Automatic leader election. If the current one falls, in a few seconds another takes over and the cluster keeps accepting deploys and serving traffic.",[44,441,442],{},"Integrated router, automatic certificates, health check before promoting a new version, rolling deploy with configurable windows.",[44,444,445],{},"Metrics and logs as internal services of the cluster itself — you don't bolt on five products to get basic observability.",[44,447,448],{},"Job submission via CLI, API, or web panel. The cluster decides which server each replica runs on.",[11,450,451],{},[105,452,141],{},[11,454,455],{},"SaaS with a 99.9% contractual SLA, multi-tenant with formal isolation requirements, platform team of 1 to 3 people, B2B contracts where uptime is part of the SOW, any company where \"being out twenty minutes\" generates a refund.",[11,457,458],{},[105,459,460],{},"Risks that change in nature:",[11,462,463],{},"Complexity doesn't go away — it changes shape. Instead of you operating three VPS by hand, you operate a cluster that solves most problems on its own but has more pieces. The learning curve for someone who has never operated a cluster before exists and is real. And it is genuinely overkill if you'll never go beyond stage 2: three servers running a hobby app is waste.",[11,465,466],{},[105,467,468],{},"Concrete numbers to calibrate:",[11,470,471],{},"A small, well-configured cluster runs comfortably on 4 servers totaling 5 vCPUs and 10 GB of RAM, with the control plane occupying between 200 and 400 MB per server. Leader election, when the current one falls, takes about 7 seconds until the cluster is back to accepting deploys. By comparison, the equivalent configuration on Kubernetes starts at hundreds of lines of manifest for a \"hello world\" app — and HeroCtl solves the same thing in about 50.",[11,473,474],{},[105,475,175],{},[11,477,478],{},"3 to 500 servers. Above that, the managed Kubernetes ecosystem has tools that a small cluster doesn't need: multi-region federation, advanced scheduler for heterogeneous workloads, deep library of specialized operators for stateful databases. It's not that small clusters don't scale — it's that above 500 nodes you're in a market where other tools have a five-year head start.",[30,480,482],{"id":481},"the-four-stages-side-by-side","The four stages side by side",[484,485,486,508],"table",{},[487,488,489],"thead",{},[490,491,492,496,499,502,505],"tr",{},[493,494,495],"th",{},"Criterion",[493,497,498],{},"Stage 1 (compose 1 VPS)",[493,500,501],{},"Stage 2 (compose + auto-update)",[493,503,504],{},"Stage 3 (Docker Swarm)",[493,506,507],{},"Stage 4 (replicated cluster)",[509,510,511,529,546,562,578,595,609,625,638,654],"tbody",{},[490,512,513,517,520,523,526],{},[514,515,516],"td",{},"Minimum monthly cost (BR, 2026)",[514,518,519],{},"R$30",[514,521,522],{},"R$50",[514,524,525],{},"R$150 (3 VPS)",[514,527,528],{},"R$200 (4 VPS)",[490,530,531,534,537,540,543],{},[514,532,533],{},"Operational complexity",[514,535,536],{},"Minimal",[514,538,539],{},"Low",[514,541,542],{},"Medium",[514,544,545],{},"Medium-high",[490,547,548,551,554,557,560],{},[514,549,550],{},"Time to first deploy",[514,552,553],{},"15 minutes",[514,555,556],{},"1 hour",[514,558,559],{},"1 day",[514,561,559],{},[490,563,564,567,570,572,575],{},[514,565,566],{},"Real high availability",[514,568,569],{},"No",[514,571,569],{},[514,573,574],{},"Yes, with caveats",[514,576,577],{},"Yes",[490,579,580,583,586,589,592],{},[514,581,582],{},"Realistic max scale",[514,584,585],{},"1–3 apps",[514,587,588],{},"5–10 apps",[514,590,591],{},"30 servers",[514,593,594],{},"500 servers",[490,596,597,600,602,605,607],{},[514,598,599],{},"Deploys without downtime",[514,601,569],{},[514,603,604],{},"Almost",[514,606,577],{},[514,608,577],{},[490,610,611,614,617,620,623],{},[514,612,613],{},"Automatic TLS",[514,615,616],{},"Manual or plugin",[514,618,619],{},"Yes, built-in",[514,621,622],{},"Yes, via router",[514,624,619],{},[490,626,627,630,632,634,636],{},[514,628,629],{},"Observability in the box",[514,631,569],{},[514,633,569],{},[514,635,569],{},[514,637,577],{},[490,639,640,643,646,648,651],{},[514,641,642],{},"Minimum team to operate",[514,644,645],{},"1 dev (partial)",[514,647,645],{},[514,649,650],{},"1 dev (dedicated)",[514,652,653],{},"1 dev (partial) or 2 (partial)",[490,655,656,659,662,665,668],{},[514,657,658],{},"Ideal application range",[514,660,661],{},"Hobby, MVP",[514,663,664],{},"Indie hacker, first customer",[514,666,667],{},"Early-stage B2B SaaS",[514,669,670],{},"SaaS with SLA, multi-tenant",[11,672,673],{},"The column that usually surprises is \"minimum team to operate\". Stage 4 with the right tool doesn't require more people than stage 3 — it requires people who think differently. The cognitive jump is bigger than the operational jump.",[30,675,677],{"id":676},"the-signs-its-time-to-step-up","The signs it's time to step up",[11,679,680],{},"Stepping up before you need to is waste; staying below what's needed is pain. The practical signs of each transition:",[11,682,683,686],{},[105,684,685],{},"Stage 1 → Stage 2."," You discovered you need to run more than one application on the same VPS, manual deploys started getting tense (fear of taking down production at nine p.m. on a Friday), the first paying customer showed up and they have expectations — even if not written down — that you won't disappear for thirty minutes mid-business-day.",[11,688,689,692],{},[105,690,691],{},"Stage 2 → Stage 3."," A customer asked for an SLA for the first time, even informally (\"how long max can this be down?\"). Or the single VPS went down once and you learned the hard way you needed redundancy. Or the team grew to three or more people and you don't want to be the only one who knows how to deploy. Or you're paying R$300 per month on a giant VPS when three medium VPS would solve the same with fail-over.",[11,694,695,698],{},[105,696,697],{},"Stage 3 → Stage 4."," B2B contract requires measurable, auditable uptime (words like \"99.9%\" and \"maintenance window\" started showing up in commercial proposals). Compliance asked for detailed audit and you need to show a trail of who did what. Or — the most common signal today — you're tired of Swarm patches and want a tool with a clear roadmap for the next five years.",[11,700,701,704],{},[105,702,703],{},"The universal \"stepped up too early\" signal."," You're spending more time configuring infrastructure than writing product features. Step back one. Seriously. Infra exists to support the product, not the other way around, and most startups that die early die because they built platform without a customer instead of customer without a platform.",[30,706,708],{"id":707},"the-trajectory-that-doesnt-work","The trajectory that doesn't work",[11,710,711],{},"Three common traps teams fall into trying to accelerate the jump:",[11,713,714,717,718,720],{},[105,715,716],{},"Jumping from compose straight to Kubernetes."," The temptation is genuine: \"if I'm going to migrate once, better migrate to the market-leading tool and never again\". Reality is harsher. Six months in you're still fighting 300-line manifests, specialized operators, operators of operators, and spending half your engineering time on problems that didn't even exist when you ran ",[15,719,21],{},". Meanwhile, the simpler competitor shipped twelve features. K8s is worth it at a very specific moment — when you already know you're going to scale to 50+ servers, you have a team to operate it, and the problems it solves are problems you actually have. Before that, it's burned capital.",[11,722,723,726],{},[105,724,725],{},"Staying on compose out of pride."," The other extreme. \"Complicated DevOps is overkill, I don't need it, I've always run everything on a VPS and never had a problem\". Reality arrives the first Friday at five p.m. when the VPS disk dies, last month's backup is three weeks old, and you discover simultaneously that (a) you needed HA and (b) you needed a tested recovery procedure. Both lessons in a single weekend are expensive.",[11,728,729,732],{},[105,730,731],{},"Buying the stack because it's hype."," Service mesh, complete observability stack with five products, GitOps with two repositories and three pipelines, autoscaler with sophisticated policies — for a three-container app serving 200 active users. You're building platform without users for the platform. The same energy invested in product features would have generated ten times the return. If you're at an earlier stage, it doesn't matter how pretty the next stage's tool is.",[30,734,736],{"id":735},"technical-details-that-hold-at-any-stage","Technical details that hold at any stage",[11,738,739],{},"Some decisions hold from stage 1 and keep holding at stage 4. Worth spending three paragraphs on them because each has already caused production pain for a lot of people.",[11,741,742,745,746,748,749,752,753,756],{},[105,743,744],{},"Restart policies."," In Compose, ",[15,747,121],{}," is the right path for someone who wants the container to come back on its own after any failure. ",[15,750,751],{},"on-failure"," is more economical but will bite you when the process exits 0 by mistake. In Swarm, ",[15,754,755],{},"restart_policy.condition: any"," plays a similar role. At any stage, thinking about restart policy is part of thinking about application design — it's not a detail.",[11,758,759,762,763,766],{},[105,760,761],{},"Health checks."," Every application that accepts HTTP needs to expose a ",[15,764,765],{},"\u002Fhealthz"," endpoint returning 200 when healthy. Without it, no orchestrator above stage 1 can distinguish \"container started\" from \"container started and is actually serving traffic\". Reasonable timeout: 5 seconds. Reasonable retry: 3 times before marking unhealthy. Without it, you're going to enter restart loops and take hours to understand what's happening.",[11,768,769,772,773,776,777,780],{},[105,770,771],{},"Named volumes versus bind mounts."," Named volumes (",[15,774,775],{},"volumes: [meudata:\u002Fvar\u002Flib\u002Fpostgresql\u002Fdata]",") survive container recreation, are managed by Docker, and work consistently across stages. Bind mounts (",[15,778,779],{},".\u002Fdata:\u002Fvar\u002Flib\u002Fpostgresql\u002Fdata",") depend on the host filesystem, behave strangely with SELinux and AppArmor, and break when the container changes machines (stage 3 and 4). Use bind mount only for development and for read-only configuration files.",[11,782,783,786],{},[105,784,785],{},"Logs."," Stdout and stderr is the right path, always. An application that writes log to a file inside the container is an application that will give you a headache. The orchestrator captures stdout, routes it where it needs to go (syslog driver, external aggregator, internal service), and you never need to exec inside the container to see what happened.",[11,788,789,792,793,796],{},[105,790,791],{},"Secrets."," Environment variables in a ",[15,794,795],{},".env"," file are comfortable and dangerous — they leak in logs, in backups, in snapshots. For stage 1 and 2, you can live with them if you're careful. For stage 3 and beyond, use the orchestrator's native secrets mechanism. In newer tools (HeroCtl included), the vault is part of the cluster — you don't bolt on a separate product just to store a password.",[30,798,800],{"id":799},"concrete-cost-per-stage-brazil-2026","Concrete cost per stage (Brazil, 2026)",[11,802,803],{},"The raw math, no flourishes:",[41,805,806,812,818,824],{},[44,807,808,811],{},[105,809,810],{},"Stage 1."," 1 VPS at R$30 per month = R$360 per year. Initial setup time: an afternoon. Continuous operation time: about 1 hour per month.",[44,813,814,817],{},[105,815,816],{},"Stage 2."," 1 VPS at R$50 per month = R$600 per year. Setup time: a day. Operation time: about 2 hours per month, mostly dealing with Watchtower updating something it shouldn't have.",[44,819,820,823],{},[105,821,822],{},"Stage 3."," 3 VPS at R$50 = R$150 per month = R$1,800 per year. Setup time: about 3 days until comfortable. Operation time: 4 to 8 hours per month, depending on how many jobs run.",[44,825,826,829],{},[105,827,828],{},"Stage 4 (HeroCtl Community)."," 4 VPS at R$50 = R$200 per month = R$2,400 per year. Setup time: 1 to 2 days until comfortable. Operation time: comparable to stage 3, but without the manual patches and with observability in the box.",[11,831,832,833,836],{},"And to calibrate the comparison many people make too early: ",[105,834,835],{},"managed Kubernetes for the same scale"," costs between R$700 and R$1,500 per month for control plane and load balancers, so R$8.4k to R$18k per year just on infrastructure — not counting the 1 to 2 SREs (R$25k to R$35k per month each) that this stage starts to require. The difference between stage 4 and managed Kubernetes in total cost is usually a full order of magnitude.",[30,838,840],{"id":839},"questions-we-get","Questions we get",[11,842,843,849,850,852],{},[105,844,845,846,848],{},"Isn't compose with ",[15,847,121],{}," enough?","\nIt's enough until the first thing ",[15,851,121],{}," doesn't cover: the entire VPS unavailable, the disk corrupted, the provider's network failure, or a deploy that ships a broken image and enters a loop with nobody to notice. For a hobby project, enough; for a paying customer, it's the starting point, not the finish line.",[11,854,855,858],{},[105,856,857],{},"Is Docker Swarm really deprecated?","\nNot in the official sense — Docker Inc. hasn't announced discontinuation. But it's in maintenance, with no relevant new features since 2019, and the plugin ecosystem stopped growing. It works in production today. A defensible choice for someone already on it. A questionable choice for someone adopting now in 2026.",[11,860,861,864],{},[105,862,863],{},"When is it worth stepping up to managed Kubernetes?","\nWhen you have more than 50 servers, a platform team with 3+ dedicated people, and specific problems the K8s ecosystem solves better (multi-region federation, sophisticated autoscaling, deep library of stateful operators). Before that, you're paying the cost without using the benefit.",[11,866,867,870],{},[105,868,869],{},"Is Watchtower safe?","\nReasonably, with caveats. It pulls any new image published on the tag you're pointing at, without distinguishing between \"an update you published\" and \"a compromised image someone pushed via supply chain\". For stage 2, the trade-off is worth it: the operational gain outweighs the risk. For larger stages, prefer mechanisms that validate the image before promoting.",[11,872,873,876,877,880,881,883,884,887,888,891],{},[105,874,875],{},"How do I back up a Docker volume at stage 1?","\nA daily cron running ",[15,878,879],{},"docker exec"," on the database container with the native dump utility (",[15,882,135],{},", ",[15,885,886],{},"mysqldump",", etc.), pipe to ",[15,889,890],{},"gzip",", and upload to object storage outside the same provider. The golden rule: the backup must live far from the primary data. If the datacenter goes down, the backup needs to be in another datacenter.",[11,893,894,897],{},[105,895,896],{},"Can I jump from stage 1 straight to 4?","\nTechnically yes, especially with tools that make the jump smooth (HeroCtl is one of them, installs in minutes and runs comfortably on 3 servers). Recommended only if you already know you'll need stage 4 within the next six months. Otherwise, stage 2 teaches you things (automated deploy, TLS, image registries) you'll use anyway later.",[11,899,900,903,904,907],{},[105,901,902],{},"What if I don't know Linux deeply?","\nStages 1 and 2 are very accessible with basic knowledge. Stage 3 starts to require network understanding (overlay networks, MTU, occasional iptables). Stage 4, with the right tool, abstracts most of the complexity — but incident debug still requires reading systemd logs, understanding what ",[15,905,906],{},"dmesg"," is saying, and diagnosing a full disk. There's no magic that replaces fundamentals when something goes wrong at three a.m.",[30,909,911],{"id":910},"closing","Closing",[11,913,914],{},"Maturity isn't a moral virtue. It isn't \"better\" to be at stage 4 than at stage 1 — it's better to be at the stage that matches the size of the problem you're solving. A hobby project at stage 4 is wasted capital and attention; a SaaS with fifty paying customers at stage 1 is operational negligence.",[11,916,917],{},"HeroCtl exists to make stage 4 accessible to people who used to have to choose between the discomfort of Swarm and the cost of Kubernetes. If you feel you've moved past stage 2 and are weighing options:",[185,919,921],{"className":187,"code":920,"language":189,"meta":190,"style":190},"curl -sSL https:\u002F\u002Fget.heroctl.com\u002Finstall.sh | sh\n",[15,922,923],{"__ignoreMap":190},[194,924,925,928,931,934,937],{"class":196,"line":197},[194,926,927],{"class":207},"curl",[194,929,930],{"class":221}," -sSL",[194,932,933],{"class":211}," https:\u002F\u002Fget.heroctl.com\u002Finstall.sh",[194,935,936],{"class":230}," |",[194,938,939],{"class":207}," sh\n",[11,941,942],{},"Installs in minutes, runs comfortably on 3 to 5 servers, and has a permanently free Community plan — no artificial feature gate, no server limit, no contract lock-in. Business and Enterprise plans exist for companies with formal SSO, detailed audit, and SLA-backed support requirements, and prices are published without a mandatory \"talk to sales\".",[11,944,945,946,951,952,956],{},"For people comparing tools in the same niche, two complementary posts: ",[947,948,950],"a",{"href":949},"\u002Fen\u002Fblog\u002Fheroctl-vs-coolify","HeroCtl vs Coolify"," covers the trade-off of adopting a tool with real HA versus an elegant single-server panel; ",[947,953,955],{"href":954},"\u002Fen\u002Fblog\u002Fheroctl-vs-dokploy","HeroCtl vs Dokploy"," covers the difference between adopting a cluster with a replicated control plane versus a panel that internally runs Swarm.",[11,958,959],{},"And if the question is \"which stage matches me right now?\", the honest answer almost always is: the one before the one you think you need. Step back one, stay until it hurts, step up when it really hurts.",[961,962,963],"style",{},"html pre.shiki code .sH3jZ, html code.shiki .sH3jZ{--shiki-default:#8B949E}html pre.shiki code .sQhOw, html code.shiki .sQhOw{--shiki-default:#FFA657}html pre.shiki code .s9uIt, html code.shiki .s9uIt{--shiki-default:#A5D6FF}html pre.shiki code .sFSAA, html code.shiki .sFSAA{--shiki-default:#79C0FF}html pre.shiki code .suJrU, html code.shiki .suJrU{--shiki-default:#FF7B72}html pre.shiki code .sZEs4, html code.shiki .sZEs4{--shiki-default:#E6EDF3}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":190,"searchDepth":204,"depth":204,"links":965},[966,967,968,969,970,971,972,973,974,975,976,977],{"id":32,"depth":204,"text":33},{"id":90,"depth":204,"text":91},{"id":284,"depth":204,"text":285},{"id":347,"depth":204,"text":348},{"id":423,"depth":204,"text":424},{"id":481,"depth":204,"text":482},{"id":676,"depth":204,"text":677},{"id":707,"depth":204,"text":708},{"id":735,"depth":204,"text":736},{"id":799,"depth":204,"text":800},{"id":839,"depth":204,"text":840},{"id":910,"depth":204,"text":911},"engineering",null,"2026-04-21","Docker Compose solves dev. In production, even a single server with no SLA can do. Beyond that, you need a real cluster. An honest trajectory through the four maturity stages.",false,"md",{},true,"\u002Fen\u002Fblog\u002Fdocker-deploy-production-compose-to-cluster","15 min",{"title":5,"description":981},{"loc":986},"en\u002Fblog\u002Fdocker-deploy-production-compose-to-cluster",[208,992,993,994,995,978],"deploy","production","cluster","ha","mfqfPpSR1uQ_80pgl39j1VgOwGutdVoxL1ymY7vjuyw",[998,1004],{"title":999,"path":1000,"stem":1001,"description":1002,"date":1003,"category":978,"children":-1},"Database backup in a cluster: strategies that survive 3 a.m.","\u002Fen\u002Fblog\u002Fdatabase-backup-strategies-cluster","en\u002Fblog\u002Fdatabase-backup-strategies-cluster","A backup that has never been restored is placebo. Five strategies with real recovery time (RTO) and honest acceptable data loss (RPO), for each Brazilian SaaS stage.","2025-12-11",{"title":1005,"path":1006,"stem":1007,"description":1008,"date":1009,"category":1010,"children":-1},"GitHub Actions vs GitLab CI vs Drone: which CI\u002FCD to pick for a Brazilian startup","\u002Fen\u002Fblog\u002Fgithub-actions-vs-gitlab-ci-vs-drone","en\u002Fblog\u002Fgithub-actions-vs-gitlab-ci-vs-drone","GitHub Actions won mindshare but has minute costs. GitLab CI is more complete but heavier. Drone (and Woodpecker) self-hosted runs on a small VPS. Practical comparison.","2026-05-15","comparison",1777362214069]