Why we built HeroCtl
Kubernetes demands an SRE team. Simple panels lack real high availability. The closest technical competitor changed its license and was acquired. An alternative was missing — so we built one.
Every cluster running in production today has to choose between three paths, and none of them is good enough for the team of five trying to ship a SaaS.
The painful path: Kubernetes
You open a "hello world" manifest and it has 300 lines. You add a templating manager to organize it — now it's 300 lines plus 200 lines of templates. You decide to use a managed cloud version to avoid maintaining the control plane — you pay US$73/month per cluster, plus NAT, plus Application Load Balancer. Need automatic TLS? Install a specialized operator. Metrics? Another operator. Routing between services with encryption? Two more operators and two days studying service mesh. Centralized logs? Yet another stack.
The complexity isn't accidental. The system is a platform for building platforms — built by a team that needed to orchestrate 100,000 machines. When a startup with four servers adopts the same tool, it's using an excavator to plant a sapling. We covered the general thesis in Kubernetes is overkill: when you don't need it.
"Most development teams find Kubernetes overkill for dev environments." — Top 13 Kubernetes Alternatives 2026
The real cost isn't infra, it's the team. Serious operators of this system charge six-figure salaries. You need at least one on staff — preferably two for on-call. They're your first hire after the CTO. Before the designer, before the second product dev, before anything that delivers value to the user.
The easy path: modern self-hosted panels
A single install command on one server, open the panel, deploy in five minutes. It works. The two leaders of this segment together have 80,000 stars in public repositories. The community exploded over the last two years because it solved the right problem: most teams don't need the colossus, they need self-hosted Heroku.
The problem only shows up later. You grow, a customer asks about SLA, the single server becomes a single point of failure. You try to replicate it across two or three servers — these panels have no distributed consensus, no leader election. They're web applications on top of Docker. Elegant for one server; fragile for three.
When your first serious customer asks "what's the SLA?", you'll have to answer "best-effort" or start migrating — probably to the colossus. Starting from scratch in the company's second year.
The technical path that existed
There's an orchestrator that's technically what you want. Single binary, real distributed consensus, multi-tenant, scales to thousands of nodes. The vendor spent eight years polishing it, and those who ran it in production have nothing to complain about regarding the core.
But in August 2023 the vendor changed the license from a legitimate OSS one to a "source available" license that restricts commercial use. In February 2025, the company was acquired by a conglomerate historically known for five-year contracts and platform lock-in. Today that orchestrator is part of the conglomerate's portfolio — and the license prevents you from offering the technology as a service or embedding it in a product without commercial licensing.
For companies that already had it in production, it's a manageable problem. For you adopting today in 2026, it's a big asterisk: the next critical feature might ship only to the paid version, or the license might change again in the next reorganization.
The lesson we drew from this isn't "open source or nothing" — it's "publish the commercial contract from day one, no retroactive change". Honest commercial software is better than open software that turns commercial halfway through. The technical orchestrator's problem wasn't going paid; it was changing the rules for those who had already bet on it.
The gap
None of the three paths combines:
- Single binary (operationally simple)
- Real high availability (consensus across multiple servers, leader election, durability)
- Heroku-like experience (no extensive orchestration files, web panel, automatic certificates)
- Explicit commercial contract from day one (permanent free plan, published paid plans — no retroactive change of terms)
- Batteries included (routing, service mesh, metrics — without assembling five products)
Modern panels have the experience and the free contract, but lose on high availability. The technical orchestrator has HA but changed the contract with those who had already adopted and never prioritized experience. The colossus has all of this only if you assemble it manually — and "manually" costs a team.
Side by side, no frills
The table below is the honest version of the decision. There's no column without a caveat — every orchestrator is a set of tradeoffs, and ours is too.
| Criterion | Colossus (K8s) | Self-hosted panel | Ex-OSS orchestrator | HeroCtl |
|---|---|---|---|---|
| Install time | 4 hours to 4 days | 5 minutes | 1 hour | 5 minutes |
| Config lines for app+TLS+ingress | 300+ | 30 (UI) | 80–120 | ~50 |
| Real high availability | Yes, with 5+ components | No (single-server) | Yes | Yes |
| Router + automatic TLS | External operator | Built-in | External operator | Built-in |
| Encryption between services | Specialized operator | No | Specialized operator | Built-in |
| Persistent metrics | External stack (3+ products) | Plugin | External stack | Internal job |
| Centralized logs | External stack (2+ products) | Plugin | External stack | Built-in single writer |
| Commercial model | Free + high operational cost | Free (single-server) | Restricted commercial (was free until 2023) | Permanent free plan + paid Business/Enterprise |
| Minimum team to operate | 1–2 dedicated SREs | 1 part-time dev | 1 dedicated SRE | 1 part-time dev |
| Ideal application range | 50+ machines | 1 machine | 5–500 machines | 1–500 machines |
The column that matters is the second-to-last: minimum team to operate. That's where the real cost lives. The other criteria are the explanation of why.
What we're building
HeroCtl is a single executable file that you install on N Linux servers with Docker. The first three become the quorum for the replicated control plane. You submit jobs via CLI, API, or built-in web panel — the cluster decides where to run, performs health checks, manages rolling deployments, automatically issues Let's Encrypt certificates via the integrated router.
No CRDs, no specialized operators, no charts. The job spec is a simple configuration file (50 lines for app+ingress+secrets, not 300). Encryption between services and automatic PKI come built-in. Persistent metrics run as a job of the system itself. Logs with single-writer architecture (no assembling Fluentd, no assembling Loki).
Today the public stack runs in production: four nodes on a cloud provider, five sites with automatic TLS, sixteen containers, zero downtime on rolling deployments. The cluster survived a complete chaos battery: kill -9 on the coordinating server (election in seven seconds), 30-second network partition, quorum loss, disk wipe, forced drain. Each of these scenarios becomes its own post.
The practical result is a radically shorter operational model. Bringing up a new application is three steps: you describe the service in a fifty-line config file, submit via CLI, and the cluster decides where to run, opens a port, registers with the router, issues a Let's Encrypt certificate, and starts serving traffic. Updating is a fourth step: change the image version in the file, submit again, and the cluster orchestrates the rolling replacement — no maintenance window, no feature flag, no manual traffic migration.
Debugging is the real test of any orchestrator. When something goes wrong at three in the morning, you need a short path between "the site is down" and "I know exactly what happened". In HeroCtl, that path is single: the panel shows which container failed, on which server it was running, last log before dying, metrics from the last few minutes, version history. No grepping through three different products, no reconstituting context from five dashboards, no switching between tools from different vendors just to understand a failure.
When HeroCtl isn't for you
Honesty is the defense mechanism of a new tool: telling where it doesn't fit is what keeps the product focused. Four profiles where we recommend a different path.
You operate at the level of hundreds of thousands of machines. Companies that run ten thousand nodes or more chose the colossus for a real reason: it was designed for that size. HeroCtl is honest about the ceiling: we tested up to hundreds of nodes in the lab, validated several dozen in customer production, and the roadmap targets the "1 to 500 servers" range. Above that, the colossus ecosystem gives you tools we don't yet have — and building them just to serve 0.1% of cases isn't a priority.
You have compliance requirements that list tools by name. Some audit frameworks (FedRAMP, ITAR, certain government contracts) require the stack to run on specific pre-approved components. HeroCtl is too young to be on those lists. If your compliance officer needs to point to an existing certificate, today the right answer is the colossus or the ex-OSS orchestrator. But if you need a tool's name on the audit list, it's not HeroCtl yet.
You need a deep library of specialized operators. The colossus ecosystem has hundreds of off-the-shelf operators — Postgres with automatic replication, Kafka with balancing, Cassandra with bootstrap. If your architecture depends on four of these operators running in production from day one, HeroCtl doesn't replace them. Our proposal is different: you run your Postgres as a regular job, handling backup and replication like a human does — not delegating to an operator that took three years to stabilize.
You want multi-cloud with workloads moving between providers in real time. HeroCtl runs on any Linux server with Docker, so technically you can mix providers. But the primitives to move encrypted storage between regions, replicate databases to another provider with automatic failover, or orchestrate virtual networks between clouds — the colossus ecosystem solves that better today. It's on our roadmap, not in the current version.
Questions we get
Is HeroCtl just another Docker wrapper? No. Docker wrappers don't do consensus between servers, don't elect a coordinator, don't survive node loss with automatic work redistribution. HeroCtl is a replicated control plane that coordinates agents on each server. Docker stays as the container runtime — an implementation choice, not the product's substance.
What if the company behind HeroCtl goes under? Three contractual protections. First, the binary has no mandatory phone-home — once installed, your cluster keeps working without ever talking to our server. There's no remote kill-switch, no periodic activation that expires. Second, Enterprise contracts include source code escrow: if the company ceases operations, the code is delivered to paying customers via a third-party custodian, with a license for internal continuity. Third, the current price contract is frozen for those signing today — there's no clause allowing retroactive change of terms. What happened to the technical orchestrator in 2023 and in 2025 is structurally prevented here.
How much RAM and CPU does it consume on a small cluster? The public demo cluster runs on four servers totaling five vCPUs and ten gigabytes of RAM, with sixteen active containers serving five sites. The control plane occupies between 200 and 400 MB per server — leaving plenty for real workload. Comparatively, the control plane of a managed version of the colossus starts at about 700 MB per master node before any application comes up.
Can I migrate from the ex-OSS orchestrator to HeroCtl? Yes. The primitives are similar (job, group, task; cluster with replicated control plane; agents on each server). The big difference is in the configuration file — ours is shorter and has fewer abstractions. For teams with a few dozen jobs, migration is manual and takes an afternoon. Above that we have an experimental converter that covers the common cases. Write to us if that's your case.
How does payment work? Three plans with a clear line between them. Community is free forever, no server limit, no job limit, no artificial feature gates — runs the entire stack described above, including HA, router, automatic certificates, metrics, and logs. Individuals and small teams never need to leave it. Business adds SSO/SAML, granular RBAC, detailed auditing, managed backup, and SLA support — for teams with formal platform requirements. Enterprise adds source code escrow, continuity contract, 24×7 support, and dedicated development.
Business and Enterprise prices are published on the plans page — no mandatory "talk to sales". The cutoff line is drawn so you only pay when the company is large enough that SSO and auditing are real requirements, not preference.
Is it production-ready? It's been running the public stack for six months, survived a documented battery of chaos scenarios, and supports the blog you're reading now. "Ready" depends on your risk appetite and the size of your team. For an indie hacker, three servers and a US$10k MRR SaaS, it's more than ready. For a bank regulated by three agencies, wait a few more quarters and talk to us about Business Edition first.
Where do sensitive data (secrets, certificates, configurations) run? In the cluster itself, encrypted at rest. The cluster is the vault — there's no mandatory external vault service. If you want to integrate with an external cloud vault (cloud provider KMS), there's an extension point; but the default configuration is self-sufficient.
What's coming in the next posts
The blog's intent is technical and direct: no marketing fluff.
- Engineering: how consensus is configured, how defense against zombie containers holding ports works, why we chose in-memory snapshot over persisted bitmap for port allocation
- Comparisons: HeroCtl vs Coolify, HeroCtl vs Nomad, HeroCtl vs Kamal, HeroCtl vs Dokploy — real numbers, not opinion
- Case studies: setup with 1 server (replacing a simple panel), 3 servers (real HA), 10+ servers (scale)
- Releases: narrative changelog of features that ship
If you're a developer feeling that the colossus is too much and the self-hosted panel is too little, stick around. If you operate the technical orchestrator and are uncertain about the post-acquisition future, write to us — there's a migration path.
The intent is simple: container orchestration, without ceremony.