Migrating from Kubernetes to a simpler stack: real case of complexity reduction
When a company adopts K8s too early, everyone pays. The reverse path — leaving K8s for simpler orchestration — is viable and more common than it seems. What to validate before, during and after.
The public narrative of the past six years has always been in the same direction: everyone migrates to Kubernetes. Conferences, sponsored posts, SRE jobs, vendor cases — the vector is unique. Came from a bare virtual machine and went up to K8s. Came from Heroku and went up to K8s. Came from Docker Compose and went up to K8s. The direction is one arrow only, and whoever isn't on the arrow must be doing it wrong.
The silent reality nobody publishes at conferences is the inverse vector: hundreds of teams migrate out of Kubernetes after discovering they paid dearly for complexity they didn't need. It's not headline, but it happens every month. Fifteen-dev company with six-node EKS cluster realizes the platform team became half the engineering budget. Startup that adopted K8s on day two discovers that three years later it still spends an entire Friday per month just updating the Helm chart version of operators. Product team that should be shipping features is debugging admission controller webhook.
This post is the playbook for this reverse migration, with the real pitfalls we've seen happen. It's not pitch — it's operational manual. If you read this and decide to stay on Kubernetes, great. The informed decision to stay is as valuable as the informed decision to leave.
The qualifying question: should you even consider this?
Before anything, rule out the case. Reverse migration only makes sense for a specific profile — and most teams thinking about leaving K8s aren't in that profile. The others are researching alternatives when they should be hiring one more SRE or simplifying the current K8s use.
The profile where the migration makes sense has six simultaneous signals:
Signal 1: the company runs Kubernetes in production for one year or more. Migration isn't experiment. If the team has been on K8s for three months and is already complaining, the problem is onboarding, not platform. Wait for the learning cycle to complete before declaring bankruptcy.
Signal 2: the platform team has between one and three people. Companies with five or more engineers dedicated to platform have an operational scale that justifies K8s. Below that, the tool tends to consume the entire team in maintenance.
Signal 3: the cluster has fewer than fifty servers in production. Above that number, the K8s ecosystem gives you tooling — horizontal node scaling, cross-region balancing, multi-cluster federation — that another stack doesn't give you. Below, you're paying overhead for capacity you don't use.
Signal 4: the apps are typical. HTTP web, relational database, in-memory cache, async jobs, some queue. If the stack includes an exotic distributed database operator, service mesh with sophisticated L7 policies, or GPU scheduling for model training, reverse migration gets complicated.
Signal 5: 80% of platform time is maintenance, not new feature. If the platform team spends most weeks updating Helm chart, debugging cluster version upgrade, or fixing a webhook that failed, it's a clear symptom. The platform became an internal product that consumes itself.
Signal 6: total platform salary represents more than 5% of MRR. It's not an absolute rule, but it's a useful metric. When the team that operates infra costs more than a twentieth of monthly recurring revenue, infra is too expensive for the company's current scale.
If your company checks all six, it's worth reading the rest of the post. If it checks three or four, read but decide cautiously. If it checks one or two, the problem is probably another.
Who definitely should not migrate
Same honesty in reverse. There's a profile where leaving K8s is the wrong decision, and whoever fits should close this tab.
Strong platform team with mature process. If you have five platform engineers who master K8s, stable CI/CD pipeline, written runbook, configured observability — leaving all that to start from zero on a simpler stack is throwing real investment away. The destination's simplicity doesn't compensate for the reset.
Stack that depends on critical operators. Relational database with automatic replication managed by operator, distributed queue with balancing managed by operator, columnar database with automatic bootstrap. These operators are real value. Trading for "human takes care of this" is operational regression, not simplification.
Compliance that nominally requires Kubernetes. Some audit frameworks — FedRAMP at certain levels, certain government contracts, some sectoral security seals — list pre-approved tools. If your compliance officer needs to point to an existing certificate, K8s is the answer. Migrating to a tool that's not on the list creates friction that costs more than savings.
Multi-cluster federation in production. If you run workloads that move between clusters in different regions, with state replication coordinated by a tool like Argo or FluxCD in multi-cluster mode, the K8s ecosystem has primitives other stacks don't have. Migrating from that is a six-month project minimum.
ML/AI workloads with complex GPU scheduling. Distributed training, GPU partitioning, scheduling that understands specific hardware affinity. K8s has mature operators and plugins for that. A simpler stack doesn't.
If you fit any of these five, the honest recommendation is to stay where you are and optimize the current K8s use.
Pre-flight assessment: one to two days before committing
Reverse migration starts with inventory. Before scheduling a "let's leave K8s" meeting, the team needs to measure what they have today. Without numbers, the decision is vibes — and vibes don't survive the first unforeseen problem during cutover.
Manifest inventory. Run kubectl get all -A --output yaml > all.yaml and count. How many files in the manifest repository? How many lines aggregated? How many namespaces? Our informal measurement on small teams: a company with ten typical apps usually has between 1,500 and 4,000 lines of YAML, spread across Deployment, Service, Ingress, ConfigMap, Secret, HorizontalPodAutoscaler, and some NetworkPolicy. Each of these lines is migration work.
Helm release inventory. helm list -A shows each installed chart. Each one is a decision. Database operator chart — will it become a regular job at the destination, with manual replication? Ingress chart — will it become integrated router config? Monitoring chart — will it become external agent? The more charts, the more migration time.
Operator inventory. kubectl get crds lists each Custom Resource Definition. Each CRD is a critical dependency that probably has no direct equivalent at the destination. If the output has three or four CRDs (cert-manager, ingress-nginx, prometheus-operator, sealed-secrets), it's within expected for a small team. If it has thirty CRDs, the migration isn't trivial — you built platform on top of platform.
RBAC and complex policies inventory. NetworkPolicy declaring isolation between namespaces, configured PodSecurityPolicy or PodSecurityStandards, fine-grained RoleBinding. All of that needs an equivalent at the destination, and the equivalent is rarely 1:1.
Traffic volume. Requests per second at peak hours, simultaneous database connections, aggregated outbound throughput. The destination needs to absorb all of that. If you've never measured, measure now — before committing to migration schedule.
Service to Ingress mapping. Each exposed Service becomes an entry point at the destination. List of domains, associated certificates, configured sticky sessions, path-based routing rules. Without this list, the migration breaks exactly at cutover time.
This assessment takes one to two days for a competent dev. It's cheap. Skipping this step is the biggest source of migrations that blow through schedule.
Target stack decision
Four main options today for a small team. Each one with explicit trade-off.
Option A: Docker Swarm. Direct compatibility with Compose format, simple multi-server, low learning curve. Good for one-dev team that already knows Compose. Serious limitation: Swarm has been in maintenance mode for a long time, with no active development of new features. Runs and works, but you're betting on a tool that doesn't evolve.
Option B: Nomad. Similar to K8s in declarative model, but simpler and with single binary. Good for those who like robust declarative model and want real high availability. Limitation: the license changed to a restricted model since 2023, and the company behind it was acquired in 2025. For new adoption today, it's a path with an asterisk.
Option C: HeroCtl. Independent orchestrator with replicated control plane, single binary, short configuration. Good for those who want operational simplicity and real high availability from day one. Honest limitation: smaller ecosystem than K8s, without a deep library of ready operators.
Option D: self-hosted panel (Coolify, Dokploy, similar). Web panel that orchestrates Docker on a machine or small set. Good for very small team without formal HA requirements. Limitation: architecture that doesn't support real distributed consensus across multiple servers — grew, became a single point of failure.
The choice depends on the profile. One-dev team without SLA requirement = Option D. Small team with real HA requirement = Option C. Team that prefers robust declarative model and accepts restricted license = Option B. Team already invested in Compose = Option A.
The five steps of the migration
From here on the playbook assumes the target stack is HeroCtl, but the skeleton applies to any destination. Adjust the conceptual mappings to Swarm/Nomad/Coolify according to choice.
Step 1 — Setup the destination in parallel (one week)
Hard rule: never migrate in-place. The current K8s cluster keeps running untouched throughout the migration. The destination is provisioned in parallel, on new servers, with a temporary domain or test subdomain.
Three to five new Linux servers. Install the target stack. Validate that the network between servers works, that storage persists after reboot, that certificates are issued automatically, that secrets can be injected into apps. Connect the destination with the same image registry the current K8s uses — that way the same image running in production goes to the destination without rebuild.
This step is deliberately light. The intent is to prove the destination works with a synthetic app before committing to migrating a real app.
Step 2 — Migrating manifests to destination spec (one to two weeks)
Most of the effort is here. Each K8s workload needs to be re-expressed in the destination format. The conceptual mapping from K8s to HeroCtl serves as reference:
- Deployment + ReplicaSet → job with
replicas: N. The concept is the same: N copies of the same workload, balanced between servers. - Service ClusterIP → internal service. In HeroCtl you don't need to create — any task has a name resolvable inside the cluster by default.
- Service LoadBalancer or Ingress → integrated ingress. Without external operator, without separate cert-manager, without ingress-nginx — everything embedded in the orchestrator.
- Pod → task. 1:1 concept.
- PersistentVolume → named volume. May require data copy, depending on the storage backend used in K8s.
- ConfigMap → env block or file in the spec. There's no separate object.
- Secret → orchestrator's integrated secret. Encrypted at rest in the control plane.
- HorizontalPodAutoscaler → scaling policy in the job spec. Triggered by CPU usage, RAM, or custom metric.
- DaemonSet → job with placement restriction "1 per node".
- CronJob →
periodictype job with cron expression. - Helm chart → custom spec. Doesn't convert 1:1 — re-write by hand.
In raw lines, the reduction is dramatic. A typical web app in K8s has 30 to 50 lines of Deployment, plus 20 of Service, plus 50 to 100 of Ingress + cert-manager + annotations. Total 100 to 170 lines. The equivalent in HeroCtl sits between 30 and 50 lines, aggregating everything in a single file.
Average migration time per app: one to three days for a competent dev. Ten apps in three weeks is a realistic pace. If it goes much slower, there's a hidden operator or undetected complexity in the assessment — stop and re-measure.
Step 3 — Database and storage migration (one to three days)
Two strategies. If the database is managed (RDS, Cloud SQL, equivalent), the destination just points to the new connection string and that's it — the database stays where it was, platform-agnostic. If the database is self-hosted on K8s, it's manual dump-and-restore: pg_dump on the old database, pg_restore on the new, with a short maintenance window at cutover time.
Persistent volumes from K8s become named volumes at the destination. May require data copy via rsync or snapshot — depending on the storage backend, this is an additional window.
Secrets are extracted from K8s and re-inserted at the destination. Use a secure channel (kubectl get secret -o yaml is just a means of reading; never commit an intermediate file). In HeroCtl, secrets are submitted via API with TLS and stay encrypted in the control plane.
Step 4 — Cutover (one to three hours, usually overnight)
The critical step. Pre-checks before any DNS change: smoke test on the destination — login works, main page loads, database is connected, latency is acceptable, queue processes job, metrics arrive at monitoring. If any of the five fails, abort the cutover.
DNS prepared: TTL reduced to 60 seconds twenty-four hours before the window. Without that, propagation takes hours and rollback is painful.
Cutover proper: change the DNS record to point to the destination IPs. Monitor 5xx and latency in a five-minute window. If something breaks significantly in the first thirty minutes, switch DNS back to K8s — complete rollback in sixty seconds of additional propagation.
Keep the K8s cluster running as standby for thirty days. Don't shut down. The extra cost is justified: if some latent bug appears in week three of the destination, you still have a place to go back to.
Step 5 — Decommission of K8s (one to two hours, after thirty days)
Thirty days after cutover, without significant incident, time to shut down. kubectl delete cluster in the self-hosted case, or aws eks delete-cluster (or equivalent in other clouds) in the managed case. Cancel managed addons separately — the bill has items that don't disappear with delete-cluster alone.
Prorated refund of the current month of the managed plan, if the provider offers. Worker instance cancellation. Final backup of the cluster state before delete, in case of future audit.
The six pitfalls of the path
Technical assessment covers what you can measure. The pitfalls below are what escape assessment and break the schedule. Each one has already caused a migration that blew through deadline at some real team.
Pitfall 1: hidden operator dependency. You think you don't have a complex operator, but cert-manager + ingress-nginx + sealed-secrets is already a stack of three operators. And probably more — kube-state-metrics for monitoring, external-dns to update DNS automatically, reloader to restart pods when ConfigMap changes. Map everything. Each operator is migration work that the superficial assessment misses.
Pitfall 2: assuming Helm chart is rewritable in a day. Simple chart with five templates is a few-hours rewrite. Complex chart with thirty templates, nested values, pre-install/post-install hooks, and subchart dependencies can take a week just to map to equivalent spec. Calibrate the estimate by the most complex chart, not by the simplest.
Pitfall 3: undocumented sticky sessions. ingress-nginx in K8s supports persistent session via annotation configuration. If the app depends on that (shopping cart, admin session, persistent websocket) and nobody documented it, the migration breaks exactly at cutover when a user starts switching between two backend servers and loses session state. Audit ingress configuration upfront — don't trust just what the team remembers.
Pitfall 4: different resource limits. K8s uses limit/request with precise semantics: request is guarantee, limit is ceiling. The destination may have a different declarative model (hard limit, or aggregated quota per job, or soft-limit semantics). Tuning error here breaks autoscaling — the app stays underprovisioned in production and doesn't scale when it should, or overprovisioned and wastes capacity. Re-measure real consumption after cutover, adjust limits in the first week.
Pitfall 5: log format. Some K8s ingresses emit log in JSON by default — downstream parser (Loki, Datadog, ELK) is configured for that format. Destination may emit log in plain text or different format. Downstream parsing breaks silently — alerts stop firing because the pattern doesn't match anymore. Verify destination's integrated router log format before cutover.
Pitfall 6: coupled CI/CD pipeline. GitOps with ArgoCD or FluxCD pointing to K8s needs to be reworked. If the pipeline applies declarative manifest with kubectl apply or helm upgrade, that doesn't work at the destination. Adapter scripts at the deploy stage are necessary — receive the old manifest, translate to new spec, submit via API. Estimate one to two weeks just for the CI/CD pipeline, separate from manifest migration time.
Realistic schedule
Honest expectation calibration, in three size ranges.
Team of one to two devs, five to ten apps: four to six weeks total. Decomposition: one week of destination setup, two to three weeks of manifest migration and adjustment, one to three days of cutover, thirty days of parallel operation, one day of decommission. Note: migration work steals focus from product development during this period. Consider feature freeze window.
Team of three to five devs, twenty to fifty apps: eight to twelve weeks. Multiplication isn't linear — additional apps increase cutover test matrix. Worth dedicating one person full-time to migration and keeping the rest of the team on product.
Company with one hundred or more apps: four to six month project, with one to two dedicated people. At this size, migration becomes a phase with project manager, biweekly milestones, and status reports. It's not a sprint.
Typical post-migration results
Ranges observed in teams that completed migration. They're not guarantees — they're reference points.
- Total RAM reduction: 30% to 50%. Kubernetes overhead is real, and disappears when you leave. Cluster that used 32 GB of aggregated RAM becomes something between 16 and 22 GB for the same workload.
- Cloud cost reduction: 40% to 70%. Comes from three fronts: no managed control plane (US$73/month per cluster leaves the budget), no NAT gateway per subnet (some providers charge per GB), smaller instances possible (platform overhead exits consumption).
- Deploy time: similar or slightly better. Not where the gain is — K8s is reasonably fast in deploy when configured well.
- Learning time for new dev: one week, against four to six in K8s. The mental model is simpler — fewer intermediate abstractions between "I want to run this" and "it's running".
- Monthly operation time: one to three dev-hours of maintenance, against twenty to forty in K8s. The bigger gain. It's here that ROI materializes.
To calibrate the last metric: our public demo cluster runs on four servers totaling five vCPUs and ten gigabytes of RAM, with control plane occupying between 200 and 400 MB per server. New coordinator election, in case of current one's failure, takes about seven seconds. Typical application spec in HeroCtl has about fifty lines — compared to three hundred lines or more of YAML in Kubernetes for "hello world" equivalent with TLS and ingress.
The inevitable question: will it go back to Kubernetes eventually?
Honesty. Depends on scale.
Team that grows to thirty or more devs, with one hundred or more servers in production, multi-region, with cross-cluster federation requirement, eventually hits the ceiling of a simpler stack. At that scale, K8s becomes a rational choice — the ecosystem gives you tools other stacks don't have. The migration back is a months project, not days, but it's a viable path.
For startups that stay sub-fifty servers over five years — the absolute majority of them — it rarely makes sense to go back. The operational gain of the simpler stack holds throughout the product's useful life.
Reverse migration (HeroCtl → K8s) is also a weeks project, not days. It's not a one-way decision. If the company grows much faster than expected, the path back exists — more expensive than staying, but it exists. The decision to migrate now doesn't lock you in forever.
Questions we receive
How long until ROI? For one-to-two-dev team with small cluster, migration pays in three to six months — the salary-equivalent of recovered maintenance time exceeds migration project cost. For larger teams, depends on how much the platform team consumed in maintenance; typically six to twelve months.
Can I keep Kubernetes for a specific workload and migrate the rest? Yes, and in some cases it's the correct strategy. Workload with critical operator (distributed database, queue with managed balancing) stays on K8s. The rest goes to a simpler stack. The two clusters coexist with separate domains or path-based routing on an upstream router. Costs a bit more than consolidating, but avoids re-writing what still works well.
Complex Helm charts: worth re-writing? Case by case. Third-party operator chart with fifty files: probably not worth it, keep on K8s or change the technology. Own chart with twenty templates: worth it, it's a few-days rewrite and eliminates Helm dependency.
Does ArgoCD work with HeroCtl? Not directly — ArgoCD was made to apply K8s manifest. But the GitOps concept works: pipeline observes the repository, translates destination spec to API payload, submits via authenticated curl. Native plugin is under consideration; for now it's a fifty-line adapter script.
The team that learned Kubernetes — will they be resentful? Legitimate question. K8s learning curve is real investment, and nobody likes seeing investment discarded. Direct conversation: the skill doesn't disappear. K8s remains a market standard for large scale, and a dev who already mastered it remains employable and valuable. The migration is a product decision for current scale, not a verdict on individual knowledge.
Is cloud agnostic more or less viable afterward? More viable, in practice. Simpler stack runs on any Linux server with Docker — bare metal, VPS from any provider, instance from any cloud. Managed K8s ties you to the provider (EKS on AWS, GKE on Google, AKS on Azure) — each with its own flavor. Leaving expands options.
Is there a public case of a company that did this migration? Several, but most don't publish at conferences (the narrative vector continues to be K8s for everyone). On forums and in informal conversations, it's easy to find a report. If you want to talk to someone who did the migration, write us — we make the bridge.
Closing
The decision to leave Kubernetes for a simpler stack isn't an admission of defeat — it's recognition that the right tool depends on the company's current scale, and that the company's current scale isn't from the colossus marketing book. Small team, small cluster, typical apps, platform consuming half of engineering budget: it's exactly the scenario where reverse migration pays.
It's not an afternoon decision. It's a four-to-six-week project for a small team, with inventory, mapping, overnight cutover, thirty days of parallel operation, and careful decommission. But it's a project whose ROI is measured in dev-hours recovered every month — every month, for the next years of the company.
If you want to try HeroCtl as candidate destination:
curl -sSL get.heroctl.com/install.sh | sh
Runs on any Linux server with Docker. Three servers form a replicated control plane with real high availability. Application spec sits between thirty and fifty lines, aggregating everything needed (replication, ingress, automatic certificate, secrets). The permanently free Community plan covers the entire stack described here — only Business and Enterprise add SSO, granular RBAC, detailed auditing, code escrow and SLA support, geared toward companies with formal platform requirements.
For additional context, k3s vs HeroCtl: when each one makes sense addresses the choice when the team has already decided to leave vanilla K8s but hesitates between lightweight K8s distribution and independent orchestrator. And Kubernetes is overkill: when you don't need it is the underlying argument for those not yet convinced that complexity is unnecessary at current scale.
Reverse migration isn't a conference headline. But it's the right decision for more teams than the public narrative admits.