Postgres in production: managed vs self-hosted, the honest math

RDS starts at US$15/month — ends at US$500. Self-hosting starts at $0 — ends waking you up at 3 a.m. How to decide between the two without lying to yourself.

HeroCtl team··14 min· Read in Portuguese →

The "RDS or run Postgres on my cluster" decision is the one Brazilian SaaS most postpones. It shows up in a month-one architecture document, becomes a TODO in month three, becomes an internal fight in month six when the AWS bill comes in high three digits. And meanwhile, no one wants to choose — because every blog post on the subject is written by someone with bias. Those who work for a managed vendor say self-hosted will break you. Those who've maintained Postgres for fifteen years say RDS is a rip-off. Both sides leave things out.

This post opens the real spreadsheet. No frills, no taking sides. When RDS makes sense, when it doesn't, how much each scenario costs in reais, how much it costs in engineer-hours, and what the five mistakes are that turn self-hosted into an accident.

What managed services do for you (no irony)

Before any comparison, it's honest to recognize what RDS, Cloud SQL, Aurora, and modern Postgres-as-a-Service (Supabase, Neon, Crunchy) actually deliver. The marketing inflated it — but the product is real.

Automatic backup with configurable retention. You say "keep seven days," and it's done. Incremental snapshot, no visible maintenance window, no cron, no babysitter. For many teams, this item alone justifies the check.

Point-in-time recovery (PITR). You discover at eleven a.m. that a deploy at nine deleted an important field. In RDS, you restore to 08:55. No reading the WAL archiving manual, no praying for a transaction log to be intact in a bucket. Just console and button.

Automatic security patches. Postgres minor releases come out every three months, and each has a reasonable CVE. In managed, that applies in a window you define. In self-hosted, you discover you're behind when a compliance check hits.

One-click read replica. Want to scale reads? Turn on the replica, wait for replication, point your application. In self-hosted, you configure streaming replication manually, manage replication slot, monitor lag, define what happens if the connection drops.

Automatic Multi-AZ failover. In RDS Multi-AZ, the secondary instance takes over in 60–120 seconds when the primary dies, and the DNS endpoint routes itself. It's the most expensive and most useful feature of the product.

Integrated metrics, centralized logs. CloudWatch already has everything there. Slow queries, cache hit ratio, active connections, IO. You open the console and see.

Hours of operation you don't spend. This is the invisible item. Each of the features above is an afternoon you didn't spend. Twenty afternoons over the year add up to a whole engineer of part-time dedication.

Recognizing this is the honest starting point. RDS is a serious product. It's not air.

What managed services do NOT do (and no one talks about)

Here lives the asterisk. The limitations below aren't on page one of the documentation.

Migrating to another platform becomes a project. When you're in Aurora, leaving Aurora is a two-to-twelve-week project depending on the size. The dialect isn't pure Postgres — Aurora has its own extensions and behaviors. Leaving Cloud SQL for another cloud requires dump-restore, planned downtime, script rewrite, IAM tuning, redoing monitoring. The exit cost is what funds the entry discount.

Some popular extensions simply don't exist. TimescaleDB doesn't run on RDS (AWS offers its own equivalent that isn't compatible). pg_partman has an old version. pgvector arrived late. If your architecture depends on a specific extension, you may discover three months later that it isn't available in your region, in your version, or at all.

Cross-region egress traffic costs. You decide to put a replica in another region for disaster recovery. Each gigabyte leaving the main region for the secondary pays toll. In small workloads it's negligible. In workloads with 200 GB of write per day, it becomes a parallel bill.

Latency between app and database if they're in different VPCs. This is the silent error. You bring up the app on one network and the database on another, with peering. Minimum latency goes from 0.3 ms (same network) to 2–4 ms (peering). Doesn't seem like much until your application makes one hundred and twenty queries per request — then it becomes 350 ms of phantom latency.

Detailed auditing costs extra. Who ran DROP TABLE? In RDS that asks for Performance Insights at the advanced tier (US$7 per vCPU per month) plus a logging plugin. It doesn't come on.

You don't really control the maintenance window. You "configure" a window, but in serious incidents AWS applies patches outside it. It's happened, it'll happen.

The honest financial math

Reference exchange rate: R$5 per dollar. RDS prices in São Paulo region (sa-east-1), April 2026, on-demand. Self-hosted assumes DigitalOcean / Vultr / Hetzner VPS in São Paulo or Miami.

Small scenario: database under 10 GB, up to 100 connections/sec

ItemRDSSelf-hosted
Instancedb.t4g.micro (2 vCPU burst, 1 GB RAM)2 vCPU 4 GB VPS already used by the app
Monthly costUS$15 = R$75R$0 (fits alongside the app)
10 GB gp3 storageUS$1.15included
10 GB backupUS$0.95R$0.50 (S3-compatible)
TotalR$85/monthR$0.50/month

Difference: R$84/month. In a year, R$1k. Doesn't change anyone's life. For an MVP, RDS is defensible just for the automatic backup.

Medium scenario: 50 GB, 1k connections/sec, 1 read replica

ItemRDSSelf-hosted
Primarydb.r6g.large (2 vCPU, 16 GB)Dedicated 4 vCPU, 8 GB VPS — R$120
Read replicadb.r6g.large4 vCPU, 8 GB VPS — R$120
50 GB gp3 storageUS$5.75included
3000 provisioned IOPSUS$60included
50 GB backupUS$4.75R$5 (S3-compatible)
BandwidthUS$10included
TotalUS$280 = R$1,400/monthR$245/month

Difference: R$1,155/month = R$13.8k/year. Here the conversation begins. Is it worth R$14k not to think about backup? For a team of two engineers, that's a month of one of their work. For a team of eight, it's negligible.

Large scenario: 500 GB, 10k connections/sec, real high availability

ItemRDS Multi-AZSelf-hosted cluster
Primarydb.r6g.4xlarge (16 vCPU, 128 GB) Multi-AZDedicated 16 vCPU, 64 GB VPS — R$650
Multi-AZ sync replicaincludedDedicated 16 vCPU, 64 GB VPS — R$650
Read replicadb.r6g.2xlarge8 vCPU, 32 GB VPS — R$320
500 GB io1 storageUS$60included
10k provisioned IOPSUS$650local NVMe included
500 GB automatic backupUS$48R$80 (WAL archiving)
Performance Insights advancedUS$112free (Prometheus)
Egress bandwidthUS$100included up to 20 TB
TotalUS$2,100 = R$10.5k/monthR$1,700/month

Difference: R$8.8k/month = R$105k/year. This is where managed becomes hard to defend financially. But the financial math is only half. The other half is time.

The time math (more important than the financial)

Engineer time in São Paulo costs between R$80 and R$250 per useful hour depending on level. Consider R$150/hour as a weighted average. That's the multiplier you need to cross with each item below.

Initial setup. RDS via console: thirty minutes. You define instance, storage, security group, parameter group, and it's running. Self-hosted done right: four to eight hours. Postgres + PgBouncer + pgBackRest to S3 + monitoring + tuning of shared_buffers/work_mem + restore script + restore test. Doing this in half a day requires prior experience. Without experience, it becomes a whole sprint.

Ongoing monthly operation. RDS: zero. You open the console when something screams. Self-hosted done right: two to four hours. Review slow queries, adjust a parameter that got tight, verify backup ran, monthly restore test, update minor version. That's the cruise regime. If you're spending more than that, problems are happening.

When it breaks at three a.m. In RDS, you open a ticket. AWS Business plan responds in four hours for high severity, one hour for critical. You go to bed and wake up with a workaround. In self-hosted, you are the support. If your monitoring system didn't wake you up, the customer did. If your DR plan is in a document no one has read in six months, you're improvising.

The clear rule: having monitoring, written disaster recovery plan, and monthly restore test — is not optional in self-hosted. It's what separates "professional self-hosted" from "accident waiting to happen".

Minimum stack for production-grade self-hosted Postgres

You can't run Postgres in production without this base. Each component below solves a known failure mode.

Main Postgres on dedicated server. Don't share disk with the application. The engine depends on predictable IOPS, and an uncontrolled growing app log can fill the volume and stop the database. Allocate a VPS just for the database, or a separate volume if it's the same VPS at first.

Connection pool with PgBouncer or Pgpool. Postgres allocates one process per connection. At two hundred direct connections, it consumes more memory than your application. PgBouncer in transaction mode solves it: dozens of real connections to the database serving thousands of application connections. Without it, you die in the first peak hour.

Backup with pgBackRest or WAL-E. Don't use pg_dump in cron as a sole strategy. pg_dump is a logical dump — good for migrating versions, bad for recovering a large database at a precise moment. You want weekly pg_basebackup plus continuous WAL archiving to an S3-compatible bucket (Cloudflare R2, Backblaze B2, Wasabi, or S3 itself). pgBackRest does this and validates integrity.

Hot standby replica via streaming replication. A second server receiving the WAL in real time, ready to be promoted if the primary falls. Bonus: you use that same server for heavy read queries, offloading the primary.

Monitoring with postgres_exporter + Prometheus + Grafana, or an equivalent plugin from the orchestrator you use. You want to see: active connections, cache ratio, transaction rate, replication lag, disk space, slow queries. Without this, you're driving with your eyes closed.

Automated monthly restore test. Cron that picks the most recent backup, restores it on a temporary server, validates that some tables have rows. If that fails, alert the team. Backup that's never been restored is placebo. We've seen teams lose a whole week of data because the "backup" had been corrupted for three months and no one tested.

The five mistakes that break self-hosted Postgres

They've been the same five for fifteen years. They don't innovate.

Not testing restore. We repeat because it's the most common item. Backup that's never been restored isn't backup, it's a file. Automated monthly restore is the civilized minimum.

Keeping shared_buffers and work_mem at default. Postgres's default is designed to run on a small server without assuming anything. In production, shared_buffers should be 25% of RAM, effective_cache_size 50–75%, work_mem calculated per simultaneous connection. Without this, you have 64 MB of cache on a server with 16 GB of RAM and performance is left on the table.

Not monitoring slow queries. A poorly written query by a distracted developer can lock the entire database. pg_stat_statements enabled, alert for any query going over 500 ms in production. Without this, you discover the problem when the customer opens a ticket.

Disk shared with the operating system. System log fills, the database's /var shares the same volume, and the database stops accepting writes. Postgres has to be on a dedicated volume. NVMe if possible.

A single server without replica. Server falls — and it falls, sooner or later, hardware fails — and you're with one to three hours of downtime restoring from backup. Synchronous replica on another server reduces that to seconds.

Postgres on an orchestrator like HeroCtl

This is where operational complexity drops. Not because Postgres got simpler — it's still complex — but because the orchestrator absorbs the plumbing part you'd normally write by hand.

Postgres as a cluster task. The service description is a configuration file of about thirty lines: official Postgres image, named volume for data, environment variables for credentials, reserved CPU and memory, restart policy. No systemd unit, no apt install, no manual firewall.

Persistence via replicated named volume. You say "this volume is replicated between two servers", and the orchestrator ensures the data exists on both. If the server running Postgres falls, the cluster reschedules on the second server with data already present. Recovery time in seconds, not hours.

Integrated automatic backup in the Business plan: continuous WAL archiving to S3-compatible object storage, weekly snapshot, configurable retention. The same RDS feature, no check to AWS.

Read replica as additional task. You describe a second service pointing to the first as upstream replication. Five extra lines in the manifest. No console, no clicks, no manual step.

Built-in metrics. The orchestrator is already collecting CPU, memory, IO from each container. Adding postgres_exporter is one more fifteen-line task. No assembling separate Prometheus, no provisioning Grafana, no popping another server.

Automatic failover if the coordinating server falls: the cluster elects another coordinator in around seven seconds and continues scheduling. Postgres itself comes back on the remaining servers right after.

The full description of a Postgres with replica + backup + metrics on HeroCtl is around one hundred lines. In Kubernetes, the equivalent is an external operator (CloudNativePG or Zalando) plus 300 lines of manifest, plus a separate monitoring stack, plus cert-manager for internal TLS between nodes. For the team of five, the difference is between an afternoon and a sprint.

Comparison table

CriterionRDS São PauloCloud SQLSupabaseNeonSimple Postgres VPSPostgres on HeroCtl
Minimum cost (50 GB medium)R$1,400/moR$1,300/moR$125/mo (Pro)R$95/mo (Launch)R$240/moR$245/mo
Automatic backupyesyesyesyesyou configureyes (Business)
Point-in-time recoveryyesyesyes (Pro)yesyou configureyes (Business)
Real high availabilityyes (Multi-AZ paid)yespartialyesyou configureyes
Custom extensionsrestrictedrestrictedrestrictedrestrictedtotaltotal
Lock-inhighhighmediummediumnonenone
Exit migrationweeksweeksdaysdayshourshours
Included monitoringpartialyesyesyesyou assemblebuilt-in
Minimum expertisenonenonenonenoneseniormid-level
App↔db latency1–4 ms1–4 ms5–30 ms10–50 ms0.3 ms0.3 ms
Ideal rangeanyanyup to 50 GBup to 100 GBindiestartup to mid-size
LGPD compliance via vendoryesyespartialpartialyou documentyou document

No column wins at everything. Each is a coherent set of tradeoffs. Anyone trying to sell a column as "the best" is selling.

Honest decision by profile

MVP up to 10 GB and up to a hundred connections/sec. Postgres as a container alongside the application, on a single VPS. Daily backup to S3-compatible object storage. Total cost, database and all, in the R$10/month range above what you already pay for the VPS. At some point you migrate — and migrating with 10 GB is a Sunday night, not a project. Start simple.

Indie hacker between 10 and 100 GB. Postgres on dedicated VPS, async replica on a second VPS, hourly backup to S3-compatible object (Cloudflare R2 or Backblaze B2 costs cents). Something between R$120 and R$200 per month total. If you have time to dedicate, this is the point where self-hosted pays off a lot.

Early startup between 100 and 500 GB. This is where the decision really lies. Evaluate RDS São Paulo on the LGPD compliance argument (AWS already has the datacenter certifications) — it'll come out in the R$1.5 to R$3k per month range. Or evaluate Postgres in a cluster managed by the orchestrator, on three dedicated VPSes, in the R$400 per month range — but it requires real operational discipline. It's not "self-hosted made easy". It's self-hosted with the orchestrator absorbing the plumbing part.

Heavy compliance or Enterprise. Managed makes sense when the audit framework asks for a vendor with specific certification. But read the contract — some RDS regions in Brazil still don't have all the certifications (HIPAA, FedRAMP, PCI level 1) that the American region has. If your auditor asks for a specific certificate, confirm the region has it before signing.

Questions inexperienced teams ask

Can I start self-hosted and migrate to RDS later? You can, and it's a valid strategy. Postgres is Postgres. You do pg_dump of the base, restore to RDS, adjust the application endpoint, decommission the old server. Up to 50 GB, it's an operation of a few hours with a short window. The opposite path (leaving RDS for self-hosted) also works, but tools like AWS DMS make ingress easier than egress.

Is RDS São Paulo reliable? The sa-east-1 region is one of the oldest AWS regions outside the United States, operating since 2011, and has three independent availability zones. In global AWS incidents, São Paulo usually gets caught up. In regional incidents, it falls alone — which has happened twice in the last five years for a few hours. Reliable enough for production, not reliable enough to skip a plan B.

Does backup with pg_dump in cron solve it? It solves for MVP, doesn't solve for serious production. pg_dump is a logical dump — doesn't preserve exact state, loses on restore time (slow for large bases), and doesn't allow recovery to minute X. The right combination is weekly physical pg_basebackup plus continuous WAL archiving. Tool: pgBackRest.

When is it worth buying advanced Performance Insights? When you're on RDS, have more than five engineers touching the schema, and need to track "who ran this query?". On small teams, native pg_stat_statements already delivers 80% of the value — turn it on first and see if you need more.

And Supabase, Neon, Crunchy? They're different products on top of Postgres. Supabase is Postgres + auth + generated REST API + file storage — good for a project that needs all that integrated, bad for those wanting just a database. Neon separates storage and compute, sleeps when idle, great for staging environment and spiky workload. Crunchy is pure Postgres with enterprise focus and Kubernetes operator. The three have reasonable free tiers for MVP — worth testing before closing with RDS.

How do I do real HA without RDS Multi-AZ? Synchronous replica on a second server (synchronous_standby_names configured) ensures each commit was written to both before returning OK to the application. Failover via Patroni, or via orchestrator like HeroCtl. The sensitive point is split-brain: the replica can't promote itself without external confirmation. Patroni solves it with etcd as arbiter. HeroCtl solves it with the distributed control plane itself acting as arbiter — without setting up an extra service.

Does HeroCtl run heavy Postgres in production for real? It does. The public cluster of the documentation itself serves this blog through a stack that includes a self-hosted Postgres as a cluster task, with replica and backup. For workloads above 500 GB or with IOPS requirements in the 50k range, we recommend evaluating managed — not because the orchestrator can't handle it, but because AWS's provisioned IOPS and I/O control in that range start to make real operational difference.

Closing

There's no single answer to "managed or self-hosted Postgres". There's your spreadsheet. If you opened this post looking for confirmation, what you found was numbers — use them.

For profiles where self-hosted pays off but operation scares you, HeroCtl is the layer that reduces friction. Backup, replica, monitoring, and failover described in a hundred lines of configuration, running on your cluster, no check to vendor, no lock-in.

Install with:

curl -sSL get.heroctl.com/install.sh | sh

For more on the total cost of hosting SaaS in Brazil in 2026, read how much it costs to host a Brazilian SaaS. For the practical transition from Docker Compose to a cluster with real high availability, read Docker deploy in production, from Compose to cluster.

The honest math is the one that fits your spreadsheet. Run the numbers before deciding.

#postgres#database#rds#self-hosted#engineering