Redis (and Valkey) in production: managed vs self-hosted in 2026
Redis changed its license in 2024, Valkey was born as an OSS fork, Dragonfly hits benchmarks. In 2026, choosing cache is no longer choosing Redis — it's choosing between 4 products. Honest analysis with costs.
The question "managed or self-hosted Redis?" became another question at the end of March 2024. That's when the company behind Redis switched the license from Apache 2.0 / BSD to a combination of RSAL with SSPL — a pair of "source available" licenses designed to prevent cloud providers from offering Redis as a service without commercial licensing. The reaction was quick: the Linux Foundation launched Valkey as a direct fork from the last BSD version, with AWS, Google, and Oracle backing the development. In parallel, projects that already existed — KeyDB and Dragonfly — started appearing more frequently in benchmarks of companies reassessing their stack.
TL;DR
In 2026, "Redis in production" became a category with four implementations disputing the same protocol: Redis OSS (BSD pre-2024 or RSAL post), Valkey (BSD, drop-in via fork), KeyDB (multi-thread, old fork), and Dragonfly (BSL, rewritten from scratch in C++). Self-hosting any of the four costs between R$30 and R$130 per month on Hetzner VPS. The managed path costs from R$75 (ElastiCache micro) to R$1,000/month (13 GB instance), plus Upstash with serverless billing varying US$0–100/month. For Brazilian startup with MRR below R$200k, self-hosted Valkey on its own cluster saves between R$300 and R$1,500 per month compared to managed, eliminates RSAL license exposure, and maintains full compatibility with Redis clients. Switching the stack after adopting the commercial version is real pain — starting with the OSS-friendly version is the bet with the lowest exit cost. This post compares the four products, the three managed paths (ElastiCache, Upstash, Redis Cloud), and the minimum configuration to run Valkey in production without losing sleep.
The short story of the license change
Before March 2024, "Redis" was the dominant OSS cache: BSD, gigantic ecosystem, present in any stack that had ever fit the word "Rails" or "Node" on a résumé. The commercial vendor — Redis Inc, formerly Redis Labs — lived well off the managed product and the paid modules (Search, JSON, TimeSeries).
Then came the announcement: version 7.4 onward would ship under RSAL + SSPL, no longer BSD. In practical terms, the change directly targeted AWS, Google, and Azure. The internal reading from those who produce open source software was different: "if it happened to Redis, it can happen to any VC-funded project". It was the third recent case — after Elastic in 2021 and MongoDB in 2018 — where a project that seemed consolidated changed the rules.
The Linux Foundation was quick. Five days after the announcement, Valkey was formed as a fork of the last BSD version (7.2.4), with independent governance and weighty backers: AWS, Google Cloud, Oracle, Ericsson, Snap. In just over a year, AWS had already migrated ElastiCache's default engine to Valkey. Google Memorystore followed. In 2026, Valkey stopped being "experimental fork" to become a growing reference — with 7.x and 8.x versions already incorporating its own optimizations that weren't even offered to Redis OSS.
The operational lesson for those choosing cache today: the mainstream moved. There's no longer the inertia of "no one was fired for choosing Redis" — the question in the architecture interview became "why Redis and not Valkey?". And the honest answer, in most cases, is "habit".
What are the four products disputing this market?
Redis OSS
The original. Versions before 7.4 are still under BSD and remain usable indefinitely — no one revokes a license retroactively. Versions 7.4 onward ship under RSAL/SSPL.
Pros: still huge community, battle-tested in production for over a decade, richer ecosystem (modules, integrations, books, talks). Almost every client library tested against Redis OSS first.
Cons: the RSAL prevents offering-as-a-service without commercial licensing. For those operating Redis for internal use, that's irrelevant — the restriction is about resale. The real risk is strategic: if the vendor changed the license once, they can change it again. Adopting Redis OSS in 2026 means betting that the next critical feature will descend to the open branch, and not stay in the commercial product.
Valkey
The Linux Foundation fork. Took the code from 7.2.4 BSD and kept developing. Drop-in replacement at the protocol level: no client needs to change a line of code to swap Redis for Valkey.
Pros: permanent BSD guaranteed by neutral governance (it's not a company, it's a foundation). Big backers align incentives to keep the project healthy. Technical parity with Redis 7.x and growing development speed.
Cons: the brand is still being built — some third-party plugins and very specific SDKs still only list "Redis" in the README. In 2026 that's increasingly cosmetic, but it can show up in old integrations needing minor adaptation.
KeyDB
The multi-thread fork. Has existed since 2019, was acquired by Snap in 2022, lives today as Snap-Telemetry project. The architectural difference is fundamental: Redis OSS and Valkey are single-thread by design (one main thread processes all commands). KeyDB runs multi-thread by default.
Pros: on CPUs with 4+ cores, KeyDB delivers 2 to 3 times more throughput than single-thread Redis on the same hardware. API is compatible, so the client doesn't change. For CPU-bound workloads with high volume, it's the obvious choice.
Cons: smaller community, pace of adopting new Redis features usually lags quarters behind. Some new Redis features (Functions, certain extensions) take time to appear in KeyDB.
Dragonfly
The rewrite. Not a fork — it's a new implementation in modern C++, with hash table designed for cache (not Redis's generic structure), using io_uring on Linux for asynchronous I/O. Compatibility at the protocol level, not at the code level.
Pros: claims of 25× throughput in specific benchmarks (heavy pipelines on modern hardware). Real memory efficiency — 2 to 3 times more data in the same RAM as Redis. No implicit GIL of single-thread; scales vertically on a machine with 32+ cores.
Cons: BSL (Business Source License) license — stays closed for 4 years before becoming Apache 2.0. It's exactly the same license pattern that caught other projects in the orchestration industry by surprise, which we covered in our post on why we built HeroCtl. Some commands still incompatible with Redis in edge cases (complex Lua scripts, certain cluster operations).
Which to choose for a new project in 2026?
The short decision tree:
- Sensible default: Valkey. Permanent BSD, Redis parity, client doesn't need to change, future guaranteed by big backers. There's no technical reason to prefer Redis OSS for a new project in 2026.
- Critical performance: Dragonfly, if the application sustains above 100k operations per second and the team accepts the BSL license risk.
- Multi-thread without rewrite: KeyDB, if the bottleneck is CPU on big hardware and the team prefers not to migrate to Dragonfly.
- Extreme simplicity (1 VPS, low volume): Redis OSS 7.2.4 BSD still works perfectly. Crystallized as a stable version; will run on any Debian/Alpine for the next five years without complaining.
- Migrating from Redis Labs managed: Valkey is drop-in. Zero code changing. Migration is only operational — replication, DNS swap, rollback if necessary.
Managed vs self-hosted: the math without frills
The numbers below are list price in May 2026, R$5/USD exchange rate.
AWS ElastiCache
Grows in steps per instance:
cache.t4g.micro(1 GB): about US$15/month = R$75/monthcache.t4g.small(2 GB): US$30/month = R$150/monthcache.r6g.large(13 GB): about US$200/month = R$1,000/monthcache.r6g.xlarge(26 GB): about US$400/month = R$2,000/month
Multi-AZ doubles the price (replica in another zone). Automatic backup is included. Real Multi-AZ failover is the main argument — you pay not to have to think about it.
Upstash
Serverless billing per command:
- Free tier: 256 MB, 500k commands/day
- Pay-as-you-go: US$0.2 per 100k commands
- For startup with medium volume (10M commands/day): about US$60/month = R$300/month
- For app with low peak: can stay between US$0 and US$10/month
The unique operational advantage: zero pre-allocated capacity. If the app sleeps, the bill sleeps. For Vercel/Cloudflare Workers, it's the natural complement. For sustained and predictable load, it ends up more expensive than ElastiCache.
Redis Cloud (direct offer from Redis Inc)
- Essentials Plan 30MB: free
- Pro Plan 5GB single-region: about US$50/month = R$250/month
- Pro Plan 10GB multi-AZ: about US$120/month = R$600/month
Includes commercial modules (Search, JSON, TimeSeries) that don't exist in Valkey or Redis OSS. If you use those modules, there's no direct alternative — it's Redis Cloud or buy commercial license and self-host.
Self-hosted on Hetzner
- CPX21 (3 vCPU, 4 GB RAM, 80 GB SSD): €7.99 = R$44/month. Fits 2 GB Valkey with room to spare.
- CPX31 (4 vCPU, 8 GB RAM, 160 GB SSD): €13.99 = R$78/month.
- Cluster of 3 CPX21 for Valkey + Sentinel HA: 3 × €7.99 = €24/month = R$130/month.
- Cluster of 3 CPX31 for serious data: €42/month = R$230/month.
For DigitalOcean, Linode, Vultr, multiply by approximately 1.5×. For AWS EC2, multiply by 2×. But in any case it stays cheaper than the equivalent managed.
Practical difference
For 8 GB cache workload with replication:
- ElastiCache Multi-AZ: ~R$1,000/month
- Redis Cloud Pro Multi-AZ: ~R$600/month
- Self-hosted Valkey on 3× Hetzner CPX31: R$230/month
- Single-node Valkey on 1× Hetzner CPX31 + S3 backup: R$80/month
Whoever chooses the managed path pays 3 to 10 times more for the same throughput. The difference is what you buy with that: contractual SLA, automatic multi-AZ failover, absence of 3 a.m. pager. For a small team, that may be worth the price. For a team that already operates Linux servers in production, it usually isn't.
Minimum production-grade Valkey stack
Configuration that withstands real production without theater:
- Container or systemd service on dedicated VPS. Don't share the machine with the application — cache and app compete for RAM, and when it goes wrong it goes wrong for both at the same time.
maxmemoryconfigured between 50 and 70% of available RAM. Leaving memory for the system and network buffers is more important than having the last megabytes for cache.maxmemory-policy:allkeys-lruif pure cache mode (throw out old keys when full).noevictionif storage mode (queue, sessions) — there prefer write error to silently losing data.- AOF persistence if the load is job queue (Sidekiq, BullMQ, Resque). Without AOF, a restart loses any job that was queued but unprocessed. RDB is insufficient in that scenario because snapshot is periodic.
- Sufficient RDB if the load is pure cache (Rails cache, Django cache). If restarting losing cache only means "slow request for a few seconds while it warms up", AOF is unnecessary overhead.
- Async replication to standby on a second node. Manual failover with internal DNS swap is acceptable for many cases. Automatic failover costs Sentinel or Cluster.
- AOF + RDB backup to S3 or compatible, daily. Restic or rclone solve well.
- Monitoring with
redis_exporterexporting to Prometheus + alerts on Grafana or similar. Critical metrics:connected_clients,used_memory,evicted_keys,keyspace_hits/misses,latency_percentiles.
This setup runs comfortably on CPX21 (R$44/month) serving 50k+ ops/s sustained for average Brazilian app.
Sentinel or Cluster?
Question that confuses many teams coming to Redis for the first time.
Sentinel: 1 master + N replicas + 3+ sentinel processes monitoring. Automatic failover when master falls — a sentinel detects, the sentinels vote, a replica becomes master, clients receive new endpoint via discovery. All on a single shard — the entire dataset fits on one node.
Cluster: dataset partitioned into 16384 slots distributed across 3+ masters. Each master has its own replicas. Multi-shard, horizontal capacity scaling — you can have 100 GB total with no individual node holding more than 20 GB.
The practical rule: Sentinel is enough up to ~100 GB dataset. Above that, Cluster is necessary. For most Brazilian startups, Sentinel is the right choice for simplicity — Cluster adds real complexity (key needs hashtag for multi-key operations, Lua scripts get restricted to a slot, some clients have bugs in cluster mode).
Don't use Cluster for status. Use Sentinel until the metric forces.
Sidekiq, BullMQ and friends patterns
Real use, not marketing diagram:
- Sidekiq Ruby: Redis needs AOF. Without AOF, any crash loses queued jobs that haven't yet been picked up. Sidekiq Pro adds "reliable fetch" that improves — but the backstop is still AOF.
- BullMQ Node: similar. AOF essential for durability. BullMQ uses data structures that depend on Redis transactional atomicity — restart without AOF can leave queue in inconsistent state.
- Resque Ruby: the father of all. AOF necessary for the same reasons.
- Pure cache (Rails.cache, Django cache, Laravel cache): can run without AOF, RDB sufficient. Losing cache on restart is acceptable.
- Pure pub/sub: doesn't even need persistence. Pub/sub is fire-and-forget by design.
Mixing cache and queue use on the same Redis works — just configure AOF (the "worst case" load determines). But for serious workload, separating into two instances (one for cache without AOF, another for queue with AOF) is cleaner. Operationally cheap if there's already an orchestrator running.
Is ElastiCache São Paulo reliable?
Yes — 99.99% contractual uptime SLA, multi-AZ in São Paulo region (sa-east-1), automatic backup, tested failover. Latency from Brazilian app to ElastiCache São Paulo stays at 1-3ms, indistinguishable from local Redis for most workloads.
The weak point isn't technical reliability, it's cost and lock-in. AWS Brazil charges about 30% more than North American regions for the same resource. And migrating from ElastiCache to another provider later involves dump/restore + coordinated cutover — not apocalypse, but it's weekend work.
Comparison table: 12 criteria
| Criterion | Redis OSS | Valkey | KeyDB | Dragonfly | ElastiCache | Upstash | Self-hosted Valkey |
|---|---|---|---|---|---|---|---|
| License | RSAL/SSPL (7.4+) | BSD | BSD | BSL → Apache 4 years | Commercial AWS | Commercial Upstash | Permanent BSD |
| Threading | Single | Single | Multi | Multi | Single (engine 7) | Serverless | Configurable |
| Redis client compat. | 100% | 100% | 100% | 95%+ | 100% | 100% (subset of commands) | 100% |
| Baseline throughput | 100k ops/s | 100k ops/s | 250k ops/s | 1M+ ops/s | depends on inst. | depends on plan | 100-250k ops/s |
| AOF persistence | Yes | Yes | Yes | Yes | Yes (snapshot) | Managed | Yes |
| Replication | Yes | Yes | Yes | Yes | Multi-AZ | Multi-region | Yes (manual config) |
| Automatic failover | Sentinel/Cluster | Sentinel/Cluster | Sentinel/Cluster | Cluster | Built-in | Built-in | Sentinel/Cluster |
| Cost 8GB/month (R$) | 80 (VPS) | 80 (VPS) | 80 (VPS) | 80 (VPS) | 1000 (Multi-AZ) | 300-500 | 80-230 |
| Lock-in | Medium (license) | Low | Low | Medium (BSL) | High (AWS) | High (Upstash API) | Low |
| Premium modules | Paid | N/A | N/A | N/A | Add-on $$ | Limited | N/A |
| Operational | You | You | You | You | AWS | Upstash | You |
| SLA support | Paid | Community | Community | Paid | Included | Included | You |
When managed still makes sense
Honesty is the defense mechanism of any technical recommendation. There are four profiles where paying for managed is the right choice:
- Team without operational capacity for Redis cluster. If no one in the company knows how to debug a master that no longer responds, or interpret RDB fork latency, or take care of AOF backup — paying AWS to do that is rational. It's not an excuse, it's division of labor.
- Compliance requiring SOC2/ISO certified vendor. Audit asking for "certified vendor X" doesn't accept "we run Valkey on a Hetzner VPS". The path is ElastiCache, Redis Cloud, or similar with certifications in the contract.
- Volume needing instant scale. Application going from 100 req/s to 100k req/s in 5 minutes due to viral campaign — Upstash's serverless path is where it shines. Self-hosted needs reserved capacity beforehand; serverless grows on the fly.
- Fully serverless application. If the app runs on Vercel or Cloudflare Workers and Redis also needs to be serverless by billing model, Upstash is practically the only sane option. Connecting edge functions to a Redis on VPS implies bad cold start.
When self-hosting is obvious
And four profiles where paying managed is waste:
- Startup with R$10k–R$200k MRR optimizing cost. The difference between R$80/month and R$1,000/month of cache is 1% of total cost of a small SaaS; it's also 11 hours of dev person-hour salary. Worth doing the math.
- Predictable workload. If cache volume grows 10% per month, there's no advantage in serverless scaling. Reserved capacity on VPS is cheaper and more predictable.
- Team has 1+ person comfortable with Linux/Docker. If there's already someone who operates Postgres, nginx, Docker — Redis/Valkey is easier than any of them. Learning curve is days, not weeks.
- Already have own cluster. If the company runs an orchestrator (HeroCtl, Coolify, similar platform) with spare nodes, Valkey becomes just another job. Marginal cost close to zero — you already pay for the nodes.
HeroCtl as infrastructure for Valkey
For those operating HeroCtl, running Valkey in production is a short configuration exercise. A ~30-line file describes a job with:
- Official Valkey 8.x container
- Replicated named volume between nodes (data survives kill -9 of server)
- Reserved resources (RAM and CPU) with hard limits
- Health check on Valkey ping
- Internal routing between services (the app talks to
valkey.servico.localwithout exposing port to the internet)
Automated AOF + RDB backup to S3-compatible is available in the Business plan — without setting up external restic, without manual cron on the host. Valkey metrics come out via redis_exporter running as sidecar and appear in the internal Prometheus (already included as a job of the cluster itself, no external stack).
Sentinel failover is integrated with the orchestrator's control plane: if the Valkey master node falls, the cluster detects in around 7 seconds and the replica is promoted. The app's configuration is updated via service discovery — no manual redeploy.
For a startup with 4 servers running the orchestrator, this setup replaces entire ElastiCache Multi-AZ at zero marginal cost (the servers are already there). The real monthly difference is the salary-equivalent of one person, depending on the size of the operation.
Questions we get
Is Valkey compatible with Redis client libraries?
Yes, in 100% of practical cases. The protocol is identical — redis-cli, node-redis, ioredis, redis-rb, redis-py, go-redis, all work without changing a line. What changes is just the endpoint. In 2026, several libraries already announce explicit support for Valkey in the README, but that's cosmetic — the protocol is the same.
Can I migrate from managed Redis Labs to self-hosted Valkey without downtime?
Yes, with replication. Configure Valkey as Redis Labs replica (REPLICAOF host port), wait for sync (a few minutes to hours depending on dataset), promote Valkey to master (REPLICAOF NO ONE), do internal DNS cutover, decommission Redis Labs after observation period. Real error window is seconds during the swap.
Is Dragonfly worth the BSL risk? Depends on the company's horizon. BSL converts to Apache 2.0 after 4 years by the standard model — so today's code will be open by 2030. The risk is that the company behind it (DragonflyDB Inc) follows the path of Redis Inc and makes the conversion less friendly. For workloads that demand performance Valkey doesn't deliver (above 500k sustained ops/s), Dragonfly may be the right choice despite the risk. For the rest, Valkey is more conservative.
How much RAM does a Redis with 1 GB of useful data consume?
Practical math: 1 GB dataset occupies between 1.3 and 2 GB of real RAM (structure overhead, fragmentation, client buffers, replication backlog). Configuring maxmemory at 60% of available RAM is a safe rule — 4 GB instance fits ~2.5 GB of useful data with room to spare.
Does Sidekiq really need AOF? Sidekiq docs say it can run without. The docs say it technically runs. In production, without AOF, any unexpected restart loses queued jobs that were in the buffer. For "welcome email" queue, you discover when customer complains. For "recurring billing" queue, you discover when the accountant complains. AOF is cheap (5-10% I/O increment), the cost of not having it is large.
Cluster vs Sentinel for app processing 50k jobs/day? Sentinel. 50k jobs/day is 0.6 ops/s average — fits in 100 MB of Redis RAM. Cluster is overkill by an order of magnitude. Sentinel solves automatic failover with 1 master + 1 replica + 3 sentinels (3 sentinel processes on separate VPSes, can coexist with other things).
Does ElastiCache São Paulo have good latency for app running in São Paulo? Yes, 1-3ms p99 within the same AZ. The problem isn't latency — it's cost and lock-in. Latency only becomes a topic if the app is on another provider (Hetzner FSN, DigitalOcean NYC) trying to talk to ElastiCache São Paulo — there it rises to 130-200ms and the argument disappears.
How to back up self-hosted Valkey to survive disaster? Three layers. First: persistent AOF on local disk (survives restart). Second: daily RDB snapshot copied to S3-compatible storage (Wasabi, Backblaze B2, Cloudflare R2 — all cheaper than AWS S3 for this case). Third: weekly snapshot copied to another storage provider (second region, second vendor). Restic or rclone do the work. Total storage cost for 4 GB Valkey backup: about US$1/month.
Closing
In 2026, "Redis in production" became a question with more nuance than it had in 2023. The original product's license changed, the Linux Foundation fork matured, multi-thread alternatives are standing, the serverless offering has a real use case. Choosing among the four implementations and the three managed paths is honest exercise — there's no single answer.
Our default recommendation for Brazilian startup in 2026: self-hosted Valkey on its own cluster, Sentinel mode, AOF on if there's queue, monitoring with Prometheus. Cost in the R$80–R$230/month range, against R$600–R$2,000/month for equivalent managed alternatives. Full compatibility with any Redis library. No exposure to RSAL license. Reversible migration if it becomes a problem.
To stand up this stack:
curl -sSL https://get.heroctl.com/install.sh | sh
And to read in parallel: Postgres in production: managed vs self-hosted (same analysis for the database) and How much does it cost to host a Brazilian SaaS in 2026 (the consolidated math of the whole stack).