Design a Distributed Cache (Redis-like)
Design a Redis-like distributed cache for a system design interview: LRU eviction, consistent hashing, replication, cache stampedes and hot keys — with interactive simulations of eviction and consistent hashing.
Why interviewers ask this question
Every large system you will ever design has a cache in it — so interviewers love making the cache itself the system under design. "Design Redis" tests distributed-systems fundamentals in their purest form: partitioning, replication, consistency, and failure handling, without the distraction of product features.
It is also a question where infrastructure teams (and companies like Redis Labs and Amazon) probe genuinely deep, because a wrong answer here — say, mod-N sharding — has a precise, demonstrable cost you are expected to know.
The 30-second answer
Step 1 — Requirements
Functional requirements
get(key),set(key, value, ttl),delete(key)— that is the whole API, and its simplicity is the point.- TTL-based expiry.
- Eviction when memory is full (LRU by default).
Non-functional requirements
- Latency: sub-millisecond — the entire reason the cache exists. Everything lives in RAM.
- Scale: say, 1 TB of cached data, 1M requests/sec — more than one machine, hence distributed.
- Availability over consistency: a cache that serves slightly stale data is useful; a cache that blocks for consensus is not. This single sentence shapes every choice below.
- Cache misses must be tolerable — the source of truth is always the database behind it.
Step 2 — A single node: hash table + eviction
Start on one machine — interviewers reward building up from a working core.
- Storage: an in-memory hash table → O(1) get/set.
- Expiry: lazy deletion (check TTL on read) plus a periodic sampling sweeper — exactly what Redis does. Purely lazy leaks memory for never-read keys; purely active burns CPU.
- Eviction: when memory is full, something must go. LRU — evict the least-recently-used key — is the default, implemented as a hash map + doubly-linked list so both lookup and "move to front" are O(1). This exact structure is also a favorite coding question; being fluent in it pays twice.
- Concurrency: Redis famously answers with a single-threaded event loop — no locks, no races, and it is still fast because everything is in RAM and operations are tiny.
Loading visualization...
| Policy | Evicts | Weakness |
|---|---|---|
| LRU | Least recently used key | One large scan flushes the whole cache |
| LFU | Least frequently used key | Old popular keys linger after they stop being hot |
| FIFO | Oldest inserted key | Ignores actual usage entirely |
| Random | A random key | Surprisingly decent; zero metadata cost |
Step 3 — Going distributed: consistent hashing
1 TB does not fit in one machine's RAM, so partition the keyspace. The naive approach — hash(key) % N — hides a disaster: when N changes (a node dies, or you add one), almost every key maps to a different node, and your hit rate drops to ~0 at the exact moment the cluster is already stressed. The database behind the cache then takes the full traffic. This scenario is why the question is asked.
Consistent hashing fixes it: place nodes on a hash ring; each key belongs to the next node clockwise. Adding or removing a node only moves ~1/N of the keys. Because raw ring placement is lumpy, each physical node appears as 100–200 virtual nodes on the ring, smoothing the distribution — and letting a beefier machine take proportionally more vnodes.
Loading visualization...
Who routes requests? Three options: a smart client library that knows the ring (memcached style — fastest, but ring updates must reach every client), a proxy tier (Twemproxy/Envoy — centralizes routing at the cost of a hop), or cluster-native redirects (Redis Cluster's MOVED responses). Any is acceptable — name the trade-off you are taking.
Step 4 — Replication and consistency
Without replication, a dead node means a cold shard and a database traffic spike (see the stampede section — the failure compounds). So each partition gets a primary and 1–2 replicas.
- Asynchronous replication: the primary acks the write, then streams it to replicas. A crash can lose the last few milliseconds of writes — for a cache this is fine, the database is still the source of truth. Synchronous replication would buy consistency at the cost of doubling write latency, inverting the cache's purpose.
- Failover: replicas heartbeat the primary; on failure a replica is promoted (Redis Sentinel pattern, or the cluster gossips and votes).
- Consistency model: eventual. A read from a lagging replica may be milliseconds stale. Say the magic words: stale-read window is bounded by replication lag, and the workload tolerates it.
Tradeoff: Asynchronous replication (the right call for a cache)
- Writes stay sub-millisecond — no cross-node round trip on the hot path
- Replicas can serve reads, multiplying read throughput
- Fast failover with only seconds of possible staleness
- A crashed primary loses its last unreplicated writes
- Reads from replicas can be briefly stale
- Split-brain needs care — fencing or quorum-based promotion
Step 5 — The failure modes that decide the interview
Cache stampede (thundering herd)
A hot key expires; 10,000 concurrent requests all miss, and all 10,000 hit the database for the same value. Defenses, in the order worth mentioning: request coalescing (only one loader per key per node; the rest wait), probabilistic early refresh (refresh shortly before expiry with jittered TTLs), and serve-stale-while-revalidate.
Hot keys
One celebrity key can saturate a single node no matter how well you shard. Fixes: replicate the hot key to multiple nodes and read from any (key#1..key#10 suffixing), or promote it to a tiny in-process cache on the application servers.
Cold start
A freshly restarted (or newly added) node has 0% hit rate. Cache warming — replaying recent keys or snapshotting — plus consistent hashing's bounded key movement keeps the blast radius small.
Big values
A 10 MB value behind a 1M RPS key is a bandwidth problem, not a memory problem. Cap value sizes and compress.
The trap hidden in this question
Step 6 — How applications use it: caching patterns
| Pattern | Write path | Read path | Use when |
|---|---|---|---|
| Cache-aside | App writes DB, invalidates cache | App checks cache, on miss reads DB and fills cache | Default choice; app controls everything |
| Read-through | App writes DB | Cache itself loads from DB on miss | You want loading logic centralized in the cache layer |
| Write-through | App writes cache; cache writes DB synchronously | Cache is always fresh | Read-heavy data that must not be stale |
| Write-behind | App writes cache; cache flushes to DB asynchronously | Fastest writes | Can tolerate losing buffered writes on crash |
In the interview, cache-aside with TTLs is almost always the right default, and invalidation is the follow-up: on a DB write, do you delete the cache entry or update it? Delete is safer — updating risks writing stale data under race conditions, while delete just costs one extra miss.
Common mistakes that cost offers
- Mod-N sharding — the single most damaging answer on this question; it collapses the hit rate on every topology change.
- Reaching for strong consistency — synchronous quorum writes in a cache betray a missed requirement; the database is the source of truth.
- No stampede story — hot-key expiry under concurrency is the most-asked follow-up; have coalescing and jittered TTLs ready.
- Ignoring memory management — a cache without an eviction policy is a memory leak with a network API.
- Persisting everything to disk — if most reads touch disk you have built a slow database, not a cache. (Optional snapshot/AOF persistence for faster warm-up is fine — as an option, not the hot path.)
Frequently asked questions
Why use consistent hashing instead of hash(key) % N?
With mod-N sharding, changing the node count remaps almost every key, so the hit rate collapses exactly when a node has just failed and the system is most fragile. Consistent hashing moves only about 1/N of the keys when a node joins or leaves, keeping the cache warm through topology changes.
Should a distributed cache use synchronous or asynchronous replication?
Asynchronous. A cache is an availability-and-latency system whose source of truth is the database behind it, so losing a few milliseconds of unreplicated writes on a crash is acceptable, while synchronous replication would add a cross-node round trip to every write and defeat the purpose of the cache.
What is a cache stampede and how do you prevent it?
A stampede happens when a popular key expires and thousands of concurrent requests all miss and hit the database simultaneously. Prevent it with request coalescing so only one request loads the key while others wait, jittered TTLs so keys do not expire together, and serving stale data while refreshing in the background.
What is the difference between Redis and Memcached in an interview answer?
Memcached is a pure multi-threaded key-value cache; Redis adds rich data structures, optional persistence, replication and cluster mode on a single-threaded core. For a "design a distributed cache" question, designing something Redis-like with consistent hashing, async replication, and LRU eviction covers what interviewers expect.
Which eviction policy should I choose?
LRU is the standard default and is implemented with a hash map plus doubly-linked list for O(1) operations. Mention LFU for frequency-skewed workloads and note LRU's weakness: a single large scan can flush the entire cache, which approximated variants like Redis's sampled LRU mitigate cheaply.
Reading only gets you halfway
Practice designing a Distributed Cache (Redis-like) step by step with an AI interviewer that evaluates your answers — free, no credit card.
Practice this problem free