Design WhatsApp (Real-Time Chat)
How to design WhatsApp in a system design interview: WebSocket gateways, message delivery and ordering, offline storage, group fan-out, and end-to-end encryption — with interactive simulations of the hard parts.
Why interviewers ask this question
Designing WhatsApp (or Messenger, Signal, Slack — the same question in different clothes) tests things a CRUD app never touches: persistent connections, message delivery guarantees, ordering, and fan-out. It is a staple at Meta, and common everywhere for mid-to-senior roles.
The question has a distinctive shape: the hard part is not storage or compute, it is that both ends of the system are unreliable phones that connect, disconnect, and change networks constantly — and users still expect every message to arrive exactly once, in order.
The 30-second answer
Step 1 — Requirements
Functional requirements
- 1:1 real-time text messaging with delivery states (sent ✓, delivered ✓✓, read).
- Group chats (cap them — WhatsApp allows 1,024 members; say the number out loud).
- Offline delivery: messages sent while a user is offline arrive when they reconnect.
- Online/last-seen presence.
- (De-scope after mentioning) media messages, voice/video calls, stories.
Non-functional requirements
- Scale: ~2B users, ~100B messages/day ≈ 1.2M messages/sec average, several × at peak.
- Latency: sub-second delivery when both parties are online.
- Delivery guarantee: at-least-once with idempotent handling — losing messages is the one unforgivable sin in chat.
- Privacy: end-to-end encryption; the server must not be able to read messages.
Step 2 — The connection layer: why WebSockets
HTTP request/response cannot deliver a message to a phone — the server cannot initiate. Three options exist:
| Technique | How | Verdict for chat |
|---|---|---|
| Short polling | Client asks "anything new?" every few seconds | Wasteful at 2B users; seconds of latency — no |
| Long polling | Server holds the request open until a message arrives | Better, but reconnect churn and one-directional — fallback only |
| WebSocket | One persistent, bidirectional TCP connection | The answer: server pushes instantly, client sends on the same socket |
Each user keeps one WebSocket to a chat gateway server. A tuned server holds on the order of a few hundred thousand to 1M idle connections, so ~500M concurrent users needs a fleet of roughly a thousand gateways.
That creates the routing problem the whole design hinges on: user B is connected to some gateway — which one? A session service (backed by Redis) maps userId → gatewayId, written on connect and cleared on disconnect. Message flow: sender → their gateway → chat service → look up B's gateway in the session service → push down B's socket.
Loading visualization...
Step 3 — Delivery guarantees, ordering, and offline users
This section is where senior candidates separate from junior ones. Walk the lifecycle of one message:
- A sends. Client assigns a client-generated message ID (for idempotency) and sends over its socket.
- Server persists first, acks second. The chat service writes the message to storage before acking — an ack means "durable", not "I saw it". A's app shows ✓.
- Deliver to B. Session lookup finds B's gateway; the message is pushed. B's device acks receipt → A sees ✓✓. Read receipts are the same mechanism, triggered on view.
- B is offline? The session lookup fails, so the message stays in B's inbox (a per-user queue of undelivered messages), and a push notification fires via APNs/FCM. On reconnect, B drains the inbox, acking each message; acked messages are deleted.
At-least-once + idempotency = effectively exactly-once. Retries can cause duplicates, so B deduplicates by the client-generated message ID.
Ordering: timestamps lie (clock skew, retries). Use a per-conversation monotonic sequence number assigned by whichever chat-service shard owns that conversation. Devices sort by it and can detect gaps ("message 41, 42, 44 — request 43").
Loading visualization...
Follow-up they will ask
Step 4 — Storage: 100B messages/day
100B messages/day at ~200 bytes is ~20 TB/day of writes — this is a write-heavy, append-only workload with reads that are almost always "recent messages in one conversation". That access pattern points directly at a wide-column store (Cassandra/HBase — Discord famously runs on Cassandra, then ScyllaDB):
- Partition key:
conversation_id— all messages of a chat live together. - Clustering key:
sequence_number(descending) — "load the last 50 messages" is one contiguous read. - Writes go to any replica (leaderless, tunable consistency), which is what lets it absorb 1M+ writes/sec.
WhatsApp's actual model is even cheaper: the server is a relay, not an archive — messages are deleted once delivered, and history lives on devices. Offering both models and their trade-off (server history = multi-device sync + storage cost; device history = privacy + cheap servers) is a strong senior move.
CREATE TABLE messages (
conversation_id UUID,
seq BIGINT, -- per-conversation sequence number
sender_id BIGINT,
ciphertext BLOB, -- E2E encrypted; server never sees plaintext
sent_at TIMESTAMP,
PRIMARY KEY ((conversation_id), seq)
) WITH CLUSTERING ORDER BY (seq DESC);Step 5 — Group chats: the fan-out problem
A message to a 1,024-member group is not one message — it is one write, 1,024 deliveries. The chat service reads the member list, then for each member does the same session-lookup-and-push (or inbox-park) as the 1:1 case.
- Fan-out happens asynchronously via a queue — the sender's ack must not wait for 1,024 pushes.
- This is precisely why WhatsApp caps group size: fan-out cost grows linearly with members. Saying "the cap is a deliberate architectural choice, not a product whim" is a great line.
- Presence has the same explosion: naively broadcasting online/offline to every contact is O(contacts) per status flap — batch it, debounce it, and only push presence for open conversations.
Tradeoff: Fan-out on write (push to every member immediately)
- Instant delivery to online members
- Simple read path — each user just reads their own inbox
- Large groups amplify every message ×N
- Wasted work for members who never open the chat
- Hot groups (breaking-news channels) need special-casing toward fan-out-on-read
Step 6 — End-to-end encryption
You will not be asked to derive the Signal protocol, but you must get the model right:
- E2E means the server relays ciphertext it cannot read. TLS alone is not E2E — TLS protects the pipe, but the server sees plaintext.
- Each user has a public/private key pair; private keys never leave the device. Senders encrypt with the recipient's public key (in practice, a session key established via Diffie-Hellman key exchange — the Signal protocol adds forward secrecy by ratcheting keys every message).
- Consequences for your design (this is what interviewers actually probe): the server cannot search message content, cannot moderate content server-side, and multi-device support becomes genuinely hard — each device has its own keys, so messages are encrypted per-device.
Loading visualization...
Common mistakes that cost offers
- Designing chat over HTTP polling — the workload is push; reach for WebSockets in the first five minutes.
- No session service. "The gateway pushes to B" without explaining how you find B's gateway means the design does not actually work.
- Acking before persisting — one crash loses messages, and lost messages are game-over for a chat product.
- Ordering by timestamp — clock skew makes cross-device timestamps unreliable; per-conversation sequence numbers are the answer.
- Ignoring offline delivery — half of all deliveries in the real system go through the inbox + push-notification path, not the live socket.
- Claiming TLS gives end-to-end encryption — instant credibility loss on this question.
Frequently asked questions
Why does WhatsApp use WebSockets instead of HTTP polling?
Chat requires the server to push messages to clients instantly. Polling wastes enormous resources at 2 billion users and adds seconds of latency, while a persistent WebSocket lets both sides send at any moment over one connection. Long polling survives only as a fallback for restrictive networks.
How does WhatsApp deliver messages to offline users?
Undelivered messages are stored in a per-user inbox queue on the server and a push notification is sent through APNs or FCM. When the device reconnects, it drains the inbox, acknowledges each message, and the server deletes acknowledged messages.
How do you guarantee message ordering in a chat system?
Assign a monotonically increasing sequence number per conversation from the service shard that owns the conversation, rather than trusting timestamps, which suffer from clock skew. Clients sort by sequence number and can detect and re-request missing messages.
What database should I propose for a chat system in an interview?
A wide-column store like Cassandra or ScyllaDB fits best: the workload is write-heavy and append-only, and partitioning by conversation ID with messages clustered by sequence number makes "recent messages in this chat" a single fast read. Also mention WhatsApp's alternative: delete on delivery and keep history on devices.
Is TLS the same as end-to-end encryption?
No. TLS encrypts the connection between client and server, but the server still sees plaintext. End-to-end encryption means only the communicating devices hold the decryption keys, so the server relays ciphertext it can never read.
Reading only gets you halfway
Practice designing WhatsApp (Real-Time Chat) step by step with an AI interviewer that evaluates your answers — free, no credit card.
Practice this problem free