Medium

16 min read

Updated 2026-07-04

Design WhatsApp (Real-Time Chat)

How to design WhatsApp in a system design interview: WebSocket gateways, message delivery and ordering, offline storage, group fan-out, and end-to-end encryption — with interactive simulations of the hard parts.

Commonly asked at Meta, Signal

Why interviewers ask this question

Designing WhatsApp (or Messenger, Signal, Slack — the same question in different clothes) tests things a CRUD app never touches: persistent connections, message delivery guarantees, ordering, and fan-out. It is a staple at Meta, and common everywhere for mid-to-senior roles.

The question has a distinctive shape: the hard part is not storage or compute, it is that both ends of the system are unreliable phones that connect, disconnect, and change networks constantly — and users still expect every message to arrive exactly once, in order.

The 30-second answer

Each phone holds a WebSocket to a chat gateway. A session service tracks which gateway each user is on. Messages route sender → gateway → chat service → recipient's gateway; if the recipient is offline, the message parks in a per-user inbox and a push notification fires. Acks at each hop give at-least-once delivery; per-conversation sequence numbers give ordering; E2E encryption means servers only ever see ciphertext.

Step 1 — Requirements

Functional requirements

1:1 real-time text messaging with delivery states (sent ✓, delivered ✓✓, read).
Group chats (cap them — WhatsApp allows 1,024 members; say the number out loud).
Offline delivery: messages sent while a user is offline arrive when they reconnect.
Online/last-seen presence.
(De-scope after mentioning) media messages, voice/video calls, stories.

Non-functional requirements

Scale: ~2B users, ~100B messages/day ≈ 1.2M messages/sec average, several × at peak.
Latency: sub-second delivery when both parties are online.
Delivery guarantee: at-least-once with idempotent handling — losing messages is the one unforgivable sin in chat.
Privacy: end-to-end encryption; the server must not be able to read messages.

Scope it fast

Candidates who try to design messaging + calls + stories in 45 minutes design none of them. Say: "I'll design 1:1 messaging deeply, extend it to groups, and touch encryption — calls and media are out of scope unless you want them." Interviewers respect controlled scope.

Step 2 — The connection layer: why WebSockets

HTTP request/response cannot deliver a message to a phone — the server cannot initiate. Three options exist:

Getting messages to the client

Technique	How	Verdict for chat
Short polling	Client asks "anything new?" every few seconds	Wasteful at 2B users; seconds of latency — no
Long polling	Server holds the request open until a message arrives	Better, but reconnect churn and one-directional — fallback only
WebSocket	One persistent, bidirectional TCP connection	The answer: server pushes instantly, client sends on the same socket

Each user keeps one WebSocket to a chat gateway server. A tuned server holds on the order of a few hundred thousand to 1M idle connections, so ~500M concurrent users needs a fleet of roughly a thousand gateways.

That creates the routing problem the whole design hinges on: user B is connected to some gateway — which one? A session service (backed by Redis) maps userId → gatewayId, written on connect and cleared on disconnect. Message flow: sender → their gateway → chat service → look up B's gateway in the session service → push down B's socket.

WebSocket vs polling, live

Interactive — try it

WebSocket Pulse

Persistent, full-duplex communication.

Client

Server

Stream●

Loading visualization...

Watch how messages flow over a persistent socket versus repeated polling — and what happens to latency and wasted requests as message rate changes.

Step 3 — Delivery guarantees, ordering, and offline users

This section is where senior candidates separate from junior ones. Walk the lifecycle of one message:

A sends. Client assigns a client-generated message ID (for idempotency) and sends over its socket.
Server persists first, acks second. The chat service writes the message to storage before acking — an ack means "durable", not "I saw it". A's app shows ✓.
Deliver to B. Session lookup finds B's gateway; the message is pushed. B's device acks receipt → A sees ✓✓. Read receipts are the same mechanism, triggered on view.
B is offline? The session lookup fails, so the message stays in B's inbox (a per-user queue of undelivered messages), and a push notification fires via APNs/FCM. On reconnect, B drains the inbox, acking each message; acked messages are deleted.

At-least-once + idempotency = effectively exactly-once. Retries can cause duplicates, so B deduplicates by the client-generated message ID.

Ordering: timestamps lie (clock skew, retries). Use a per-conversation monotonic sequence number assigned by whichever chat-service shard owns that conversation. Devices sort by it and can detect gaps ("message 41, 42, 44 — request 43").

Queues and acknowledgements, live

Interactive — try it

Distributed Message Queue

Topics partitioned for scale. Consumers reading sequentially.

Partition 0

C1 (Offset: 0)

Partition 1

C2 (Offset: 0)

Partition 2

Messages are appended to partitions. Consumers in a group read exclusively from partitions to ensure ordering guarantee within a partition.

Loading visualization...

See how messages park in a queue for an offline consumer, and how acks remove them — the exact mechanic behind the offline inbox.

Follow-up they will ask

"What if the gateway crashes after the server ack but before delivery to B?" Answer: nothing is lost — the message is already durable in B's inbox; delivery is retried until B acks. This is exactly why you persist before acking.

Step 4 — Storage: 100B messages/day

100B messages/day at ~200 bytes is ~20 TB/day of writes — this is a write-heavy, append-only workload with reads that are almost always "recent messages in one conversation". That access pattern points directly at a wide-column store (Cassandra/HBase — Discord famously runs on Cassandra, then ScyllaDB):

Partition key: conversation_id — all messages of a chat live together.
Clustering key: sequence_number (descending) — "load the last 50 messages" is one contiguous read.
Writes go to any replica (leaderless, tunable consistency), which is what lets it absorb 1M+ writes/sec.

WhatsApp's actual model is even cheaper: the server is a relay, not an archive — messages are deleted once delivered, and history lives on devices. Offering both models and their trade-off (server history = multi-device sync + storage cost; device history = privacy + cheap servers) is a strong senior move.

messages table (Cassandra-style)

CREATE TABLE messages (
  conversation_id  UUID,
  seq              BIGINT,        -- per-conversation sequence number
  sender_id        BIGINT,
  ciphertext       BLOB,          -- E2E encrypted; server never sees plaintext
  sent_at          TIMESTAMP,
  PRIMARY KEY ((conversation_id), seq)
) WITH CLUSTERING ORDER BY (seq DESC);

Step 5 — Group chats: the fan-out problem

A message to a 1,024-member group is not one message — it is one write, 1,024 deliveries. The chat service reads the member list, then for each member does the same session-lookup-and-push (or inbox-park) as the 1:1 case.

Fan-out happens asynchronously via a queue — the sender's ack must not wait for 1,024 pushes.
This is precisely why WhatsApp caps group size: fan-out cost grows linearly with members. Saying "the cap is a deliberate architectural choice, not a product whim" is a great line.
Presence has the same explosion: naively broadcasting online/offline to every contact is O(contacts) per status flap — batch it, debounce it, and only push presence for open conversations.

Tradeoff: Fan-out on write (push to every member immediately)

Pros

Instant delivery to online members
Simple read path — each user just reads their own inbox

Cons

Large groups amplify every message ×N
Wasted work for members who never open the chat
Hot groups (breaking-news channels) need special-casing toward fan-out-on-read

Step 6 — End-to-end encryption

You will not be asked to derive the Signal protocol, but you must get the model right:

E2E means the server relays ciphertext it cannot read. TLS alone is not E2E — TLS protects the pipe, but the server sees plaintext.
Each user has a public/private key pair; private keys never leave the device. Senders encrypt with the recipient's public key (in practice, a session key established via Diffie-Hellman key exchange — the Signal protocol adds forward secrecy by ratcheting keys every message).
Consequences for your design (this is what interviewers actually probe): the server cannot search message content, cannot moderate content server-side, and multi-device support becomes genuinely hard — each device has its own keys, so messages are encrypted per-device.

End-to-end encryption, live

Interactive — try it

E2E Encryption

Visualize how messages are secured from Alice to Bob.

Alice

Bob

Waiting for message...

Protocol:Signal Protocol (Double Ratchet)

1. Alice expects Bob's Public Key.
2. Alice encrypts message: Hello world! -> ???.
3. Server relays encrypted blob. Server CANNOT read it.
4. Bob uses Private Key to decrypt.

Loading visualization...

Step through key exchange and message encryption to see exactly what the server can and cannot read.

Common mistakes that cost offers

Designing chat over HTTP polling — the workload is push; reach for WebSockets in the first five minutes.
No session service. "The gateway pushes to B" without explaining how you find B's gateway means the design does not actually work.
Acking before persisting — one crash loses messages, and lost messages are game-over for a chat product.
Ordering by timestamp — clock skew makes cross-device timestamps unreliable; per-conversation sequence numbers are the answer.
Ignoring offline delivery — half of all deliveries in the real system go through the inbox + push-notification path, not the live socket.
Claiming TLS gives end-to-end encryption — instant credibility loss on this question.

Senior-level signals

Graceful gateway drain on deploy (reconnect storms are real), idempotency keys end to end, backpressure when a device reconnects to a 10,000-message backlog, and the server-relay vs server-archive storage trade-off. Two of these unprompted is a strong-hire signal.

Frequently asked questions

Why does WhatsApp use WebSockets instead of HTTP polling?

Chat requires the server to push messages to clients instantly. Polling wastes enormous resources at 2 billion users and adds seconds of latency, while a persistent WebSocket lets both sides send at any moment over one connection. Long polling survives only as a fallback for restrictive networks.

How does WhatsApp deliver messages to offline users?

Undelivered messages are stored in a per-user inbox queue on the server and a push notification is sent through APNs or FCM. When the device reconnects, it drains the inbox, acknowledges each message, and the server deletes acknowledged messages.

How do you guarantee message ordering in a chat system?

Assign a monotonically increasing sequence number per conversation from the service shard that owns the conversation, rather than trusting timestamps, which suffer from clock skew. Clients sort by sequence number and can detect and re-request missing messages.

What database should I propose for a chat system in an interview?

A wide-column store like Cassandra or ScyllaDB fits best: the workload is write-heavy and append-only, and partitioning by conversation ID with messages clustered by sequence number makes "recent messages in this chat" a single fast read. Also mention WhatsApp's alternative: delete on delivery and keep history on devices.

Is TLS the same as end-to-end encryption?

No. TLS encrypts the connection between client and server, but the server still sees plaintext. End-to-end encryption means only the communicating devices hold the decryption keys, so the server relays ciphertext it can never read.

Reading only gets you halfway

Practice designing WhatsApp (Real-Time Chat) step by step with an AI interviewer that evaluates your answers — free, no credit card.

Practice this problem free