Trendlist: [New post] System Design

Tuesday, November 16, 2021

[New post] System Design – Chat app

System Design – Chat app

by oneOokay

Design Facebook messenger / What's app

push and pull pros and cons in chat app
long pulling vs websockets ref: https://ably.com/blog/websockets-vs-long-polling

Step 1. Requirements

one-on-one conversation between users including acknowledgement
keep track of user's online/offline status
support persistent storage of chat history (NOTE: not all chat apps stores chat histories, some app does not store them at all, need to clarify this)
group chat
push notification

minimum latency
highly consistent, user should see same histories across different platforms/devices
availability: ideally should be high but can sacrifice it for consistency (NOTE!!! this is different from other system that can sacrifice consistency for availability)

Step 2. Data Estimation

Assumption: 500 M DAU / each user sends 40 msg per day / 100 bytes per message

Calculation:

20 Billion messages a day (230 K messages/s)-> 2 TB / day -> 3.6 PetaBytes 5 years
230 K msg/s -> 23 MB /s (both upload and download) -> each message needs to come in and go out

Step 3. High level design

a chat server to 1) receive messages 2) send messages 3) acknowledge message delivery / user status 4) store message

Workflow:

user1 send msg1 to user2 via Chat Server
server receives msg1 and sends acknowledgement to user1
server stores msg1 in the database
server sends msg1 to user2
user2 receives msg1 and sends acknowledgement to server
server notifies user1 that msg1 has been delivered to user2

Step 4. Detail component design

message receiving and sending
message storing and retrieving
user online/offline status managing

Message Receiving and Sending

Push vs Poll when receiving msgs

After user1 sends msg1 to server, how would user2 gets msg1 from server ? Pull vs push in chat apps

Pull

user2 can pull from chat server for new messages periodically.
- - server needs to keep track of which msgs pending to be delivered
- - user2 has to pull frequently to get the msgs in a timely manner. user2 can't get the message unless user2 pulls, msgs are not received in a timely manner
- - frequently pulling most of the time will get an empty response for no msgs, wasting resources.

push

once server gets a message sent from user, server pushes the msg to its receiver.
- all the active/online users need to keep a connection open with the server
- + server does not need to keep track of pending messages (if user is online, failed to delivered is another topic)
- + minimum latency, msg delivered instantly on the open connection

We should choose PUSH

Push technics : long pulling and websocket

long polling

a technique where the server elects to hold a client's connection open for as long as possible, delivering a response only after data becomes available or a timeout threshold has been reached.
+ implemented based on XMLHttpRequest, near-universally supported by devices
- more implementations on the server

websocket - would be a better choice

a thin transport layer built on top of a device's TCP/IP stack
an as-close-to-raw-as-possible TCP communication layer to web application developers
+ faster than long polling, less data loads in transits
- don't automatically recover when connections are terminated. need to implement yourself (a lot of client library can handle this)
- old browsers does not support it

How does long polling work?

client issue a request, started a long polling connection
if no updates, server will not reply; server keeps long polling connection open until the connection times out.
as soon as server has updates, it sends to client immediately
upon clients receives the msg, client and issue another request for future updates to keep the connection open.

How does the server tracks long polling connections for user?

A hash table. key: userID, value: connection object

Sending a message for an offline user

implement on the sender's UI to resend it when receiver comes back online
stores the message for a while and retry sending it once receiver comes back online. when user comes back online, check if there are any pending messages pull? or maybe a sqs queue to store them ? or store them in a database)

chat servers needed: assuming 500 M connections are maintained at the same time (modern server can handle 50K concurrent connections) -> need 10 K servers -> needs load balancers

how do we know which server holds which user's connection ?

load balancer with a map each userId to a server to redirect the request. LB stickiness.

sent acknowledgement

when server receives msg from sender 1) it sends msg to receiver then send an acknowledge to sender 2) stores msg in database (acknowledgement does not need to wait for msg to be stored in db)

message sequence

user1 send M1 to user2 at 1:00 pm / user2 send M2 to user1 at 1:00 pm
M1 arrives at server at 1: 01 pm / M2 arrives at server at 1: 02 pm
user1 sees: M1, M2 / user2 sees: M2, M1

Message Database

date time	sender	receiver	message content
1:01 PM	user1	user2	M1
1:02 PM	user2	user1	M2

在同一个device上 user1和user2的message排列应该是这样的 (假设sender在右边)

User1 UI - complies with server time

	M1 says User1 (1:01 PM server time)
User2 says : M2 (1:02 PM server time)

User 2 UI - not complies with server time

	M2 says User2 (1:02 PM server time)
User1 says : M1 (1:01 PM server time)

对于User2, 如果他在另一个device上登录, 重新fetch message histories from database的话, 他的UI就会变得不同

User 2 UI - new device

User1 says : M1 (1:01 PM server time)
	M2 says User2 (1:02 PM server time)

所以对于每一个user, 都要maintain一个单独的sequence??? TODO 我不知道怎么弄. ref: https://stackoverflow.com/questions/65953525/how-does-the-messenger-maintain-the-sequencing-of-the-messages-during-chat-and-w

data store and retrieval

use HBase
use pagination
- use Limit/Offset with SQL databases which already have LIMIT and OFFSET as part of the SQL SELECT Syntax. would look like GET /items?limit=20&offset=100. This query would return the 20 rows starting with the 100th row
- + Easiest to implement, almost no coding required other than passing parameters directly to SQL query.
- + Stateless on the server.
- + Works regardless of custom sort_by parameters.
Not performant for large offset values. Let's say you perform a query with a large offset value of 1000000. The database needs to scan and count rows starting with 0, and will skip (i.e. throw away data) for the first 1000000 rows.

Not consistent when new items are inserted to the table (i.e. Page drift) This is especially noticeable when we are ordering items by newest first. Consider the following that orders by decreasing Id:

Query GET /items?offset=0&limit=15

10 new items added to the table

Query GET /items?offset=15&limit=15 The second query will only return 5 new items, as adding 10 new items moved the offset back by 10 items. To fix this, the client would really need to offset by 25 for the second query GET /items?offset=25&limit=15, but the client couldn't possibly know other objects being inserted into the table.

This free site is ad-supported. Learn more

oneOokay | November 16, 2021 at 3:39 am | Categories: Uncategorized | URL: https://wp.me/p7wN9I-2LF

Trendlist

Tuesday, November 16, 2021

[New post] System Design – Chat app

System Design – Chat app

Step 1. Requirements

Step 2. Data Estimation

Step 3. High level design

Step 4. Detail component design

Message Receiving and Sending

Push vs Poll when receiving msgs

Push technics : long pulling and websocket

data store and retrieval

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own

Report Abuse

Labels

Tuesday, November 16, 2021

[New post] System Design – Chat app

New post on oneokay

System Design – Chat app

Step 1. Requirements

Step 2. Data Estimation

Step 3. High level design

Step 4. Detail component design

Message Receiving and Sending

Push vs Poll when receiving msgs

Push technics : long pulling and websocket

data store and retrieval

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own