oneOokay posted: " Step 1: Requirements clarification Functional requirements: user can post/delete tweets: text, pic, videouser can follow/unfollow useruser can view tweets - newsfeeds/timelineuser can view all tweets for another user user can view all the tweets fr"
user can view all the tweets from all the ppl he follows in a timeline
user can mark tweets as favorites/ like/dislike
additional requirements
user can search for tweets based on keywords
user can reply to a tweet
user can edit a tweet
user can tag other users in a tweet
user can view trending tweets : top twitter tags, top searching trends, top tweets
tweet notification
whom to follow? recommended ppl to follow
None functional requirements:
Latency:
view newsfeeds - latency ~ 200ms
view pics/videos latency
Availability: should be highly available
Consistency: If user is not able to see a tweet for a while, that's fine (in the interest of availability)
Step 2: System specification estimations
Assumptions
Total User: 1 Billion / DAU: 200 Million (20 % total user ratio) -> actually tweeter has a way higher DAU/total user ratio!
user follows 200 users
each user favorites 5 tweets per day
visits timeline/news feeds 2 times a day / visits 5 other ppl's page
newsfeeds displays 20 tweets per page
new 100 million new tweets per day / 6000 tweets / second
picture: 200 KB / 1 out 5 has a picture
video : 2 MB / 1 out of 10 has a vidoe
calculation
total favorites a day: 200 Million (DAU) * 5 = 1 Billion favorites
total tweets needs to be generated in time lines / newsfeeds per day: 200 M (DAU) * (2 times + 5 other ppl's) * 20 tweets per page = 28 Billion tweets per day
comparison with 1150 new tweets every day => Read Heavy
total storage for tweets a day: 100 M (tweets per day) * (280 + 30) = 30GB / day
each tweets has 140 chars, 2 bytes to store a char without compression => total 280 bytes
30 bytes to store other meta data: ID, timestamp, user ID, etc
total storage for medias a day: 100 M (tweets per day) / 5 pictures * 200 KB + 100 M (tweets per day) / 10 videos * 2 MB = 24TB/day
Bandwidth
ingress: 24 TB / day = 290 MB / second (30 GB/day在24TB里面可忽略不计)
egress(out):
text : 28 B * 280 bytes / 86400 s = 93 MB /s
picture : 28 B / 5 * 200 KB / 86400s = 13 GB /s
video: (assume watch every 3rd video user sees in timeline) : 28 B /10/3 * 2 MB / 86400 s = 22 GB /s
api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota.
tweet_data (string): The text of the tweet, typically up to 140 characters.
tweet_location (string): Optional location (longitude, latitude) this Tweet refers to.
user_location (string): Optional location (longitude, latitude) of the user adding the tweet.
media_ids (number[]): Optional list of media_ids to be associated with the Tweet. (all the media photo, video, etc. need to be uploaded separately).
Returns: (string) A successful post will return the URL to access that tweet. Otherwise, an appropriate HTTP error is returned.
Step 4: Define Data Model/Database Schema
User:
UserId: int (PK)
UserName: varchar(20)
Email: varchar(50)
DateOfBirth: datetime
RegisterDateTime: datetime
LastLogin: datetime
ApiDevKey: varchar(50)
Tweets:
TweetId: int(PK)
UserId: int (FK)
Content: varchar(140)
TweetLatitude: int
TweetLongtitude: int
CreationDateTime: datetime
NumOfFavs: int
MediaIds
Liked:
TweetId: int(PK)
UserId: int(PK) todo why using both tweetID and UserId as the PK?
CreationDateTime: datetime
UserFollow:
UserId1: int (PK)
UserId2: int (PK) todo why using two userIds as PK?
CreatingDateTime: datetime
What database to choose? todo
Step 5: High-level Design
Per step 2: write : 290 MB/sec ; read: 35 GB / s. it is a read heavy system
At a high level, we need
multiple application servers to serve all these requests
load balancers in front of application servers for traffic distributions
an efficient database that can store all the new tweets and can support a huge number of reads on the backend
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.