Step 1: Requirements clarification

Functional requirements:

  1. user can post/delete tweets: text, pic, video
  2. user can follow/unfollow user
  3. user can view tweets - newsfeeds/timeline
    1. user can view all tweets for another user
    2. user can view all the tweets from all the ppl he follows in a timeline
  4. user can mark tweets as favorites/ like/dislike

additional requirements

  1. user can search for tweets based on keywords
  2. user can reply to a tweet
  3. user can edit a tweet
  4. user can tag other users in a tweet
  5. user can view trending tweets : top twitter tags, top searching trends, top tweets
  6. tweet notification
  7. whom to follow? recommended ppl to follow

None functional requirements:

  • Latency:
    • view newsfeeds - latency ~ 200ms
    • view pics/videos latency
  • Availability: should be highly available
  • Consistency: If user is not able to see a tweet for a while, that's fine (in the interest of availability)

Step 2: System specification estimations

Assumptions

  • Total User: 1 Billion / DAU: 200 Million (20 % total user ratio) -> actually tweeter has a way higher DAU/total user ratio!
  • user follows 200 users
  • each user favorites 5 tweets per day
  • visits timeline/news feeds 2 times a day / visits 5 other ppl's page
  • newsfeeds displays 20 tweets per page
  • new 100 million new tweets per day / 6000 tweets / second
  • picture: 200 KB / 1 out 5 has a picture
  • video : 2 MB / 1 out of 10 has a vidoe

calculation

  • total favorites a day: 200 Million (DAU) * 5 = 1 Billion favorites
  • total tweets needs to be generated in time lines / newsfeeds per day: 200 M (DAU) * (2 times + 5 other ppl's) * 20 tweets per page = 28 Billion tweets per day
    • comparison with 1150 new tweets every day => Read Heavy
  • total storage for tweets a day: 100 M (tweets per day) * (280 + 30) = 30GB / day
    • each tweets has 140 chars, 2 bytes to store a char without compression => total 280 bytes
    • 30 bytes to store other meta data: ID, timestamp, user ID, etc
  • total storage for medias a day: 100 M (tweets per day) / 5 pictures * 200 KB + 100 M (tweets per day) / 10 videos * 2 MB = 24TB/day
  • Bandwidth
    • ingress: 24 TB / day = 290 MB / second (30 GB/day在24TB里面可忽略不计)
    • egress(out):
      • text : 28 B * 280 bytes / 86400 s = 93 MB /s
      • picture : 28 B / 5 * 200 KB / 86400s = 13 GB /s
      • video: (assume watch every 3rd video user sees in timeline) : 28 B /10/3 * 2 MB / 86400 s = 22 GB /s
      • total: 35 GB/s

Step 3: System Interface Design (API)

postTweets(api_dev_key, tweet_data, tweet_location, user_location, media_ids)

Parameters:

  • api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota.
  • tweet_data (string): The text of the tweet, typically up to 140 characters.
  • tweet_location (string): Optional location (longitude, latitude) this Tweet refers to.
  • user_location (string): Optional location (longitude, latitude) of the user adding the tweet.
  • media_ids (number[]): Optional list of media_ids to be associated with the Tweet. (all the media photo, video, etc. need to be uploaded separately).

Returns: (string)
A successful post will return the URL to access that tweet. Otherwise, an appropriate HTTP error is returned.

Step 4: Define Data Model/Database Schema

User:

  • UserId: int (PK)
  • UserName: varchar(20)
  • Email: varchar(50)
  • DateOfBirth: datetime
  • RegisterDateTime: datetime
  • LastLogin: datetime
  • ApiDevKey: varchar(50)

Tweets:

  • TweetId: int(PK)
  • UserId: int (FK)
  • Content: varchar(140)
  • TweetLatitude: int
  • TweetLongtitude: int
  • CreationDateTime: datetime
  • NumOfFavs: int
  • MediaIds

Liked:

  • TweetId: int(PK)
  • UserId: int(PK) todo why using both tweetID and UserId as the PK?
  • CreationDateTime: datetime

UserFollow:

  • UserId1: int (PK)
  • UserId2: int (PK) todo why using two userIds as PK?
  • CreatingDateTime: datetime

What database to choose? todo

Step 5: High-level Design

Per step 2: write : 290 MB/sec ; read: 35 GB / s. it is a read heavy system

At a high level, we need

  • multiple application servers to serve all these requests
  • load balancers in front of application servers for traffic distributions
  • an efficient database that can store all the new tweets and can support a huge number of reads on the backend
  • file storage to store photos and videos

Using the same block diagram


This free site is ad-supported. Learn more