Estimate the scale of the system

It is always a good idea to estimate the scale of the system as it help reflect if the designed system could fulfill the functional requirements. The requirements might include:

  • Number of users
  • Number of active users(NAU)
  • Requrests per second(RPS)
  • Logins per seconds
  • Transactions per second(TPS) for E-commerce
  • Likes/dislikes per second, shares per second, comments per second for social media sites
  • Searches per second for sites with a search feature
  • Storage needed
  • Servers needed
  • Network bandwidth needed

To estimate hardware resource needed, we need to understand that there are four major resources in a computer system.

  • CPU
  • Memory
  • Storage
  • Network

Estimate servers needed

The modern computer system is a multi-processor system. It varies from single CPU core to multiple CPU cores. The following is a 32 CPU threads system.

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

In order to estimate how many servers are needed, we can approach in the following manner.

  1. How much work can single CPU do?
  2. How much work can single server do?
  3. How many servers are needed?

Let’s have an example to go through this approach.

Let’s say it takes 100ms for a sinlge-core CPU system to handle single client request. It means the system can handle 10 requests per second. So, we can extrapolate a 32-core system can handle 320 requests per second. Let’s say we have to handle 320,000 requests per second(RPS). It means 1000 servers are needed.

Notice that this is a rough calculation only considering CPU needs. In real case, there might be other performance bottleneck to handle 320 requests per second in a system. For example, the system might be already I/O bound before running out of CPU bandwidth. But this method still gives us a estimation at the high level.

Estimate storage needed

To estimate storage needed, we can approach as below.

  1. Identify the different data types
  2. Estimate the space needed for each data type
  3. Get the total space needed

Let’s take YouTube as an example to understand this approach.

  • Data types: videos, thumbnail images and comments.
  • Let’s assume there are roughly 2B users and 5% users(100M users) upload videos consistently. On average, each user has a weekly upload(~50 videos per year). Roughly, 13M videos(100M*50/365) are uploaded daily. Let’s assume the video is 10 minutes long on average and it takes 50MB storage space after compression. Let’s say each video has a thumbnail image of 20KB. Each video has about 5 comments and the size of each comment is 200 bytes. In total, the space need for each video is 50MB + 20KB + 1KB, roughly 50MB. By multiplying 13M videos, it needs 619TB storage in a day.

Estimate network bandwidth needed

Determine the incoming and outgoing data for network bandwidth estimation

  • We already know there would be ~619TB data uploaded to YouTube in a day. Dividing this by the number of seconds in a day(619TB/86400 seconds), the incoming data to YouTube would be 7.3GB/s.
  • Let’s say 10% of YouTube users are daily active users. With approximately 200M daily users, let’s assume a user watches 10 videos a day. Then YouTube would have 2B views in a day. This would result in ~93PB outgoing data in a day. Dividing this by the number of seconds in a day(93PB/86400 seconds), the outgoing speed would be 1128GB/s.