General steps for system performance analysis

This post tries to provide general steps for system performance analysis. We are going to cover the following objectives.

  • The performance goals
  • Workload characterization
  • Drill-down analysis
  • Traditional performance tools
  • BCC/BPF performance tools

The Goals of performance analysis

In general, the goals of performance analysis are to improve end-user performance and reduce the operating cost. To achieve this, it’s important to make the performance measurable. We often use the following metrics to measure performance.

  • Rates - operation or request rates per second
  • Throughput - data transferred per second
  • Latency - the time to accomplish a operation or request in milliseconds
  • Resource utilization - the resource usage in percentage
  • Cost - Price/performance ratio

The rates, throughput and latency are usually the most important metrics to check if the certain performance goal is met. For example, the throughput measured in MiB/s for a daily database backup is too slow to complete in a given backup window. We need to investigate the issue from the backup application to system layer in order to find a solution to improve the performance. For a second example, the latency at cloud native storage volume layer is as high as 20ms while the underneath SSD disk latency shows less than 1ms. This requires further analysis at volume layer in order to find out the cause for the 19ms latency.

Performance optimization is endless effort. It depends on the goal you are targeting. So, setting the goals is the first step before you involve further performance analysis activities.

Workload characterization

Performance analysis is a process to analyze systematically. Understanding the system/application configuration and applied workload are often needed before you do further performance analysis. This is the workload characterization.

The workload characterization tries to answer the following questions:

  • What’s the running application? What’s the major components/features used in the application?
  • What is the schedule to run the workload? What is the job concurrency?
  • What are the read/write patterns? Mixed read and write, or read/write only workload?
  • What are the rates, throughput and latency at application level?
  • What is the performance concern?

Sometimes, you may get a description of the workload from end-users. However, the workload and its configuraiton are usualaly not described clearly enough by the users. It’s worth to characterize the workload with custom profiler. An application level workload profiler can be developed for this purpose. But this often requires application expertise. At system level, you may leverage the BCC/BPF performance tools to profile the workload.

Drill-down analysis

The drill-down analysis is to find a clue and drill deeper until you find the root cause for the performance issue.

The general process for drill-down analysis would be like these steps.

  1. Examine the high level performance metrics and identify the degraded performance point
  2. For the target workload point with degradation, lazer focus on the four major system resources(CPU, memory, disk I/O and network) to see what is the potential bottleneck
  3. If it’s hardware bottleneck, it might be resolved by scaling up and scaling out the system resources. Otherwise, it could be a software bottleneck either from kernel space or user space.
  4. Find a clue based on the collected metrics to drill down to the next level. Software bottleneck analysis often requires profiling and tracing effort to pinpoint the culprit.

To identify a hardware bottleneck, you would check if any of the four major resources are saturated. For example, the system must be CPU bound if the CPU utilization is above 90%. The system must be disk I/O bound if the disk is 100% busy and wait queue is unexpected large.

It’s likely that you could not find the root cause with one round of analysis if you go with the wrong direction. You have to repeat the above steps to identify the right direction for RCA. Keep in mind, finding a needle in haystack is not an easy work. You must be patient.

Traditional performance tools

During the drill-down performance analysis, you can use the following Linux built-in tools. They are simple but very powerful to help determine the next direction on the way of performance analysis.

  • uptime - system loads in past 1 minute, 5 minutes and 15 minutes
  • dmesg - system error messages
  • vmstat - overview of system resource usage(CPU, Memory and disk/network I/O)
  • mpstat - per-CPU usage in different states
  • pidstat - CPU usage per process
  • iostat - disk I/O statistics(throughput, IOPS, latency, etc)
  • netstat/sar - network throughput, TCP/IP connection stats
  • top - CPU/Memory usage per process and more

Please refer to this post for more detail on how to use Linux traditional tools to analyze performance.

BCC/BPF performance tools

While the traditional tools always gives us a first look at the system performance especially on the resource usage, we can use(or create) BCC/BPF tools for further performance analysis. Please refer to this post for more detail.