Understanding await in iostat

What’s meaning of await in iostat?

The following is the description provided for await field in iostat man page.

$ man iostat
The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.

It is a measure of disk I/O latency in milliseconds. The latency is from the front of the I/O scheduler to the I/O completion time.

I/O path

The I/O path mainly includes the following footprints from block layer to underneath storage device.

  • Get the I/O requests from application(filesystem)
  • Merge the I/O requests to existing device queue
  • Dispatch the I/O requests(by the I/O scheduler) to the device driver
  • Hypervisor scheduler in virtualization if any
  • Multipathing if any
  • Hardware handling
  • HBA driver
  • Transportation(bus)
  • FC switch routing if any
  • Storage controller queuing, caching and processing
  • Actual disk latency

How the await time is calculated?

The await is the average time on a per I/O basis, measured in milliseconds. It mainly includes the time spent in I/O scheduler queue and time spent on storage servicing it if the HBA/SAN latency is relatively marginal.

There are two queues involved in the I/O processing path.

  • The queue in I/O scheduler
  • The queue in storage side(e.g. controller)

nr_requests limits the maximum number of I/Os in the sorted request queue. The front thread will be blocked if the I/O can’t be merged/inserted into the scheduler queue due to the full occupancy of the queue . Note that the nr_requests is applied to read and write separately.

After the I/O is passed to the driver, it is no longer in the scheduler queue and doesn’t cout to nr_requests limit. However, it will count to avgqu-sz. So, the avgqu-sz could reach the sum of nr_requests and LUN queue_depth.

How the svctm time is measured?

await measures the I/O latency on a per I/O basis while svctm take into account parallel I/O. For example, if 100 I/Os are submitted to the I/O scheduler in parallel and queued onto storage(say queue_depth=50), and the 100 I/Os completes in 10ms, the await time would be 10ms, but the svctm time could be 2ms.

Follow up

Since await includes the time spent in I/O scheduler and storage queue servicing. We may want to see a breakdown for the two phases by using blktrace. It would tell us the overheads on disk queue(I2D) and actual I/O service latency(D2C). For furhter study of blktrace, you can read this article.