Using cgroups to limit block device bandwidth

The block I/O controller specifies upper IO rate limits on devices.

$ mount | egrep "/cgroup |/blkio"
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)

$ lscgroup  | grep blkio
blkio:/

In this post, we will learn how to use the following control files to limit block device bandwidth for the user tasks.

  • blkio.throttle.read_bps_device - Specifies upper limit on READ rate from the device. IO rate is specified in bytes per second. Rules are per device.
  • blkio.throttle.read_iops_device - Specifies upper limit on READ rate from the device. IO rate is specified in IO per second. Rules are per device.
  • blkio.throttle.write_bps_device - Specifies upper limit on WRITE rate to the device. IO rate is specified in bytes per second. Rules are per device.
  • blkio.throttle.write_iops_device - Specifies upper limit on WRITE rate to the device. IO rate is specified in io per second. Rules are per device.

Create blkio control group

Install libcgroup package to manage cgroups:

$ yum install libcgroup libcgroup-tools

Create blkio control group:

$ cgcreate -g blkio:/blkiolimited

$ lscgroup | grep blkio
blkio:/
blkio:/blkiolimited

$ ls /sys/fs/cgroup/blkio/blkiolimited/
blkio.bfq.io_service_bytes            blkio.bfq.weight                 blkio.throttle.io_service_bytes_recursive  blkio.throttle.read_iops_device   cgroup.procs
blkio.bfq.io_service_bytes_recursive  blkio.bfq.weight_device          blkio.throttle.io_serviced                 blkio.throttle.write_bps_device   notify_on_release
blkio.bfq.io_serviced                 blkio.reset_stats                blkio.throttle.io_serviced_recursive       blkio.throttle.write_iops_device  tasks
blkio.bfq.io_serviced_recursive       blkio.throttle.io_service_bytes  blkio.throttle.read_bps_device             cgroup.clone_children

Limit the block device bandwidth

Limit block device bandwidth for root group

To specify a bandwidth rate on particular device for root group, we can use the policy format as “major:minor bytes_per_second”.

The following example puts a bandwidth limit of 1MB/s on writes for root group on device having major/minor number 259:12.

$ ls -la /dev/ | grep "nvme3n1"
brw-rw----   1 root disk    259,  12 Dec 23 21:44 nvme3n1

$ echo "259:12  1048576" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device

Limit block device bandwidth for user defined group

The following examples specifies the IOPS limit on the device 259:12 for the user defined cgroups “blkio:/blkiolimited”.

Use control files to specify the limit directly:

$ cat /sys/fs/cgroup/blkio/blkiolimited/blkio.throttle.write_iops_device
$ echo "259:12  8192" > /sys/fs/cgroup/blkio/blkiolimited/blkio.throttle.write_iops_device
$ cat /sys/fs/cgroup/blkio/blkiolimited/blkio.throttle.write_iops_device
259:12 8192

Use libcgroup tools to specify the limit:

$ cgset -r blkio.throttle.write_iops_device="259:12 8192" blkiolimited
$ cgget -r blkio.throttle.write_iops_device blkiolimited
blkiolimited:
blkio.throttle.write_iops_device: 259:12 8192

Verify the disk bandwidth usage

Use fio to write 50G data on root group(unlimited bandwidth)

$ fio --blocksize=4k --ioengine=libaio --readwrite=randwrite --filesize=50G --group_reporting --direct=1 --iodepth=128 --end_fsync=1 --name=job1 --filename=/mnt/fio.dat


$ iostat -ktdx 5 | grep "nvme3n1"
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme3n1           0.00     0.05    0.00    8.02     0.01   508.54   126.83     0.06    7.80    0.08    7.80   0.04   0.03
nvme3n1           0.00     0.00    0.00 133828.60     0.00 535314.40     8.00     1.18    0.01    0.00    0.01   0.00  55.64
nvme3n1           0.00     0.20    0.00 246289.20     0.00 985157.60     8.00     2.21    0.01    0.00    0.01   0.00 100.04
nvme3n1           0.00     0.20    0.00 246518.80     0.00 986076.80     8.00     2.23    0.01    0.00    0.01   0.00 100.04
nvme3n1           0.00     0.20    0.00 244115.60     0.00 976462.40     8.00     2.24    0.01    0.00    0.01   0.00 100.00
nvme3n1           0.00     0.20    0.00 240184.00     0.00 960736.80     8.00     2.23    0.01    0.00    0.01   0.00 100.04
nvme3n1           0.00     0.20    0.00 250391.60     0.00 1001567.20     8.00     2.41    0.01    0.00    0.01   0.00 100.04
nvme3n1           0.00     0.20    0.00 262449.60     0.00 1049799.20     8.00     2.53    0.01    0.00    0.01   0.00 100.00
nvme3n1           0.00     0.20    0.00 252171.20     0.00 1008685.60     8.00     2.30    0.01    0.00    0.01   0.00 100.00
nvme3n1           0.00     0.20    0.00 236467.60     0.00 945872.00     8.00     2.16    0.01    0.00    0.01   0.00 100.00
nvme3n1           0.00     0.20    0.00 255060.80     0.00 1020244.00     8.00    17.04    0.07    0.00    0.07   0.00 100.00
nvme3n1           0.00     0.20    0.00 235199.60     0.00 940798.40     8.00    44.76    0.19    0.00    0.19   0.00 100.00
nvme3n1           0.00     0.20    0.00 18768.00     0.00 75072.80     8.00     5.81    0.31    0.00    0.31   0.00   8.40
nvme3n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
^C

Use fio to write 50G data on user defined group(limited bandwidth)

$ cgget -r blkio.throttle.write_iops_device blkiolimited
blkiolimited:
blkio.throttle.write_iops_device: 259:12 8192

$ cgexec -g blkio:blkiolimited fio --blocksize=4k --ioengine=libaio --readwrite=randwrite --filesize=50G --group_reporting --direct=1 --iodepth=128 --end_fsync=1 --name=job1 --filename=/mnt/fio.dat


$ iostat -ktdx 5 | grep "nvme3n1"
nvme3n1           0.00     0.05    0.00   24.54     0.01   574.51    46.82     0.06    2.57    0.08    2.57   0.01   0.04
nvme3n1           0.00     0.00    0.00 3112.20     0.00 12448.80     8.00     0.04    0.01    0.00    0.01   0.01   4.60
nvme3n1           0.00     0.20    0.00 8190.40     0.00 32762.40     8.00     0.11    0.01    0.00    0.01   0.01  11.92
nvme3n1           0.00     0.20    0.00 8190.40     0.00 32762.40     8.00     0.12    0.01    0.00    0.01   0.01  12.02
nvme3n1           0.00     0.20    0.00 8190.40     0.00 32762.40     8.00     0.12    0.01    0.00    0.01   0.01  12.22
nvme3n1           0.00     0.20    0.00 8190.40     0.00 32762.40     8.00     0.12    0.01    0.00    0.01   0.01  12.10
nvme3n1           0.00     0.20    0.00 8190.40     0.00 32762.40     8.00     0.12    0.01    0.00    0.01   0.02  12.36
^C

Reference