Using systemtap to analyze latency of the kernel module function

In this post, we continue to explore how to use SystemTap to analyze the latency of the kernel module function. In the following example, we want to analyze the latency of the function “nfsd_vfs_write” from the kernel module “nfsd”.

Deploy the SystemTap packages

Refer to this post to deploy SystemTap and its required packages.

Check the nfsd kernel module info

[root@host1 ~]# uname -r
3.10.0-1160.el7.x86_64

[root@host1 ~]# ls /lib/modules/`uname -r`/kernel/
arch  crypto  drivers  fs  kernel  lib  mm  net  sound  virt
[root@host1 ~]# ls /lib/modules/`uname -r`/kernel/fs
binfmt_misc.ko.xz  btrfs  cachefiles  ceph  cifs  cramfs  dlm  exofs  ext4  fat  fscache  fuse  gfs2  isofs  jbd2  lockd  mbcache.ko.xz  nfs  nfs_common  nfsd  nls  overlayfs  pstore  squashfs  udf  xfs

[root@host1 ~]# ls /lib/modules/`uname -r`/kernel/fs/ext4
ext4.ko.xz
[root@host1 ~]# ls /lib/modules/`uname -r`/kernel/fs/xfs
xfs.ko.xz
[root@host1 ~]# ls /lib/modules/`uname -r`/kernel/fs/btrfs
btrfs.ko.xz
[root@host1 ~]# ls /lib/modules/`uname -r`/kernel/fs/nfs
blocklayout  filelayout  flexfilelayout  nfs.ko.xz  nfsv3.ko.xz  nfsv4.ko.xz  objlayout


[root@host1 ~]# lsmod | grep "nfsd "
nfsd                  351321  13
[root@host1 ~]# modinfo nfsd
filename:       /lib/modules/3.10.0-1160.el7.x86_64/kernel/fs/nfsd/nfsd.ko.xz
license:        GPL
author:         Olaf Kirch <okir@monad.swb.de>
alias:          fs-nfsd
retpoline:      Y
rhelversion:    7.9
srcversion:     61A6390CD82AA4A7492CB06
depends:        auth_rpcgss,sunrpc,grace,lockd,nfs_acl
intree:         Y
vermagic:       3.10.0-1160.el7.x86_64 SMP mod_unload modversions
signer:         CentOS Linux kernel signing key
sig_key:        E1:FD:B0:E2:A7:E8:61:A1:D1:CA:80:A2:3D:CF:0D:BA:3A:A4:AD:F5
sig_hashalgo:   sha256
parm:           cltrack_prog:Path to the nfsdcltrack upcall program (string)
parm:           cltrack_legacy_disable:Disable legacy recoverydir conversion. Default: false (bool)
parm:           nfs4_disable_idmapping:Turn off server's NFSv4 idmapping when using 'sec=sys' (bool)

[root@host1 ~]# cat /proc/kallsyms |  egrep -i -w "nfsd_vfs_write"
ffffffffc094fdd0 t nfsd_vfs_write	[nfsd] 

SystemTap script

In the following SystemTap script, we have implemented

  • a probe to detect the kernel module “nfsd” and its function “nfsd_vfs_write”

  • a probe to detect the return of the function “nfsd_vfs_write”

  • handlers to get the execname, pid, tid and timestamp in each probe function

  • a probe as timer(5 seconds tracing)

  • a probe “end” to analyze the collected runtime of the function “nfsd_vfs_write” in the end of tracing

    [root@host1 ~]# cat nfs_trace.stp
    global count
    global start_t,diff_t[1000000]

    probe module(“nfsd”).function(“nfsd_vfs_write”){
    count++
    e = execname()
    p = pid()
    t = tid()
    start_t[e,p,t] = gettimeofday_us()
    }

    probe module(“nfsd”).function(“nfsd_vfs_write”).return {
    e = execname()
    p = pid()
    t = tid()
    start_ts = start_t[e,p,t]
    end_ts = gettimeofday_us()
    if(start_ts>0)
    diff_t[e,p,t,start_ts] = end_ts - start_ts
    }

    probe timer.s(5){exit()}

    probe end{
    count=1
    total_time=0
    foreach([e,p,t,ts] in diff_t){
    printf(“nfsd_vfs_write(%d %s %d %d %d) call time: %d\n”,count,e,p,t,ts,diff_t[e,p,t,ts])
    count++
    total_time+=diff_t[e,p,t,ts]
    }
    count–
    printf(“nfsd_vfs_write total call time(us): %d\n”,total_time)
    printf(“nfsd_vfs_write calls: %d\n”,count)
    printf(“nfsd_vfs_write average call time(us): %d\n”,total_time/count)
    }

Run the SystemTap script

[root@host1 ~]# cat stp.sh
runid=$1
sleep 10
stap -D MAXACTION=1000000 nfs_trace.stp > nfs_trace.${runid}.out

[root@host1 ~]# ./stp.sh kernel-3.10-run1

Note that, the script will wait for 10 seconds before tracing. This is to make sure the network workload to be monitored will be running stable.

Get the tracing result

[root@host1 ~]# cat stp_3.10/nfs_trace.1.out | head -3
nfsd_vfs_write(1 nfsd 3597 3597 1658189341055849) call time: 139
nfsd_vfs_write(2 nfsd 3598 3598 1658189341055864) call time: 136
nfsd_vfs_write(3 nfsd 3596 3596 1658189341055977) call time: 149
[root@host1 ~]# cat stp_3.10/nfs_trace.1.out | tail -5
nfsd_vfs_write(116340 nfsd 3595 3595 1658189346055536) call time: 182
nfsd_vfs_write(116341 nfsd 3598 3598 1658189346055621) call time: 237
nfsd_vfs_write total call time(us): 23050967
nfsd_vfs_write calls: 116341
nfsd_vfs_write average call time(us): 198

As we can see, there are totally 116341 calls for the function “nfsd_vfs_write” and the average call time is 198 us. This gives us a clear sense of the function call latency so that we can compare the same for different system configurations(e.g. different kernel versions).

SystemTap runtime errors

When we run the SystemTap script, we added a option “-D MAXACTION=1000000” to fix the following runtime error.

[root@host1 ~]# stap nfs_trace.stp > nfs_trace.out
ERROR: MAXACTION exceeded near identifier 'printf' at nfs_trace.stp:27:3
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

What does “⁠MAXACTION exceeded” mean?

The probe handler attempted to execute too many statements in the probe handler. The default number of actions allowed in a probe handler is 1000.

Reference