Getting started with bpftrace

Intro to bpftrace

bpftrace is a high-level tracing language for Linux enhanced Berkeley Packet Filter (eBPF) available in recent Linux kernels (4.x). bpftrace uses LLVM as a backend to compile scripts to BPF-bytecode and makes use of BCC for interacting with the Linux BPF system, as well as existing Linux tracing capabilities: kernel dynamic tracing (kprobes), user-level dynamic tracing (uprobes), and tracepoints. The bpftrace language is inspired by awk and C, and predecessor tracers such as DTrace and SystemTap. bpftrace was created by Alastair Robertson.

bpftrace package install

$ curl https://repos.baslab.org/rhel/7/bpftools/bpftools.repo --output /etc/yum.repos.d/bpftools.repo
 
$ yum install bpftrace bpftrace-tools bpftrace-doc bcc-static bcc-tools
Installed:
  bcc-static.x86_64 0:0.21.0-1.el7  bpftrace.x86_64 0:0.13.0-2.el7  bpftrace-doc.noarch 0:0.13.0-2.el7   bpftrace-tools.noarch 0:0.13.0-2.el7
 
Updated:
  bcc-tools.x86_64 0:0.21.0-1.el7
 
Dependency Updated:
  bcc.x86_64 0:0.21.0-1.el7  python-bcc.noarch 0:0.21.0-1.el7

Bpftrace ships with many ready-to-run tools after installation.

$ cd /usr/share/bpftrace/tools
$ ls
bashreadline.bt  biostacks.bt  cpuwalk.bt  execsnoop.bt       killsnoop.bt  naptime.bt    pidpersec.bt  setuids.bt    syncsnoop.bt  tcpconnect.bt  tcpretrans.bt   vfscount.bt   xfsdist.bt biolatency.bt    bitesize.bt   dcsnoop.bt  ext4dist.bt        loads.bt      oomkill.bt    runqlat.bt    statsnoop.bt  syscount.bt   tcpdrop.bt     tcpsynbl.bt     vfsstat.bt
biosnoop.bt      capable.bt    doc         gethostlatency.bt  mdflush.bt    opensnoop.bt  runqlen.bt    swapin.bt     tcpaccept.bt  tcplife.bt     threadsnoop.bt  writeback.bt

A first look at bpftrace: tracing system call open()

Run the bpftrace program at the command line(a one-liner):

$ bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'
Attaching 1 probe...

From a different terminal, start a iostat process to be traced:

$ iostat -ktdx 2

Monitor the tracing output:

$ bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'
<omitted..>
iostat /proc/uptime
iostat /proc/stat
iostat /proc/diskstats
iostat /proc/uptime
iostat /proc/stat
iostat /proc/diskstats
iostat /proc/uptime
iostat /proc/stat
iostat /proc/diskstats
^C

The output shows the process name and the filename passed to the open syscall system-wide. In the above example, the iostat process opens the following files every 2 seconds.

/proc/uptime
/proc/stat
/proc/diskstats

List all the open tracepoints:

$ bpftrace -l 'tracepoint:syscalls:sys_enter_open*'
tracepoint:syscalls:sys_enter_open
tracepoint:syscalls:sys_enter_open_by_handle_at
tracepoint:syscalls:sys_enter_open_tree
tracepoint:syscalls:sys_enter_openat
tracepoint:syscalls:sys_enter_openat2

Count the open(and variant) syscalls:

$ bpftrace -e 'tracepoint:syscalls:sys_enter_open* { @[probe]=count();}'
Attaching 5 probes...
^C
@[tracepoint:syscalls:sys_enter_open]: 66
@[tracepoint:syscalls:sys_enter_openat]: 3963

Bpftrace ships with opensnoop.bt which traces both the start and end of open/openat syscall.

$ cat opensnoop.bt
#!/usr/bin/bpftrace
/*
 * opensnoop        Trace open() syscalls.
 *                For Linux, uses bpftrace and eBPF.
 *
 * Also a basic example of bpftrace.
 *
 * USAGE: opensnoop.bt
 *
 * This is a bpftrace version of the bcc tool of the same name.
 *
 * Copyright 2018 Netflix, Inc.
 * Licensed under the Apache License, Version 2.0 (the "License")
 *
 * 08-Sep-2018        Brendan Gregg        Created this.
 */

BEGIN
{
    printf("Tracing open syscalls... Hit Ctrl-C to end.\n");
    printf("%-6s %-16s %4s %3s %s\n", "PID", "COMM", "FD", "ERR", "PATH");
}

tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat
{
    @filename[tid] = args->filename;
}

tracepoint:syscalls:sys_exit_open,
tracepoint:syscalls:sys_exit_openat
/@filename[tid]/
{
    $ret = args->ret;
    $fd = $ret > 0 ? $ret : -1;
    $errno = $ret > 0 ? 0 : - $ret;
    printf("%-6d %-16s %4d %3d %s\n", pid, comm, $fd, $errno, str(@filename[tid]));
    delete(@filename[tid]);
}

END
{
    clear(@filename);
}

It outputs process id, command, fd and the opened file path.

$ ./opensnoop.bt | egrep "PID|iostat"
PID    COMM               FD ERR PATH
28992  iostat              3   0 /etc/ld.so.cache
28992  iostat              3   0 /lib64/libc.so.6
28992  iostat              3   0 /usr/lib/locale/locale-archive
28992  iostat              3   0 /sys/devices/system/cpu
28992  iostat              3   0 /proc/diskstats
28992  iostat              3   0 /etc/localtime
28992  iostat              3   0 /proc/uptime
28992  iostat              3   0 /proc/stat
28992  iostat              3   0 /proc/diskstats
28992  iostat              4   0 /etc/sysconfig/sysstat.ioconf
28992  iostat              3   0 /proc/uptime
28992  iostat              3   0 /proc/stat
28992  iostat              3   0 /proc/diskstats
^C

Reference