TIL: The basics of writing eBPF programs

2024-11-17 00:00:00 +0000 UTC

eBPF is a set of tools for interacting with the internals of an operating system’s kernel without rebuilding the kernel itself. eBPF programs are written in C, compiled to an executable using eBPF bytecode, and JIT-compiled to native code after verification by the kernel at invocation-time. eBPF programs are invoked when a defined event is emitted by the kernel. There are many hook-points provided by the Linux kernel for eBPF programs.

To simplify the process of writing the userspace-interface for eBPF programs, several SDKs exist, including the Go eBPF SDK.

A simple eBPF program

The following eBPF counts packets on a particular network interface. This program is a lightly-modified version of the “getting started” example in the Go eBPF documentation.

//go:build ignore
// ^ included to exclude this file from the default go buildchain
//   This file will be compiled using go generate.

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

// This struct definition is providing all of the arguments required to invoke the bpf syscall.
// The definition of the syscall can be found in `man bpf`. In particular this struct is used to
// provide arguments for the bpf command BPF_MAP_CREATE.
struct {
	// __uint is  macro defined in libbpf: https://docs.ebpf.io/ebpf-library/libbpf/ebpf/__uint/
	// This macro is used to define uint properties of maps. It is defined as:
	// 	#define __uint(name, val) int (*name)[val]
	// In this case, the value is a constant defined in libbpf. The full list of available map types
	// is at: https://docs.ebpf.io/linux/map-type/
        __uint(type, BPF_MAP_TYPE_ARRAY);

	// __type is a macro defined in libbpf: https://docs.ebpf.io/ebpf-library/libbpf/ebpf/__type/
	// It is used to define "type" properties in maps. It is defined as:
	// 	#define __type(name, val) typeof(val) *name
	// Here the key is a 32-bit unsigned integer, and value is a 64-bit unsigned integer.
	// The __-prefixed integral types are those defined in asm/types.h
	__type(key, __u32);
	__type(value, __u64);

	// max_entries is a property of the BPF map type "array"
	// indicating the maximum number of values stored in this array
	__uint(max_entries, 1);
				
} pkt_count SEC(".maps");

// Because this is a BPF_PROG_TYPE_XDP, we put our code in the
// a section of the ELF file prefixed with xdp.
// https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_XDP/
SEC("xdp")
int count_packets() {
	// we have pkt_count.max_entries == 1,
	// so the key for our value will always be 0
	__u32 key = 0;

	// bpf_map_lookup_elem is a helper defined in bpf/bpf_helpers.h
	// A list of helpers: https://docs.ebpf.io/linux/helper-function/
	// This helper wraps the bpf syscall invoking the command BPF_MAP_LOOKUP_ELEM
	__u64 *count = bpf_map_lookup_elem(&pkt_count, &key);

	// bpf_map_lookup_elem can fail, and the BPF verifier is strict about null-checking
	if(count) {
		// the modern __atomic equivalent to the legacy __sync fn included
		// in the original example code. Since our program is an XDP-type
		// eBPF program, it is invoked for every packet on a particular network
		// interface. This means our map will hold the count of all packets traversing the
		// defined network interface.

		// We'll define the network interface when loading the eBPF program in our ebpf-go program.
		__atomic_fetch_add(count, 1, __ATOMIC_RELAXED);				        
	}

	// XDP defines constants indicating what to do
	// with the examined packets
	return XDP_PASS;			
}

char __license[] SEC("license") = "Dual MIT/GPL"

To generate eBPF bytecode for this function, we run go generate. We can learn a little more by using llvm-objdump to dump the compiled object file: llvm-objdump -d counter_bpfel.o.

The first few instructions are straightforward:

0:       b7 01 00 00 00 00 00 00 r1 = 0                            # __u32 key = 0
1:       63 1a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r1            # store key on the stack
2:       bf a2 00 00 00 00 00 00 r2 = r10                          # r2 is pointer to bottom of stack
3:       07 02 00 00 fc ff ff ff r2 += -4                          # r2 now points to value of key on stack

Next we get some more interesting instructions:

4:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll # set r1 to the address of our map. opcode 0x18, src, 0x1 => r1 = map_by_fd(0)
6:       85 00 00 00 01 00 00 00 call 1                            # call bpf_map_lookup_elem. note this is the BPF_CALL opcode 0x8 for calling a helper fn
7:       15 00 02 00 00 00 00 00 if r0 == 0 goto +2 <LBB0_2>       # null check on return value of bpf_map_lookup_elem
8:       b7 01 00 00 01 00 00 00 r1 = 1                            # r1 = 1, our increment size
9:       db 10 00 00 00 00 00 00 lock *(u64 *)(r0 + 0) += r1       # atomically increment the u64 value at memory location (r0+0) by r1

Instruction 4 has opcode 0x18, which is “load immediate, double word.” Source register 0x1 indicates this is “load the address to the eBPF map at the file descriptor provided in the immediate argument.” So r1 stores the address of our map after this instruction executes.

Instruction 6 is a call instruction. Opcode 85 is the command “call platform-agnostic helper function.” This lookup is performed by BTF helper function ID. On my system, bpf_map_lookup_elem has BTF ID 1. Seee below for more information about obtaining these IDs on Linux.

Instruction 7 is a JEQ comparing a register to an immediate value. Instruction 8 stores in the immediate value 1 in a given register.

Instruction 9 is more complex – it is an atomic add, using the address we loaded into r0 by calling bpf_map_lookup_elem.

The rest of the program is simple:

0000000000000050 <LBB0_2>:
10:       b7 00 00 00 02 00 00 00 r0 = 2                           # set return value to 2 (XDP_PASS)
11:       95 00 00 00 00 00 00 00 exit                             # return

To determine what the BTF ID of bpf_map_lookup_elem was, I first ran bpftool btf dump file /sys/kernel/btf/vmlinux format c, then searched the resulting file output:

enum bpf_func_id {
	BPF_FUNC_unspec = 0,
	BPF_FUNC_map_lookup_elem = 1,
	BPF_FUNC_map_update_elem = 2,
	BPF_FUNC_map_delete_elem = 3,
// ....
};

Some sources:

Tags: C linux ebpf