eBPF - What is it? WH-questions

Simplify eBPF architecture

Before being loaded into the kernel, eBPF prog MUST pass a certain set of requirements.
Verification involves executing the eBPF program within the VM. Doing so allows the verifier, with 10,000+ lines of code, to perform a series of checks.
The verifier will traverse the potential paths the eBPF program may take when executed in the kernel, making sure the program does indeed run to completion without any looping that would cause a kernel lockup. Other checks, from valid register state, program size, to out of bound jumps, must also be met.
Almost immediately, eBPF sets itself apart from LKMs with important safety controls in place.
If all checks are passed, the eBPFprogram is loaded and compiled into the kernel at a point in a code path and listens for the right signal. That signal comes in the form of an event that passes where the program is loaded in the code path. Once triggered, the bytecode executes and collects information as per its instructions.

😳 So what does eBPF do? It lets programmers safely execute custom bytecode within the Linux kernel without modifying or adding to kernel source code. While still a far cry from replacing LKMs as a whole, eBPF programs introduce custom code to interact with protected hardware resources with minimal threat to the kernel.

How eBPF Works

Anatomy of an eBPF Program

Events and Hooking

eBPF programs are triggered by events that pass a particular location in the kernel. These events are captured at hooks when a specific set of instructions are executed in a single run. When triggered, these hooks will execute an eBPF program, letting us capture or manipulate data. The diversity of hook locations is one of the many aspects that makes eBPF so useful. A quick sampling of these locations include:

System Calls - Inserted when user space functions transfer execution to the kernel
Function Entry and Exit - Intercepts calls to pre-existing functions
Network Events - Executes when packets are received
Kprobes and uprobes - Attach to probes for kernel or user functions

Helper calls

When eBPF programs are triggered at their hook points, they make calls to helper functions. These special functions are what makes eBPF feature-rich in accessing memory. For example, helpers can perform a wide variety of tasks:

Search, update, and delete key-value pairs in tables
Generate a pseudo-random number
Collect and flag tunnel metadata
Chain eBPF programs together, known as tail calls
Perform tasks with sockets, like binding, retrieve cookies, redirect packets, etc.

These helper functions must be defined by the kernel, meaning there is a whitelist of calls eBPF programs can make. But the number is large and continues to grow.

Maps - Data Structures

To store and share data between the program and kernel or user spaces, eBPF makes use of maps. As implied by the name, maps are key-value pairs. Supporting a number of different data structures, like hash tables, arrays, and tries, programs are able to send and receive data in maps using helper functions.

Executing an eBPF Program

Loading and Verifying

The kernel expects all eBPF programs to be loaded as bytecode, so unless bytecode is being written, we need a way to compile higher level languages. To build out this compiler, eBPF uses LLVM as its back-end infrastructure on which a front-end for any programming language can be built. Because eBPF programs are written in C, that language front end is Clang. But before compiled bytecode can be hooked anywhere, it must pass a series of checks. By simulating the program in a VM-like construct, an in-kernel verifier can prevent programs that loop, do not have the right permissions, or crash the kernel. If the program passes all checks, program bytecode will be loaded onto the hook point using a bpf() system call

Just-In-Time (JIT) Compiler

After verification, eBPF bytecode is JIT’d into native machine code. eBPF has a modern design, meaning it has been upgraded to be 64-bit encoded with 11 total registers. This closely maps eBPF to hardware for x86_64, ARM, and arm64 architecture, amongst others. Fast compilation at runtime makes it possible for eBPF to remain performant even as it must first pass through a VM.

Summary

Putting this conceptual jigsaw together, eBPF programs are inserted into a hook point after passing a number of safety checks. When they are triggered by an event, programs execute immediately, using a combination of helper functions and maps to manipulate and store data. We’ll take a closer look at how these components work together in the next section