eBPF end to end
move · O overview · F fit
01 / 00

Contributor deep-dive · built on NetWatch's real eBPF

eBPF, end to end.
From a kernel hook to a process name.

How a ~250-line program lets NetWatch say which process opened a connection — by running safe, verified code inside the Linux kernel. We build the whole idea from first principles, then walk the exact aya program that ships in netwatch-sdk, bug scars and all.

the eBPF VMthe verifiermaps & ring buffers kprobesaya / RustCO-RE & BTF the #38 war story

Five acts · ~23 stops · three live simulations you can drive. Everything here is grounded in files you can open: crates/ebpf-programs/src/main.rs, crates/common/src/lib.rs, src/ebpf/source.rs.

Why run code in the kernel at all

The visibility you want lives on the wrong side of a wall

NetWatch wants to name the process behind every connection. But the moment a socket is interesting — the connect() — the answer (pid, comm) is known only for an instant, deep in the kernel, and only to the task making the call.

  • Polling /proc races the kernel: short-lived flows open and close between two reads. You see the connection, never the owner.
  • A kernel module could see it — but a bug is a kernel panic, and you'd recompile per kernel. Nobody ships that to end users.
  • Packet capture alone sees bytes on the wire, not the task that produced them.

You need to run your logic at the kernel event, with the kernel's guarantee that your logic can't crash or hang it.

Userspace

netwatch (your TUI)
reads /proc, libpcap
…always a step behind

Kernel

tcp_v4_connect()
pid · comm · dest known HERE
gone microseconds later

eBPF erases the wall: a sliver of your code, running right at the kernel event.

The whole idea in one picture

Event → hook → your program → map → userspace

Every eBPF system is this pipeline. A kernel event hits a hook; the kernel runs your verified program; it stashes data in a map; userspace reads the map. Press play and watch one connect() flow through.

live

The eBPF pipeline

event
connect()
a task calls into the kernel
hook
kprobe
attached to tcp_v4_connect
program
your bytecode
verified + JITed, runs in-kernel
map
ring buffer
shared kernel⇄user memory
userspace
netwatch
reads, decodes, attributes

The kernel boundary sits between program/map and userspace — but the map straddles it. That straddle is the whole trick.

Program types & attach points

eBPF is a family — pick the hook that sees your event

A program's type decides where it attaches, what context it gets, and which helpers it may call. NetWatch lives in the kprobe corner — dynamic instrumentation of an arbitrary kernel function — because tcp_v4_connect is exactly the moment it cares about.

kprobe / kretprobe ← us

Attach to (almost) any kernel function, entry or return. Maximum reach; pays for it in stability — the function signature is an internal detail that can shift between kernels.

tracepoint stable

Kernel-blessed static hooks with a documented, stable layout. Fewer of them, but they don't move under you. The grown-up alternative to a kprobe when one exists.

XDP L2, fastest

Runs at the driver before an sk_buff even exists — drop/redirect at line rate. The DDoS / load-balancer corner.

tc (traffic control) L3

Ingress/egress on the qdisc with full packet context. Where shaping and per-packet policy live.

LSM / cgroup policy

Security hooks and per-cgroup socket/connect gates — enforcement points, not just observation.

perf / uprobe profiling

Sampling and user-space function probes. Out of scope here, but the same machine underneath.

Same VM, verifier, maps, and JIT under all of them. Learn the kprobe path and the rest is mostly a different context struct and helper set.

A tiny RISC machine inside the kernel

11 registers, a 512-byte stack, 8 bytes per instruction

eBPF isn't a scripting language — it's a real instruction set the kernel interprets or JIT-compiles to native code. Deliberately minimal, so it can be proven safe before it runs.

r0
return value
r1–r5
call args (ctx in r1)
r6–r9
callee-saved
r10
frame ptr (read-only)

Every instruction is the same 64-bit shape. Click the fields →

interactive

Decode a BPF instruction

Pick an instruction, then click a field to see what those bytes mean.

opcode(8) · dst(4) · src(4) · offset(16) · imm(32) — 64 bits, always. The verifier reasons over a graph of these.

The shared memory between two worlds

Maps are how kernel code and your process touch the same bytes

A program is stateless and short-lived — it runs, returns, and forgets. Maps are the memory that outlives a single run and crosses the kernel boundary. Both the BPF program (via helpers) and userspace (via the bpf() syscall) read and write them.

HASH / LRU_HASH

Key→value tables. The bread and butter — e.g. a flow-key → metadata cache. LRU variant evicts under pressure instead of failing.

ARRAY / PERCPU_ARRAY

Index→value, fixed size. Per-CPU variants give each core its own copy so the program never contends a lock.

RINGBUF ← us

A multi-producer, single-consumer queue for streaming events to userspace. MPSC, preserves order, supports reserve/commit. NetWatch's EVENTS map.

PERF_EVENT_ARRAY predecessor

The older per-CPU event channel. Ring buffer superseded it (one shared buffer, ordering, less copying) on kernels ≥ 5.8.

PROG_ARRAY / maps-of-maps

Tail calls and nesting — programs that jump to other programs. Power tools; not needed for our path.

SK_STORAGE / TASK_STORAGE

Storage attached to a socket or task lifetime — the kernel frees it when the object dies. Elegant for per-flow state.

Why a ring buffer for us: connect() events are a stream, order matters, and we want backpressure (drop, don't corrupt) when userspace stalls.

The reason any of this is safe

The verifier proves your program before the kernel will run it

Loading a program isn't "trust me." The kernel statically proves three things by walking every path through your instructions:

  • It terminates — a DAG with bounded loops only; no way to hang the kernel.
  • Memory safety — every pointer is tracked with known bounds; no read/write escapes its object.
  • No leaks — resources you acquire (a reserved ring-buffer slot) must be released on every exit path.

It does this by simulating execution: for each instruction it tracks the type and range of every register, forking its analysis at branches. Reject → your program never loads.

drive it

Verifier stepper — reference accounting

Flip to "reserve first" and step to the early return — the same rejection that shaped NetWatch's real program. This is also why a sandbox can drop CAP_BPF after load.

How sandboxed code reaches the kernel

Helpers are the syscall table for eBPF — and some are GPL-only

Your program can't just dereference kernel pointers or call kernel functions. It calls a fixed, audited set of helpers — and, on modern kernels, kfuncs. Each is vetted for the verifier's safety model.

  • bpf_probe_read_kernel() — safely copy from a kernel address (faults become errors, not panics).
  • bpf_get_current_pid_tgid() / _comm() — who is running right now.
  • bpf_ktime_get_ns() — a monotonic timestamp.
  • bpf_ringbuf_reserve/submit() — the map ops behind aya's reserve()/submit().
ebpf-programs/src/main.rs

A real footgun: forget this 4-byte section and the load fails with an error that points nowhere near the cause. NetWatch documents it in the source so the next person doesn't lose an afternoon.

Compile once, run on many kernels

BTF describes kernel types; CO-RE relocates against them at load

A struct sock's field offsets differ between kernel builds. Hardcode them and your program is wrong on the next box. The fix is two pieces:

  • BTF (BPF Type Format) — compact type info for the running kernel, shipped at /sys/kernel/btf/vmlinux.
  • CO-RE (Compile Once – Run Everywhere) — your object carries relocations like "offset of field X"; the loader patches them against the target's BTF at load time.

The result: one .o that adapts, instead of a build per kernel (the old BCC model that shipped Clang to every machine).

your .o + CO-RE relocations
loader reads target BTF (/sys/kernel/btf/vmlinux)
patch every "field offset" to THIS kernel
verifier-ready, kernel-correct bytecode

NetWatch's Phase-1 program reads a copied-in sockaddr at fixed, ABI-stable offsets, so it leans on CO-RE less than a program chasing struct sock internals would — but the same machinery is what makes the eBPF Phase-2 hooks portable.

Four ways to write the same bytecode

Why NetWatch writes its kernel code in Rust (aya)

BCC

Python/C, compiles BPF on the target with embedded Clang. Great for ad-hoc tracing; heavy to ship (an LLVM toolchain on every user's machine).

libbpf + C

The canonical modern path. C source, CO-RE, a small loader library. Battle-tested; but it's a second language and build system bolted onto a Rust project.

bpftrace

A high-level DSL for one-liners and scripts. Perfect for exploration, not for embedding a long-lived program inside a product.

aya (Rust) ← us

Kernel and userspace in Rust, no libbpf/C dependency, CO-RE supported. One language, one toolchain, one build — it disappears into the NetWatch tree.

The deciding factor for a Rust product isn't raw capability — all four emit the same verified bytecode. It's cohesion: aya lets the kernel program share types with userspace (one #[repr(C)] struct, both sides) and build with cargo, no foreign toolchain to vendor or teach contributors.

Trade-off acknowledged: aya's ecosystem is younger than libbpf's, and you write some unsafe by hand that libbpf's CO-RE macros would hide. NetWatch judged the cohesion worth it.

From Rust source to an embeddable object

A freestanding binary for a target with no operating system

The kernel program compiles to bpfel-unknown-none — little-endian BPF, no OS. That constraint explains every odd line at the top of the file:

  • #![no_std] — no standard library; there's no allocator or syscalls down here.
  • #![no_main] — the kernel calls your hook, not a main().
  • a #[panic_handler] — required for no_std; it just loops, because there's nothing to panic to.

Output: netwatch_sdk_ebpf.o, an ELF object of BPF bytecode + map definitions + that license section.

top of the kernel crate

Built by scripts/build-ebpf.sh, then copied to target/bpf/netwatch_sdk_ebpf.o so the userspace crate can swallow it whole (next slide).

What actually happens at startup

ELF bytes → bpf() syscall → verify → JIT → attach

The .o is baked into the userspace binary with include_bytes! — no file to ship or lose. At runtime aya walks it through the kernel's loading protocol:

include_bytes!(…/netwatch_sdk_ebpf.o)
aya parses ELF: programs, maps, relocations, license
bpf(BPF_MAP_CREATE) — ring buffer gets an fd
bpf(BPF_PROG_LOAD) — kernel VERIFIES, then JITs
attach kprobe to tcp_v4_connect — now live

Two things worth internalizing:

  • Everything is a file descriptor. The loaded program and the map are fds the process owns. Close them and the kernel tears down the attachment — which is exactly how NetWatch's Drop detaches cleanly.
  • Verification happens at load, once. The expensive proof is paid at PROG_LOAD; after that the JITed code runs at native speed on every event.

This is also where it can fail in the field — missing CAP_BPF, a kernel that rejects the probe, no BTF. NetWatch surfaces each as EbpfStatus::Unavailable rather than crashing (Act V).

crates/common/src/lib.rs · shared by both sides

One #[repr(C)] struct, agreed byte-for-byte

The kernel program writes these bytes; userspace reads them out of the same ring-buffer slot. There's no serialization — it's raw memory. So the layout must be identical and predictable on both sides.

  • #[repr(C)] pins field order & C alignment — no Rust field reordering.
  • _pad0: [u8; 3] is explicit padding: it pushes tgid to offset 4 so the u32 is naturally aligned. Leave it out and the compiler inserts silent padding — fine until the two sides disagree.
  • All address fields stay in network byte order; userspace converts on decode.

Click a field to see exactly which bytes it owns. Then break it →

interactive

ConnectV4Event — 48 bytes

This struct is the seam between kernel and userspace. Get its layout wrong and every field downstream is garbage — silently.

ebpf-programs/src/main.rs · the program itself

Read what you need, then build the event

#[kprobe] tcp_v4_connect

Every line is doing one of three jobs:

  • Get the contextProbeContext is the kprobe's window onto the probed function's registers. ctx.arg(1) is the second argument, struct sockaddr *uaddr.
  • Ask the kernel via helpers — pid/tgid, comm, and a timestamp come from helpers, not from poking memory. Safe by construction.
  • Carefully read kernel memorybpf_probe_read_kernel copies the destination port and address out of uaddr at the sockaddr_in offsets, fault-safe.

Notice saddr and sport are hardcoded to 0. That's not laziness — it's a fact about when this code runs. Which is the whole next slide.

Why read uaddr and not the socket? Because of a bug that made eBPF attribution silently return nothing — issue #38. →

When you read decides what you see

At kprobe entry, the socket isn't filled in yet

A kprobe on tcp_v4_connect fires at function entry — before its body runs. The earlier version read the socket's own fields (skc_daddr, skc_dport) there and got all zeros: the kernel only populates them later, during routing. Every event was discarded; attribution silently never worked. Scrub the timeline and watch what's readable when.

drive it

Execution of tcp_v4_connect(sk, uaddr, len)

The fix: read the destination from uaddr (valid at entry, copied in by the syscall layer), and accept that source addr/port aren't known yet → key attribution on (daddr, dport).

Why the reads come before the reserve

A reserved slot is a reference you must always release

Look back at the program's ordering: it reads every field first, and only then reserves the ring-buffer slot. That order isn't taste — the verifier requires it.

Once you call reserve() you hold a reference. The verifier proves that on every path to BPF_EXIT, that reference is released (submit or discard). An early return after a reserve — e.g. a read that failed — is a leak, and the program is rejected. So: do everything fallible before you reserve.

Flip the toggle in the stepper on "The verifier" (slide 7) to watch the real rejection fire. Here's the rule in the source's own words →

main.rs — the comment that encodes the rule

The verifier turns a whole class of resource-leak bugs into a compile-time "no." You either structure the code correctly or it never loads.

256 KiB of shared, ordered, lossy-by-design queue

reserve → write → submit, with backpressure

The ring buffer is the conveyor belt from kernel to userspace. The program reserve()s a slot, write()s the event into it, and submit(0)s — the 0 flag meaning "wake a polling consumer."

  • Sized for bursts — 256 KiB absorbs ~5k connect/sec for a few hundred ms if userspace stalls.
  • Lossy, not corrupting — when it's full, reserve() returns None and the event is dropped. Never a torn write.
  • Ordered & multi-producer — every CPU can submit; the consumer sees a single ordered stream.

Drive the producer and consumer and watch it fill, wake, and drop.

interactive

EVENTS: RingBuf (capacity 16 shown)

The submit(0) wake is a latency/throughput knob: BPF_RB_NO_WAKEUP trades immediacy for fewer wakeups once you know your consumer.

src/ebpf/source.rs · EventSource

Load the object, attach the probe, drain to a channel

source.rs — load → attach → read (Linux+ebpf)

Three moves, mirroring the kernel side:

  • Embed, don't ship. include_bytes! folds the .o into the executable — nothing to install or path-resolve at runtime.
  • Attach by name. aya finds the tcp_v4_connect program in the object, loads it (verify + JIT), and attaches the kprobe.
  • Own the lifecycle via fds. The Bpf handle owns the program and map; its Drop closes the fds and the kernel detaches the probe. No explicit teardown.

Decoding flips the network-byte-order fields back and hands a typed EbpfEvent to the rest of NetWatch over an mpsc channel.

On non-Linux, or without the ebpf feature, this whole path is a stub returning UnsupportedPlatform — every other OS still compiles.

The whole vertical slice, one event

From connect() to a process name on screen

Everything so far, assembled. A browser opens a connection; seven steps later NetWatch's Connections tab labels it with the owning process — something packet capture alone could never do. Step through the real data as it transforms.

drive it

chrome → 142.250.72.4:443

The key insight made physical: attribution is keyed on (daddr, dport) because that's the only stable identity available at kprobe entry. The #38 fix and this final label are the same decision.

Privilege you earn, then give back

Three guarantees stack up to "safe in an end-user tool"

Running code in the kernel sounds reckless for a TUI people install. It isn't, because the risky parts are proven done before the program runs and the capability is dropped right after load.

  • The verifier already proved termination, memory safety, and no leaks — a panic or hang isn't on the table.
  • The GPL gate keeps you to audited helpers; you can't call arbitrary kernel code.
  • The sandbox drops CAP_BPF / CAP_PERFMON / CAP_NET_RAW the instant load+attach finish — the program keeps running, but nothing can load another one.
startup — hold CAP_BPF (need it to load)
bpf(PROG_LOAD) — verifier proves the program
attach kprobe · take ring-buffer fd
sandbox::apply() — DROP CAP_BPF + Landlock
run loop — probe still firing, privilege gone

This is the same "earn privilege, then drop it" arc as packet capture in NetWatch's sandbox — eBPF just joins the list of things opened before the drop.

What the model costs you

The constraints are the price of the safety

Kprobes are unstable contracts

You're hooking an internal function. Its signature or existence can change between kernels — which is why a tracepoint is preferred when one covers your event.

512-byte stack

Tiny. Large structs go in a map (or per-CPU scratch), not on the stack. The ring-buffer reserve() pattern also keeps the event out of stack space.

No unbounded loops

The verifier must prove termination. Bounded loops only; anything it can't bound, it rejects. Some "obvious" code simply won't load.

Helper / feature availability

Ring buffers need kernel ≥ 5.8; some helpers are newer still. Older kernels mean a different code path or a graceful "no."

It can fail at attach

Missing CAP_BPF, no BTF, a kernel that refuses the probe. NetWatch reports these as state, never a crash →

Verifier errors are cryptic

"Invalid ELF header" for a missing license; "reference leak" for a misordered reserve. Half of eBPF skill is reading these. This deck's war story is one such scar.

enum EbpfStatus { Active · Unavailable(reason) · NotCompiled } — every failure mode is a UI state, so the tool degrades instead of dying. Off by default; you opt in with --features ebpf.

Where this goes next — and where you come in

One kprobe today; full attribution tomorrow

Phase 1 is a single IPv4 TCP kprobe. The roadmap's eBPF Phase 2 widens it until NetWatch can decrypt a QUIC flow and name the process that owns it — something nothing else in the category does end to end.

  • tcp_v6_connect — same story, IPv6.
  • UDP send path — so QUIC (which NetWatch already decrypts) gets per-process attribution, not just /proc guesses.
  • inet_sock_set_state — catch flows by state transition, including short-lived ones the connect-entry probe can miss.

Every one of these is the same question this deck has been answering, asked again:

which hook sees this event?
what's actually populated when it fires?
what's the stable key for attribution?
read it before reserve, ship the event

Adding a hook means reading the same kernel networking source a netdev contributor reads — so this is also a real on-ramp into upstream kernel work. The #38 timeline is the kind of question you'll keep answering.

The whole map, once more

From first principles to a process name on screen

The model Act I

event → hook → verified program → map → userspace. A kprobe at the one moment the kernel knows who owns a socket.

The machine Act II

A tiny VM, maps as shared memory, and a verifier that proves termination, memory safety, and no leaks before anything runs.

The toolchain Act III

aya keeps kernel and userspace in one Rust build; no_std object → bpf() → verify → JIT → attach, all owned by fds.

The real program Act IV

A #[repr(C)] ABI, reads before the reserve, the #38 timing bug, and the ring buffer carrying one event to userspace.

Safety Act V

Proven safe, GPL-gated, capability dropped after load. Failure is a UI state, never a crash.

Read the source

crates/ebpf-programs/src/main.rs · crates/common/src/lib.rs · src/ebpf/source.rs. ~250 lines, now legible end to end.

Press O for the full map · to revisit any simulation. Next stop: add a hook (Act V) — or pick a BPF selftest and send your first kernel patch.

Jump to a stop — click any, or press O to close