Contributor deep-dive · built on NetWatch's real eBPF
eBPF, end to end.
From a kernel hook to a process name.
How a ~250-line program lets NetWatch say which process opened
a connection — by running safe, verified code inside the Linux kernel. We build the whole
idea from first principles, then walk the exact aya program that ships in netwatch-sdk,
bug scars and all.
Five acts · ~23 stops · three live simulations you can drive.
Everything here is grounded in files you can open: crates/ebpf-programs/src/main.rs,
crates/common/src/lib.rs, src/ebpf/source.rs.
Why run code in the kernel at all
The visibility you want lives on the wrong side of a wall
NetWatch wants to name the process behind every connection. But the moment a socket is
interesting — the connect() — the answer (pid, comm) is known only
for an instant, deep in the kernel, and only to the task making the call.
- Polling
/procraces the kernel: short-lived flows open and close between two reads. You see the connection, never the owner. - A kernel module could see it — but a bug is a kernel panic, and you'd recompile per kernel. Nobody ships that to end users.
- Packet capture alone sees bytes on the wire, not the task that produced them.
You need to run your logic at the kernel event, with the kernel's guarantee that your logic can't crash or hang it.
Userspace
Kernel
eBPF erases the wall: a sliver of your code, running right at the kernel event.
The whole idea in one picture
Event → hook → your program → map → userspace
Every eBPF system is this pipeline. A kernel event hits a
hook; the kernel runs your verified program; it stashes data in a
map; userspace reads the map. Press play and watch one connect() flow through.
The eBPF pipeline
The kernel boundary sits between program/map and userspace — but the map straddles it. That straddle is the whole trick.
Program types & attach points
eBPF is a family — pick the hook that sees your event
A program's type decides where it attaches, what context it
gets, and which helpers it may call. NetWatch lives in the kprobe corner — dynamic
instrumentation of an arbitrary kernel function — because tcp_v4_connect is exactly the
moment it cares about.
kprobe / kretprobe ← us
Attach to (almost) any kernel function, entry or return. Maximum reach; pays for it in stability — the function signature is an internal detail that can shift between kernels.
tracepoint stable
Kernel-blessed static hooks with a documented, stable layout. Fewer of them, but they don't move under you. The grown-up alternative to a kprobe when one exists.
XDP L2, fastest
Runs at the driver before an
sk_buff even exists — drop/redirect at line rate. The DDoS / load-balancer corner.
tc (traffic control) L3
Ingress/egress on the qdisc with full packet context. Where shaping and per-packet policy live.
LSM / cgroup policy
Security hooks and per-cgroup socket/connect gates — enforcement points, not just observation.
perf / uprobe profiling
Sampling and user-space function probes. Out of scope here, but the same machine underneath.
Same VM, verifier, maps, and JIT under all of them. Learn the kprobe path and the rest is mostly a different context struct and helper set.
A tiny RISC machine inside the kernel
11 registers, a 512-byte stack, 8 bytes per instruction
eBPF isn't a scripting language — it's a real instruction set the kernel interprets or JIT-compiles to native code. Deliberately minimal, so it can be proven safe before it runs.
Every instruction is the same 64-bit shape. Click the fields →
Decode a BPF instruction
opcode(8) · dst(4) · src(4) · offset(16) · imm(32) — 64 bits, always.
The verifier reasons over a graph of these.
The shared memory between two worlds
Maps are how kernel code and your process touch the same bytes
A program is stateless and short-lived — it runs, returns, and forgets.
Maps are the memory that outlives a single run and crosses the kernel boundary. Both the BPF
program (via helpers) and userspace (via the bpf() syscall) read and write them.
HASH / LRU_HASH
Key→value tables. The bread and butter — e.g. a flow-key → metadata cache. LRU variant evicts under pressure instead of failing.
ARRAY / PERCPU_ARRAY
Index→value, fixed size. Per-CPU variants give each core its own copy so the program never contends a lock.
RINGBUF ← us
A multi-producer, single-consumer queue for
streaming events to userspace. MPSC, preserves order, supports reserve/commit.
NetWatch's EVENTS map.
PERF_EVENT_ARRAY predecessor
The older per-CPU event channel. Ring buffer superseded it (one shared buffer, ordering, less copying) on kernels ≥ 5.8.
PROG_ARRAY / maps-of-maps
Tail calls and nesting — programs that jump to other programs. Power tools; not needed for our path.
SK_STORAGE / TASK_STORAGE
Storage attached to a socket or task lifetime — the kernel frees it when the object dies. Elegant for per-flow state.
Why a ring buffer for us: connect() events are a stream, order
matters, and we want backpressure (drop, don't corrupt) when userspace stalls.
The reason any of this is safe
The verifier proves your program before the kernel will run it
Loading a program isn't "trust me." The kernel statically proves three things by walking every path through your instructions:
- It terminates — a DAG with bounded loops only; no way to hang the kernel.
- Memory safety — every pointer is tracked with known bounds; no read/write escapes its object.
- No leaks — resources you acquire (a reserved ring-buffer slot) must be released on every exit path.
It does this by simulating execution: for each instruction it tracks the type and range of every register, forking its analysis at branches. Reject → your program never loads.
Verifier stepper — reference accounting
Flip to "reserve first" and step to the early return — the same rejection that shaped
NetWatch's real program. This is also why a sandbox can drop CAP_BPF after load.
How sandboxed code reaches the kernel
Helpers are the syscall table for eBPF — and some are GPL-only
Your program can't just dereference kernel pointers or call kernel functions. It calls a fixed, audited set of helpers — and, on modern kernels, kfuncs. Each is vetted for the verifier's safety model.
bpf_probe_read_kernel()— safely copy from a kernel address (faults become errors, not panics).bpf_get_current_pid_tgid()/_comm()— who is running right now.bpf_ktime_get_ns()— a monotonic timestamp.bpf_ringbuf_reserve/submit()— the map ops behind aya'sreserve()/submit().
A real footgun: forget this 4-byte section and the load fails with an error that points nowhere near the cause. NetWatch documents it in the source so the next person doesn't lose an afternoon.
Compile once, run on many kernels
BTF describes kernel types; CO-RE relocates against them at load
A struct sock's field offsets differ between kernel builds. Hardcode
them and your program is wrong on the next box. The fix is two pieces:
- BTF (BPF Type Format) — compact type info for the running kernel, shipped at
/sys/kernel/btf/vmlinux. - CO-RE (Compile Once – Run Everywhere) — your object carries relocations like "offset of field X"; the loader patches them against the target's BTF at load time.
The result: one .o that adapts, instead of a build per kernel
(the old BCC model that shipped Clang to every machine).
NetWatch's Phase-1 program reads a copied-in sockaddr at fixed,
ABI-stable offsets, so it leans on CO-RE less than a program chasing struct sock
internals would — but the same machinery is what makes the eBPF Phase-2 hooks portable.
Four ways to write the same bytecode
Why NetWatch writes its kernel code in Rust (aya)
BCC
Python/C, compiles BPF on the target with embedded Clang. Great for ad-hoc tracing; heavy to ship (an LLVM toolchain on every user's machine).
libbpf + C
The canonical modern path. C source, CO-RE, a small loader library. Battle-tested; but it's a second language and build system bolted onto a Rust project.
bpftrace
A high-level DSL for one-liners and scripts. Perfect for exploration, not for embedding a long-lived program inside a product.
aya (Rust) ← us
Kernel and userspace in Rust, no libbpf/C dependency, CO-RE supported. One language, one toolchain, one build — it disappears into the NetWatch tree.
The deciding factor for a Rust product isn't raw
capability — all four emit the same verified bytecode. It's cohesion: aya lets the kernel
program share types with userspace (one #[repr(C)] struct, both sides) and build with
cargo, no foreign toolchain to vendor or teach contributors.
Trade-off acknowledged: aya's ecosystem is younger than libbpf's, and you write some
unsafe by hand that libbpf's CO-RE macros would hide. NetWatch judged the cohesion
worth it.
From Rust source to an embeddable object
A freestanding binary for a target with no operating system
The kernel program compiles to bpfel-unknown-none — little-endian
BPF, no OS. That constraint explains every odd line at the top of the file:
#![no_std]— no standard library; there's no allocator or syscalls down here.#![no_main]— the kernel calls your hook, not amain().- a
#[panic_handler]— required forno_std; it just loops, because there's nothing to panic to.
Output: netwatch_sdk_ebpf.o, an ELF object of BPF bytecode +
map definitions + that license section.
Built by scripts/build-ebpf.sh, then copied to
target/bpf/netwatch_sdk_ebpf.o so the userspace crate can swallow it whole
(next slide).
What actually happens at startup
ELF bytes → bpf() syscall → verify → JIT → attach
The .o is baked into the userspace binary with
include_bytes! — no file to ship or lose. At runtime aya walks it through the
kernel's loading protocol:
Two things worth internalizing:
- Everything is a file descriptor. The loaded program and the map are
fds the process owns. Close them and the kernel tears down the attachment — which is exactly how NetWatch'sDropdetaches cleanly. - Verification happens at load, once. The expensive proof is paid at
PROG_LOAD; after that the JITed code runs at native speed on every event.
This is also where it can fail in the field — missing CAP_BPF, a
kernel that rejects the probe, no BTF. NetWatch surfaces each as
EbpfStatus::Unavailable rather than crashing (Act V).
crates/common/src/lib.rs · shared by both sides
One #[repr(C)] struct, agreed byte-for-byte
The kernel program writes these bytes; userspace reads them out of the same ring-buffer slot. There's no serialization — it's raw memory. So the layout must be identical and predictable on both sides.
#[repr(C)]pins field order & C alignment — no Rust field reordering._pad0: [u8; 3]is explicit padding: it pushestgidto offset 4 so theu32is naturally aligned. Leave it out and the compiler inserts silent padding — fine until the two sides disagree.- All address fields stay in network byte order; userspace converts on decode.
Click a field to see exactly which bytes it owns. Then break it →
ConnectV4Event — 48 bytes
This struct is the seam between kernel and userspace. Get its layout wrong and every field downstream is garbage — silently.
ebpf-programs/src/main.rs · the program itself
Read what you need, then build the event
Every line is doing one of three jobs:
- Get the context —
ProbeContextis the kprobe's window onto the probed function's registers.ctx.arg(1)is the second argument,struct sockaddr *uaddr. - Ask the kernel via helpers — pid/tgid, comm, and a timestamp come from helpers, not from poking memory. Safe by construction.
- Carefully read kernel memory —
bpf_probe_read_kernelcopies the destination port and address out ofuaddrat thesockaddr_inoffsets, fault-safe.
Notice saddr and sport are hardcoded
to 0. That's not laziness — it's a fact about when this code runs. Which is
the whole next slide.
Why read uaddr and not the socket? Because of a bug that made eBPF
attribution silently return nothing — issue #38. →
When you read decides what you see
At kprobe entry, the socket isn't filled in yet
A kprobe on tcp_v4_connect fires at
function entry — before its body runs. The earlier version read the socket's own fields
(skc_daddr, skc_dport) there and got all
zeros: the kernel only populates them later, during routing. Every event was discarded; attribution
silently never worked. Scrub the timeline and watch what's readable when.
Execution of tcp_v4_connect(sk, uaddr, len)
The fix: read the destination from uaddr (valid at entry, copied in by the
syscall layer), and accept that source addr/port aren't known yet → key attribution on
(daddr, dport).
Why the reads come before the reserve
A reserved slot is a reference you must always release
Look back at the program's ordering: it reads every field first, and only then reserves the ring-buffer slot. That order isn't taste — the verifier requires it.
Once you call reserve() you hold a
reference. The verifier proves that on every path to BPF_EXIT, that reference
is released (submit or discard). An early
return after a reserve — e.g. a read that failed — is a leak, and the program is
rejected. So: do everything fallible before you reserve.
Flip the toggle in the stepper on "The verifier" (slide 7) to watch the real rejection fire. Here's the rule in the source's own words →
The verifier turns a whole class of resource-leak bugs into a compile-time "no." You either structure the code correctly or it never loads.
256 KiB of shared, ordered, lossy-by-design queue
reserve → write → submit, with backpressure
The ring buffer is the conveyor belt from kernel to userspace. The program
reserve()s a slot, write()s the event into it, and
submit(0)s — the 0 flag meaning "wake a polling
consumer."
- Sized for bursts — 256 KiB absorbs ~5k connect/sec for a few hundred ms if userspace stalls.
- Lossy, not corrupting — when it's full,
reserve()returnsNoneand the event is dropped. Never a torn write. - Ordered & multi-producer — every CPU can submit; the consumer sees a single ordered stream.
Drive the producer and consumer and watch it fill, wake, and drop.
EVENTS: RingBuf (capacity 16 shown)
The submit(0) wake is a latency/throughput knob:
BPF_RB_NO_WAKEUP trades immediacy for fewer wakeups once you know your consumer.
src/ebpf/source.rs · EventSource
Load the object, attach the probe, drain to a channel
Three moves, mirroring the kernel side:
- Embed, don't ship.
include_bytes!folds the.ointo the executable — nothing to install or path-resolve at runtime. - Attach by name. aya finds the
tcp_v4_connectprogram in the object, loads it (verify + JIT), and attaches the kprobe. - Own the lifecycle via
fds. TheBpfhandle owns the program and map; itsDropcloses the fds and the kernel detaches the probe. No explicit teardown.
Decoding flips the network-byte-order fields back and hands a typed
EbpfEvent to the rest of NetWatch over an mpsc channel.
On non-Linux, or without the ebpf feature, this whole path is a stub
returning UnsupportedPlatform — every other OS still compiles.
The whole vertical slice, one event
From connect() to a process name on screen
Everything so far, assembled. A browser opens a connection; seven steps later NetWatch's Connections tab labels it with the owning process — something packet capture alone could never do. Step through the real data as it transforms.
chrome → 142.250.72.4:443
The key insight made physical: attribution is keyed on (daddr, dport)
because that's the only stable identity available at kprobe entry. The #38 fix and this final label are the
same decision.
Privilege you earn, then give back
Three guarantees stack up to "safe in an end-user tool"
Running code in the kernel sounds reckless for a TUI people install. It isn't, because the risky parts are proven done before the program runs and the capability is dropped right after load.
- The verifier already proved termination, memory safety, and no leaks — a panic or hang isn't on the table.
- The GPL gate keeps you to audited helpers; you can't call arbitrary kernel code.
- The sandbox drops
CAP_BPF/CAP_PERFMON/CAP_NET_RAWthe instant load+attach finish — the program keeps running, but nothing can load another one.
This is the same "earn privilege, then drop it" arc as packet capture in NetWatch's sandbox — eBPF just joins the list of things opened before the drop.
What the model costs you
The constraints are the price of the safety
Kprobes are unstable contracts
You're hooking an internal function. Its signature or existence can change between kernels — which is why a tracepoint is preferred when one covers your event.
512-byte stack
Tiny. Large structs go in a map (or per-CPU scratch), not on the
stack. The ring-buffer reserve() pattern also keeps the event out of stack space.
No unbounded loops
The verifier must prove termination. Bounded loops only; anything it can't bound, it rejects. Some "obvious" code simply won't load.
Helper / feature availability
Ring buffers need kernel ≥ 5.8; some helpers are newer still. Older kernels mean a different code path or a graceful "no."
It can fail at attach
Missing CAP_BPF, no BTF, a kernel
that refuses the probe. NetWatch reports these as state, never a crash →
Verifier errors are cryptic
"Invalid ELF header" for a missing license; "reference leak" for a misordered reserve. Half of eBPF skill is reading these. This deck's war story is one such scar.
enum EbpfStatus { Active · Unavailable(reason) · NotCompiled } — every
failure mode is a UI state, so the tool degrades instead of dying. Off by default; you opt in with
--features ebpf.
Where this goes next — and where you come in
One kprobe today; full attribution tomorrow
Phase 1 is a single IPv4 TCP kprobe. The roadmap's eBPF Phase 2 widens it until NetWatch can decrypt a QUIC flow and name the process that owns it — something nothing else in the category does end to end.
tcp_v6_connect— same story, IPv6.- UDP send path — so QUIC (which NetWatch already decrypts) gets per-process attribution, not just
/procguesses. inet_sock_set_state— catch flows by state transition, including short-lived ones the connect-entry probe can miss.
Every one of these is the same question this deck has been answering, asked again:
Adding a hook means reading the same kernel networking source a netdev contributor reads — so this is also a real on-ramp into upstream kernel work. The #38 timeline is the kind of question you'll keep answering.
The whole map, once more
From first principles to a process name on screen
The model Act I
event → hook → verified program → map → userspace. A kprobe at the one moment the kernel knows who owns a socket.
The machine Act II
A tiny VM, maps as shared memory, and a verifier that proves termination, memory safety, and no leaks before anything runs.
The toolchain Act III
aya keeps kernel and userspace in
one Rust build; no_std object → bpf() → verify → JIT →
attach, all owned by fds.
The real program Act IV
A #[repr(C)]
ABI, reads before the reserve, the #38 timing bug, and the ring buffer carrying one event to userspace.
Safety Act V
Proven safe, GPL-gated, capability dropped after load. Failure is a UI state, never a crash.
Read the source
crates/ebpf-programs/src/main.rs ·
crates/common/src/lib.rs · src/ebpf/source.rs. ~250 lines,
now legible end to end.
Press O for the full map · ← to revisit any simulation. Next stop: add a hook (Act V) — or pick a BPF selftest and send your first kernel patch.