introduce a fuzz testing web interface #20958

andrewrk · 2024-08-06T03:19:01Z

Makes it so that when you pass --fuzz to zig build, the build runner starts listening on a port (configurable with --port) and serves a fuzzing web interface.

The web interface shows source code, sharing the same Zig code from Autodocs, displaying points of interest inline as red or green dots that indicate code coverage. These points correspond to the PC addresses that LLVM selects with the sancov pass by inserting an inline 8bit counter at every control flow graph edge.

The data is live-updating via the WebSocket standard.

closes #20812

#352 remains open because the system is not yet integrated with unit testing and other kinds of code coverage combination. However this is obviously a step towards completing that issue.

However, after this PR is merged, I think it will be quite a fun and rewarding area to contribute to!

Status

Just to be clear, while there is a fun, flashy demo here, I would describe the fuzzing capabilities of Zig as still alpha quality. That is, feel free to give it a try if you want to get involved in the development efforts of fuzzing. However, deriving any amount of real value out of fuzzing isn't really possible yet due to some key enhancements remaining to do:

You can see there are quite a few follow-up tasks left to do in that dedicated section below.

Demo

2024-08-05.18-11-01.mp4

Screenshots

Follow-Up Work

separate the build runner process from the configure process #20981
make the source hyperlinks work in the fuzzing web interface (currently they construct autodoc URLs) #20982
fuzzer web interface: add a UI element showing a list of source files sorted by total uncovered points of interest #20983
fuzzer web interface: put the source view into a scrolling box so the main UI doesn't go offscreen #20984
write fuzz inputs to a shared memory region before running a task #20803
fuzz testing: support macOS #20986
fuzz testing: support Windows #20987
fuzzing: support more than one executable at once #20988
debug info audit - many virtual memory addresses have strange source locations #20989
-OReleaseSafe breaks fuzzing entry points feature; incorrect already-sorted assumption #20990
unreachable code paths need to be excluded from having coverage instrumentation #20992
more optimized and correct management of 8-bit PC counters #20994
fuzzer web interface: play a little jingle when it finds a bug! #20995
include the concept of a set of modules of interest in coverage
- you still want instrumentation for the purposes of fuzzing but you only want to find bugs in some code
fuzzer web interface: ability to scroll to source locations that newly gain coverage #20996
There should be a way to gain access to the interesting inputs while the fuzzing process is still ongoing.
More stats
support code coverage when testing #352
- solvable by running each unit test once after rebuilding in fuzz test mode in order to notice the coverage

As a reminder, the main motivation for tackling fuzzing right now is that fuzzing is an essential tool in ensuring incremental compilation is robust and therefore can be enabled by default. The yak stack looks like this:

Release 1.0, which depends on
Finishing the language, which depends on
Implementing incremental compilation because it might affect language decisions, which depends on
Fuzz testing, which depends on
A way to visualize the effectiveness of the fuzzer's algorithm, which is this PR.

deflock · 2024-08-06T17:28:40Z

Isn't --port too generic and should be --fuzz-port? 🤔 Or it'll be used not just for fuzzing?

andrewrk · 2024-08-06T19:19:59Z

Isn't --port too generic and should be --fuzz-port? 🤔 Or it'll be used not just for fuzzing?

Keen eye. You have noticed my plan to eventually make this the Build System Interface rather than only the fuzzer interface.

Actually, I should change it to --listen= to match the other equivalent CLI options.

When a unique run is encountered, track it in a bit set memory-mapped into the fuzz directory so it can be observed by other processes, even while the fuzzer is running.

* new .zig-cache subdirectory: 'v' - stores coverage information with filename of hash of PCs that want coverage. This hash is a hex encoding of the 64-bit coverage ID. * build runner * fixed bug in file system inputs when a compile step has an overridden zig_lib_dir field set. * set some std lib options optimized for the build runner - no side channel mitigations - no Transport Layer Security - no crypto fork safety * add a --port CLI arg for choosing the port the fuzzing web interface listens on. it defaults to choosing a random open port. * introduce a web server, and serve a basic single page application - shares wasm code with autodocs - assets are created live on request, for convenient development experience. main.wasm is properly cached if nothing changes. - sources.tar comes from file system inputs (introduced with the `--watch` feature) * receives coverage ID from test runner and sends it on a thread-safe queue to the WebServer. * test runner - takes a zig cache directory argument now, for where to put coverage information. - sends coverage ID to parent process * fuzzer - puts its logs (in debug mode) in .zig-cache/tmp/libfuzzer.log - computes coverage_id and makes it available with `fuzzer_coverage_id` exported function. - the memory-mapped coverage file is now namespaced by the coverage id in hex encoding, in `.zig-cache/v` * tokenizer - add a fuzz test to check that several properties are upheld

with debug info resolved. begin efforts of providing `std.debug.Info`, a cross-platform abstraction for loading debug information into an in-memory format that supports queries such as "what is the source location of this virtual memory address?" Unlike `std.debug.SelfInfo`, this API does not assume the debug information in question happens to match the host CPU architecture, OS, or other target properties.

* std.debug.Dwarf: add `sortCompileUnits` along with a field to track the state for the purpose of assertions and correct API usage. This makes batch lookups faster. - in the future, findCompileUnit should be enhanced to rely on sorted compile units as well. * implement `std.debug.Dwarf.resolveSourceLocations` as well as `std.debug.Info.resolveSourceLocations`. It's still pretty slow, since it calls getLineNumberInfo for each array element, repeating a lot of work unnecessarily. * integrate these APIs with `std.Progress` to understand what is taking so long. The output I'm seeing from this tool shows a lot of missing source locations. In particular, the main area of interest is missing for my tokenizer fuzzing example.

it does not need to be deprecated

yields a 60x speedup for resolveSourceLocations in debug builds

it's too fast to need it now

* libfuzzer: close file after mmap * fuzzer/main.js: connect with EventSource and debug dump the messages. currently this prints how many fuzzer runs have been attempted to console.log. * extract some `std.debug.Info` logic into `std.debug.Coverage`. Prepares for consolidation across multiple different executables which share source files, and makes it possible to send all the PC/SourceLocation mapping data with 4 memcpy'd arrays. * std.Build.Fuzz: - spawn a thread to watch the message queue and signal event subscribers. - track coverage map data - respond to /events URL with EventSource messages on a timer

prevents unnecessary compilation errors on wasm32-freestanding

helps the serialization use case

* libfuzzer: track unique runs instead of deduplicated runs - easier for consumers to notice when to recheck the covered bits. * move common definitions to `std.Build.Fuzz.abi`. build runner sends all the information needed to fuzzer web interface client needed in order to display inline coverage information along with source code.

so you can have somewhere to start browsing

This will help scroll the point of interest into view

because the wasm code needs to string match against debug information

this fix bypasses the slice bounds, reading garbage data for up to the last 7 bits (which are technically supposed to be ignored). that's going to need to be fixed, let's fix that along with switching from byte elems to usize elems.

and make it still test compilation on non-Windows

andrewrk added zig build system std.Build, the build runner, `zig build` subcommand, package management release notes This PR should be mentioned in the release notes. fuzzing labels Aug 6, 2024

andrewrk force-pushed the fuzz branch from 311f8f5 to 1b91ea0 Compare August 6, 2024 23:34

andrewrk added 24 commits August 7, 2024 00:48

fuzzer: track code coverage from all runs

97643c1

When a unique run is encountered, track it in a bit set memory-mapped into the fuzz directory so it can be observed by other processes, even while the fuzzer is running.

fuzzer: log errors and move deduplicated runs to shared mem

ffc050e

fuzzer: share zig to html rendering with autodocs

107b272

std.debug.FixedBufferReader is fine

66954e8

it does not need to be deprecated

std.debug.Dwarf: precompute .debug_line table

1792258

yields a 60x speedup for resolveSourceLocations in debug builds

std.Debug.Info: remove std.Progress integration

c2ab461

it's too fast to need it now

std.debug.Info.resolveSourceLocations: O(N) implementation

53aa9d7

README: update how std lib docs are found in a release build

5f92a03

std.posix: add some more void bits

d36c182

prevents unnecessary compilation errors on wasm32-freestanding

add std.http.WebSocket

b9fd0ee

std.debug.Coverage: use extern structs

2292563

helps the serialization use case

fuzzer web ui: render stats

f56d113

fuzzer web ui: add coverage stat

6e6164f

fuzzer web ui: introduce entry points

e64a009

so you can have somewhere to start browsing

fuzzing web ui: make entry point links clickable

db69641

fuzzer web UI: navigate by source location index

ef4c219

This will help scroll the point of interest into view

fuzzer web UI: annotated PCs in source view

3d48602

fuzzer web UI: render PCs with red or green depending on coverage

38227e9

fuzzer web ui: resolve cwd in sources.tar

bfc2ee0

because the wasm code needs to string match against debug information

andrewrk added 10 commits August 7, 2024 00:48

dump-cov: show seen PCs

895fa87

fuzzer web ui: fail scrolling into view gracefully

1484f17

wasm zig source rendering: fix annotation location off-by-one

5f5a7b5

update branch for latest std.sort changes

8dae629

std.debug: fix compile errors on windows and macos

40edd11

Compilation: fix not showing sub-errors for autodocs

ff503ed

Compilation: fix -femit-docs

904fcda

build runner: --fuzz not yet supported on Windows

2a651ea

update coff_dwarf standalone test to new API

d721d9a

and make it still test compilation on non-Windows

andrewrk force-pushed the fuzz branch from 1b91ea0 to d721d9a Compare August 7, 2024 07:48

andrewrk merged commit 0e99f51 into master Aug 7, 2024
10 checks passed

andrewrk deleted the fuzz branch August 7, 2024 18:55

gwenzek added a commit to zml/docs that referenced this pull request Sep 13, 2024

Vendor zig Wasm renderer: ziglang/zig#20958

57f6ebf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce a fuzz testing web interface #20958

introduce a fuzz testing web interface #20958

andrewrk commented Aug 6, 2024 •

edited

Loading

deflock commented Aug 6, 2024

andrewrk commented Aug 6, 2024

introduce a fuzz testing web interface #20958

introduce a fuzz testing web interface #20958

Conversation

andrewrk commented Aug 6, 2024 • edited Loading

Status

Demo

Screenshots

Follow-Up Work

deflock commented Aug 6, 2024

andrewrk commented Aug 6, 2024

andrewrk commented Aug 6, 2024 •

edited

Loading