Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce a fuzz testing web interface #20958

Merged
merged 34 commits into from
Aug 7, 2024
Merged

introduce a fuzz testing web interface #20958

merged 34 commits into from
Aug 7, 2024

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Aug 6, 2024

Makes it so that when you pass --fuzz to zig build, the build runner starts listening on a port (configurable with --port) and serves a fuzzing web interface.

The web interface shows source code, sharing the same Zig code from Autodocs, displaying points of interest inline as red or green dots that indicate code coverage. These points correspond to the PC addresses that LLVM selects with the sancov pass by inserting an inline 8bit counter at every control flow graph edge.

The data is live-updating via the WebSocket standard.

closes #20812

#352 remains open because the system is not yet integrated with unit testing and other kinds of code coverage combination. However this is obviously a step towards completing that issue.

However, after this PR is merged, I think it will be quite a fun and rewarding area to contribute to!

Status

Just to be clear, while there is a fun, flashy demo here, I would describe the fuzzing capabilities of Zig as still alpha quality. That is, feel free to give it a try if you want to get involved in the development efforts of fuzzing. However, deriving any amount of real value out of fuzzing isn't really possible yet due to some key enhancements remaining to do:

You can see there are quite a few follow-up tasks left to do in that dedicated section below.

Demo

2024-08-05.18-11-01.mp4

Screenshots

image

image

Follow-Up Work

As a reminder, the main motivation for tackling fuzzing right now is that fuzzing is an essential tool in ensuring incremental compilation is robust and therefore can be enabled by default. The yak stack looks like this:

  1. Release 1.0, which depends on
  2. Finishing the language, which depends on
  3. Implementing incremental compilation because it might affect language decisions, which depends on
  4. Fuzz testing, which depends on
  5. A way to visualize the effectiveness of the fuzzer's algorithm, which is this PR.

@andrewrk andrewrk added zig build system std.Build, the build runner, `zig build` subcommand, package management release notes This PR should be mentioned in the release notes. fuzzing labels Aug 6, 2024
@deflock
Copy link

deflock commented Aug 6, 2024

Isn't --port too generic and should be --fuzz-port? 🤔 Or it'll be used not just for fuzzing?

@andrewrk
Copy link
Member Author

andrewrk commented Aug 6, 2024

Isn't --port too generic and should be --fuzz-port? 🤔 Or it'll be used not just for fuzzing?

Keen eye. You have noticed my plan to eventually make this the Build System Interface rather than only the fuzzer interface.

Actually, I should change it to --listen= to match the other equivalent CLI options.

When a unique run is encountered, track it in a bit set memory-mapped
into the fuzz directory so it can be observed by other processes, even
while the fuzzer is running.
* new .zig-cache subdirectory: 'v'
  - stores coverage information with filename of hash of PCs that want
    coverage. This hash is a hex encoding of the 64-bit coverage ID.
* build runner
  * fixed bug in file system inputs when a compile step has an
    overridden zig_lib_dir field set.
  * set some std lib options optimized for the build runner
    - no side channel mitigations
    - no Transport Layer Security
    - no crypto fork safety
  * add a --port CLI arg for choosing the port the fuzzing web interface
    listens on. it defaults to choosing a random open port.
  * introduce a web server, and serve a basic single page application
    - shares wasm code with autodocs
    - assets are created live on request, for convenient development
      experience. main.wasm is properly cached if nothing changes.
    - sources.tar comes from file system inputs (introduced with the
      `--watch` feature)
  * receives coverage ID from test runner and sends it on a thread-safe
    queue to the WebServer.
* test runner
  - takes a zig cache directory argument now, for where to put coverage
    information.
  - sends coverage ID to parent process
* fuzzer
  - puts its logs (in debug mode) in .zig-cache/tmp/libfuzzer.log
  - computes coverage_id and makes it available with
    `fuzzer_coverage_id` exported function.
  - the memory-mapped coverage file is now namespaced by the coverage id
    in hex encoding, in `.zig-cache/v`
* tokenizer
  - add a fuzz test to check that several properties are upheld
with debug info resolved.

begin efforts of providing `std.debug.Info`, a cross-platform
abstraction for loading debug information into an in-memory format that
supports queries such as "what is the source location of this virtual
memory address?"

Unlike `std.debug.SelfInfo`, this API does not assume the debug
information in question happens to match the host CPU architecture, OS,
or other target properties.
* std.debug.Dwarf: add `sortCompileUnits` along with a field to track
  the state for the purpose of assertions and correct API usage.
  This makes batch lookups faster.
  - in the future, findCompileUnit should be enhanced to rely on sorted
    compile units as well.
* implement `std.debug.Dwarf.resolveSourceLocations` as well as
  `std.debug.Info.resolveSourceLocations`. It's still pretty slow, since
  it calls getLineNumberInfo for each array element, repeating a lot of
  work unnecessarily.
* integrate these APIs with `std.Progress` to understand what is taking
  so long.

The output I'm seeing from this tool shows a lot of missing source
locations. In particular, the main area of interest is missing for my
tokenizer fuzzing example.
it does not need to be deprecated
yields a 60x speedup for resolveSourceLocations in debug builds
* libfuzzer: close file after mmap
* fuzzer/main.js: connect with EventSource and debug dump the messages.
  currently this prints how many fuzzer runs have been attempted to
  console.log.
* extract some `std.debug.Info` logic into `std.debug.Coverage`.
  Prepares for consolidation across multiple different executables which
  share source files, and makes it possible to send all the
  PC/SourceLocation mapping data with 4 memcpy'd arrays.
* std.Build.Fuzz:
  - spawn a thread to watch the message queue and signal event
    subscribers.
  - track coverage map data
  - respond to /events URL with EventSource messages on a timer
prevents unnecessary compilation errors on wasm32-freestanding
helps the serialization use case
* libfuzzer: track unique runs instead of deduplicated runs
  - easier for consumers to notice when to recheck the covered bits.
* move common definitions to `std.Build.Fuzz.abi`.

build runner sends all the information needed to fuzzer web interface
client needed in order to display inline coverage information along with
source code.
so you can have somewhere to start browsing
This will help scroll the point of interest into view
because the wasm code needs to string match against debug information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuzzing release notes This PR should be mentioned in the release notes. zig build system std.Build, the build runner, `zig build` subcommand, package management
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add a UI to fuzzing to report progress, code coverage, interesting inputs, and other stats
2 participants