-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce a fuzz testing web interface #20958
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
andrewrk
added
zig build system
std.Build, the build runner, `zig build` subcommand, package management
release notes
This PR should be mentioned in the release notes.
fuzzing
labels
Aug 6, 2024
Isn't |
Keen eye. You have noticed my plan to eventually make this the Build System Interface rather than only the fuzzer interface. Actually, I should change it to |
When a unique run is encountered, track it in a bit set memory-mapped into the fuzz directory so it can be observed by other processes, even while the fuzzer is running.
* new .zig-cache subdirectory: 'v' - stores coverage information with filename of hash of PCs that want coverage. This hash is a hex encoding of the 64-bit coverage ID. * build runner * fixed bug in file system inputs when a compile step has an overridden zig_lib_dir field set. * set some std lib options optimized for the build runner - no side channel mitigations - no Transport Layer Security - no crypto fork safety * add a --port CLI arg for choosing the port the fuzzing web interface listens on. it defaults to choosing a random open port. * introduce a web server, and serve a basic single page application - shares wasm code with autodocs - assets are created live on request, for convenient development experience. main.wasm is properly cached if nothing changes. - sources.tar comes from file system inputs (introduced with the `--watch` feature) * receives coverage ID from test runner and sends it on a thread-safe queue to the WebServer. * test runner - takes a zig cache directory argument now, for where to put coverage information. - sends coverage ID to parent process * fuzzer - puts its logs (in debug mode) in .zig-cache/tmp/libfuzzer.log - computes coverage_id and makes it available with `fuzzer_coverage_id` exported function. - the memory-mapped coverage file is now namespaced by the coverage id in hex encoding, in `.zig-cache/v` * tokenizer - add a fuzz test to check that several properties are upheld
with debug info resolved. begin efforts of providing `std.debug.Info`, a cross-platform abstraction for loading debug information into an in-memory format that supports queries such as "what is the source location of this virtual memory address?" Unlike `std.debug.SelfInfo`, this API does not assume the debug information in question happens to match the host CPU architecture, OS, or other target properties.
* std.debug.Dwarf: add `sortCompileUnits` along with a field to track the state for the purpose of assertions and correct API usage. This makes batch lookups faster. - in the future, findCompileUnit should be enhanced to rely on sorted compile units as well. * implement `std.debug.Dwarf.resolveSourceLocations` as well as `std.debug.Info.resolveSourceLocations`. It's still pretty slow, since it calls getLineNumberInfo for each array element, repeating a lot of work unnecessarily. * integrate these APIs with `std.Progress` to understand what is taking so long. The output I'm seeing from this tool shows a lot of missing source locations. In particular, the main area of interest is missing for my tokenizer fuzzing example.
it does not need to be deprecated
yields a 60x speedup for resolveSourceLocations in debug builds
it's too fast to need it now
* libfuzzer: close file after mmap * fuzzer/main.js: connect with EventSource and debug dump the messages. currently this prints how many fuzzer runs have been attempted to console.log. * extract some `std.debug.Info` logic into `std.debug.Coverage`. Prepares for consolidation across multiple different executables which share source files, and makes it possible to send all the PC/SourceLocation mapping data with 4 memcpy'd arrays. * std.Build.Fuzz: - spawn a thread to watch the message queue and signal event subscribers. - track coverage map data - respond to /events URL with EventSource messages on a timer
prevents unnecessary compilation errors on wasm32-freestanding
helps the serialization use case
* libfuzzer: track unique runs instead of deduplicated runs - easier for consumers to notice when to recheck the covered bits. * move common definitions to `std.Build.Fuzz.abi`. build runner sends all the information needed to fuzzer web interface client needed in order to display inline coverage information along with source code.
so you can have somewhere to start browsing
This will help scroll the point of interest into view
because the wasm code needs to string match against debug information
this fix bypasses the slice bounds, reading garbage data for up to the last 7 bits (which are technically supposed to be ignored). that's going to need to be fixed, let's fix that along with switching from byte elems to usize elems.
and make it still test compilation on non-Windows
This was referenced Aug 7, 2024
gwenzek
added a commit
to zml/docs
that referenced
this pull request
Sep 13, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
fuzzing
release notes
This PR should be mentioned in the release notes.
zig build system
std.Build, the build runner, `zig build` subcommand, package management
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Makes it so that when you pass
--fuzz
tozig build
, the build runner starts listening on a port (configurable with--port
) and serves a fuzzing web interface.The web interface shows source code, sharing the same Zig code from Autodocs, displaying points of interest inline as red or green dots that indicate code coverage. These points correspond to the PC addresses that LLVM selects with the sancov pass by inserting an inline 8bit counter at every control flow graph edge.
The data is live-updating via the WebSocket standard.
closes #20812
#352 remains open because the system is not yet integrated with unit testing and other kinds of code coverage combination. However this is obviously a step towards completing that issue.
However, after this PR is merged, I think it will be quite a fun and rewarding area to contribute to!
Status
Just to be clear, while there is a fun, flashy demo here, I would describe the fuzzing capabilities of Zig as still alpha quality. That is, feel free to give it a try if you want to get involved in the development efforts of fuzzing. However, deriving any amount of real value out of fuzzing isn't really possible yet due to some key enhancements remaining to do:
You can see there are quite a few follow-up tasks left to do in that dedicated section below.
Demo
2024-08-05.18-11-01.mp4
Screenshots
Follow-Up Work
-OReleaseSafe
breaks fuzzing entry points feature; incorrect already-sorted assumption #20990As a reminder, the main motivation for tackling fuzzing right now is that fuzzing is an essential tool in ensuring incremental compilation is robust and therefore can be enabled by default. The yak stack looks like this: