Skip to content

Latest commit

 

History

History

Fuzzers

Fuzzing WebURL

Fuzzing is a way of testing an API with generated data, and can help discover bugs in scenarios that developers wouldn't think to test. For more information on fuzzing, see google/fuzzing: why fuzz?.

WebURL uses Swift's built-in support for LLVM's libfuzzer to support these kinds of tests on our API. Note that the version of Swift included with Apple's SDK does not include this support. In order to fuzz on an Apple platform, a toolchain from swift.org must first be installed.

Currently, the following fuzzers are available:

  • url-parse-reparse

    Parses some bytes generated by the fuzzer (regardless if it is valid UTF-8), and if successful, re-parses the URL's serialized representation and checks that the result is exactly the same. This ensures that the parser behaves itself (and won't crash or attempt an out-of-bounds access, even if you literally give it random bytes to parse), and validates one of the goals of the URL living standard, that a successfully parsed URL will not change no matter how many times it is serialized to-/parsed from a string.

    To build and run this fuzzer, use the fuzz-parser script:

    ./fuzz-parser.sh -max_len=20 -dict=url.dict
    
  • foundation-to-web

    Parses some bytes generated by the fuzzer as a Swift String, then as a Foundation URL. If successful, the URL is converted to a WebURL and the results are checked for semantic equivalence. This has been incredibly helpful in developing Foundation interoperability, and fuzz-testing Foundation against WebURL's implementation.

    To build and run this fuzzer, use the fuzz-ftow script:

    ./fuzz-ftow.sh -jobs=4 -max_len=16 -dict=url.dict Corpora/foundation-to-web
    
  • web-to-foundation

    Parses some bytes generated by the fuzzer as a WebURL, and if successful, attempts to convert it to a Foundation URL. If conversion is also successful, the results are checked for semantic equivalence.

    To build and run this fuzzer, use the fuzz-wtof script:

    ./fuzz-wtof.sh -jobs=4 -max_len=16 -dict=url.dict Corpora/web-to-foundation
    
  • web-foundation-roundtrip

    Parses some bytes generated by the fuzzer as a WebURL, and encodes it for Foundation using the .encodedForFoundation property. The result if then converted to a Foundation URL. If successful, the Foundation URL must round-trip back to a WebURL, and that WebURL must be identical to the URL returned by .encodedForFoundation.

    To build and run this fuzzer, use the fuzz-foundation-roundtrip script:

    ./fuzz-foundation-roundtrip.sh -jobs=4 -max_len=16 -dict=url.dict Corpora/foundation-roundtrip
    

Any arguments provided to the fuzz-*.sh scripts will be forwarded to the fuzzer executable. Use the -help=1 argument for more information about the supported arguments.

Eventually, the goal is to also add fuzzers for the resolve function (parsing a relative URL against a base URL), and as many setters and mutating APIs as is practical. Ideally, they would be run continuously as the code changes to detect any bugs that may be introduced, but in practice they are each given ~12 hours of exercise every 4-6 weeks, or when particularly large code changes are made. That's good for a couple hundred million iterations, and if there are issues, it tends to find them fairly quickly.

Corpora

Fuzzing is greatly helped by the presence of a corpus - a collection of API inputs which exercise as many unique code paths as possible. The fuzzer is able to discover these inputs by itself and store them in a directory you specify. Storing the corpus allows you to stop and restart the fuzzer while keeping the coverage discovered by previous runs.

To specify where the fuzzer should store its corpus, supply a directory path as an argument. If the directory already contains a corpus, fuzzing will be resumed using that data. The following command generates a corpus at "Corpora/parse-reparse":

./fuzz-parser.sh -max_len=20 -dict=url.dict Corpora/parse-reparse

(Note: the contents of the "Corpora" directory are ignored by git, so you can safely store corpora there without worrying that you might commit them).

However, it is usually better not to start from nothing; it will take the fuzzer a long time to discover a good set of inputs. Generating a seed corpus allows you to give the fuzzer a bit of a head-start, and greatly reduces the time required to achieve good coverage.

The "Seeding" directory contains scripts which you can use to generate a seed corpus. You can edit these scripts, add some URL strings (possibly copied from the benchmark or test suites, or anything else you think of), and then run them to create a directory of sample inputs for the fuzzer to use when building its own corpus. For example:

./Seeding/generate_corpus_parse-reparse.swift

mkdir -p Corpora/parse-reparse
./fuzz-parser.sh -max_len=20 -dict=url.dict Corpora/parse-reparse Seeding/parse-reparse

The first line will invoke a swift script, which generates a corpus using the URL strings contained in the script, and stores the results in "Seeding/parse-reparse". We then supply that corpus to the fuzzer after its own corpus directory; it will analyse the seed inputs, and any interesting code-paths it discovers will inform how it creates/adds to the corpus in the "Corpora/parse-reparse" directory.