Welcome to the June 2022 report from the Reproducible Builds project. In these reports, we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.
Save the date!
Despite several delays, we are pleased to announce dates for our in-person summit this year:
The event will happen in/around Venice (Italy), and we intend to pick a venue reachable via the train station and an international airport. However, the precise venue will depend on the number of attendees.
Please see the announcement mail from Mattia Rizzolo, and do keep an eye on the mailing list for further announcements as it will hopefully include registration instructions.
News
David Wheeler filed an issue against the Rust programming language to report that builds are “not reproducible because full path to the source code is in the panic and debug strings”. Luckily, as one of the responses mentions: “the --remap-path-prefix
solves this problem and has been used to great effect in build systems that rely on reproducibility (Bazel, Nix) to work at all” and that “there are efforts to teach cargo about it here”.
The Python Security team announced that:
The
ctx
hosted project on PyPI was taken over via user account compromise and replaced with a malicious project which contained runtime code which collected the content ofos.environ.items()
when instantiating Ctx objects. The captured environment variables were sent as a base64 encoded query parameter to a Heroku application […]
As their announcement later goes onto state, version-pinning using “hash-checking mode” can prevent this attack, although this does depend on specific installations using this mode, rather than a prevention that can be applied systematically.
Developer vanitasvitae published an interesting and entertaining blog post detailing the blow-by-blow steps of debugging a reproducibility issue in PGPainless, a library which “aims to make using OpenPGP in Java projects as simple as possible”.
Whilst their in-depth research into the internals of the .jar
may have been unnecessary given that diffoscope would have identified the, it must be said that there is something to be said with occasionally delving into seemingly “low-level” details, as well describing any debugging process. Indeed, as vanitasvitae writes:
Yes, this would have spared me from 3h of debugging 😉 But I probably would also not have gone onto this little dive into the JAR/ZIP format, so in the end I’m not mad.
Kees Cook published a short and practical blog post detailing how he uses reproducibility properties to aid work to replace one-element arrays in the Linux kernel. Kees’ approach is based on the principle that if a (small) proposed change is considered equivalent by the compiler, then the generated output will be identical… but only if no other arbitrary or unrelated changes are introduced. Kees mentions the “fantastic” diffoscope tool, as well as various kernel-specific build options (eg. KBUILD_BUILD_TIMESTAMP
) in order to “prepare my build with the ‘known to disrupt code layout’ options disabled”.
Stefano Zacchiroli gave a presentation at GDR Sécurité Informatique based in part on a paper co-written with Chris Lamb titled Increasing the Integrity of Software Supply Chains. (Tweet)
Debian
In Debian in this month, 28 reviews of Debian packages were added, 35 were updated and 27 were removed this month adding to our knowledge about identified issues. Two issue types were added: nondeterministic_checksum_generated_by_coq
and nondetermistic_js_output_from_webpack
.
After Holger Levsen found hundreds of packages in the bookworm distribution that lack .buildinfo
files, he uploaded 404 source packages to the archive (with no meaningful source changes). Currently bookworm now shows only 8 packages without .buildinfo
files, and those 8 are fixed in unstable and should migrate shortly. By contrast, Debian unstable will always have packages without .buildinfo
files, as this is how they come through the NEW queue. However, as these packages were not built on the official build servers (ie. they were uploaded by the maintainer) they will never migrate to Debian testing. In the future, therefore, testing should never have packages without .buildinfo
files again.
Roland Clobus posted yet another in-depth status report about his progress making the Debian Live images build reproducibly to our mailing list. In this update, Roland mentions that “all major desktops build reproducibly with bullseye, bookworm and sid” but also goes on to outline the progress made with automated testing of the generated images using openQA.
GNU Guix
Vagrant Cascadian made a significant number of contributions to GNU Guix:
-
Submitted patches to fix reproducibility issues in keyutils and isl as well as reported two bugs affecting reproducibility testing […][…].
-
23 specific fixes related to reproducibility. [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]
-
Proposed setting
FORCE_SOURCE_DATE=1
in the environment of all builds in order to fix numerous timestamp issues in documentation generation tools. -
Identified reproducibility issues in the
maradns
package as it appears to embed a random prime number. (Patch) -
Responded in a thread to point out that GNU Guix already has the infrastructure in place to verify the reproducibility of downloaded substitutes for the vast majority of packages.
-
Lastly, Vagrant performed an evaluation of the unreproducible packages that remain in the distribution.
Elsewhere in GNU Guix, Ludovic Courtès published a paper in the journal The Art, Science, and Engineering of Programming called Building a Secure Software Supply Chain with GNU Guix:
This paper focuses on one research question: how can [Guix]((https://www.gnu.org/software/guix/) and similar systems allow users to securely update their software? […] Our main contribution is a model and tool to authenticate new Git revisions. We further show how, building on Git semantics, we build protections against downgrade attacks and related threats. We explain implementation choices. This work has been deployed in production two years ago, giving us insight on its actual use at scale every day. The Git checkout authentication at its core is applicable beyond the specific use case of Guix, and we think it could benefit to developer teams that use Git.
A full PDF of the text is available.
openSUSE
In the world of openSUSE, SUSE announced at SUSECon that they are preparing to meet SLSA level 4. (SLSA (Supply chain Levels for Software Artifacts) is a new industry-led standardisation effort that aims to protect the integrity of the software supply chain.)
However, at the time of writing, timestamps within RPM archives are not normalised, so bit-for-bit identical reproducible builds are not possible. Some in-toto provenance files published for SUSE’s SLE-15-SP4 as one result of the SLSA level 4 effort. Old binaries are not rebuilt, so only new builds (e.g. maintenance updates) have this metadata added.
Lastly, Bernhard M. Wiedemann posted his usual monthly openSUSE reproducible builds status report.
diffoscope
diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 215
, 216
and 217
to Debian unstable. Chris Lamb also made the following changes:
-
New features:
-
Bug fixes:
-
Output improvements:
- Don’t leak the (likely-temporary) pathname when comparing PDF documents. […]
-
Logging improvements:
- Update test fixtures for GNU readelf 2.38 (now in Debian unstable). […][…]
- Be more specific about the minimum required version of
readelf
(ie. binutils), as it appears that this ‘patch’ level version change resulted in a change of output, not the ‘minor’ version. […] - Use our
@skip_unless_tool_is_at_least
decorator (NB.at_least
) over@skip_if_tool_version_is
(NB.is
) to fix tests under Debian stable. […] - Emit a warning if/when we are handling a UNIX
TERM
signal. […]
-
Codebase improvements:
In addition, Edward Betts updated a broken link to the RSS on the diffoscope homepage and Vagrant Cascadian updated the diffoscope package in GNU Guix […][…][…].
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
build-compare
caused a regression for a few days.python-fasttext
(CPU-related issue).
-
Chris Lamb:
- #1012614 filed against
node-dommatrix
. - #1012766 filed against
rtpengine
. - #1012790 filed against
sphinxcontrib-mermaid
. - #1012792 filed against
yaru-theme
. - #1012836 filed against
mapproxy
(forwarded upstream). - #1013257 filed against
libxsmm
. - #1014041 filed against
yt-dlp
(forwarded upstream). - #891263 was filed against puppet in February 2018 and the patch was finally proposed for inclusion upstream.
- #1012614 filed against
Testing framework
The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
-
Holger Levsen:
- Add a package set for packages that use the R programming language […] as well as one for Rust […].
- Improve package set matching for Python […] and font-related […] packages.
- Install the
lz4
,lzop
andxz-utils
packages on all nodes in order to detect running kernels. […] - Improve the cleanup mechanisms when testing the reproducibility of Debian Live images. […][…]
- In the automated node health checks, deprioritise the “generic kernel warning”. […]
-
Roland Clobus (Debian Live image reproducibility):
- Add various maintenance jobs to the Jenkins view. […]
- Cleanup old workspaces after 24 hours. […]
- Cleanup temporary workspace and resulting directories. […]
- Implement a number of fixes and improvements around publishing files. […][…][…]
- Don’t attempt to preserve the file timestamps when copying artifacts. […]
And finally, node maintenance was also performed by Mattia Rizzolo […].
Mailing list and website
On our mailing list this month:
-
David Wheeler started a thread stating his desire that reproducible builds and GitBOM are able to work together simultaneously. David first describes the goals of both GitBOM and reproducibility, outlines the potential problems and even outlines a number of prospective solutions.
-
In a similar vein, David Wheeler also posted about the problems with Profile-Guided Optimisation (PGO) in relation to reproducible builds.
-
Roland Clobus “copied in” our mailing list with a question about whether enabling link-time optimisations (LTO) in Debian as a whole might cause reproducibility problems.
-
Mattia Rizzolo posted a request for assistance regarding the translations of our website.
Lastly, Chris Lamb updated the main Reproducible Builds website and documentation in a number of small ways, but primarily published an interview with Hans-Christoph Steiner of the F-Droid project. Chris Lamb also added a Coffeescript example for parsing and using the SOURCE_DATE_EPOCH
environment variable […]. In addition, Sebastian Crane very-helpfully updated the screenshot of salsa.debian.org’s “request access” button on the How to join the Salsa group. […]
Contact
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mailing list:
[email protected]