Welcome to the September 2024 report from the Reproducible Builds project!
Our reports attempt to outline what we’ve been up to over the past month, highlighting news items from elsewhere in tech where they are related. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
Table of contents:
- New binsider tool to analyse ELF binaries
- Unreproducibility of GHC Haskell compiler “95% fixed”
- Mailing list summary
- Towards a 100% bit-for-bit reproducible OS…
- Two new reproducibility-related academic papers
- Distribution work
- diffoscope
- Other software development
- Android toolchain core count issue reported
- New Gradle plugin for reproducibility
- Website updates
- Upstream patches
- Reproducibility testing framework
New binsider
tool to analyse ELF binaries
Reproducible Builds developer Orhun Parmaksız has announced a fantastic new tool to analyse the contents of ELF binaries. According to the project’s README
page:
Binsider can perform static and dynamic analysis, inspect strings, examine linked libraries, and perform hexdumps, all within a user-friendly terminal user interface!
More information about Binsider’s features and how it works can be found within Binsider’s documentation pages.
Unreproducibility of GHC Haskell compiler “95% fixed”
A seven-year-old bug about the nondeterminism of object code generated by the Glasgow Haskell Compiler (GHC) received a recent update, consisting of Rodrigo Mesquita noting that the issue is:
95% fixed by [merge request] !12680 when
-fobject-determinism
is enabled. […]
The linked merge request has since been merged, and Rodrigo goes on to say that:
After that patch is merged, there are some rarer bugs in both interface file determinism (eg.
#25170
) and in object determinism (eg.#25269
) that need to be taken care of, but the great majority of the work needed to get there should have been merged already. When merged, I think we should close this one in favour of the more specific determinism issues like the two linked above.
Mailing list summary
On our mailing list this month:
-
Fay Stegerman let everyone know that she started a thread on the Fediverse about the problems caused by unreproducible
zlib
/deflate
compression in.zip
and.apk
files and later followed up with the results of her subsequent investigation. -
Long-time developer kpcyrd wrote that “there has been a recent public discussion on the Arch Linux GitLab [instance] about the challenges and possible opportunities for making the Linux kernel package reproducible”, all relating to the
CONFIG_MODULE_SIG
flag. […] -
Bernhard M. Wiedemann followed-up to an in-person conversation at our recent Hamburg 2024 summit on the potential presence for Reproducible Builds in recognised standards. […]
-
Fay Stegerman also wrote about her worry about the “possible repercussions for RB tooling of Debian migrating from
zlib
tozlib-ng
” as reproducibility requires identical compressed data streams. […] -
Martin Monperrus wrote the list announcing the latest release of
maven-lockfile
that is designed aid “building Maven projects with integrity”. […] -
Lastly, Bernhard M. Wiedemann wrote about potential role of reproducible builds in combatting silent data corruption, as detailed in a recent Tweet and scholarly paper on faulty CPU cores. […]
Towards a 100% bit-for-bit reproducible OS…
Bernhard M. Wiedemann began writing on journey towards a 100% bit-for-bit reproducible operating system on the openSUSE wiki:
This is a report of Part 1 of my journey: building 100% bit-reproducible packages for every package that makes up [openSUSE’s]
minimalVM
image. This target was chosen as the smallest useful result/artifact. The larger package-sets get, the more disk-space and build-power is required to build/verify all of them.
This work was sponsored by NLnet’s NGI Zero fund.
Two new reproducibility-related academic papers
Marvin Strangfeld published his bachelor thesis, “Reproducibility of Computational Environments for Software Development” from RWTH Aachen University. The author offers a more precise theoretical definition of computational environments compared to previous definitions, which can be applied to describe real-world computational environments. Additionally, Marvin provide a definition of reproducibility in computational environments, enabling discussions about the extent to which an environment can be made reproducible. The thesis is available to browse or download in PDF format.
In addition, Shenyu Zheng, Bram Adams and Ahmed E. Hassan of Queen’s University, ON, Canada have published an article on “hermeticity” in Bazel-based build systems:
A hermetic build system manages its own build dependencies, isolated from the host file system, thereby securing the build process. Although, in recent years, new artifact-based build technologies like Bazel offer build hermeticity as a core functionality, no empirical study has evaluated how effectively these new build technologies achieve build hermeticity. This paper studies 2,439 non-hermetic build dependency packages of 70 Bazel-using open-source projects by analyzing 150 million Linux system file calls collected in their build processes. We found that none of the studied projects has a completely hermetic build process, largely due to the use of non-hermetic top-level toolchains. […]
Distribution work
In Debian this month, 14 reviews of Debian packages were added, 12 were updated and 20 were removed, all adding to our knowledge about identified issues. A number of issue types were updated as well. […][…]
In addition, Holger opened 4 bugs against the debrebuild
component of the devscripts suite of tools. In particular:
#1081047
: Fails to download.dsc
file.#1081048
: Does not work with a proxy.#1081050
: Fails to create adebrebuild.tar
.#1081839
: Fails withE: mmdebstrap failed to run
error.
Last month, an issue was filed to update the Salsa CI pipeline (used by 1,000s of Debian packages) to no longer test for reproducibility with reprotest’s build_path
variation. Holger Levsen provided a rationale for this change in the issue, which has already been made to the tests being performed by tests.reproducible-builds.org. This month, this issue was closed by Santiago R. R., nicely explaining that build path variation is no longer the default, and, if desired, how developers may enable it again.
In openSUSE news, Bernhard M. Wiedemann published another report for that distribution.
diffoscope
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading version 278
to Debian:
-
New features:
- Add a helpful contextual message to the output if comparing Debian
.orig
tarballs within.dsc
files without the ability to “fuzzy-match” away the leading directory. […]
- Add a helpful contextual message to the output if comparing Debian
-
Bug fixes:
-
Misc:
For trydiffoscope, the command-line client for the web-based version of diffoscope, Chris Lamb also:
- Added an explicit
python3-setuptools
dependency. (#1080825) - Bumped the
Standards-Version
to 4.7.0. […]
Other software development
disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into system calls to reliably flush out reproducibility issues. This month, version 0.5.11-4
was uploaded to Debian unstable by Holger Levsen making the following changes:
- Replace build-dependency on the obsolete
pkg-config
package with one onpkgconf
, following a Lintian check. […] - Bump
Standards-Version
field to 4.7.0, with no related changes needed. […]
In addition, reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, version 0.7.28
was uploaded to Debian unstable by Holger Levsen including a change by Jelle van der Waa to move away from the pipes
Python module to shlex
, as the former will be removed in Python version 3.13 […].
Android toolchain core count issue reported
Fay Stegerman reported an issue with the Android toolchain where a part of the build system generates a different classes.dex
file (and thus a different .apk
) depending on the number of cores available during the build, thereby breaking Reproducible Builds:
We’ve rebuilt [tag
v3.6.1
] multiple times (each time in a fresh container): with 2, 4, 6, 8, and 16 cores available, respectively:
- With 2 and 4 cores we always get an unsigned APK with SHA-256
14763d682c9286ef…
.- With 6, 8, and 16 cores we get an unsigned APK with SHA-256
35324ba4c492760…
instead.
New Gradle plugin for reproducibility
A new plugin for the Gradle build tool for Java has been released. This easily-enabled plugin results in:
reproducibility settings [being] applied to some of Gradle’s built-in tasks that should really be the default. Compatible with Java 8 and Gradle 8.3 or later.
Website updates
There were a rather substantial number of improvements made to our website this month, including:
-
Chris Lamb:
- Attempt to use GitLab CI to ‘artifact’ the website; hopefully useful for testing branches. […]
- Correct the linting rule whilst building the website. […]
- Make a number of small changes to Kees’ post written by Vagrant. […][…][…]
- Add the Civil Infrastructure Platform to the Projects page. […]
- Miscellaneous administration of misfiled images. […][…]
-
Evangelos Tzaras made a huge number of changes related to the recent Hamburg 2024 summit […][…][…][…][…] as well as proposed an infographic about which question Reproducible Builds is trying to answer.
-
Holger Levsen added his two presentations (Reproducible Builds: The First Eleven Years and Preserving *other* build artifacts) to the website. […]
-
Jelle van der Waa completely modernised the System Images documentation, noting that “a lot has changed since 2017(!);
ext4
,erofs
andFAT
filesystems can now be made reproducible”. […] -
Developer RyanSquared replaced the continuous integration test link for Arch Linux on our Projects page with an external instance […][…] as well as updated the documentation to reflect the dependencies required to build the website […].
-
Vagrant Cascadian pushed a lengthy interview with Linux developer Kees Cook. […][…][…][…]
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
agama-integration-tests
(random)contrast
(FTBFS-nocheck)cpython
(FTBFS-2038)crash
(parallelism, race)ghostscript
(toolchain date)glycin-loaders
(FTBFS-j1
)gstreamer-plugins-rs
(date, other)kernel-doc/Sphinx
(toolchain bug, parallelism/race)kernel
(parallelism in BTF)libcamera
(random key)libgtop
(uname -r
)libsamplerate
(random temporary directory)lua-luarepl
(FTBFS)meson
(toolchain)netty
(modification time in.a
)nvidia-persistenced
(date)nvidia-xconfig
(date-related issue)obs-build
(build-tooling corruption)perl
(Perl records kernel version)pinentry
(make efl droppable)python-PyGithub
(FTBFS 2024-11-25)python-Sphinx
(parallelism/race)python-chroma-hnswlib
(CPU)python-libcst
python-pygraphviz
(random timing)python312
(.pyc
embeds modification time)python312
(drop.pyc
from documentation time)scap-security-guide
(date)seahorse
(parallelism)subversion
(minor Java.jar
modification times)xen/acpica
(date-related issue in toolchain)xmvn
(random)
- Fridrich Strba:
-
Chris Lamb:
- #1082702 filed against
magic-wormhole-transit-relay
. - #1082706 filed against
python-sphobjinv
. - #1082707 filed against
lomiri-content-hub
. - #1082796 filed against
python-mt-940
. - #1082806 filed against
tree-puzzle
. - #1083053 filed against
muon-meson
.
- #1082702 filed against
-
James Addison:
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In September, a number of changes were made by Holger Levsen, including:
-
Debian-related changes:
- Upgrade the
osuosl4
node to Debian trixie in anticipation of runningdebrebuild
andrebuilderd
there. […][…][…] - Temporarily mark the
osuosl4
node as offline due to ongoingxfs_repair
filesystem maintenance. […][…] - Do not warn about (very old) broken nodes. […]
- Add the
risc64
architecture to the multiarch version skew tests for Debian trixie and sid. […][…][…] - Mark the
virt{32,64}b
nodes as down. […]
- Upgrade the
-
Misc changes:
In addition, Vagrant Cascadian recorded a disk failure for the virt32b
and virt64b
nodes […], performed some maintenance of the cbxi4a
node […][…] and marked most armhf
architecture systems as being back online.
Finally, If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Mastodon: @[email protected]
-
Mailing list:
[email protected]
-
Twitter: @ReproBuilds