Welcome to the December 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. As a rather rapid recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries (more).
Reproducible Builds: Increasing the Integrity of Software Supply Chains awarded IEEE Software “Best Paper” award
In February 2022, we announced in these reports that a paper written by Chris Lamb and Stefano Zacchiroli was now available in the March/April 2022 issue of IEEE Software. Titled Reproducible Builds: Increasing the Integrity of Software Supply Chains (PDF).
This month, however, IEEE Software announced that this paper has won their Best Paper award for 2022.
Reproducibility to affect package migration policy in Debian
In a post summarising the activities of the Debian Release Team at a recent in-person Debian event in Cambridge, UK, Paul Gevers announced a change to the way packages are “migrated” into the staging area for the next stable Debian release based on its reproducibility status:
The folks from the Reproducibility Project have come a long way since they started working on it 10 years ago, and we believe it’s time for the next step in Debian. Several weeks ago, we enabled a migration policy in our migration software that checks for regression in reproducibility. At this moment, that is presented as just for info, but we intend to change that to delays in the not so distant future. We eventually want all packages to be reproducible. To stimulate maintainers to make their packages reproducible now, we’ll soon start to apply a bounty [speedup] for reproducible builds, like we’ve done with passing autopkgtests for years. We’ll reduce the bounty for successful autopkgtests at that moment in time.
Speranza: “Usable, privacy-friendly software signing”
Kelsey Merrill, Karen Sollins, Santiago Torres-Arias and Zachary Newman have developed a new system called Speranza, which is aimed at reassuring software consumers that the product they are getting has not been tampered with and is coming directly from a source they trust. A write-up on TechXplore.com goes into some more details:
“What we have done,” explains Sollins, “is to develop, prove correct, and demonstrate the viability of an approach that allows the [software] maintainers to remain anonymous.” Preserving anonymity is obviously important, given that almost everyone—software developers included—value their confidentiality. This new approach, Sollins adds, “simultaneously allows [software] users to have confidence that the maintainers are, in fact, legitimate maintainers and, furthermore, that the code being downloaded is, in fact, the correct code of that maintainer.” […]
The corresponding paper is published on the arXiv preprint server in various formats, and the announcement has also been covered in MIT News.
Nondeterministic Git bundles
Paul Baecher published an interesting blog post on Reproducible git bundles. For those who are not familiar with them, Git bundles are used for the “offline” transfer of Git objects without an active server sitting on the other side of a network connection. Anyway, Paul wrote about writing a backup system for his entire system, but:
I noticed that a small but fixed subset of [Git] repositories are getting backed up despite having no changes made. That is odd because I would think that repeated bundling of the same repository state should create the exact same bundle. However [it] turns out that for some, repositories bundling is nondeterministic.
Paul goes on to to describe his solution, which involves “forcing git to be single threaded makes the output deterministic”. The article was also discussed on Hacker News.
Output from libxlst
now deterministic
libxslt is the XSLT C library developed for the GNOME project, where XSLT itself is an XML language to define transformations for XML files. This month, it was revealed that the result of the generate-id()
XSLT function is now deterministic across multiple transformations, fixing many issues with reproducible builds. As the Git commit by Nick Wellnhofer describes:
Rework the generate-id() function to return deterministic values. We use
a simple incrementing counter and store ids in the "psvi" member of
nodes which was freed up by previous commits. The presence of an id is
indicated by a new "source node" flag.
This fixes long-standing problems with reproducible builds, see
https://bugzilla.gnome.org/show_bug.cgi?id=751621
This also hardens security, as the old implementation leaked the
difference between a heap and a global pointer, see
https://bugs.chromium.org/p/chromium/issues/detail?id=1356211
The old implementation could also generate the same id for dynamically
created nodes which happened to reuse the same memory. Ids for namespace
nodes were completely broken. They now use the id of the parent element
together with the hex-encoded namespace prefix.
Community updates
There were made a number of improvements to our website, including Chris Lamb fixing the generate-draft
script to not blow up if the input files have been corrupted today or even in the past […], Holger Levsen updated the Hamburg 2023 summit to add a link to farewell post […] & to add a picture of a Post-It note. […], and Pol Dellaiera updated the paragraph about tar
and the --clamp-mtime
flag […].
On our mailing list this month, Bernhard M. Wiedemann posted an interesting summary on some of the reasons why packages are still not reproducible in 2023.
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes, including processing objdump
symbol comment filter inputs as Python byte
(and not str
) instances […] and Vagrant Cascadian extended diffoscope support for GNU Guix […] and updated the version in that distribution to version 253 […].
“Challenges of Producing Software Bill Of Materials for Java”
Musard Balliu, Benoit Baudry, Sofia Bobadilla, Mathias Ekstedt, Martin Monperrus, Javier Ron, Aman Sharma, Gabriel Skoglund, César Soto-Valero and Martin Wittlinger (!) of the KTH Royal Institute of Technology in Sweden, have published an article in which they:
… deep-dive into 6 tools and the accuracy of the SBOMs they produce for complex open-source Java projects. Our novel insights reveal some hard challenges regarding the accurate production and usage of software bills of materials.
The paper is available on arXiv.
Debian Non-Maintainer campaign
As mentioned in previous reports, the Reproducible Builds team within Debian has been organising a series of online and offline sprints in order to clear the huge backlog of reproducible builds patches submitted by performing so-called NMUs (Non-Maintainer Uploads).
During December, Vagrant Cascadian performed a number of such uploads, including:
crack
[…] (#1021521 & #1021522)dustmite
[…] (#1020878 & #1020879)edid-decode
[…] (#1020877)gentoo
[…] (#1024284)haskell98-report
[…] (#1024007)infinipath-psm
[…] (#990862)lcm
[…] (#1024286)libapache-mod-evasive
[…] (#1020800)libccrtp
[…] (#860470)libinput
[…] (#995809)lirc
[…] (#979019, #979023 & #979024)mm-common
[…] (#977177)mpl-sphinx-theme
[…] (#1005826)psi
[…] (#1017473)python-parse-type
[…] (#1002671)ruby-tioga
[…] (#1005727)ucspi-proxy
[…] (#1024125)ypserv
[…] (#983138)
In addition, Holger Levsen performed three “no-source-change” NMUs in order to address the last packages without .buildinfo
files in Debian trixie, specifically lorene
(0.0.0~cvs20161116+dfsg-1.1), maria
(1.3.5-4.2) and ruby-rinku
(1.7.3-2.1).
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In December, a number of changes were made by Holger Levsen:
-
Debian-related changes:
-
Arch Linux-related changes
-
Misc changes:
- Install the
python3-setuptools
andswig
packages, which are now needed to build OpenWrt. […] - Install
pkg-config
needed to build Coreboot artifacts. […] - Detect failures due to an issue where the
fakeroot
tool is implicitly required but not automatically installed. […] - Detect failures due to rename of the
vmlinuz
file. […] - Improve the grammar of an error message. […]
- Document that
freebsd-jenkins.debian.net
has been updated to FreeBSD 14.0. […]
- Install the
In addition, node maintenance was performed by Holger Levsen […] and Vagrant Cascadian […].
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
apr
(hostname issue)dune
(parallelism)epy
(time-based.pyc
issue)fpc
(Year 2038)gap
(date)gh
(FTBFS in 2024)kubernetes
(fixed random build path)libgda
(date)libguestfs
(tar)metamail
(date)mpi-selector
(date)neovim
(randomness in Lua)nml
(time-based.pyc
)pommed
(parallelism)procmail
(benchmarking)pysnmp
(FTBFS in 2038)python-efl
(drop Sphinx doctrees)python-pyface
(time)python-pytest-salt-factories
(time-based.pyc
issue)python-quimb
(fails to build on single-CPU systems)python-rdflib
(random)python-yarl
(random path)qt6-webengine
(parallelism issue in documentation)texlive
(Gzip modification time issue)waf
(time-based.pyc
)warewulf
(CPIO modification time and inode issue)xemacs
(toolchain hostname)
-
Chris Lamb:
- #1057710 filed against
python-aiostream
. - #1057721 filed against
openpyxl
. - #1058681 filed against
python-multipletau
. - #1059013 filed against
wxmplot
. - #1059014 filed against
stunnel4
.
- #1057710 filed against
-
James Addison:
- #1059592 & #1059631 filed against
qttools-opensource-src
.
- #1059592 & #1059631 filed against
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Mailing list:
[email protected]
-
Mastodon: @reproducible_builds
-
Twitter: @ReproBuilds