Welcome to latest report from the Reproducible Builds project for June 2021. In these reports we outline the most important things that have been happening in the world of reproducible builds in the past month. As ever, if you are interested in contributing to the project, please visit the Contribute page on our website.
Community news
Jake Edge of Linux Weekly News (LWN) published a lengthy article on 16th June describing various steps taken by the Fedora Linux distribution with respect to preventing supply-chain attacks:
The specter of more events like the SolarWinds supply-chain attacks is something that concerns many in our communities—and beyond. Linux distributions provide a supply chain that obviously needs to be protected against attackers injecting malicious code into the update stream. This problem recently came up on the Fedora devel mailing list, which led to a discussion covering a few different topics. For the most part, Fedora users are protected against such attacks, which is not to say there is nothing more to be done, of course.
The Google Security Blog introduced a new framework called “Supply chain Levels for Software Artifacts”, or SLSA (to be pronounced as ‘salsa’). In particular, SLSA level 4 (“currently the highest level”) not only requires a two-person review of all changes but also “a hermetic, reproducible build process” due to its “many auditability and reliability benefits”. Whilst a highly welcome inclusion in Google’s requirements, by equating reproducible builds with only the highest level of supply-chain security in their list, it might lead others to conclude that only the most secure systems can benefit from the benefits of reproducible builds, whilst it is a belief of the Reproducible Builds project that many more users, if not all, can do so.
Many media outlets (including The Verge, etc.) reported on how the United States’ FBI operated a messaging app as a ‘honeypot trap’ for a long period of time, leading to hundreds of arrests. According to the UK’s Financial Times, court documents describe how the FBI persuaded a software developer facing prison to allow the FBI to commandeer the app and to introduce it to suspected criminals:
Over the course of the next three years, the operation was able to inspect about 27m messages over 11,800 devices as ANOM gained popularity in criminal circles globally, pushed by the developer but also a network of crime “influencers” — experts in encrypted phones who encourage others to use such devices.
As the Financial Times reports, “it is unclear what exactly prompted the FBI and others to reveal the operation”, although others have suggested it may result from legal limits in timeframes for intercepting communications. The FBI’s operation raises ethical concerns which overlap with beliefs held by proponents of Reproducible Builds, not least of all because even the most unimpeachable actions by actors may result in the incidental surveillance of innocent people.
In similar legal news, Susan Landau posted to the Lawfare blog about the potential dangers posted by evidentiary software. In particular, she discusses concerns that proprietary software may be fundamentally incompatible with the ability of defendants have the right to know the nature of the evidence against them — this is a right that is explicitly enshrined, for instance, in the Sixth Amendment of United States Constitution. However,
At the time of our writing the article on the use of software as evidence, there was no overriding requirement that [United States] law enforcement provide a defendant with the code so that they might examine it themselves.
It is relevant here because if the inability to consult the relevant source code of does violate such rights, it may follow that a secure and reproducible build process will also be required — after all, it would be the output of the binary versions of the source code that is used to convict suspects, not the source code itself. As Susan points out:
Mistakes happen with software and sometimes the only way to find errors is to study the code itself—both of which have important implications for courtroom use of software programs.
The Reproducible Builds project restarted their IRC meetings this month. Taking place on the #reproducible-builds
channel on the OFTC IRC network, the log of the meeting on 29th June is now available online, and the next meeting is due to take place on July 27th at 15:00 UTC (agenda).
Ars Technica are reporting that “counterfeit” packages in PyPI, the official Python package repository, contained secret code that installed cryptomining software on infected machines: “So-called typosquatting attacks succeed when targets accidentally mistype a name such as typing mplatlib or maratlib instead of the legitimate and popular package, matplotlib”. The article is at pains to points out that PyPI is not not abused any more than other repositories are:
Last year, packages downloaded thousands of times from RubyGems installed malware that attempted to intercept bitcoin payments. Two years before that, someone backdoored a 2-million-user code library hosted in NPM. Sonatype has tracked more than 12,000 malicious NPM packages since 2019.
Distribution work
Ariadne Conill published a detailed blog post this month detailing their work on security issues and concerns in the Alpine Linux distribution. In particular, Ariadne included an interesting section on an effort “to prove the reproducibility of Alpine package builds”:
To this end, I hope to have the Alpine 3.15 build fully reproducible. This will require some changes to
abuild
so that it producesbuildinfo files
, as well as a rebuilder backend. We plan to use the same buildinfo format as Arch Linux, and will likely adapt some of the other reproducible builds work Arch has done to Alpine.
Ariadne mentioned plans to have a meeting and a sprint during July, to be organised in and around the #alpine-reproducible
channel on the OFTC IRC network, and later posted a round-up of security initiatives in Alpine during June which mentions, amongst many other things, the ability to demonstrate reproducible Alpine install images for the Raspberry Pi.
Elsewhere in Alpine news, kpcyrd posted a series of Tweets explaining the steps he made for a reproducible Alpine image. [1] [2]
For openSUSE, Bernhard M. Wiedemann posted his monthly reproducible builds status report.
The NixOS Linux distribution pulled off a technical and publicity coup this month by announcing that the ISO_minimal.x86_64-Linux
image is 100% reproducible. The announcement was widely discussed on Hacker News, where the article has received in excess of 200 comments.
In early June, Nilesh Patra asked for help making Debian’s brian
package build reproducibly. Felix C. Stegerman proposed two patches which seem to have fixed the remaining issues (#989693). These were submitted upstream, where they were shortly merged.
Felix C. Stegerman announced the release of v1.0.0 of apksigcopier, a tool to copy, extract and patch .apk
signatures needed to facilitate reproducible builds on the F-Droid Android application store. Holger Levsen subsequently sponsored an upload to Debian. Felix C. Stegerman also reported that Android builds are sometimes not reproducible due to a bug in Android’s coreLibraryDesugaring
. […]
Elsewhere in F-Droid, the Swiss COVID Certificate mobile app (which uses reproducible builds) has been added to F-Droid — the F-Droid developers have mentioned that the upstream developers have been very helpful in making this happen. Relatedly, the Android version of the Electrum Bitcoin Wallet has been made reproducible.
Lastly, Hannes Mehnert announced the launch of the reproducible MirageOS build infrastructure, together with where to obtain ‘unikernels’: “To provide a high level of assurance and trust, if you distribute binaries in 2021, you should have a recipe how they can be reproduced in a bit-by-bit identical way.”
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
deepdiff
(report a ‘build failure in 2022’ issue)dulwich
(build fails in the future due to expired GPG key)gtksourceview4
(report that build fails in uniprocessor machine)ipxe
(ar(1)
call needs to be deterministic)json-lib
(report a date / epoch issue)kernel-default
(two sorting and random-related issues)lepton
(drop call to-march=native
)lighttpd1
(build fails in 2036)openvas-smb
(date and Portable Executable timestamp issue)python-MapProxy
(report a ‘build fails on uniprocessor machine’ issue)python-gcsfs
(report a ‘build fails on uniprocessor machine’ issue)
-
Nilesh Patra:
-
Vagrant Cascadian:
- #989963 filed against
tclap
. - #989965 filed against
gtk-sharp3
. - #989966 filed against
gtk-sharp3
. - #990084 filed against
graphicsmagick
. - #990246, #990247 and #990248 filed against
vlc
. - #990253 filed against
pmix
. - #990254 filed against
openmpi
. - #990300 filed against
auctex
. - #990323 filed against
volume-key
. - #990327 filed against
cppunit
. - #990329 filed against
rpm
. - #990332 filed against
libcddb
. - #990338 filed against
autogen
. - #990339 filed against
matplotlib
.
- #989963 filed against
Separate to this, Hans-Christoph Steiner noted there is a reproducibility-related bug in Python’s standard zipfile
library. This problem makes it hard to create reproducible .zip
files. In particular, Hans would like to have more input from Python people, since it is not clear how best to resolve the problem.
diffoscope
diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it provides human-readable diffs from many kinds of binary formats.
This month, Chris Lamb made a number of changes including releasing version 177). In addition, Chris updated the try.diffoscope.org service to reflect that Bytemark were acquired by the Iomart Group. […].
-
Balint Reczey:
-
Jean-Romain Garnier:
- Overhaul the Mach-O executable file comparator. […][…][…][…][…]
- Implement tests for the Mach-O comparator. […][…][…]
- Switch to new argument format for the LLVM compiler. […]
- Fix
test_libmix_differences
in testsuite for the ELF format. […][…] - Improve macOS compatibility for the Mach-O comparator. […]
- Add
llvm-readobj
andllvm-objdump
to the internalEXTERNAL_TOOLS
data structure. […]
-
Mattia Rizzolo:
Website and documentation
A number of few changes were made to the main Reproducible Builds website and documentation this month, including:
-
Arnout Engelen:
-
Chris Lamb:
- Use an ellipsis […] and drop a full stop […] to clarify ‘more items’ links.
- Update the link and logo to Google Open Source Security Team. […]
- Reduce the amount of bold text on the homepage. […]
- Document the non-reproducibility arising from abbreviated Git hashes depending on the number of total objects in a Git repository. […]
-
Hervé Boutemy:
- Add a Reproducible Central section section to the JVM page. […]
-
Holger Levsen:
- Add busybox to the list of software respecting the
SOURCE_DATE_EPOCH
environment variable for build timestamps if available. […]
- Add busybox to the list of software respecting the
-
Mattia Rizzolo:
- Fix a typo in a CSS class name. […]
- Add the (now-superseded) Linux Foundation Core Infrastructure Initiative to the list of historical sponsors. […]
Testing framework
The Reproducible Builds project operates a Jenkins-based testing framework that powers tests.reproducible-builds.org
. This month, the following changes were made:
-
Holger Levsen:
-
Debian-related changes:
- Initial stab at building and comparing Debian Live images. […]
- Run the
lb build
Debian Live command withsudo(8)
. […][…] - Use safer and more common
rm -rf
syntax in/around Debian Live images. […] - Sync build results of Live images to our Jenkins instance. […]
- Create a Debian unstable schroot for running diffoscope on the
osuosl173
node so it can be used to test Debian Live images. […] - Cope with the Tails build manifests now only containing binary package names. […]
- Do not incorrectly detect diskspace issues on OpenSSL builds. […]
- Delete the
reproducible_compare_Debian_sha1sums
jobs. […]
-
Automatic node health check improvements:
-
Misc:
-
-
Mattia Rizzolo:
-
Roland Clobus spent significant time on automatically building Debian Live images twice and comparing the output if they differ (Jenkins job page). This included:
-
Vagrant Cascadian:
Finally, build node maintenance was performed by Holger Levsen […][…][…], Mattia Rizzolo […][…][…][…] and Vagrant Cascadian […].
Misc development news
Dan Shearer from the LumoSQL database project posted to the rb-general mailing list about reproducibility and microcode updates, emphasis ours:
Here at LumoSQL we do repeated runs testing SQLite of various versions and configurations, storing the results in an SQLite database. Here is an example of the kind of variation that justifies what some have called our ‘too-fussy’ test suite, a microcode update that changes behaviour from one day to another.
Finally, in last month’s report we wrote about Paul Spooren proposing a patch for the BusyBox suite of UNIX utilities so that it uses SOURCE_DATE_EPOCH
for build timestamps if available. This was merged during June by Denys Vlasenko.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter (@ReproBuilds) and Mastodon (@[email protected]).
-
Reddit: /r/ReproducibleBuilds
-
Mailing list:
[email protected]