Welcome to the May 2020 report from the Reproducible Builds project.
One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. Nonetheless, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.
In these reports we outline the most important things that we and the rest of the community have been up to over the past month.
News
The Corona-Warn app that helps trace infection chains of SARS-CoV-2/COVID-19 in Germany had a feature request filed against it that it build reproducibly.
A number of academics from Cornell University have published a paper titled Backstabber’s Knife Collection which reviews various open source software supply chain attacks:
Recent years saw a number of supply chain attacks that leverage the increasing use of open source during software development, which is facilitated by dependency managers that automatically resolve, download and install hundreds of open source packages throughout the software life cycle.
In related news, the LineageOS Android distribution announced that a hacker had access to the infrastructure of their servers after exploiting an unpatched vulnerability.
Marcin Jachymiak of the Sia decentralised cloud storage platform posted on their blog that their siac
and siad
utilities can now be built reproducibly:
This means that anyone can recreate the same binaries produced from our official release process. Now anyone can verify that the release binaries were created using the source code we say they were created from. No single person or computer needs to be trusted when producing the binaries now, which greatly reduces the attack surface for Sia users.
Synchronicity is a distributed build system for Rust build artifacts which have been published to crates.io. The goal of Synchronicity is to provide a distributed binary transparency system which is independent of any central operator.
The Comparison of Linux distributions article on Wikipedia now features a Reproducible Builds column indicating whether distributions approach and progress towards achieving reproducible builds.
Distribution work
In Debian this month:
-
Paul Wise continued a discussion that was started in February regarding the storing and distribution of build logs and other related artifacts and their relationship to reproducible builds. For example, the
binutils
package ships its own, unreproducible, log files in its binary packages. It was followed-up by replies from Chris Lamb and Matthias Klose. -
34 reviews of Debian packages were added, 20 were updated and 122 were removed this month adding to our knowledge about identified issues. Chris Lamb added and categorised a new
ocaml_cmti_files
toolchain issue.
In Alpine Linux, an issue was filed — and closed — regarding the reproducibility of .apk
packages.
Allan McRae of the ArchLinux project posted their third Reproducible builds progress report to the arch-dev-public
mailing list which includes the following call for help:
We also need help to investigate and fix the packages that fail to reproduce that we have not investigated as of yet.
In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update.
Software development
diffoscope
Chris Lamb made the changes listed below to diffoscope, our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. He also prepared and uploaded versions 142
, 143
, 144
, 145
and 146
to Debian, PyPI, etc.
-
Comparison improvements:
- Improve fuzzy matching of JSON files as
file
now supports recognising JSON data. (#106) - Refactor
.changes
and.buildinfo
handling to show all details (including the GnuPG header and footer components) even when referenced files are not present. (#122) - Use our
BuildinfoFile
comparator (etc.) regardless of whether the associated files (such as theorig.tar.gz
and the.deb
) are present. […] - Include GnuPG signature data when comparing
.buildinfo
,.changes
, etc. […] - Add support for printing Android APK signatures via
apksigner(1)
. (#121) - Identify “iOS App Zip archive data” as
.zip
files. (#116) - Add support for Apple Xcode
.mobilepovision
files. (#113)
- Improve fuzzy matching of JSON files as
-
Bug fixes:
-
Output improvements:
- Never emit the same
id="foo"
anchor reference twice in the HTML output, otherwise identically-named parts will not be able to linked to via a#foo
anchor. (#120) - Never emit an empty “id” anchor either; it is not possible to link to
#
. […] - Don’t pretty-print the output when using the
--json
presenter; it will usually be too complicated to be readable by the human anyway. […] - Use the SHA256 over MD5 hash when generating page names for the HTML directory-style presenter. (#124)
- Never emit the same
-
Reporting improvements:
- Clarify the message when we truncate the number of lines to standard error […] and reduce the number of maximum lines printed to 25 as usually the error is obvious by then […].
- Print the amount of free space that we have available in our temporary directory as a debugging message. […]
- Clarify
Command […] failed with exit code
messages to remove duplicateexited with exit
but also to note thatdiffoscope
is interpreting this as an error. […] - Don’t leak the full path of the temporary directory in
Command […] exited with 1
messages. (#126) - Clarify the warning message when we cannot import the
debian
Python module. […] - Don’t repeat
stderr from {}
if both commands emit the same output. […] - Clarify that an external command emits for both files, otherwise it can look like we are repeating itself when, in reality, it is being run twice. […]
-
Testsuite improvements:
-
Dockerfile
improvements:- Add a
.dockerignore
file to whitelist files we actually need in our container. (#105) - Use
ARG
instead ofENV
when setting up theDEBIAN_FRONTEND
environment variable at runtime. (#103) - Run as a non-root user in container. (#102)
- Install/remove the
build-essential
during build so we can install the recommended packages from Git. […]
- Add a
-
Codebase improvements:
- Bump the officially required version of Python from 3.5 to 3.6. (#117)
- Drop the (default)
shell=False
keyword argument tosubprocess.Popen
so that the potentially-unsafeshell=True
is more obvious. […] - Perform string normalisation in Black […] and include the Black output in the assertion failure too […].
- Inline
MissingFile
’s special handling ofdeb822
to prevent leaking through abstract layers. […][…] - Allow a bare
try
/except
block when cleaning up temporary files with respect to theflake8
quality assurance tool. […] - Rename
in_dsc_path
todsc_in_same_dir
to clarify the use of this variable. […] - Abstract out the duplicated parts of the
debian_fallback
class […] and add descriptions for the file types. […] - Various commenting and internal documentation improvements. […][…]
- Rename the
Openssl
command class toOpenSSLPKCS7
to accommodate other command names with this prefix. […]
-
Misc:
- Rename the
--debugger
command-line argument to--pdb
. […] - Normalise filesystem
stat(2)
“birth times” (ie.st_birthtime
) in the same way we do with thestat(1)
command’sAccess:
andChange:
times to fix a nondeterministic build failure in GNU Guix. (#74) - Ignore case when ordering our file format descriptions. […]
- Drop, add and tidy various module imports. […][…][…][…]
- Rename the
In addition:
-
Jean-Romain Garnier fixed a general issue where, for example,
LibarchiveMember
’shas_same_content
method was called regardless of the underlying type of file. […] -
Daniel Fullmer fixed an issue where some filesystems could only be mounted read-only. (!49)
-
Emanuel Bronshtein provided a patch to prevent a build of the Docker image containing parts of the build’s. (#123)
-
Mattia Rizzolo added an entry to
debian/py3dist-overrides
to ensure therpm-python
module is used in package dependencies (#89) and moved to using the newexecute_after_*
andexecute_before_*
Debhelper rules […].
Chris Lamb also performed a huge overhaul of diffoscope’s website:
- Add a completely new design. […][…]
- Dynamically generate our contributor list […] and supported file formats […] from the main Git repository.
- Add a separate, canonical page for every new release. […][…][…]
- Generate a ‘latest release’ section and display that with the corresponding date on the homepage. […]
- Add an RSS feed of our releases […][…][…][…][…] and add to Planet Debian […].
- Use Jekyll’s
absolute_url
andrelative_url
where possible […][…] and move a number of configuration variables to_config.yml
[…][…].
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
golang-packaging
(toolchain issue, affecting times inminikube
)jboss-logging-tools
(toolchain issue, affecting date forresteasy
)linux_logo
(sortfind
output to avoid inheriting filesystem order)moonjit
(generate reproducible output by default ifSOURCE_DATE_EPOCH
is set)vala
(report ASLR nondeterminism)
-
Jelle van der Waa:
earlyoom
(timestamps in Gzip files)fmt
(Don’t installsphinx-build
cached files as they are unneeded & unreproducible)nvidia-settings
(timestamp in Gzip files)
-
Chris Lamb:
- #959714 filed against
ataqv
. - #960313 filed against
elinks
. - #960386 filed against
briquolo
. - #960388 filed against
cryptominisat
. - #960590 filed against
wolfssl
. - #960591 filed against
mistral
. - #960607 filed against
python-watcherclient
. - #960669 filed against
tree-puzzle
. - #961009 filed against
nulib2
. - #961202 filed against
process-cpp
. - #961494 filed against
bowtie2
. - #961495 filed against
properties-cpp
. - #961582 filed against
wand
(forwarded upstream) - #961657 filed against
vows
.
- #959714 filed against
-
Vagrant Cascadian:
- #961747 filed against
libstatgrab
. - #961764 filed against
texi2html
. - #961766 filed against
grub
. - #961830 filed against
systemtap
. - #961942 filed against
mono
. mescc-tools
: InheritCFLAGS
in aMakefile
, allowing-ffile-prefix-map
/-fdebug-prefix-map
to sanitise build paths (merged upstream).
- #961747 filed against
Other tools
Elsewhere in our tooling:
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. In May, Chris Lamb uploaded version 1.8.1-1
to Debian unstable and Bernhard M. Wiedemann fixed an “off-by-one” error when parsing PNG image modification times. (#16)
In disorderfs, our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues, Chris Lamb replaced the term “dirents” in place of “directory entries” in human-readable output/log messages […] and used the astyle source code formatter with the default settings to the main disorderfs.cpp
source file […].
Holger Levsen bumped the debhelper-compat level
to 13 in disorderfs […] and reprotest […], and for the GNU Guix distribution Vagrant Cascadian updated the versions of disorderfs to version 0.5.10 […] and diffoscope to version 145 […].
Project documentation & website
-
Carl Dong:
- Clarify some potential confusion around GCC
libtool
. […]
- Clarify some potential confusion around GCC
-
Chris Lamb:
- Rename the Who page to Projects”. […]
- Ensure that Jekyll enters the
_docs
subdirectory to find the_docs/index.md
file after an internal move. (#27) - Wrap
ltmain.sh
etc. in preformatted quotes. […] - Wrap the
SOURCE_DATE_EPOCH
Python examples onto more lines to prevent visual overflow on the page. […] - Correct a “preferred” spelling error. […]
-
Holger Levsen:
- Sort our Academic publications page by publication year […] and add “Trusting Trust” and “Fully Countering Trusting Trust through Diverse Double-Compiling” […].
-
Juri Dispan:
- Update the URL for
faketime
to the project’s Github page. (!57)
- Update the URL for
Testing framework
We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org
that, amongst many other tasks, tracks the status of our reproducibility efforts as well as identifies any regressions that have been introduced. Holger Levsen made the following changes:
-
System health status:
-
- Fail loudly if there are more than three
.buildinfo
files with the same name. […] - Fix a typo which prevented
/usr
merge variation on Debian unstable. […] - Temporarily ignore PHP’s horde](https://www.horde.org/) packages in Debian bullseye. […]
- Document how to reboot all nodes in parallel, working around
molly-guard
. […]
- Fail loudly if there are more than three
-
Further work on a Debian package rebuilder:
- Workaround and document various issues in the
debrebuild
script. […][…][…][…] - Improve output in the case of errors. […][…][…][…]
- Improve documentation and future goals […][…][…][…], in particular documentiing two real world tests case for an “impossible to recreate build environment” […].
- Find the right source package to rebuild. […]
- Increase the frequency we run the script. […][…][…][…]
- Improve downloading and selection of the sources to build. […][…][…]
- Improve version string handling.. […]
- Handle build failures better. […]. […]. […]
- Also consider “architecture all”
.buildinfo
files. […][…]
- Workaround and document various issues in the
In addition:
-
kpcyrd, for Alpine Linux, updated the
alpine_schroot.sh
script now that a patch forabuild
had been released upstream. […] -
Alexander Couzens of the OpenWrt project renamed the
brcm47xx
target tobcm47xx
. […] -
Mattia Rizzolo fixed the printing of the build environment during the second build […][…][…] and made a number of improvements to the script that deploys Jenkins across our infrastructure […][…][…].
Lastly, Vagrant Cascadian clarified in the documentation that you need to be user jenkins
to run the blacklist
command […] and the usual build node maintenance was performed by Holger Levsen […][…][…], Mattia Rizzolo […][…] and Vagrant Cascadian […][…][…].
Mailing list:
There were a number of discussions on our mailing list this month:
Paul Spooren started a thread titled Reproducible Builds Verification Format which reopens the discussion around a schema for sharing the results from distributed rebuilders:
To make the results accessible, storable and create tools around them, they should all follow the same schema, a reproducible builds verification format. The format tries to be as generic as possible to cover all open source projects offering precompiled source code. It stores the rebuilder results of what is reproducible and what not.
Hans-Christoph Steiner of the Guardian Project also continued his previous discussion regarding making our website translatable.
Lastly, Leo Wandersleb posted a detailed request for feedback on a question of supply chain security and other issues of software review; Leo is the founder of the Wallet Scrutiny project which aims to prove the security of Android Bitcoin Wallets:
Do you own your Bitcoins or do you trust that your app allows you to use “your” coins while they are actually controlled by “them”? Do you have a backup? Do “they” have a copy they didn’t tell you about? Did anybody check the wallet for deliberate backdoors or vulnerabilities? Could anybody check the wallet for those?
Elsewhere, Leo had posted instructions on his attempts to reproduce the binaries for the BlueWallet Bitcoin wallet for iOS and Android platforms.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Reddit: /r/ReproducibleBuilds
-
Mailing list:
[email protected]
This month’s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.