Welcome to the November 2019 report from the Reproducible Builds project.
As a summary of our project, whilst anyone can inspect the source code of free software for malicious flaws almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is therefore to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.
In this month’s report, we cover:
- Media coverage and events — Enter the Reproducibility Challenge, etc.
- Upstream news — OCaml, Mes, Maven, etc.
- Distribution work — The latest reports from Arch, Debian and openSUSE, etc.
- Software development — Holiday bonanza of patches, work on diffoscope, etc.
- Contributing — How to get in touch…
If you are interested in contributing to our project, please visit our Contribute page on our website.
Media coverage and events
We held our fifth annual Reproducible Builds summit between the 1st and 8th December in Marrakesh, Morocco. A full, in-depth report will be posted next month…
On November 16th, Vagrant Cascadian presented There and Back Again, Reproducibly at the SeaGL in Seattle, Washington.
Chris Lamb was featured on The Manifest package management podcast in an episode called Reproducible Builds project and Debian package management.
ReScience C is an open-access journal that targets computational research and encourages the explicit replication of already published research. This month they announced their Ten Years Reproducibility Challenge which promotes the idea that old code — in this instance, a “scientific article [published] before January 1st 2010” — should also run on modern hardware and software in order to check one can obtain the same scientific results in the future.
Upstream news
Mike Hommey pushed a change to Mozilla build system to add and print error messages when differences are found between builds as requested in bug #1597903.
There was fresh activity on an old pull request for the OCaml programming language regarding the usage and adoption of the BUILD_PATH_PREFIX_MAP
environment variable that is used to ensure that software packages do not embed build-time paths into generated files. On the pull request in question Gabriel Scherer was kind enough to provide many helpful examples on how to use the rewrite rules.
Jan Nieuwenhuizen announced the release of GNU Mes 0.21 and Jeremiah Orians announced the release of mescc-tools-seed version 1.1:
Capable of bootstrapping from a simple hex assembler all the way to a cross-platform C compiler Work is still ongoing [to] result in a full bootstrap from a 357 byte bootstrap binary all the way to GCC.
Hervé Boutemy announced the release of three base Apache Maven plugins (maven-source-plugin, maven-jar-plugin and maven-assembly-plugin 3.2.0) to get Reproducible Builds as a “direct output” from this build system. For more information, please see the “Configuring for Reproducible Builds” section of their documentation.
Eli Schwartz reported a bug against the GNU groff typesetting system for incomplete SOURCE_DATE_EPOCH
environment variable support; the output files appeared to be embedding the build timezone.
Distribution work
Arch Linux
A slight but temporary decline in the Arch Linux reproducibility status was determined to be due to a bug in the continuous integration framework where one build was building with --nocheck
whilst the other did not, resulting in the test dependencies being installed on one build. This led to differences in the BUILDINFO
file which records the build dependencies.
Morten Linderud (Foxboron) wrote a blog post on the progress of reproducible builds for Arch packages, including how to reproduce packages and a roadmap of future of work.
The standard Arch development tools package (devtools
) now contains a new tool called makerepropkg
which can reproduce a package from the Arch repositories given a seed PKGBUILD
file.
A lot of work has been put into getting the “[core]
” system more reproducible; every package has been rebuilt with a new version of pacman
which resolved a previous issue with storing the package size. Build failures and download issues have also been resolved which have lead to an increase of reproducible packages in this distributions continuous integration setup.
openSUSE
Bernhard M. Wiedemann posted a summary of openSUSE updates for 2019 including rpm
, a high level openSUSE status and fixing problems with .pyc
files which is also relevant to Arch Linux.
The report also summarises the current reproducibility status as follows:
In addition to this, Bernhard also published his monthly Reproducible Builds status update.
Debian
Thorsten Glaser filed a bug against the debhelper
packaging library to request that it sets and exports a umask
of 022
for all operations as a possible “harmonisation potential”. A varying umask
can result in unreproducible packages as the file permissions on the build system can be embedded into archives generated by the build system.
Chris Lamb categorised a large number of packages and issues in the Reproducible Builds “notes” repository, including adding a new ocaml_dune_captures_build_path
toolchain issue […].
Vagrant Cascadian filed a bug against the Lintian Debian static analyser for Debian packages to request that it checks for missing and/or unsigned .buildinfo
files. He also uploaded the latest version of GNU Mes to the unstable distribution.
Other
Natanael Copa (@n_copa) posted on Twitter that he was finally able to make a fully reproducible package) for Alpine Linux.
The NixOS distribution announced that they plan to run a Christmas Hackathon hosted by Smarkets in London, England on 9th December.
Software development
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
- Arnout Engelen:
-
Bernhard M. Wiedemann:
abseil-cpp
(sort the output offind
/readdir(2)
)afl
(date)brp-check-suse
(to strip link-time optimisation (LTO) data from.o
object files)buzztrax
(report a parallelism/nondeterminism issue from GTK-Doc)cardpeek
(fix a previous patch)cecilia
(strip date and time in a.png
image file)lib3270
(merged; date)maven-plugin-bundle
(fix a Java date)nulloy
(.zip
issue, already filed upstream)opencensus-cpp
(sort the result offind
/readdir(2)
)OpenSC
(generate consistent DocBook identifiers)pcc
(fix a build failure from LTO in.a
archive files)perl-HTTP-Cookies
(fix a build failure in 2025)pocl
(report compile-time CPU detection)python-oslo.reports
(drop unnecessary files with randomness)sql-parser
(sortfind
/readdir(2)
)-
vim
(report a build failure when built without parallelism) - Various updates to the RPM package manager:
- Chris Lamb:
- #943954 filed against
tm-align
. - #943956 filed against
snakemake
(forwarded upstream). - #944131 filed against
splitpatch
(forwarded upstream). - #944214 filed against
libaqbanking
. - #944520 filed against
isbg
(forwarded upstream). - #944782 filed against
python-sybil
(forwarded upstream). - #945105 filed against
intel-gpu-tools
. - #945576 filed against
superlu-dist
. - #945822 filed against
liblopsub
. - genpy
- #943954 filed against
- Vagrant Cascadian:
- #944694 filed against
resource-agents
(forwarded upstream).
- #944694 filed against
diffoscope
diffoscope
is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of non-deterministic behaviour.
diffoscope versions 131
, 132
and 133
were uploaded to Debian unstable by Chris Lamb. He also made the following changes:
- New features / improvements:
- Allow all possible
.zip
file variations to return from external tools with non-zero exit codes, not just known types we can identify (e.g. Java.jmod
and.jar
files). (#78) - Limit
.dsc
and.buildinfo
file matching to files in ASCII or UTF-8 format. (#77) - Bump the previous
max_page_size
limit from 400 kB to 4 MB. […] - Clarify in the HTML and text outputs that the limits are per-format, not global. (#944882)
- Don’t use line-based buffering when communicating with subprocesses in “binary” mode. (#75)
- Allow all possible
- Regression fixes:
- Correct the substitution/filtering of paths in ELF output to avoid unnecessary differences depending on the path name provided and commandline. (#945572)
- Silence/correct a Python
SyntaxWarning
message due to incorrectly comparing an integer by identity vs. equality. (#945531)
- Testsuite improvements:
- Refresh the OCaml test fixtures to support versions greater than 4.08.1. […]
- Update an Android manifest test to reflect that parsed XML attributes are returned in a new/sorted manner under Python 3.8. […]
- Dramatically Truncate the tcpdump expected diff to 8KB from ~600KB to reduce the size of the release tarball. […]
- Add a self-test to encourage that new test data files are generated dynamically or at least no new ones are added without an explicit override. […]
- Add a comment that the
text_ascii1
andtext_ascii2
fixture files are used in multiple tests so is not trivial to remove/replace them. […] - Drop two more test fixture files for the directory tests. […]
- Don’t run our self-test against the output of the Black source code reformatter with versions earlier than “ours” as it will generate different results. […]
- Update an XML test for Python 3.8. […]
- Drop unused an unused
BASE_DIR
global. […]
- Code improvements:
Other contributions were also made from:
- Jelle van der Waa:
- Mattia Rizzolo:
- Install
python3-all
whilst running the autopkgtests as we want to run the tests against all supported Python versions. […] - Use
apt-get
instead ofapt
in ourDockerfile
. […] - Add
zstd
to our test dependencies after the resolution of #34. […]
- Install
strip-nondeterminism
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, Chris Lamb added file
as a dependency for libfile-stripnondeterminism-perl
(#945212) and moved away from deprecated $ADTTMP
variable […] and made two uploads in total (1.6.2-1
& 1.6.3-1
).
Project website
There was yet more effort put into our our website this month, including:
-
Chris Lamb dropped a duplicated use the term “community” and other words […][…], correcting the capitalisation of GitHub & GitLab […] and corrected the use of an “an” […].
-
Daniel Edgecumbe added a section on initramfs and
.cpio
files to our Archive Metadata page. […] -
Hervé Boutemy added a link to Maven Guide to Configuring for Reproducible Builds to our JVM page. […]
-
Holger Levsen added a link to the openSUSE reproducible-builds CI graph and did several commits in preparation of the Reproducible Builds summit in Marrakesh in December.
-
Jelle van der Waa added Arch Linux-specific links for diffoscope and friends to our Tools page. […]
Test framework
We operate a comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made:
-
Alexander Couzens (OpenWrt): Fix a typo in the
kirkwood
architecture. […] -
Holger Levsen:
- Debian:
- Display newer suites first on pages showing the oldest build results. […]
- Use the fully qualified-domain name (FQDN) when specifying hostnames in our list of offline nodes. […]
- Reflect that
coccia.debian.org
has changed IP address. […] - Ignore the Maximum transmission Unit (MTU) on
eth0
when checking for host health. […] - Perform the “
/usr
merge” variation in the unstable, experimental and bullseye distributions but not on buster. […]
- Arch Linux:
- OpenWrt:
- Misc:
- Attempt to fix the PureOS package set. […]
- Shorten a “HOWTO” header a tiny bit. […]
- Drop hack to fix the clock. […]
- Improve a script header; patches are even more welcome than bugs! […]
- Disable the use of the OpenSSH
ControlMaster
feature to prevent Jenkins killing connections. […] - Make a number of improvements to our boilerplate texts/scripts. […][…][…]
- Debian:
-
Jelle van der Waa: Skip running the Arch Linux tests for continuous builds and rebuilds. […][…]
- Mattia Rizzolo:
- Set the maximum size for HTML pages generated by diffoscope to 1MB (current default is 400 KB). […][…]
- Update and improve the backup routines for the email relay system managing
reproducible-builds.org
. […][…]
- Vagrant Cascadian:
- Ensure OpenSSH
authorized_keys
files are processed in the correct directory regardless of where they are run from. […] - Reduce the level of parallelism on
armhf
systems with a lot of cores to reduce swapping on highly parallel builds, additionally ensuring level of parallelism are odd and even numbers on the first and second builds respectfully. […]
- Ensure OpenSSH
The usual node maintenance was performed by Holger Levsen. […][…][…][…]
Contributing
If you are interested in contributing the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds
onirc.oftc.net
. -
Twitter: @ReproBuilds
-
Mailing list:
[email protected]
This month’s report was written by Arnout Engelen, Chris Lamb, Holger Levsen, Jelle van der Waa, Bernhard M. Wiedemann and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.