1010957">

Debian Bug report logs - #1010957
man-db: unreproducible index.db: contents depend on directory read order

version graph

Package: src:man-db; Maintainer for src:man-db is Colin Watson <cjwatson@debian.org>;

Reported by: Johannes Schauer Marin Rodrigues <josch@debian.org>

Date: Sat, 14 May 2022 06:51:01 UTC

Severity: normal

Tags: fixed-upstream

Found in version man-db/2.10.2-1

Fixed in version man-db/2.11.0-1

Done: Colin Watson <cjwatson@debian.org>

Bug is archived. No further changes may be made.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, josch@debian.org, reproducible-bugs@lists.alioth.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Sat, 14 May 2022 06:51:03 GMT) (full text, mbox, link).


Acknowledgement sent to Johannes Schauer Marin Rodrigues <josch@debian.org>:
New Bug report received and forwarded. Copy sent to josch@debian.org, reproducible-bugs@lists.alioth.debian.org, Colin Watson <cjwatson@debian.org>. (Sat, 14 May 2022 06:51:04 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Johannes Schauer Marin Rodrigues <josch@debian.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: man-db: unreproducible index.db: contents depend on directory read order
Date: Sat, 14 May 2022 08:46:48 +0200
[Message part 1 (text/plain, inline)]
Source: man-db
Version: 2.10.2-1
Severity: normal
User: reproducible-builds@lists.alioth.debian.org
Usertags: randomness
X-Debbugs-Cc: josch@debian.org, reproducible-bugs@lists.alioth.debian.org

Hi,

the contents of index.db are unreproducible across different
hosts/filesystems. With the same host/filesystem, it works fine:

    $ export SOURCE_DATE_EPOCH=1652473183
    $ mmdebstrap --variant=standard unstable out1.tar
    $ mmdebstrap --variant=standard unstable out2.tar
    $ cmp out1.tar out2.tar
    $ echo $?
    0

Now lets mmdebstrap use a filesystem mounted with disorderfs as its
TMPDIR to simulate the problem:

    $ mkdir emptydir disorder
    $ sudo disorderfs --multi-user=yes --shuffle-dirents=yes --reverse-dirents=no emptydir disorder
    $ export TMPDIR=$(pwd)/disorder
    $ mmdebstrap --variant=standard unstable out1.tar
    $ mmdebstrap --variant=standard unstable out2.tar
    $ diffoscope out1.tar out2.tar | grep ├──
    ├── file list
    ├── ./var/cache/man/cs/index.db
    ├── ./var/cache/man/da/index.db
    ├── ./var/cache/man/de/index.db
    ├── ./var/cache/man/es/index.db
    ├── ./var/cache/man/fr/index.db
    ...

I attached the contents of /var/cache/man/index.db so that you can see
that it is indeed the order that differs between individual runs.

Thanks!

cheers, josch
[index1.db (application/x-gdbm, attachment)]
[index2.db (application/x-gdbm, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Sat, 14 May 2022 22:24:03 GMT) (full text, mbox, link).


Message #8 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Johannes Schauer Marin Rodrigues <josch@debian.org>
To: 1010957@bugs.debian.org
Subject: Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Sun, 15 May 2022 00:20:57 +0200
[Message part 1 (text/plain, inline)]
Control: tag -1 + patch

Hi,

Quoting Johannes Schauer Marin Rodrigues (2022-05-14 08:46:48)
> the contents of index.db are unreproducible across different
> hosts/filesystems. With the same host/filesystem, it works fine:
> 
>     $ export SOURCE_DATE_EPOCH=1652473183
>     $ mmdebstrap --variant=standard unstable out1.tar
>     $ mmdebstrap --variant=standard unstable out2.tar
>     $ cmp out1.tar out2.tar
>     $ echo $?
>     0
> 
> Now lets mmdebstrap use a filesystem mounted with disorderfs as its
> TMPDIR to simulate the problem:
> 
>     $ mkdir emptydir disorder
>     $ sudo disorderfs --multi-user=yes --shuffle-dirents=yes --reverse-dirents=no emptydir disorder
>     $ export TMPDIR=$(pwd)/disorder
>     $ mmdebstrap --variant=standard unstable out1.tar
>     $ mmdebstrap --variant=standard unstable out2.tar
>     $ diffoscope out1.tar out2.tar | grep ├──
>     ├── file list
>     ├── ./var/cache/man/cs/index.db
>     ├── ./var/cache/man/da/index.db
>     ├── ./var/cache/man/de/index.db
>     ├── ./var/cache/man/es/index.db
>     ├── ./var/cache/man/fr/index.db
>     ...
> 
> I attached the contents of /var/cache/man/index.db so that you can see
> that it is indeed the order that differs between individual runs.

I now have a patch (attached).

Thanks!

cheers, josch
[reproducible (text/x-diff, attachment)]
[signature.asc (application/pgp-signature, inline)]

Added tag(s) patch. Request was from Johannes Schauer Marin Rodrigues <josch@debian.org> to 1010957-submit@bugs.debian.org. (Sat, 14 May 2022 22:24:03 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Wed, 18 May 2022 08:27:02 GMT) (full text, mbox, link).


Message #13 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Johannes Schauer Marin Rodrigues <josch@debian.org>
To: cjwatson@debian.org
Cc: 1010957@bugs.debian.org
Subject: Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Wed, 18 May 2022 10:25:42 +0200
[Message part 1 (text/plain, inline)]
Hi Colin,

Quoting Johannes Schauer Marin Rodrigues (2022-05-15 00:20:57)
> I now have a patch (attached).

do you have an approximate ETA for when you think you'll be able to upload a
version of man-db that fixes this issue? I'm trying to decide whether I should
make a new mmdebstrap release that works around this or wait for a man-db
upload that addresses this issue.

Thanks!

cheers, josch
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#1010957; Package src:man-db. (Wed, 18 May 2022 23:39:02 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Wed, 18 May 2022 23:39:02 GMT) (full text, mbox, link).


Message #18 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Johannes Schauer Marin Rodrigues <josch@debian.org>, 1010957@bugs.debian.org
Subject: Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Thu, 19 May 2022 00:36:37 +0100
On Sun, May 15, 2022 at 12:20:57AM +0200, Johannes Schauer Marin Rodrigues wrote:
> I now have a patch (attached).

Thanks for your patch!

I found a few problems with it.  Don't worry about sending a new patch
to address these; I have fixes in my local tree for most things I've
commented on here and am working on the rest.  I'm just letting you know
where things stand.

First, we should add the appropriate Gnulib modules for upstream
portability.  Then:

> @@ -356,8 +356,8 @@ static void add_dir_entries (MYDBM_FILE
>  	 * or . files (such as current, parent dir).
>  	 */
>  
> -	dir = opendir (infile);
> -	if (!dir) {
> +	n = scandir(infile, &namelist, NULL, alphasort);

IMO we might as well move the filtering currently being done in the
while loop below to a scandir filter function.

I would prefer not to use alphasort here, because it's locale-dependent
(not that it will matter very much in practice with the sorts of file
names that typically appear in manual page directories, but I can
imagine edge cases).  A variant that's deliberately locale-independent
is only a few lines of code.

> @@ -367,13 +367,13 @@ static void add_dir_entries (MYDBM_FILE
>  
>          /* strlen(newdir->d_name) could be replaced by newdir->d_reclen */
>  
> -	while ((newdir = readdir (dir)) != NULL) {
> -		if (*newdir->d_name == '.' &&
> -		    strlen (newdir->d_name) < (size_t) 3)
> +	while (n--) {

I guess this might have been borrowed from the example in the scandir(3)
manual page, because it goes in reverse order; I don't think there's a
good reason to do that.

> +	free(namelist);

This leaks memory.  We need to free all the elements of this list as
well.

>  	order_files (infile, &names);

This means that the conversion to scandir in this function is in
practice going to be ineffective, because we immediately turn around and
re-sort the list by the physical locations of the first blocks of the
corresponding files.  Won't this have just as much of an effect on
reproducibility in principle, even if it doesn't happen to affect
mmdebstrap in your tests, presumably due to something like disk order
typically being similar between runs if your disk isn't too full?

I'm experimenting with simply removing the order_files call here, on the
basis that other performance improvements to mandb(8) have made it less
critical.  It does seem to slow things down slightly even on an SSD,
though that may be measurement error; I still need to compare timings on
a rotational drive (but I will).

More interestingly, it changes accessdb(8) output in ways that aren't
just obvious consequences of sorting (multi keys change due to that, but
as far as I can tell that's fine).  So far I've spotted the following
issues:

 * A number of new entries are introduced for some reason (and the
   question is why they were missing beforehand).

 * The targets of WHATIS_MAN references change (admittedly in cases
   where there are multiple possibilities, but we should be picking a
   deterministic one rather than just the first).

 * Symlinks flip between ULT_MAN and SO_MAN depending on whether the
   symlink happens to sort before its target.

 * The database's idea of whether pages require processing via tbl(1)
   (and probably other preprocessors) changes in some cases.  (Fixed in
   https://gitlab.com/cjwatson/man-db/-/commit/1873051fdb.)

I think these are mostly existing bugs, but I intend to fix them first
before attempting to land your patch, because this is all delicate
enough that I really want to be sure of exactly what's changing.  (Since
these are all order-dependent bugs, it may even be that fixing all these
will make parts of your patch unnecessary; but we'll see.)

I will keep working on this, and expect to be able to get a new release
out in a week or two.

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Thu, 19 May 2022 02:06:02 GMT) (full text, mbox, link).


Message #21 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Johannes Schauer Marin Rodrigues <josch@debian.org>
To: 1010957@bugs.debian.org, Colin Watson <cjwatson@debian.org>
Subject: Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Thu, 19 May 2022 04:03:33 +0200
[Message part 1 (text/plain, inline)]
Hi Colin,

Quoting Colin Watson (2022-05-19 01:36:37)
> On Sun, May 15, 2022 at 12:20:57AM +0200, Johannes Schauer Marin Rodrigues wrote:
> > I now have a patch (attached).
> 
> Thanks for your patch!
> 
> I found a few problems with it.  Don't worry about sending a new patch
> to address these; I have fixes in my local tree for most things I've
> commented on here and am working on the rest.  I'm just letting you know
> where things stand.

thank you!! :)

> First, we should add the appropriate Gnulib modules for upstream portability.
> Then:
> 
> > @@ -356,8 +356,8 @@ static void add_dir_entries (MYDBM_FILE
> >        * or . files (such as current, parent dir).
> >        */
> >  
> > -     dir = opendir (infile);
> > -     if (!dir) {
> > +     n = scandir(infile, &namelist, NULL, alphasort);
> 
> IMO we might as well move the filtering currently being done in the
> while loop below to a scandir filter function.
> 
> I would prefer not to use alphasort here, because it's locale-dependent
> (not that it will matter very much in practice with the sorts of file
> names that typically appear in manual page directories, but I can
> imagine edge cases).  A variant that's deliberately locale-independent
> is only a few lines of code.

That makes sense. Thank you.

> > @@ -367,13 +367,13 @@ static void add_dir_entries (MYDBM_FILE
> >  
> >          /* strlen(newdir->d_name) could be replaced by newdir->d_reclen */
> >  
> > -     while ((newdir = readdir (dir)) != NULL) {
> > -             if (*newdir->d_name == '.' &&
> > -                 strlen (newdir->d_name) < (size_t) 3)
> > +     while (n--) {
> 
> I guess this might have been borrowed from the example in the scandir(3)
> manual page, because it goes in reverse order; I don't think there's a
> good reason to do that.

In fact, my patch is borrowed from existing patches that the reproducible
builds team submitted to packages and...

> > +     free(namelist);
> 
> This leaks memory.  We need to free all the elements of this list as
> well.

...all these patches are missing a free() of the individual list members, so
they all leak memory. Thanks a lot for spotting this -- I guess now I can
prepare a few more patches for other packages. This would not've happened if I
indeed had read the scandir(3) manual page more carefully, which does free()
correctly. XD

> >       order_files (infile, &names);
> 
> This means that the conversion to scandir in this function is in
> practice going to be ineffective, because we immediately turn around and
> re-sort the list by the physical locations of the first blocks of the
> corresponding files.  Won't this have just as much of an effect on
> reproducibility in principle, even if it doesn't happen to affect
> mmdebstrap in your tests, presumably due to something like disk order
> typically being similar between runs if your disk isn't too full?

This is interesting. I've tested my patch by executing man-db on a filesystem
mounted with disorderfs --shuffle-dirents=yes and observed that using scandir()
was somehow necessary in both locations.

> I'm experimenting with simply removing the order_files call here, on the
> basis that other performance improvements to mandb(8) have made it less
> critical.  It does seem to slow things down slightly even on an SSD, though
> that may be measurement error; I still need to compare timings on a
> rotational drive (but I will).

Thanks!

> More interestingly, it changes accessdb(8) output in ways that aren't just
> obvious consequences of sorting (multi keys change due to that, but as far as
> I can tell that's fine).  So far I've spotted the following issues:
> 
>  * A number of new entries are introduced for some reason (and the
>    question is why they were missing beforehand).
> 
>  * The targets of WHATIS_MAN references change (admittedly in cases
>    where there are multiple possibilities, but we should be picking a
>    deterministic one rather than just the first).
> 
>  * Symlinks flip between ULT_MAN and SO_MAN depending on whether the
>    symlink happens to sort before its target.
> 
>  * The database's idea of whether pages require processing via tbl(1)
>    (and probably other preprocessors) changes in some cases.  (Fixed in
>    https://gitlab.com/cjwatson/man-db/-/commit/1873051fdb.)
> 
> I think these are mostly existing bugs, but I intend to fix them first
> before attempting to land your patch, because this is all delicate
> enough that I really want to be sure of exactly what's changing.  (Since
> these are all order-dependent bugs, it may even be that fixing all these
> will make parts of your patch unnecessary; but we'll see.)
> 
> I will keep working on this, and expect to be able to get a new release out
> in a week or two.

Thanks a lot for working on this! Once you think you are done, feel free to
ping me in case you'd like me to run my test suite with man-db including all
your changes, making sure that they have the desired effect in the environment
I'm running it in. Currently, my workaround is to create a chroot in a TMPDIR
mounted as tmpfs. It seems that the tmpfs directory entries are somehow
returned deterministically even across different tmpfs mounts.

Thanks!

cheers, josch
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Thu, 22 Sep 2022 15:51:02 GMT) (full text, mbox, link).


Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Colin Watson <cjwatson@debian.org>. (Thu, 22 Sep 2022 15:51:02 GMT) (full text, mbox, link).


Message #26 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Holger Levsen <holger@layer-acht.org>
To: 1010957@bugs.debian.org, josch@debian.org
Subject: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Thu, 22 Sep 2022 15:48:30 +0000
[Message part 1 (text/plain, inline)]
hi!

Colin, what's the status of this bug? You said you were working on improving
josch' patch in May 2022...?! :)

Also, the bug is currently tagged 'patch', I guess it's appropriate to remove
that tag?

josch: btw you said you you submitted other patches missing freeing of memory,
have you updated those other patches?


-- 
cheers,
	Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

We live in a world where teenagers get more and more desperate trying to
convince adults to behave like grown ups.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#1010957; Package src:man-db. (Thu, 22 Sep 2022 19:57:02 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Thu, 22 Sep 2022 19:57:03 GMT) (full text, mbox, link).


Message #31 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Holger Levsen <holger@layer-acht.org>, 1010957@bugs.debian.org
Cc: josch@debian.org
Subject: Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Thu, 22 Sep 2022 20:53:07 +0100
Control: tag -1 - patch

On Thu, Sep 22, 2022 at 03:48:30PM +0000, Holger Levsen wrote:
> Colin, what's the status of this bug? You said you were working on improving
> josch' patch in May 2022...?! :)

Yeah, this has taken me a bit longer than expected, but I have in fact
been making some progress.  josch's patch has been very useful in that
it provides an easy way to see differences between unsorted and sorted
traversal, and I've taken my goal as being to drive those differences to
zero.  The only bit I've committed so far has been:

  https://gitlab.com/cjwatson/man-db/-/commit/bb0f7086ba4ce4503761737bf612088c03b6c495

I also have a few hundred lines of somewhat untidy patch that I'll
commit in a few pieces as soon as I'm certain of it; this is all
essentially about stabilizing the decisions about which database entries
win compared to which other entries, so that the end result doesn't
change depending on the scan order.  With that, I'm down to on the order
of 150 lines of diff of accessdb output against the result of josch's
patch, and I think there are only about one or two problems left.

A lot of the remaining difficulties are due to somewhat impenetrable old
code which appeared to be trying to micro-optimize memory usage in a way
that I don't think makes sense nowadays, so I may take a bit of a
digression into reorganizing some of this.

I'll update this bug as I make further progress.

> Also, the bug is currently tagged 'patch', I guess it's appropriate to remove
> that tag?

Done.

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]



Removed tag(s) patch. Request was from Colin Watson <cjwatson@debian.org> to 1010957-submit@bugs.debian.org. (Thu, 22 Sep 2022 19:57:03 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Thu, 22 Sep 2022 20:03:02 GMT) (full text, mbox, link).


Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Colin Watson <cjwatson@debian.org>. (Thu, 22 Sep 2022 20:03:02 GMT) (full text, mbox, link).


Message #38 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Holger Levsen <holger@layer-acht.org>
To: Colin Watson <cjwatson@debian.org>
Cc: 1010957@bugs.debian.org, josch@debian.org
Subject: Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Thu, 22 Sep 2022 20:01:45 +0000
[Message part 1 (text/plain, inline)]
Hi Colin,

On Thu, Sep 22, 2022 at 08:53:07PM +0100, Colin Watson wrote:
> Yeah, this has taken me a bit longer than expected, but I have in fact
> been making some progress.  josch's patch has been very useful in that
> it provides an easy way to see differences between unsorted and sorted
> traversal, and I've taken my goal as being to drive those differences to
> zero.  The only bit I've committed so far has been:
> 
>   https://gitlab.com/cjwatson/man-db/-/commit/bb0f7086ba4ce4503761737bf612088c03b6c495

cool, thanks for the update and all your man-db work!

> I'll update this bug as I make further progress.

great, thanks again! 


-- 
cheers,
	Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

Imagine god created trillions of galaxies but freaks out because some dude
kisses another.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#1010957; Package src:man-db. (Sun, 25 Sep 2022 22:21:05 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Sun, 25 Sep 2022 22:21:05 GMT) (full text, mbox, link).


Message #43 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Holger Levsen <holger@layer-acht.org>, 1010957@bugs.debian.org
Cc: josch@debian.org
Subject: Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Sun, 25 Sep 2022 23:18:19 +0100
This weekend's work has been:

  https://gitlab.com/cjwatson/man-db/-/compare/bb0f7086ba...5d2863d0a0

A lot of this was code rearrangement that I needed to do before I could
make progress on the real issues, but if you look at the NEWS.md diff
you'll see a number of changes that relate to this bug.  With all of
that, there are 33 lines of diff of accessdb output remaining on my
system against the result of josch's patch, which come down to two
issues:

 * unstable choice of whatis target for pages with many entries in NAME,
   some but not all of which are represented as symlinks in the
   filesystem to a file name that is not itself in NAME (there are some
   examples of this in libbsd-dev and libmd-dev)
 * some difficulty deciding exactly what to do with cross-section links
   in some cases (inetd.conf(5) → inetd(8))

I'll need a bit more concentrated hacking time here, but I'll continue
to work on these; this has been a great opportunity to clean up some
truly unpleasant bits of code.  Once I have the accessdb diff down to
zero, we'll see whether there's any further instability in the on-disk
GDBM representation, and also whether there are any other issues that
don't show up in the set of pages I have installed.

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Mon, 26 Sep 2022 13:09:02 GMT) (full text, mbox, link).


Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Colin Watson <cjwatson@debian.org>. (Mon, 26 Sep 2022 13:09:02 GMT) (full text, mbox, link).


Message #48 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Holger Levsen <holger@layer-acht.org>
To: Colin Watson <cjwatson@debian.org>
Cc: 1010957@bugs.debian.org, josch@debian.org
Subject: Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Mon, 26 Sep 2022 13:07:08 +0000
[Message part 1 (text/plain, inline)]
Hi Colin,

On Sun, Sep 25, 2022 at 11:18:19PM +0100, Colin Watson wrote:
> This weekend's work has been:
>   https://gitlab.com/cjwatson/man-db/-/compare/bb0f7086ba...5d2863d0a0

wow, impressive!

(and thank you for taking care of man-db for so many years now! :)

[...]
> I'll need a bit more concentrated hacking time here, but I'll continue
> to work on these; this has been a great opportunity to clean up some
> truly unpleasant bits of code.  Once I have the accessdb diff down to
> zero, we'll see whether there's any further instability in the on-disk
> GDBM representation, and also whether there are any other issues that
> don't show up in the set of pages I have installed.

sounds great! also thank you for keeping us updated here, i'm looking
forward to hear more good news eventually! :)


-- 
cheers,
	Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

I'm looking forward to Corona being a beer again and Donald a duck.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#1010957; Package src:man-db. (Sun, 02 Oct 2022 15:03:04 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Sun, 02 Oct 2022 15:03:04 GMT) (full text, mbox, link).


Message #53 received at 1010957@bugs.debian.org (full text, mbox, --include=./man-db_2.10.3~20221002-1_amd64.deb" to inject the new .deb. > The two resulting tarballs had somewhat differing file lists (timestamps > etc.), but all the actual files in the tarballs were bitwise-identical. > > Feel free to do any other testing you think might be useful. There's a > bootstrapped source tarball attached as an artifact to the > "build-distcheck" CI job in GitLab that you can easily use to build a > snapshot .deb if you need one. > > -- > Colin Watson (he/him) [cjwatson@debian.org] > > &In-Reply-To=&subject=Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order">reply):

From: Colin Watson <cjwatson@debian.org>
To: Holger Levsen <holger@layer-acht.org>, 1010957@bugs.debian.org
Cc: josch@debian.org
Subject: Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Sun, 2 Oct 2022 16:00:58 +0100
Control: tag -1 fixed-upstream

Success!

  https://gitlab.com/cjwatson/man-db/-/compare/5d2863d0a0...866c3571d3

As well as more localized testing, I built a .deb with this and used
josch's instructions from the start of this bug to build mmdebstrap
tarballs via disorderfs, using
"--hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount
--include=./man-db_2.10.3~20221002-1_amd64.deb" to inject the new .deb.
The two resulting tarballs had somewhat differing file lists (timestamps
etc.), but all the actual files in the tarballs were bitwise-identical.

Feel free to do any other testing you think might be useful.  There's a
bootstrapped source tarball attached as an artifact to the
"build-distcheck" CI job in GitLab that you can easily use to build a
snapshot .deb if you need one.

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]



Added tag(s) fixed-upstream. Request was from Colin Watson <cjwatson@debian.org> to 1010957-submit@bugs.debian.org. (Sun, 02 Oct 2022 15:03:04 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Sun, 02 Oct 2022 15:54:02 GMT) (full text, mbox, link).


Message #58 received at 1010957@bugs.debian.org (full text, mbox, > --include=./man-db_2.10.3~20221002-1_amd64.deb" to inject the new .deb. > > The two resulting tarballs had somewhat differing file lists (timestamps > > etc.), but all the actual files in the tarballs were bitwise-identical. > > Did you maybe forget the "export SOURCE_DATE_EPOCH=XXX" step? Just replace XXX > with the output of `date +%s` but make sure that both mmdebstrap invocations > see the same value for SOURCE_DATE_EPOCH and then there should be zero > differences and a "cmp" should be sufficient to make sure that it works. > > Thanks! > > cheers, josch&subject=Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order">reply):

From: Johannes Schauer Marin Rodrigues <josch@debian.org>
To: 1010957@bugs.debian.org, Colin Watson <cjwatson@debian.org>, Holger Levsen <holger@layer-acht.org>
Subject: Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Sun, 02 Oct 2022 17:50:07 +0200
[Message part 1 (text/plain, inline)]
Quoting Colin Watson (2022-10-02 17:00:58)
> Success!
> 
>   https://gitlab.com/cjwatson/man-db/-/compare/5d2863d0a0...866c3571d3

Thank you!! :D

> 
> As well as more localized testing, I built a .deb with this and used
> josch's instructions from the start of this bug to build mmdebstrap
> tarballs via disorderfs, using
> "--hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount
> --include=./man-db_2.10.3~20221002-1_amd64.deb" to inject the new .deb.
> The two resulting tarballs had somewhat differing file lists (timestamps
> etc.), but all the actual files in the tarballs were bitwise-identical.

Did you maybe forget the "export SOURCE_DATE_EPOCH=XXX" step? Just replace XXX
with the output of `date +%s` but make sure that both mmdebstrap invocations
see the same value for SOURCE_DATE_EPOCH and then there should be zero
differences and a "cmp" should be sufficient to make sure that it works.

Thanks!

cheers, josch
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#1010957; Package src:man-db. (Sun, 02 Oct 2022 16:57:02 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Sun, 02 Oct 2022 16:57:03 GMT) (full text, mbox, link).


Message #63 received at 1010957@bugs.debian.org (full text, mbox, > > --include=./man-db_2.10.3~20221002-1_amd64.deb" to inject the new .deb. > > > The two resulting tarballs had somewhat differing file lists (timestamps > > > etc.), but all the actual files in the tarballs were bitwise-identical. > > > > Did you maybe forget the "export SOURCE_DATE_EPOCH=XXX" step? Just replace XXX > > with the output of `date +%s` but make sure that both mmdebstrap invocations > > see the same value for SOURCE_DATE_EPOCH and then there should be zero > > differences and a "cmp" should be sufficient to make sure that it works. > > I thought I'd set SOURCE_DATE_EPOCH, but I'd failed to pass it through > sudo. After fixing that, I indeed get cmp-identical tarballs. > > -- > Colin Watson (he/him) [cjwatson@debian.org] > > &In-Reply-To=&subject=Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order&References=<165251080896.541081.3004363942085536939.reportbug@localhost> <166472580726.73324.7253962179275978376@localhost> ">reply):

From: Colin Watson <cjwatson@debian.org>
To: Johannes Schauer Marin Rodrigues <josch@debian.org>
Cc: 1010957@bugs.debian.org, Holger Levsen <holger@layer-acht.org>
Subject: Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Sun, 2 Oct 2022 17:56:19 +0100
On Sun, Oct 02, 2022 at 05:50:07PM +0200, Johannes Schauer Marin Rodrigues wrote:
> Quoting Colin Watson (2022-10-02 17:00:58)
> > As well as more localized testing, I built a .deb with this and used
> > josch's instructions from the start of this bug to build mmdebstrap
> > tarballs via disorderfs, using
> > "--hook-dir=/usr/share/mmdebstrap/hooks/file-mirror-automount
> > --include=./man-db_2.10.3~20221002-1_amd64.deb" to inject the new .deb.
> > The two resulting tarballs had somewhat differing file lists (timestamps
> > etc.), but all the actual files in the tarballs were bitwise-identical.
> 
> Did you maybe forget the "export SOURCE_DATE_EPOCH=XXX" step? Just replace XXX
> with the output of `date +%s` but make sure that both mmdebstrap invocations
> see the same value for SOURCE_DATE_EPOCH and then there should be zero
> differences and a "cmp" should be sufficient to make sure that it works.

I thought I'd set SOURCE_DATE_EPOCH, but I'd failed to pass it through
sudo.  After fixing that, I indeed get cmp-identical tarballs.

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Mon, 03 Oct 2022 15:30:02 GMT) (full text, mbox, link).


Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Colin Watson <cjwatson@debian.org>. (Mon, 03 Oct 2022 15:30:03 GMT) (full text, mbox, link).


Message #68 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Holger Levsen <holger@layer-acht.org>
To: Colin Watson <cjwatson@debian.org>
Cc: 1010957@bugs.debian.org, josch@debian.org
Subject: Re: Bug#1010957: status update? Re: Bug#1010957: man-db: unreproducible index.db: contents depend on directory read order
Date: Mon, 3 Oct 2022 15:27:25 +0000
[Message part 1 (text/plain, inline)]
On Sun, Oct 02, 2022 at 04:00:58PM +0100, Colin Watson wrote:
> Control: tag -1 fixed-upstream
> Success!
>   https://gitlab.com/cjwatson/man-db/-/compare/5d2863d0a0...866c3571d3

awesome!

On Sun, Oct 02, 2022 at 05:56:19PM +0100, Colin Watson wrote:
> I thought I'd set SOURCE_DATE_EPOCH, but I'd failed to pass it through
> sudo.  After fixing that, I indeed get cmp-identical tarballs.

very nice! much cheers!


-- 
cheers,
	Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

Plastic bottles: made to last forever, designed to throw away.
[signature.asc (application/pgp-signature, inline)]

Reply sent to Colin Watson <cjwatson@debian.org>:
You have taken responsibility. (Sat, 15 Oct 2022 15:30:06 GMT) (full text, mbox, link).


Notification sent to Johannes Schauer Marin Rodrigues <josch@debian.org>:
Bug acknowledged by developer. (Sat, 15 Oct 2022 15:30:07 GMT) (full text, mbox, link).


Message #73 received at 1010957-close@bugs.debian.org (full text, mbox, reply):

From: Debian FTP Masters <ftpmaster@ftp-master.debian.org>
To: 1010957-close@bugs.debian.org
Subject: Bug#1010957: fixed in man-db 2.11.0-1
Date: Sat, 15 Oct 2022 15:28:30 +0000
Source: man-db
Source-Version: 2.11.0-1
Done: Colin Watson <cjwatson@debian.org>

We believe that the bug you reported is fixed in the latest version of
man-db, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 1010957@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Colin Watson <cjwatson@debian.org> (supplier of updated man-db package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Sat, 15 Oct 2022 15:48:48 +0100
Source: man-db
Architecture: source
Version: 2.11.0-1
Distribution: unstable
Urgency: medium
Maintainer: Colin Watson <cjwatson@debian.org>
Changed-By: Colin Watson <cjwatson@debian.org>
Closes: 1010957 1012078
Changes:
 man-db (2.11.0-1) unstable; urgency=medium
 .
   * New upstream release:
     - Allow the reproduction of bitwise-identical databases regardless of
       scan order (closes: #1010957).
     - Run preprocessors in the correct order (closes: #1012078).
Checksums-Sha1:
 d9094bc254ee84e260e781f4d2bc484af1d51940 2418 man-db_2.11.0-1.dsc
 66656a467f33aedbe639ccbb7c2048f2892c15a9 1923260 man-db_2.11.0.orig.tar.xz
 09aeeec2b13bc5f12850da9ff4624057fc6abf81 833 man-db_2.11.0.orig.tar.xz.asc
 8b5b339370c3ec0b34a181083b20f39653cc06ee 73400 man-db_2.11.0-1.debian.tar.xz
Checksums-Sha256:
 5bd20792d86773cece2ad6d03390f6bb6cc23ec8d2444b2624ce428a4ba25850 2418 man-db_2.11.0-1.dsc
 4130e1a6241280359ef5e25daec685533c0a1930674916202ab0579e5a232c51 1923260 man-db_2.11.0.orig.tar.xz
 c190edcbbc7b16d192d0babcaa2562cfb21616caec7bba1de16c172daf16ecf6 833 man-db_2.11.0.orig.tar.xz.asc
 87b3fe615143b7c8f706ccb79114eefa4f389744f90caf865238b596fc097ec4 73400 man-db_2.11.0-1.debian.tar.xz
Files:
 d02a481d186dcd259a24b8e07173dbf3 2418 doc important man-db_2.11.0-1.dsc
 ad12e19d4f86d866a3858decf6989746 1923260 doc important man-db_2.11.0.orig.tar.xz
 1dbea9a762e6d1a382f47f56cc772451 833 doc important man-db_2.11.0.orig.tar.xz.asc
 1bd1c261051de2eef406e98b7eebcfd0 73400 doc important man-db_2.11.0-1.debian.tar.xz

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEErApP8SYRtvzPAcEROTWH2X2GUAsFAmNKyKcACgkQOTWH2X2G
UAvolw/+OTIaXnnMUkbdEMVeZ/Em0jrHTM1rLad4kZsEqQNbSg1xqI8l97Q8kDI0
2aE6PdYcBF44qqgY+AjPLqRNmgBErvyagzpU9VZFKClAzHv8B+SuES/CN3rE8zsm
3EY7ir3uwQi93NIeSXXrMdh9Bfh0ktMoo0uOnlKMb8L+dNrVSkdDOXx6dH338dZ8
DkfzSKEGXpTznPlKOYRNnckVC80aInXs/Yjr/LsZZXJK+qPBN0Llha9IRPscrDoq
Fg/OnoX+UJPry0+hFMyYrwINstjCzyXRw3QbTWMPTzW9/4oNa8n37xeqSzfBAR+c
tg2RSjZitspka8P/1356M/EJ85CCR5dEb+URDlYRIRb7MtCO1OVzeYGvQyZJBOGD
SgQvBtkn4E9/iL7Cdnu3Y4aTaU3hi6sOps2pdCs/dHEOiVaf9hvK7silvHF6FQB8
7uUIGhgwucrILuivgpQ1XyQzSTb4va9iOvIwo/T9ucu20XJcI7VsjaeaA04mJJM0
7Klc8uHKgDes6zHYExL4G9MF6nu2lBflxMyMaB0KAxizyj46fZHR0RC3vLQ8MaxB
RyKaz0du4TpsxlpqACG3c0i80PGIFwuRyYRp0+kk/YEMk1JjIK1UPt2vbBl6ng9f
ID3pjhH49cUxFK/9B6gNPCF5d3bALoEA5/wSHmQjZgkXrUeA+yc=
=gelV
-----END PGP SIGNATURE-----




Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#1010957; Package src:man-db. (Sun, 16 Oct 2022 16:12:02 GMT) (full text, mbox, link).


Message #76 received at 1010957@bugs.debian.org (full text, mbox, reply):

From: Johannes Schauer Marin Rodrigues <josch@debian.org>
To: cjwatson@debian.org
Cc: 1010957@bugs.debian.org, holger@layer-acht.org
Subject: Re: Bug#1010957 closed by Debian FTP Masters <ftpmaster@ftp-master.debian.org> (reply to Colin Watson <cjwatson@debian.org>) (Bug#1010957: fixed in man-db 2.11.0-1)
Date: Sun, 16 Oct 2022 18:09:32 +0200
[Message part 1 (text/plain, inline)]
Hi Colin,

Quoting Debian Bug Tracking System (2022-10-15 17:30:07)
> This is an automatic notification regarding your Bug report
> which was filed against the src:man-db package:
> 
> #1010957: man-db: unreproducible index.db: contents depend on directory read order
> 
> It has been closed by Debian FTP Masters <ftpmaster@ftp-master.debian.org> (reply to Colin Watson <cjwatson@debian.org>).

thank you! I just confirmed that man-db 2.11.0-1 indeed fixes this and removed
my workaround from mmdebstrap:

https://gitlab.mister-muffin.de/josch/mmdebstrap/commit/aac7157820c6e278e140e132d25bdcce979fd4bc

cheers, josch
[signature.asc (application/pgp-signature, inline)]

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Fri, 18 Nov 2022 07:27:52 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Jan 31 00:19:59 2025; Machine Name: bembo

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.