• Bug#1068483: dpkg-genbuildinfo: Should buildinfo files copy the hash of

    From Adrian Bunk@21:1/5 to All on Sat Apr 6 09:51:29 2024
    XPost: linux.debian.bugs.dist

    Package: dpkg-dev
    Version: 1.22.6
    Severity: normal
    X-Debbugs-Cc: reproducible-builds@lists.alioth.debian.org

    A thought I already wrote in a recent debian-devel discussion:

    In theory source package filenames should be eternally and globally
    unique, but in practice there are cornercases where this assumption
    might break like for example:
    - *stable-security does not currently have a copy of the sources
    in the main archive, one always have to upload the source archive
    there and this might accidentally be a different orig.tar
    - dak does not keep an eternal history of everything it ever knew,
    e.g. RM and later re-NEW of a source version might have a different
    source .orig.tar or even different sources for a Debian revision
    - Debian and Ubuntu might have different orig.tar for the same version,
    if Ubuntu updated a package before Debian did, or with packages
    were development is completely independent in Debian and Ubuntu
    (e.g. OpenStack, KDE)

    The reason for different files might be as trivial as "git archive"
    not always producing the same output when running in different
    environments, e.g. the autogenerated tarball for a git tag on Github
    might have different checksums depending on whether it is downloaded
    today or next year despite identical contents due to slightly
    different gzip compression.

    Should buildinfo files contain the hashes of the source package,
    to clearly define what sources have been used?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Guillem Jover@1:229/2 to Adrian Bunk on Sat Apr 6 09:52:39 2024
    XPost: linux.debian.bugs.dist
    From: guillem@debian.org

    Hi!

    On Sat, 2024-04-06 at 02:56:02 +0300, Adrian Bunk wrote:
    Package: dpkg-dev
    Version: 1.22.6
    Severity: normal
    X-Debbugs-Cc: reproducible-builds@lists.alioth.debian.org

    A thought I already wrote in a recent debian-devel discussion:

    In theory source package filenames should be eternally and globally
    unique, but in practice there are cornercases where this assumption
    might break like for example:
    - *stable-security does not currently have a copy of the sources
    in the main archive, one always have to upload the source archive
    there and this might accidentally be a different orig.tar
    - dak does not keep an eternal history of everything it ever knew,
    e.g. RM and later re-NEW of a source version might have a different
    source .orig.tar or even different sources for a Debian revision
    - Debian and Ubuntu might have different orig.tar for the same version,
    if Ubuntu updated a package before Debian did, or with packages
    were development is completely independent in Debian and Ubuntu
    (e.g. OpenStack, KDE)

    The reason for different files might be as trivial as "git archive"
    not always producing the same output when running in different
    environments, e.g. the autogenerated tarball for a git tag on Github
    might have different checksums depending on whether it is downloaded
    today or next year despite identical contents due to slightly
    different gzip compression.

    Should buildinfo files contain the hashes of the source package,
    to clearly define what sources have been used?

    Ideally? Yes, and I think we considered that at the time when we
    introduced the .buildinfo files. Although a ref to the .dsc does get
    included if the build is also creating the source package.

    The problem is that when dpkg-buildpackage is not building the source
    package, there is no guarantee the source package is going to be
    present, or that if it is present it matches what is currently being
    built from the working directory.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Guillem Jover@1:229/2 to Ximin Luo on Tue Apr 9 03:40:01 2024
    XPost: linux.debian.bugs.dist
    From: guillem@debian.org

    Control: forcemerge 882511 1068483

    Hi!

    After replying to Adrian's report, I recalled there being a previous one
    that was similar, and then recalled that I had an even older branch that implemented a potential solution for this. See below.

    On Thu, 2017-11-23 at 16:23:29 +0100, Ximin Luo wrote:
    Package: dpkg-dev
    Version: 1.19.0.4
    Severity: wishlist
    Tags: patch

    dpkg-buildpackage currently does not automatically list the source .dsc nor its hash in the call to dpkg-genbuildinfo when doing a binary-only build. This
    is understandable because in a binary-only build, dpkg-buildpackage does not have any concept of a source package and therefore does not know (and cannot verify) if the working tree was actually generated from any .dsc or not.

    However, the caller knows this information, and it is useful for reproducible builds to track exactly which (i.e. hash-wise) source code generates which binary packages. So it should be possible for the caller to tell dpkg-buildpackage, "yes please do include the .dsc hash in the buildinfo, I am
    telling you it is correct, you can assume this safely".

    Tools like sbuild/pbuilder could then do this, as well as users or rebuilders.

    The attached patch implements this in the simplest way possible. It allows the
    caller to run something like:

    $ dpkg-buildpackage --no-sign -b --buildinfo-option=--build=full

    The resulting $pkg_$ver_$arch.buildinfo then contains the .dsc and its hash.

    However this requires the caller to know which option to pass, which would either be

    --buildinfo-option=--build=full
    --buildinfo-option=--build=any,source
    --buildinfo-option=--build=all,source

    depending on whether the original build request (to dpkg-buildpackage) was a -b, -B, or -A.

    For this reason, it may be better (more usable) to add a --force-source-in-buildinfo
    flag (or similar name) and when this is switched on, do this instead:

    -push @buildinfo_opts, "--build=$build_types" if build_has_none(BUILD_DEFAULT);
    +push @buildinfo_opts, "--build=$build_types,source" if build_has_none(BUILD_DEFAULT);

    Let me know if you like this idea and I'll be happy to implement that instead of
    the attached patch.

    The problem with this solution is that it is prone do accidental use,
    as it is very easy for a user to unknowingly have recreated the sources
    from a locally extracted tree (be that modified or not).

    On Sat, 2024-04-06 at 02:57:40 +0200, Guillem Jover wrote:
    On Sat, 2024-04-06 at 02:56:02 +0300, Adrian Bunk wrote:
    Package: dpkg-dev
    Version: 1.22.6
    Severity: normal
    X-Debbugs-Cc: reproducible-builds@lists.alioth.debian.org

    A thought I already wrote in a recent debian-devel discussion:

    In theory source package filenames should be eternally and globally
    unique, but in practice there are cornercases where this assumption
    might break like for example:
    - *stable-security does not currently have a copy of the sources
    in the main archive, one always have to upload the source archive
    there and this might accidentally be a different orig.tar
    - dak does not keep an eternal history of everything it ever knew,
    e.g. RM and later re-NEW of a source version might have a different
    source .orig.tar or even different sources for a Debian revision
    - Debian and Ubuntu might have different orig.tar for the same version,
    if Ubuntu updated a package before Debian did, or with packages
    were development is completely independent in Debian and Ubuntu
    (e.g. OpenStack, KDE)

    The reason for different files might be as trivial as "git archive"
    not always producing the same output when running in different environments, e.g. the autogenerated tarball for a git tag on Github
    might have different checksums depending on whether it is downloaded
    today or next year despite identical contents due to slightly
    different gzip compression.

    Should buildinfo files contain the hashes of the source package,
    to clearly define what sources have been used?

    Ideally? Yes, and I think we considered that at the time when we
    introduced the .buildinfo files. Although a ref to the .dsc does get
    included if the build is also creating the source package.

    The problem is that when dpkg-buildpackage is not building the source package, there is no guarantee the source package is going to be
    present, or that if it is present it matches what is currently being
    built from the working directory.

    I've now finished the change I had in that branch, which implements
    support so that dpkg-buildpackage can be passed a .dsc or a source-dir,
    and in the former will first extract it, and for both then it will
    change directory to the source tree. If it got passed a .dsc then it
    will instruct dpkg-genbuildinfo to include a ref to it.

    Which I think accomplishes the requested behavior in a safe way? I've
    attached what I've got, which I'm planning on merging for 1.22.7. I'll
    probably split that into two commits though before merging.

    Thanks,
    Guillem

    From 972a9630fdb25705ca011c9b6b9c8a0a75bca6ea Mon Sep 17 00:00:00 2001
    From: Guillem Jover <guillem@debian.org>
    Date: Wed, 17 Aug 2016 00:54:47 +0200
    Subject: [PATCH] dpkg-buildpackage: Add support for building from a .dsc or
    dir

    This adds support to build directly from a source package .dsc, or to
    specify a source directory to use for the build. In the first case we

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Vagrant Cascadian@1:229/2 to Guillem Jover on Thu Apr 11 00:30:01 2024
    XPost: linux.debian.bugs.dist
    From: vagrant@reproducible-builds.org

    On 2024-04-09, Guillem Jover wrote:
    I've now finished the change I had in that branch, which implements
    support so that dpkg-buildpackage can be passed a .dsc or a source-dir,
    and in the former will first extract it, and for both then it will
    change directory to the source tree. If it got passed a .dsc then it
    will instruct dpkg-genbuildinfo to include a ref to it.

    Which I think accomplishes the requested behavior in a safe way? I've attached what I've got, which I'm planning on merging for 1.22.7. I'll probably split that into two commits though before merging.

    Had a chance to take this for a test run, and it appears to work, though
    with a few surprises...

    dpkg-buildpackage -- hello_2.10-3.dsc

    Ends up regenerating the .dsc, as --build=any,all,source by default
    ... which may end up with a different .dsc checksum in the .buildinfo
    than .dsc that was passed on the commandline. Which makes some sense,
    but maybe would be better to error out? I would not expect to regenerate
    the .dsc if you're passing dpkg-buildpackage a .dsc!


    dpkg-buildpackage --build=any,all -- /path/to/hello_2.10-3.dsc

    Fails to find the .dsc file, as it appears to extract the sources to
    hello-2.10 and then expects to find ../hello_2.10-3.dsc


    All that said ... this seemed to work for me:

    dpkg-buildpackage --build=any,all -- hello_2.10-3.dsc

    So yay, progress! Thanks!


    All of the above cases do not clean up the hello-2.10 extracted from the
    .dsc file, so re-running any of the above need to manually clean that or
    run from a clean directory or experience various failure modes with the existing hellp-2.10 directory.


    So a few little glitches, but overall this seems close to something we
    have really wanted for reproducible builds! And just for good measure,
    thanks!


    live well,
    vagrant

    -----BEGIN PGP SIGNATURE-----

    iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCZhcRNQAKCRDcUY/If5cW qtIiAQDA2BsXumHVUvYdp2cnNNUDu/s+YkEA0Zh5gV22ygP7nAD+LvdOKft0H5Vo jW57KSoJvFCzxnTwj2evCCVRxJAZlwQ=
    =8oUX
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Guillem Jover@1:229/2 to Vagrant Cascadian on Thu Apr 11 13:20:01 2024
    XPost: linux.debian.bugs.dist
    From: guillem@debian.org

    Hi!

    On Wed, 2024-04-10 at 15:22:45 -0700, Vagrant Cascadian wrote:
    On 2024-04-09, Guillem Jover wrote:
    I've now finished the change I had in that branch, which implements
    support so that dpkg-buildpackage can be passed a .dsc or a source-dir,
    and in the former will first extract it, and for both then it will
    change directory to the source tree. If it got passed a .dsc then it
    will instruct dpkg-genbuildinfo to include a ref to it.

    Which I think accomplishes the requested behavior in a safe way? I've attached what I've got, which I'm planning on merging for 1.22.7. I'll probably split that into two commits though before merging.

    Had a chance to take this for a test run, and it appears to work, though
    with a few surprises...

    Ah, thanks for the testing, that was very helpful! :)

    dpkg-buildpackage -- hello_2.10-3.dsc

    Ends up regenerating the .dsc, as --build=any,all,source by default
    ... which may end up with a different .dsc checksum in the .buildinfo
    than .dsc that was passed on the commandline. Which makes some sense,
    but maybe would be better to error out? I would not expect to regenerate
    the .dsc if you're passing dpkg-buildpackage a .dsc!

    Hmm, right I think I had documented that locally in the manual page,
    but I can see how this can be surprising. I've for now switched the
    code to not regenerate the .dsc when that is being passed, but the
    problem is that I think the three options are potentially correct:

    * regen: If you built the source on a stable/unstable system, then
    you'd want to regenerate it on the target one (for unstable or a
    backport or stable update), otherwise we might get compatibility
    issues or missed updates. It is also what is being requested when
    calling dpkg-buildpackage (as in "please build source and
    binaries" :).
    * no-regen: If we rebuild then we might end up with inconsistent
    sources if these are tracked in different places, and if you pass
    it the sources then it seems logical to expect them not to be
    regenerated.
    * error: This is the safe option of "both options are correct, let's
    do none :D", of deferring the interface behavior.

    Even though I changed it to no-regen for now, I'm thinking, though,
    that the regen behavior is the more correct one.

    dpkg-buildpackage --build=any,all -- /path/to/hello_2.10-3.dsc

    Fails to find the .dsc file, as it appears to extract the sources to hello-2.10 and then expects to find ../hello_2.10-3.dsc

    Ah, right, this is expected to be a filename not a pathname. (Placing
    the source elsewhere is not currently feasible, see #657401; I mean I
    guess dpkg-buildpackage could copy the source but…).

    I've now added a check, although I'll be reworking it a bit before
    merging, because it will emit confusing output if you specify «./filename.dsc» as not being in the current directory. :)

    All that said ... this seemed to work for me:

    dpkg-buildpackage --build=any,all -- hello_2.10-3.dsc

    So yay, progress! Thanks!

    Great, thanks!

    All of the above cases do not clean up the hello-2.10 extracted from the
    .dsc file, so re-running any of the above need to manually clean that or
    run from a clean directory or experience various failure modes with the existing hellp-2.10 directory.

    I've also added an explicit check, and dpkg-buildpackage now will error
    out if the directory already exists. I don't think removing a
    pre-existing directory would be safe (at least w/o an explicit option
    to do so). But perhaps, as you hinted, removing the source tree (for a successful build) after finishing would indeed be an option, hmm.

    So a few little glitches, but overall this seems close to something we
    have really wanted for reproducible builds! And just for good measure, thanks!

    I force-pushed the reworked code into:

    https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/log/?h=pu/dpkg-buildpackage-dsc

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)