Package: dpkg-dev
Version: 1.22.6
Severity: normal
X-Debbugs-Cc: reproducible-builds@lists.alioth.debian.org
A thought I already wrote in a recent debian-devel discussion:
In theory source package filenames should be eternally and globally
unique, but in practice there are cornercases where this assumption
might break like for example:
- *stable-security does not currently have a copy of the sources
in the main archive, one always have to upload the source archive
there and this might accidentally be a different orig.tar
- dak does not keep an eternal history of everything it ever knew,
e.g. RM and later re-NEW of a source version might have a different
source .orig.tar or even different sources for a Debian revision
- Debian and Ubuntu might have different orig.tar for the same version,
if Ubuntu updated a package before Debian did, or with packages
were development is completely independent in Debian and Ubuntu
(e.g. OpenStack, KDE)
The reason for different files might be as trivial as "git archive"
not always producing the same output when running in different
environments, e.g. the autogenerated tarball for a git tag on Github
might have different checksums depending on whether it is downloaded
today or next year despite identical contents due to slightly
different gzip compression.
Should buildinfo files contain the hashes of the source package,
to clearly define what sources have been used?
Package: dpkg-dev
Version: 1.19.0.4
Severity: wishlist
Tags: patch
dpkg-buildpackage currently does not automatically list the source .dsc nor its hash in the call to dpkg-genbuildinfo when doing a binary-only build. This
is understandable because in a binary-only build, dpkg-buildpackage does not have any concept of a source package and therefore does not know (and cannot verify) if the working tree was actually generated from any .dsc or not.
However, the caller knows this information, and it is useful for reproducible builds to track exactly which (i.e. hash-wise) source code generates which binary packages. So it should be possible for the caller to tell dpkg-buildpackage, "yes please do include the .dsc hash in the buildinfo, I am
telling you it is correct, you can assume this safely".
Tools like sbuild/pbuilder could then do this, as well as users or rebuilders.
The attached patch implements this in the simplest way possible. It allows the
caller to run something like:
$ dpkg-buildpackage --no-sign -b --buildinfo-option=--build=full
The resulting $pkg_$ver_$arch.buildinfo then contains the .dsc and its hash.
However this requires the caller to know which option to pass, which would either be
--buildinfo-option=--build=full
--buildinfo-option=--build=any,source
--buildinfo-option=--build=all,source
depending on whether the original build request (to dpkg-buildpackage) was a -b, -B, or -A.
For this reason, it may be better (more usable) to add a --force-source-in-buildinfo
flag (or similar name) and when this is switched on, do this instead:
-push @buildinfo_opts, "--build=$build_types" if build_has_none(BUILD_DEFAULT);
+push @buildinfo_opts, "--build=$build_types,source" if build_has_none(BUILD_DEFAULT);
Let me know if you like this idea and I'll be happy to implement that instead of
the attached patch.
On Sat, 2024-04-06 at 02:56:02 +0300, Adrian Bunk wrote:
Package: dpkg-dev
Version: 1.22.6
Severity: normal
X-Debbugs-Cc: reproducible-builds@lists.alioth.debian.org
A thought I already wrote in a recent debian-devel discussion:
In theory source package filenames should be eternally and globally
unique, but in practice there are cornercases where this assumption
might break like for example:
- *stable-security does not currently have a copy of the sources
in the main archive, one always have to upload the source archive
there and this might accidentally be a different orig.tar
- dak does not keep an eternal history of everything it ever knew,
e.g. RM and later re-NEW of a source version might have a different
source .orig.tar or even different sources for a Debian revision
- Debian and Ubuntu might have different orig.tar for the same version,
if Ubuntu updated a package before Debian did, or with packages
were development is completely independent in Debian and Ubuntu
(e.g. OpenStack, KDE)
The reason for different files might be as trivial as "git archive"
not always producing the same output when running in different environments, e.g. the autogenerated tarball for a git tag on Github
might have different checksums depending on whether it is downloaded
today or next year despite identical contents due to slightly
different gzip compression.
Should buildinfo files contain the hashes of the source package,
to clearly define what sources have been used?
Ideally? Yes, and I think we considered that at the time when we
introduced the .buildinfo files. Although a ref to the .dsc does get
included if the build is also creating the source package.
The problem is that when dpkg-buildpackage is not building the source package, there is no guarantee the source package is going to be
present, or that if it is present it matches what is currently being
built from the working directory.
I've now finished the change I had in that branch, which implements
support so that dpkg-buildpackage can be passed a .dsc or a source-dir,
and in the former will first extract it, and for both then it will
change directory to the source tree. If it got passed a .dsc then it
will instruct dpkg-genbuildinfo to include a ref to it.
Which I think accomplishes the requested behavior in a safe way? I've attached what I've got, which I'm planning on merging for 1.22.7. I'll probably split that into two commits though before merging.
On 2024-04-09, Guillem Jover wrote:
I've now finished the change I had in that branch, which implements
support so that dpkg-buildpackage can be passed a .dsc or a source-dir,
and in the former will first extract it, and for both then it will
change directory to the source tree. If it got passed a .dsc then it
will instruct dpkg-genbuildinfo to include a ref to it.
Which I think accomplishes the requested behavior in a safe way? I've attached what I've got, which I'm planning on merging for 1.22.7. I'll probably split that into two commits though before merging.
Had a chance to take this for a test run, and it appears to work, though
with a few surprises...
dpkg-buildpackage -- hello_2.10-3.dsc
Ends up regenerating the .dsc, as --build=any,all,source by default
... which may end up with a different .dsc checksum in the .buildinfo
than .dsc that was passed on the commandline. Which makes some sense,
but maybe would be better to error out? I would not expect to regenerate
the .dsc if you're passing dpkg-buildpackage a .dsc!
dpkg-buildpackage --build=any,all -- /path/to/hello_2.10-3.dsc
Fails to find the .dsc file, as it appears to extract the sources to hello-2.10 and then expects to find ../hello_2.10-3.dsc
All that said ... this seemed to work for me:
dpkg-buildpackage --build=any,all -- hello_2.10-3.dsc
So yay, progress! Thanks!
All of the above cases do not clean up the hello-2.10 extracted from the
.dsc file, so re-running any of the above need to manually clean that or
run from a clean directory or experience various failure modes with the existing hellp-2.10 directory.
So a few little glitches, but overall this seems close to something we
have really wanted for reproducible builds! And just for good measure, thanks!
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 150:43:14 |
Calls: | 10,383 |
Files: | 14,054 |
Messages: | 6,417,791 |