• [PATCH] New dpkg-source build option: --no-generate-diff

    From Daniel Richard G.@21:1/5 to All on Wed May 22 09:20:01 2024
    I've rebased this patch against current Git.

    To recap, this option enables (re)building a source package without cross- checking against the orig tarball, in scenarios where the latter is
    superfluous and expensive. Some benchmarks from a package I work with
    basically tell the story. This is with my patch applied:

    $ ls -l *.orig.tar.xz
    -rw-r--r-- 1 skunk users 843547668 May 16 19:08 ungoogled-chromium_125.0.6422.60.orig.tar.xz

    $ time -p dpkg-source --no-preparation --abort-on-upstream-changes --build ungoogled-chromium-src
    dpkg-source.pl: info: using source format '3.0 (quilt)'
    dpkg-source.pl: info: building ungoogled-chromium using existing ./ungoogled-chromium_125.0.6422.60.orig.tar.xz
    dpkg-source.pl: info: using patch list from debian/patches/series dpkg-source.pl: info: building ungoogled-chromium in ungoogled-chromium_125.0.6422.60-1xtradeb1.2204.1.debian.tar.xz
    dpkg-source.pl: info: building ungoogled-chromium in ungoogled-chromium_125.0.6422.60-1xtradeb1.2204.1.dsc
    real 628.93
    user 198.92
    sys 88.00

    $ time -p dpkg-source --no-generate-diff --no-preparation --abort-on-upstream-changes --build ungoogled-chromium-src
    dpkg-source.pl: info: using source format '3.0 (quilt)'
    dpkg-source.pl: info: building ungoogled-chromium using existing ./ungoogled-chromium_125.0.6422.60.orig.tar.xz
    dpkg-source.pl: info: building ungoogled-chromium in ungoogled-chromium_125.0.6422.60-1xtradeb1.2204.1.debian.tar.xz
    dpkg-source.pl: info: building ungoogled-chromium in ungoogled-chromium_125.0.6422.60-1xtradeb1.2204.1.dsc
    real 11.62
    user 10.86
    sys 0.74


    --Daniel


    ---
    man/dpkg-source.pod | 11 ++++++++++-
    scripts/Dpkg/Source/Package/V2.pm | 32 +++++++++++++++++++++++++++++---
    2 files changed, 39 insertions(+), 4 deletions(-)

    diff --git a/man/dpkg-source.pod b/man/dpkg-source.pod
    index 736f42fd2..4102fd986 100644
    --- a/man/dpkg-source.pod
    +++ b/man/dpkg-source.pod
    @@ -647,7 +647,8 @@ patches have been applied during the extraction.

    B<Building>

    -All original tarballs found in the current directory are extracted in a
    +If B<--no-generate-diff> is not given, then
    +all original tarballs found in the current directory are extracted in a
    temporary directory by following the same logic as for the unpack, the
    debian directory is copied over in the temporary directory, and all
    patches except the automatic patch (B<debian-changes->I<version>
    @@ -791,6 +792,14 @@ Those options are only allowed
    in B<debian/source/local-options> so that all generated source
    packages have the same behavior by default.

    +=item B<--no-generate-diff>
    +
    +D
  • From Guillem Jover@21:1/5 to Daniel Richard G. on Thu Jul 11 05:40:01 2024
    Hi!

    On Wed, 2024-05-22 at 03:02:50 -0400, Daniel Richard G. wrote:
    I've rebased this patch against current Git.

    Thanks for the patch, and sorry for not commenting earlier. As I
    mentioned elsewhere, I've had this in mind, but I guess drafted
    multiple replies in my head which ended up never being delivered.

    (Also source package formats have become something of a contentious
    topic, and it feels sometimes a bit demotivating to work on these.)

    To recap, this option enables (re)building a source package without cross- checking against the orig tarball, in scenarios where the latter is superfluous and expensive. Some benchmarks from a package I work with basically tell the story. This is with my patch applied:

    Is this superfluous because you don't need the source at all (in which
    case I think a better option might be to not generate it) or because
    you (or the tool driving dpkg-buildpackage or dpkg-source) knows that
    there have been no changes (for example the source-tree being in a
    VCS)? Or perhaps this is needed for example for a CI or build system
    to transport the source-tree across installations, then perhaps a
    different transient format could be used for that purpose only (for
    example «dpkg-source --format="3.0 (git)" -b dir»?

    For the patch itself, I'm not very fond of the semantics it
    introduces, because while something similar can probably be specified
    for format 1.0, that one has rather loose semantics and is more prone
    to error. Personally I don't trust myself to remember if I've done
    changes to a tree (if it's not tracked by a VCS), so I see this
    diluting its robustness and checks.

    Depending on the scenarios you have in mind, a better option might
    perhaps be to either make dpkg-source integrate more tightly with a
    VCS, or perhaps create a new source format. Both of which I've had
    in mind for a while, but see the motivation bit above.

    For the VCS integration, something I've been pondering (and started
    drafting on a git branch) is a new extraction mode where dpkg-source
    would for example unpack the orig tarball(s) from a "3.0 (quilt)"
    source package, and create a git repo out of those, while tracking
    the digests for those tarballs. Then import the patches into git
    commits (and remove them from debian/patches/). The problem is that
    I don't really want to have to massage or track patch format
    differences to make them valid git commits, like some tools do. Then
    the build process could regenerate the "3.0 (quilt)" out from that
    working directory. And this could give you the speed you need, while
    not losing on robustness I guess? Perhaps a solution to the guarantee
    of having patches that are always git commits, could be to add a format
    variant from the quilt one where that holds (say "3.0 (git-format-patch)"
    or whatever), which would make that trivial then, although harder to
    deploy. Or maybe just keep "3.0 (quilt)" but add a file marker under debian/source/ or an option under debian/source/options.

    The other option could be to create a new git-based source format,
    my thinking has been on something resembling the current "3.0 (git)"
    format but shallow with a single commit for the upstream part,
    and a packaging branch with the packaging bits and the patches
    against upstream. This could be useful for git snapshots, or
    upstreams that do not release tarballs. And could give you the speed
    you look for too I guess?


    For the latter it could be interesting to get other parties that have
    been working on source and VCS handling, although my concern is that
    at this point the various groups appear to have very strong views on
    how this should look like, and I think there might even be
    disagreement on fundamental base topics. But I'd still like to think
    these are not insurmountable, but perhaps this would require an
    in-person gathering to be effective at all, which is limiting.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daniel Richard G.@21:1/5 to Guillem Jover on Thu Jul 11 22:40:01 2024
    Hi Guillem,

    On Wed, 2024 Jul 10 23:30-04:00, Guillem Jover wrote:

    Thanks for the patch, and sorry for not commenting earlier. As I
    mentioned elsewhere, I've had this in mind, but I guess drafted
    multiple replies in my head which ended up never being delivered.

    (Also source package formats have become something of a contentious
    topic, and it feels sometimes a bit demotivating to work on these.)

    I'm glad to hear from you :) But I have to apologize for inadvertently misleading you---my patch and use case were not intended to be anywhere
    near so conceptually thorny. I'm vaguely aware of the different
    supported source package formats, from having read the man page numerous
    times, and also how dpkg-source can integrate with Git in some
    workflows. But I never thought any of that would be relevant for this contribution.

    To recap, this option enables (re)building a source package without
    cross-checking against the orig tarball, in scenarios where the
    latter is superfluous and expensive. Some benchmarks from a package I
    work with basically tell the story. This is with my patch applied:

    Is this superfluous because you don't need the source at all (in which
    case I think a better option might be to not generate it) or because
    you (or the tool driving dpkg-buildpackage or dpkg-source) knows that
    there have been no changes (for example the source-tree being in a
    VCS)? Or perhaps this is needed for example for a CI or build system
    to transport the source-tree across installations, then perhaps a
    different transient format could be used for that purpose only (for
    example «dpkg-source --format="3.0 (git)" -b dir»?

    I really should have been more explicit about my use case, rather than hand-waving it and presuming the motivation was apparent. Let me tell
    you what I'm doing, and hopefully that will paint the intended picture.

    I am working with the Debian "chromium" package, using the official one
    (from unstable) as a starting point, and making modifications to it.
    There are two different kinds of modifications in play:

    1. Making it buildable on Ubuntu 22.04/jammy and later supported
    releases (since Ubuntu only provides this package as a "snap");

    2. Adding modifications from the ungoogled-chromium project, to make a
    "Chromium without Google" application.

    (At present, I am maintaining packages with #1 and #1+#2 applied in an
    Ubuntu PPA, and have an automated process to update them.)

    The modifications mainly consist of tweaks to files under debian/, and
    adding patches to the existing patch series. (#2 is a bit more involved,
    since a lot of s/chromium/ungoogled-chromium/ renaming is involved, but
    that's not particularly relevant here.)

    However, a very important aspect of any modified package I provide is
    that the orig source tarball remains identical to Debian's. It's over
    900 MB nowadays, I don't want to mess with it, and I don't want to
    have to distribute it myself (just point people to a Debian mirror). I
    want all my modifications to be contained in the .debian.tar.xz and
    .dsc files alone.

    So what I do is I unpack the stock .debian.tar.xz file, add the new
    patches, update the series, and modify the rules/control/changelog/etc.
    files appropriately. Now, all I need is a new .debian.tar.xz and .dsc
    file. The .orig.tar.xz is already there, and should remain as-is.

    What I want dpkg-source to do here is

    * Read the modified debian/ tree, generate the .debian.tar.xz, hash it

    * Hash the .orig.tar.xz file

    * Generate the .dsc

    I do *not* want it to check that the patches apply correctly to the orig source, because not only is unpacking that giant tarball time-consuming
    and obnoxious, I have a separate test-build process that does that
    validation for me (in addition to checking other aspects of the build,
    like does the source configure correctly). I want to say "thank you but
    no" to dpkg-source's cross-checking.

    So that's my use case, in its full concreteness. (Please feel free to
    ask about any aspects I may have glossed over, in case they are of
    interest.) Generalizing a bit, what it comes down to is producing just
    the .debian.tar.* and .dsc files, against existing orig tarball(s), and
    doing nothing with the orig tarball(s) beyond hashing them in order to
    generate the .dsc. (Also implied is not looking at any files in the
    source directory outside of the debian/ subdir. I typically build the
    chromium package with *only* debian/ in the source dir.)

    When I first ran into the problem of dpkg-source taking ages to build,
    I thought of throwing together a script that just does the specific
    tar(1) command to generate the .debian.tar.xz file, and a simple
    template to generate the .dsc. But I knew that was a half-arsed
    solution, so I prepared this patch.

    Subjectively, I don't like that the patch couches this modality as
    "don't generate an automatic diff" rather than a more straightforward
    "just generate the debian/dsc files against existing orig tarball(s)
    without doing any checks," but the former seemed like it was necessary
    to fit into the man page conceptually. If you disagree, I'm happy to
    rename the option and rewrite the doc text; just let me know what
    approach I should take.

    For the patch itself, I'm not very fond of the semantics it
    introduces, because while something similar can probably be specified
    for format 1.0, that one has rather loose semantics and is more prone
    to error. Personally I don't trust myself to remember if I've done
    changes to a tree (if it's not tracked by a VCS), so I see this
    diluting its robustness and checks.

    Depending on the scenarios you have in mind, a better option might
    perhaps be to either make dpkg-source integrate more tightly with a
    VCS, or perhaps create a new source format. Both of which I've had
    in mind for a while, but see the motivation bit above.

    I can't comment on this nor the rest, and I hope it's now clear why :]
    I'm sorry for putting you though this whole thought process unnecessarily---
    I can only hope the time you spent on it will bear fruit in other areas
    of dpkg's development.


    --Daniel


    --
    Daniel Richard G. || skunk@iSKUNK.ORG
    My ASCII-art .sig got a bad case of Times New Roman.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)