• Representing Debian Metadata in Git

    From Simon Richter@21:1/5 to All on Tue Aug 20 09:20:01 2024
    Hi,

    there's a bit of a discussion within Debian on collaborating using Git.

    One of the long-standing issues is that there are multiple ways Debian packaging can be represented in a git tree, and none of them are optimal.

    The problem at hand is that the packaging workflow consists of

    1. importing an upstream release
    2. optionally stripping out undistributable parts
    3. adding packaging metadata
    4. optionally adding a patch stack

    The workflow for upgrading a package is

    1. import a new upstream release
    2. apply and possibly modify the exclusion list
    3. apply the packaging metadata, updating it in the process
    4. rebase the patch stack

    Right now, git is used mainly as a network file system, and only tagged releases are expected to be consistent enough to compile, because often
    going from one consistent state to another as an atomic operation would
    require multiple changes to be applied in the same commit.

    The imported archive is represented either directly as a tree (which may
    be imported from the upstream project if no files are undistributable
    for Debian), or via a mechanism that can reproduce a compressed archive
    that is bitwise identical to the upstream release, from a tree and some additional patch data.

    The patch stack is stored as a set of patches inside a directory, and
    rebased using quilt.

    An alternate representation stores the patch stack as a branch that is
    rebased using git, and then exported to single files.

    The Debian changelog is stored as a file inside Git, but some automation
    exists to update this from Git commit messages.

    Debian changelog entries refer to bugs in the Debian Bug Tracking
    system. There is a desire to also incorporate forges (currently, GitLab)
    and refer to the forges' issue tracker from commit messages (where the
    issue tracker is used for team collaboration, while the Debian BTS is
    used for user-visible bugs).

    All of this is very silly, because we're essentially storing metadata as
    data because we cannot express in Git what we're actually doing, and the conflicting priorities people have have led to conflicting solutions.

    I'd like to xkcd 927 this now, and find a common mapping.

    From a requirements perspective, I'd like to be able to

    - express patches as commits:
    - allow cherry-picking upstream commits as Debian patches
    - allow cherry-picking Debian patches for upstream submission
    - generate the Debian changelog from changes committed to Git
    - express filter steps for generating the upstream archive(s) from a tree‑ish and some metadata
    - store upstream signatures inside Git
    - keep a history of patches, including patches applied to previously
    released packages

    A possible implementation would be a type of Git "user extension" object
    that contains

    - an extension name
    - an object type (interpreted by the extension)
    - type-tagged references to other objects
    - other type-tagged data

    Validity of the object would be determined by the extension, and git
    would treat this object as mostly opaque (i.e. whenever one is
    encountered, the extension needs to be called). The only exception would
    be references, because we need to be able to transfer these objects and
    all their dependencies efficiently (so the extension would generate a
    list of references that should be recursively packed or omitted).

    On top of that, we could represent a Debian package through special
    objects, such as

    - debian::debian-dir (a tree-like object referenced from the root
    tree, contains a tree for plain files plus links to special objects for generated items, such as patch stacks)
    - debian::upstream-archive (a tree-like object that marks the boundary between objects imported from upstream, and objects that are part of
    packaging, and gives instructions for regenerating the upstream archives without storing them as blobs)
    - debian::update-upstream (a commit-like object to move to a new upstream-archive object, this contains the upstream version number that
    the following upload object must use)
    - debian::changelog-entry (a commit-like object that adds an item to
    the Debian changelog)
    - debian::upload (a commit-like object that adds a version to the
    Debian changelog)
    - debian::rebase-patches (a commit-like object that links the patch
    stacks before and after a rebase)
    - ...

    Changes to packaging would still be represented as commit objects
    containing a tree, but that tree would contain a special entry for the
    "debian" subdirectory that points to the last packaging change.

    This is very high-level so far, because I'd like to get some feedback
    first on whether it makes sense to pursue this further. This would use
    up the last unused three-bit object type in Git, so it will have to be
    very generic on this side to not block future development -- and it
    would require a lot of design effort on the Debian side as well to
    hammer out the details.

    Any feelings/objections/missed requirements?

    Simon

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Wed Aug 21 23:40:01 2024
    Hi Simon,

    * Simon Richter <sjr@debian.org> [240820 09:11]:
    One of the long-standing issues is that there are multiple ways Debian packaging can be represented in a git tree, and none of them are optimal.
    [..]
    A possible implementation would be a type of Git "user extension" object
    that contains

    - an extension name
    - an object type (interpreted by the extension)
    - type-tagged references to other objects
    - other type-tagged data
    [..]

    Any feelings/objections/missed requirements?

    In the current DEP14/DEP18 discussions a lot of discussion was had
    about how we should represent Debian things in git; your mail also
    goes into this direction.

    My *feeling* is we should do the opposite - that is, represent less
    Debian stuff in git, and especially do it in less Debian-specific
    ways. IOW, no git extensions, no setup with multiple branches that
    contain more or less unrelated things, etc.

    I think we should move more towards a setup that is easily
    understood by people not closely following our Debian-specific
    things. We should avoid surprising things, again that would include
    the multiple branches and any git extensions.

    Before pushing for new ways of representing Debian stuff in git, I
    think it would be a good idea to learn from all the other distros
    and distro-like systems successfully using git [1]. Debian is not
    the only distro that wants to use git to capture changes and
    encourage contributions to its packages.

    Chris

    [1] alpine, homebrew, freebsd ports come to mind immediately. nixos
    and others too.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Chris Hofstaedtler on Thu Aug 22 00:20:01 2024
    Chris Hofstaedtler <zeha@debian.org> writes:

    My *feeling* is we should do the opposite - that is, represent less
    Debian stuff in git, and especially do it in less Debian-specific
    ways. IOW, no git extensions, no setup with multiple branches that
    contain more or less unrelated things, etc.

    +1

    I think this is particularly important for attracting new contributors and easing the onboarding process. There are a lot of odd Debian-specific
    things that people have to learn because they're necessary to make Debian
    work. I am dubious that the Git representation is one of them, and would rather continue down the path of providing Debian tools and processes that reduce the delta between how Debian packaging uses Git and how most free software development uses Git.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Thu Aug 22 02:00:01 2024
    * rsbecker@nexbridge.com <rsbecker@nexbridge.com> [240822 01:21]:
    Any feelings/objections/missed requirements?

    In the current DEP14/DEP18 discussions a lot of discussion was had about how we
    should represent Debian things in git; your mail also goes into this direction.

    My *feeling* is we should do the opposite - that is, represent less Debian stuff in git,
    and especially do it in less Debian-specific ways. IOW, no git extensions, no setup
    with multiple branches that contain more or less unrelated things, etc.
    [..]

    On the other side (perhaps), git is increasingly being used in the Ops setting for
    DevOps and DevSecOps. Production configurations for high-value applications are
    moving to storing those configurations into git for tracing and audit. Git is an
    enabler for good production operations practices.

    Don't get me wrong. Yes, we should use git to do what git is good
    for (tracking changes, etc).

    We should not invent new ways of using git that no one else uses.
    I'd like to reduce the delta of "how Debian uses git" to "how
    everyone else uses git" to, hopefully, zero.

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rsbecker@nexbridge.com@21:1/5 to Chris Hofstaedtler on Thu Aug 22 01:40:01 2024
    On Wednesday, August 21, 2024 5:38 PM, Chris Hofstaedtler wrote:
    * Simon Richter <sjr@debian.org> [240820 09:11]:
    One of the long-standing issues is that there are multiple ways Debian
    packaging can be represented in a git tree, and none of them are optimal. >[..]
    A possible implementation would be a type of Git "user extension"
    object that contains

    - an extension name
    - an object type (interpreted by the extension)
    - type-tagged references to other objects
    - other type-tagged data
    [..]

    Any feelings/objections/missed requirements?

    In the current DEP14/DEP18 discussions a lot of discussion was had about how we
    should represent Debian things in git; your mail also goes into this direction.

    My *feeling* is we should do the opposite - that is, represent less Debian stuff in git,
    and especially do it in less Debian-specific ways. IOW, no git extensions, no setup
    with multiple branches that contain more or less unrelated things, etc.

    I think we should move more towards a setup that is easily understood by people
    not closely following our Debian-specific things. We should avoid surprising things,
    again that would include the multiple branches and any git extensions.

    Before pushing for new ways of representing Debian stuff in git, I think it would be a
    good idea to learn from all the other distros and distro-like systems successfully
    using git [1]. Debian is not the only distro that wants to use git to capture changes
    and encourage contributions to its packages.

    On the other side (perhaps), git is increasingly being used in the Ops setting for
    DevOps and DevSecOps. Production configurations for high-value applications are moving to storing those configurations into git for tracing and audit. Git is an
    enabler for good production operations practices. My $0.02 (and my customers')

    --Randall

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Stanley@21:1/5 to rsbecker@nexbridge.com on Thu Aug 22 02:20:01 2024
    On 2024-08-21 19:11:40 -0400 (-0400), rsbecker@nexbridge.com wrote:
    [...]
    On the other side (perhaps), git is increasingly being used in the
    Ops setting for DevOps and DevSecOps. Production configurations
    for high-value applications are moving to storing those
    configurations into git for tracing and audit. Git is an enabler
    for good production operations practices. My $0.02 (and my
    customers')

    This is nothing new though. Long before Git existed, before people
    started using terms like DevOps, it was fairly typical for sysadmins
    (that's what we called ourselves back then) to track the entirety of
    /etc in RCS. Yes having an auditable change history for your
    configuration is useful, but Git didn't invent that. Git has merely
    supplanted all prior version control systems, for this use case as
    well as others.
    --
    Jeremy Stanley

    -----BEGIN PGP SIGNATURE-----

    iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmbGgkdfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCkWBhAAyxg6LJ7JgNTr3O4QavQuaTRtLmJgoJo80jXfvnXKEk8EdzAYzDYxjTh7 XZD7lzmx7vcnR4h4GeCzKplVujgrMvzwFZlgJOown3VH861psxLrDqRnhJDIx3/u qCAXMilFJS5k7SnSrSZ4jumAKs8jK2IcPsRu2cknFEg+5Tdp2YgSMDYY5FBUDI92 Mni/UYDLl1/3fAl1HuCawr1TnHM/93mdaRimFGyU4DspreQQGuJHEVaWSWcCiDFU rK/f+nccsXDI1u1Pef3XzFa5xW38apxWaSnvb0+9U97aAsTmTvlGtVYwpBW/HWHD wFND7sWGuCznSgQPSZD3M6faguUoB4P/eXauWLgWMuOM6t9lEq9VFn432hW8qwdY Qx0wa+ubvez/U+fgxOUCArtDxuBYiPQedDE+AfN0uz+ZzY3Hpgpk7nKl7MbkUslF CQiReJXyd0IQM8jMS6yKXu6R6dNtfisNuPE0z/L5fCkpvfIG41vlO0pRQqEnPFxp d3Hveg91UH3SuUdtYwl3Vibz2ayWxiL75cb2evGHgJg7QZzgW4wBGln7OoLNIh1S KNQmX1xcNzljSNA40EaTGdSTzm65O3ZEHwT7gkTRwnkhrOe4tgK80/dTSb7oQbTs m+mmDclPq92HNvTxWF1sz05GyZTTMuBfbHKGFOcxyu7QrQwuIpc=
    =B4Nv
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32
  • From =?ISO-8859-1?Q?Aur=E9lien?= COUDERC@21:1/5 to All on Thu Aug 22 19:20:01 2024
    Hi !

    Le mercredi 21 août 2024, 23:37:38 CEST Chris Hofstaedtler a écrit :
    Hi Simon,

    * Simon Richter <sjr@debian.org> [240820 09:11]:
    One of the long-standing issues is that there are multiple ways Debian packaging can be represented in a git tree, and none of them are optimal.

    […]

    Any feelings/objections/missed requirements?

    In the current DEP14/DEP18 discussions a lot of discussion was had
    about how we should represent Debian things in git; your mail also
    goes into this direction.

    In the Qt/KDE Team (~600-700 source packages) we’ve taken the complete opposite approach.
    We keep debian/ only repos in salsa and don’t put the upstream source in git anywhere, only in the uploads to the archive.

    Updating a package to a new upstream version is then as simple as a new changelog entry, and uscan / dpkg-builpackage / sbuild handle the rest for us.

    I personally think it’s crazy / not a good use of my time to try and mix both upstream and packaging history in the same repo and try to make git dance around that when handling new upstream releases. The extents of the ongoing d-devel discussions on
    the topic tend to reinforce that feeling.

    Keeping debian and upstream changes separate is a nice feature.

    I’d even qualify the debian-only workflow as essential for packages with large source trees like Qt WebEngine that embeds Chromium. The source-included workflows add orders of magnitude of overhead in this kind of situation. (For some value of $fun,
    try cloning the mesa or Firefox repos from a sloppy Internet connection for a packaging analysis or an occasional contribution.)


    Happy hacking,
    --
    Aurélien

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Blair Noctis@21:1/5 to Simon Richter on Thu Aug 22 21:30:01 2024
    On 2024-08-20 15:10, Simon Richter wrote:
    (...)
    Right now, git is used mainly as a network file system, and only tagged releases are expected to be consistent enough to compile, because often going from one consistent state to another as an atomic operation would require multiple changes to be applied in the same commit.

    The imported archive is represented either directly as a tree (which may
    be imported from the upstream project if no files are undistributable
    for Debian), or via a mechanism that can reproduce a compressed archive
    that is bitwise identical to the upstream release, from a tree and some additional patch data.

    The patch stack is stored as a set of patches inside a directory, and rebased using quilt.

    An alternate representation stores the patch stack as a branch that is rebased using git, and then exported to single files.

    The Debian changelog is stored as a file inside Git, but some automation exists to update this from Git commit messages.

    Debian changelog entries refer to bugs in the Debian Bug Tracking
    system. There is a desire to also incorporate forges (currently, GitLab)
    and refer to the forges' issue tracker from commit messages (where the
    issue tracker is used for team collaboration, while the Debian BTS is
    used for user-visible bugs).

    All of this is very silly, because we're essentially storing metadata as data because we cannot express in Git what we're actually doing, and the conflicting priorities people have have led to conflicting solutions.

    I'd like to xkcd 927 this now, and find a common mapping.

    Here's my very likely very naive 2 cents: we are basically maintaining a
    fork for each non-native package.

    Being a fork, a "Debianized" package can also live like other "upstream" forks: with its own branch based on the original, make necessary changes
    and record them as commits; merge original onto its own branch, dealing
    with conflicts; maintain its own changelog; rinse and repeat.

    Debian-specific metadata can be represented structurally in commit
    messages, or if necessary, (still) in a plain debian/ subdirectory that
    won't conflict with upstream.

    Then,

    From a requirements perspective, I'd like to be able to

     - express patches as commits:
       - allow cherry-picking upstream commits as Debian patches
       - allow cherry-picking Debian patches for upstream submission
     - generate the Debian changelog from changes committed to Git
     - express filter steps for generating the upstream archive(s) from a tree‑ish and some metadata
     - store upstream signatures inside Git
     - keep a history of patches, including patches applied to previously released packages

    these are naturally met; and

    (...)
    Changes to packaging would still be represented as commit objects
    containing a tree, but that tree would contain a special entry for the "debian" subdirectory that points to the last packaging change.

    no more needed.

    This is very high-level so far, because I'd like to get some feedback
    first on whether it makes sense to pursue this further.This would use
    up the last unused three-bit object type in Git, so it will have to be
    very generic on this side to not block future development -- and it
    would require a lot of design effort on the Debian side as well to
    hammer out the details.

    Even less thought out, but probably easier to implement once the design
    is finished. ;)

    --
    Sdrager,
    Blair Noctis


    -----BEGIN PGP SIGNATURE-----

    iHUEARYKAB0WIQScTWEJ927Sl0a/hB7sV97Kb1Pv6QUCZseRJAAKCRDsV97Kb1Pv 6bxqAQCF6gpaGD2XdPUp63vA0VsAH1/9VgUOxtexgVtx7WznlAD/U4sJBltfuJah onBfOcv8IfAn/J2pe7bRAJsYRhvJ9Qo=
    =NCuz
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco d'Itri@21:1/5 to coucouf@coucouf.fr on Thu Aug 22 23:20:01 2024
    On Aug 22, Aurélien COUDERC <coucouf@coucouf.fr> wrote:

    I personally think it’s crazy / not a good use of my time to try and
    mix both upstream and packaging history in the same repo and try to
    make git dance around that when handling new upstream releases. The
    extents of the ongoing d-devel discussions on the topic tend to
    reinforce that feeling.
    Oh well. FWIW I think it's crazy and not a good use of my time to NOT
    have the complete upstream history in the same repository that I use for Debian packaging. :-)

    (For some value of $fun, try cloning the mesa or
    Firefox repos from a sloppy Internet connection for a packaging
    analysis or an occasional contribution.)
    But --depth 1 should work around this.

    --
    ciao,
    Marco

    -----BEGIN PGP SIGNATURE-----

    iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZserEwAKCRDLPsM64d7X gWq4AP0VVyACpWNyay3x56R7dXw6KKslaQYxB27g0BlyQmIwegD/TVqFFnY/h1uH 3m82VDua3yG2tjfKvgaVWWpK8tP6jgA=
    =TmlQ
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sean Whitton@21:1/5 to Simon Richter on Fri Aug 23 04:30:01 2024
    Hello,

    I think that more than you realise of this already exists :)

    On Tue 20 Aug 2024 at 04:10pm +09, Simon Richter wrote:

    From a requirements perspective, I'd like to be able to

    - express patches as commits:
    - allow cherry-picking upstream commits as Debian patches
    - allow cherry-picking Debian patches for upstream submission

    git-debrebase and git-dpm already achieve this.

    - express filter steps for generating the upstream archive(s) from a tree‑ish
    and some metadata

    Excluded-Files in d/copyright is for this.
    I guess that you disprefer that because it's part of the tree, though.

    - store upstream signatures inside Git

    Well, there's signatures on their tags.

    - keep a history of patches, including patches applied to previously released
    packages

    This is already there with git-debrebase and git-dpm, though it is a bit
    fiddly to dig it out.

    --
    Sean Whitton

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmbH8eMZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQJ1CEACNcSelg9bt9m1/Hkp9TFJn RjexekGK+MIX7qn5JSoVQgshZLmd01C6mAGIsh4d0CmdT17Ajp+kJSFnaEYiRHcl g1LESSuSNc7xC7tP1N3V/yEic6TWN+sL/GBw18AxITRvW/JvpVdpLKCsuJbWnpZv gu+ev2yiHFMCFFPVBAR5vyFt5m/ora4VcRK41Z5mlXf2jedlH/q5QtERPHMFD/Ko KGwmFMrPMlfZh07aBs5ZgpRg+5VrSxgCWXKOjVIjc11PpYUmcNAyIozAoj4jbbfi z8SOB3zSEThE1vrM39QMk1snLFagD6BAvtoKaJRRy+owr4jr7CZ7KK35AyNTJwNl X6I//CzpaJk39TeEQaMUzBWABqpBZ5ORzTUGKdg1HovTQgrJ8j0JMg7qym1SC7o0 gJEj9B7LHacInnm2/JgOR1H2cpJB7offDnZUDxjlcGCG3z0VOxvwkg4+q603T/6v HRZoJxtILf0BBClb+ptMlblzQtaFW0TC49CwvI2MEza09fsBuHYfLM+XjZ/MYDov HsyZusjsEmbjq2NWxycnyv1zNCi4Pd431XZDJvrw9i/cXVV0XkdYM6dpnkHzjG1o HhTFvtH+3bI46KivURr8yUaqxCaKAXTXDsxmGwjOB+WHmVTTKrUMpVzpg8o2jhtR 1JiO8bQFpf480aToR2MLLA==G1yo
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Us