• booststrapping /usr-merged systems (was: Re: DEP 17: Improve support fo

    From Helmut Grohne@1:229/2 to Luca Boccassi on Wed May 17 11:40:01 2023
    XPost: linux.debian.devel
    From: helmut@subdivi.de

    Hi,

    This bootstrap aspect got me and I discussed this with a number of
    people and did some research.

    On Sun, May 07, 2023 at 12:51:21PM +0100, Luca Boccassi wrote:
    I don't think this is true? At least not in the broader sense: if you
    compile something on Debian, it will obviously get linked against
    libraries and dependencies as they are in Debian.
    Perhaps what you mean is that, given an entire separate sysroot-like
    tree, passing the appropriate compiler and linker flags and
    environment variables, you can use the local compiler we ship to build 'foreign' programs. That is true, but again it requires to set up the environment appropriately, including linker flags. And the caller
    needs to ensure the environment, including linker flags, is
    appropriate for the target environment (I guess 'host' environment, in
    GNU parlance). Therefore, I don't think it would be unreasonable to
    require that if the target environment is split-usr, then the caller
    also needs to specify an appropriate
    '-Wl,--dynamic-linker=/lib/ld-whatever' option.

    Given the feedback, I am convinced that changing PT_INTERP is a stupid
    idea regardless of whether it is technically feasible. There must be a
    better way. Let's step back a bit.

    The underlying problem here is performing the initial filesystem
    bootstrap. The semantics of this are a bit vague as they are not spelled
    out in policy, so we will have to derive them from implementations.

    I think the major players are (in descending popularity):
    * debootstrap
    * mmdebstrap
    * cdebootstrap
    * multistrap

    multistrap predates mmdebstrap and when there was no mmdebstrap, I used
    it a lot. When attempting to test it, I totally couldn't convince it to bootstrap from an unsigned or locally signed repository. The patch in
    #908451 didn't cut it. I also note that it creates a /lib64 -> /lib
    symbolic link which feels quite incompatible with merged-/usr. For
    these reasons, I am dropping multistrap from the tools under
    consideration and recommend removing it from the archive. If you happen
    to use multistrap, now would be a good moment to tell me. Personally,
    all of my use cases of multistrap have been converted to mmdebstrap and
    that made a lot of things simpler.

    cdebootstrap vaguely works though unsigned operation seems dysfunctional
    as it runs apt-get update during cdebootstrap-helper-apt.postinst and
    that fails. I happen to not have figured out why and treat this failure
    as a success.

    So the most popular implementations quite evidently are debootstrap and mmdebstrap and both "just work". I note though that they work quite differently:
    * debootstrap (depending on flags including --variant) pre-merges its
    chroot while mmdebstrap relies on packages doing it.

    I think that the question whether a distribution is merged is a
    property of the distribution and not the bootstrap tool, so I
    strongly recommend following mmdebstrap's view on this. The
    debootstrap way means that we have to include patches for every
    derivative, which is a process that does not scale well.

    * mmdebstrap operates in two phases. It first unpacks and configures a
    rather minimal set of packages and then proceeds to adding packages
    passed to --include in a second phase once essential is fully
    configured while debootstrap immediately unpacks everything.

    I think the debootstrap approach is slightly worse here, because it
    means that preinst scripts of non-essential packages cannot rely on
    essential packages having been configured.

    In any case, we have to deal with both behaviours.

    After this little excursion into bootstrap technology, let's go back to
    the /usr-merge and its effects.

    I think at this point, we have quite universal consensus about the goal
    of moving files to their canonical location (i.e. from / to /usr) as a
    solution to the aliasing problems while we do not have consensus on
    precisely how to do this (i.e. with changing dpkg or without). If you
    believe that this is not consensus, please speak up.

    So in a distant future our packages will not contain any files in /bin
    or /lib. In particular, this affects /bin/sh and the dynamic loader,
    both of which are required to run maintainer scripts, which are
    currently required for creating the symbolic links. Boom.

    Solutions have been proposed to this and I think they all fall into one
    of the following four categories.

    1. Don't move. We just keep those files that require a particular
    location (such as /bin/sh or the dynamic loader) in their
    non-canonical location. As such, maintainer scripts will be able to
    run and perform the conversion to symbolic links afterwards.

    2. Move and ship links. Since we unpack all essential data.tar before
    running the first maintainer script, having one package contain the
    compatibility symlinks is enough to fix the problem.

    3. Move and avoid using non-canonical locations. This is the approach
    where we write maintainer scripts as #!/usr/bin/sh and considered
    changing PT_INTERP.

    4. Change the bootstrap protocol. In essence, this has been attempted
    in debootstrap by creating these symlinks prior to unpack, but no
    consensus has evolved around this approach yet. The category is

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From =?utf-8?Q?Bj=C3=B8rn_Mork?=@1:229/2 to Marco d'Itri on Fri Jun 9 13:10:01 2023
    XPost: linux.debian.devel
    From: bjorn@mork.no

    Marco d'Itri <md@Linux.IT> writes:

    as we all know every Debian maintainer can veto any systemic changes
    that they do not like.

    I don't think qusr-merge would not have happened if this was true. And
    I believe you know that very well.

    I find your remark disrespectful. And I'm trying hard to assume good
    faith here. Please help me. What are you trying to achieve by it? Was
    it meant as a joke? If so, then it was a bit misplaced I'm afraid.

    Maybe you should re-read https://www.debian.org/code_of_conduct ?


    Bjørn

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Marco d'Itri@1:229/2 to bjorn@mork.no on Fri Jun 9 13:40:01 2023
    XPost: linux.debian.devel
    From: md@Linux.IT

    On Jun 09, Bjørn Mork <bjorn@mork.no> wrote:

    as we all know every Debian maintainer can veto any systemic changes
    that they do not like.
    I don't think qusr-merge would not have happened if this was true. And
    I believe you know that very well.
    Actually merging /usr happened in a suboptimal way because I had to work around this lack of collaboration, so yes: this is true.
    It happened, but despite the vetoes.

    --
    ciao,
    Marco

    -----BEGIN PGP SIGNATURE-----

    iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZIMNCgAKCRDLPsM64d7X gR5LAP44zpzrgrXJ1Gq+l2HeUGRja+semnD33fiQ3GLIzeCjugD+MZ6ZiQGwDiXf Kv1UG83Dm/WKRYk0paE0DChJQFygmwA=
    =QyCi
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Sven Joachim@1:229/2 to Sven Joachim on Sat Jun 10 09:00:01 2023
    XPost: linux.debian.devel
    From: svenjoac@gmx.de

    On 2023-06-10 08:35 +0200, Sven Joachim wrote:

    Am 10.06.2023 um 07:35 schrieb Helmut Grohne:

    One of the approaches to making bootstrapping work was adding the
    symlinks to some data.tar. That has been category 2 from my earlier
    mail. We definitely cannot add /bin as a directory to one package and
    /bin as a symlink to another (unless using diversions), because the
    resulting behaviour is dependent on the unpack order when used with
    dpkg. Also any bootstrap tool that unpacks with tar -k (such as
    debootstrap) requires changes to support this. So this pretty much
    precludes completing the transition in a way that just unpacking all
    data.tar of essential packages gives you a working chroot. In effect,
    this requires a proposal to change the bootstrap protocol (category 4)
    in order to make sense.

    There is a loop hole that I ignored here. While /bin cannot be both a
    directory and a symlink at the same time, we can upgrade it. So if we
    somehow managed to get one and only one package to contain /bin as a
    directory, we could upgrade that to a symlink.

    I think the goal should be to get to this state eventually.

    Unfortunately, any
    external package that still ships stuff in /bin breaks this. In effect,
    any addon repository or old package can break your system.

    You lost me. We have converted /bin to a symlink already, have many
    packages that ship files there and yet our systems do not break. Could
    you please elaborate?

    Thinking about it once more, I understand that unpacking old or external packages during the bootstrap phase could break it if those are unpacked
    before the package shipping the /bin symlink, and this is what you meant.

    Cheers,
    Sven

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Sven Joachim@1:229/2 to All on Sat Jun 10 08:40:01 2023
    XPost: linux.debian.devel
    From: svenjoac@gmx.de

    Am 10.06.2023 um 07:35 schrieb Helmut Grohne:

    One of the approaches to making bootstrapping work was adding the
    symlinks to some data.tar. That has been category 2 from my earlier
    mail. We definitely cannot add /bin as a directory to one package and
    /bin as a symlink to another (unless using diversions), because the
    resulting behaviour is dependent on the unpack order when used with
    dpkg. Also any bootstrap tool that unpacks with tar -k (such as
    debootstrap) requires changes to support this. So this pretty much
    precludes completing the transition in a way that just unpacking all
    data.tar of essential packages gives you a working chroot. In effect,
    this requires a proposal to change the bootstrap protocol (category 4)
    in order to make sense.

    There is a loop hole that I ignored here. While /bin cannot be both a directory and a symlink at the same time, we can upgrade it. So if we
    somehow managed to get one and only one package to contain /bin as a directory, we could upgrade that to a symlink.

    I think the goal should be to get to this state eventually.

    Unfortunately, any
    external package that still ships stuff in /bin breaks this. In effect,
    any addon repository or old package can break your system.

    You lost me. We have converted /bin to a symlink already, have many
    packages that ship files there and yet our systems do not break. Could
    you please elaborate?

    Cheers,
    Sven

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Sven Joachim@1:229/2 to Helmut Grohne on Sat Jun 10 19:30:01 2023
    XPost: linux.debian.devel
    From: svenjoac@gmx.de

    On 2023-06-10 10:39 +0200, Helmut Grohne wrote:

    Hi Sven,

    On Sat, Jun 10, 2023 at 08:35:44AM +0200, Sven Joachim wrote:
    Unfortunately, any
    external package that still ships stuff in /bin breaks this. In effect,
    any addon repository or old package can break your system.

    You lost me. We have converted /bin to a symlink already, have many
    packages that ship files there and yet our systems do not break. Could
    you please elaborate?

    I'm sorry. I see how I am mixing up use cases all the time. What is
    broken here is smooth upgrades (or package removal). Let me add detail.

    dpkg has two kinds of filesystem resources. These are owned objects and shared objects. A regular file usually is owned by one and only one
    package. A directory is often shared between multiple packages. A
    regular file can also be shared between multiple (Multi-Arch: same)
    instances of the same package. So whenever a package removes a shared
    object from a package (due to upgrading or removing the package), dpkg
    checks whether this shared object now is unreferenced. If that happens,
    it actually deletes it from the filesystem.

    So we kinda need to distinguish the actual filesystem view from the dpkg database view in this discussion. While the filesystem can now (since bookworm) be assumed to always have the symlinks, dpkg has a (shared)
    object there. It doesn't track the type yet (though Guillem is
    working[1] on that).

    Now we imagine a situation where we managed to get past this transition somehow and the end state is that no package in trixie ships /bin other
    than base-files, which ships it as a symlink.

    That is what I would perceive as the goal. In this case, the /bin
    symlink is safe from being removed by dpkg since it is owned by a
    package. Or am I missing something?

    Or maybe we finished the
    transition by having no package ship /bin and we modified the bootstrap protocol to create the symlinks in another way.

    I don not think this is possible, for the two reasons you gave below.

    There is two use cases that are at risk now:

    * You have some old bookworm package around that still ships a file in
    /bin. You no longer need this package and remove it. Since this was
    the last package (on your system) to contain /bin (in data.tar), dpkg
    observes that /bin can go away and deletes your symlink. Boom.

    * You have some external repository that contains a package which still
    ships something in /bin. At some point the vendor got the message
    about moving files and moves them to /usr/bin and this - again - is
    when your /bin symlink vanishes during the package upgrade.

    Cheers,
    Sven

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Luca Boccassi@1:229/2 to Helmut Grohne on Sat Jun 10 21:00:01 2023
    XPost: linux.debian.devel
    From: bluca@debian.org

    On Sat, 10 Jun 2023 at 09:40, Helmut Grohne <helmut@subdivi.de> wrote:
    My takeaway here is that while I see the protective diversion as the "obviously superior solution", this clearly is not consensus at this
    time. It also means that when rewriting DEP 17, I need to spend quite a
    bit of text on rationale. Thank you.

    I would caution to avoid interpreting clarifying questions being asked
    as dissent. It's good to ask questions and clarify details about
    corner cases, but I wouldn't automatically write them down as
    disagreement. At least that's my reading of recent parts of this
    thread.

    Kind regards,
    Luca Boccassi

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Helmut Grohne@1:229/2 to Sven Joachim on Sat Jun 10 10:50:02 2023
    XPost: linux.debian.devel
    From: helmut@subdivi.de

    Hi Sven,

    On Sat, Jun 10, 2023 at 08:35:44AM +0200, Sven Joachim wrote:
    Unfortunately, any
    external package that still ships stuff in /bin breaks this. In effect,
    any addon repository or old package can break your system.

    You lost me. We have converted /bin to a symlink already, have many
    packages that ship files there and yet our systems do not break. Could
    you please elaborate?

    I'm sorry. I see how I am mixing up use cases all the time. What is
    broken here is smooth upgrades (or package removal). Let me add detail.

    dpkg has two kinds of filesystem resources. These are owned objects and
    shared objects. A regular file usually is owned by one and only one
    package. A directory is often shared between multiple packages. A
    regular file can also be shared between multiple (Multi-Arch: same)
    instances of the same package. So whenever a package removes a shared
    object from a package (due to upgrading or removing the package), dpkg
    checks whether this shared object now is unreferenced. If that happens,
    it actually deletes it from the filesystem.

    So we kinda need to distinguish the actual filesystem view from the dpkg database view in this discussion. While the filesystem can now (since
    bookworm) be assumed to always have the symlinks, dpkg has a (shared)
    object there. It doesn't track the type yet (though Guillem is
    working[1] on that).

    Now we imagine a situation where we managed to get past this transition
    somehow and the end state is that no package in trixie ships /bin other
    than base-files, which ships it as a symlink. Or maybe we finished the transition by having no package ship /bin and we modified the bootstrap protocol to create the symlinks in another way. There is two use cases
    that are at risk now:

    * You have some old bookworm package around that still ships a file in
    /bin. You no longer need this package and remove it. Since this was
    the last package (on your system) to contain /bin (in data.tar), dpkg
    observes that /bin can go away and deletes your symlink. Boom.

    * You have some external repository that contains a package which still
    ships something in /bin. At some point the vendor got the message
    about moving files and moves them to /usr/bin and this - again - is
    when your /bin symlink vanishes during the package upgrade.

    So at this time, I think we basically have three ways of dealing with
    this:

    1. Add a protective diversion for the affected locations (and keep that
    until forky at least).
    2. Ship the affected symlinks as directories in some essential package
    until we are sure that no package ships these directories (even in
    external repositories).
    3. Modify dpkg in some way to handle this case.

    I hope this made things more clear. Also note that this mail is purely concerned with dpkg package operations and entirely ignores the
    bootstrap use case.

    My takeaway here is that while I see the protective diversion as the
    "obviously superior solution", this clearly is not consensus at this
    time. It also means that when rewriting DEP 17, I need to spend quite a
    bit of text on rationale. Thank you.

    Helmut

    [1] https://wiki.debian.org/Teams/Dpkg/Spec/MetadataTracking

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Timo =?utf-8?Q?R=C3=B6hling?=@1:229/2 to All on Sun Jun 11 23:20:01 2023
    XPost: linux.debian.devel
    From: roehling@debian.org

    * Luca Boccassi <bluca@debian.org> [2023-06-10 19:54]:
    I would caution to avoid interpreting clarifying questions being asked
    as dissent. It's good to ask questions and clarify details about
    corner cases, but I wouldn't automatically write them down as
    disagreement. At least that's my reading of recent parts of this
    thread.

    This is also my understanding. And for the record, I want to
    emphasize that I am very much in favor of the plan that Helmut came
    up with, for a number of reasons:

    [Full disclosure: I had a few in-person discussions with Helmut in
    Hamburg last month, so I am probably somewhat biased by now.]


    1. Helmut has shown experimentally that his transition plan can
    work. There are always unknown unknowns, of course, but at the very
    least, we do not have to break any use-cases intentionally.

    2. The transition will leave us in a well-defined state post-trixie
    without the need to add (and continue to maintain) any clutches
    (or "special cases") for dpkg.

    3. Almost all problematic cases can be dealt with by some black
    magic in a single usrmerge-support package. It is not pretty, but it
    will get the job done; a bunch of trickery to make dpkg do the Right
    Thing despite its incomplete knowledge of aliased paths.

    4. We will be able detect the few cases where the Right Thing does
    not happen transparently, and we can even give advance warning to
    affected package maintainers what they should and should not do. If
    the maintainers of those packages pre-upload their transitioned
    packages to experimental for some automated tests and verification,
    we can avoid any breakage in unstable and testing.


    Of course, you do not have to take my word for any of this. I am a
    big fan of Helmut's approach with experimental verification and
    data-driven discovery. Have a look at his published test scripts and
    try to poke holes in them. The more people do this, the more
    confidence we can have that this might actually work after all.


    Cheers
    Timo


    --
    ⢀⣴⠾⠻⢶⣦⠀ ╭────────────────────────────────────────────────────╮
    ⣾⠁⢠⠒⠀⣿⡁ │ Timo Röhling │
    ⢿⡄⠘⠷⠚⠋⠀ │ 9B03 EBB9 8300 DF97 C2B1 23BF CC8C 6BDD 1403 F4CA │
    ⠈⠳⣄⠀⠀⠀⠀ ╰────────────────────────────────────────────────────╯

    -----BEGIN PGP SIGNATURE-----

    iQGzBAEBCgAdFiEEJvtDgpxjkjCIVtam+C8H+466LVkFAmSGOd8ACgkQ+C8H+466 LVnQQwwAmPW/pLtM17wMG1gT8eHRQkr7PaiEO7pnpPBmibxJSHdy3g9C2taGGPZQ n9SocDe0uOG1nUT2Dg2UkOG9QRAY0sbF0mVD0DfBXyeGMtISvLxVRIlU2heXnvLo axJf9ZXnaFewrnkJmEIWWJvOpYaKzSng8koIC3Cwa+sy6+lKlAZCn8huIwZErtMK pg41ERfQpbWGKZpy/w8LZeiHwuEDy3JPjJH64Z9t/BYoPiECTFJ29aGwVfnw/cM2 jSxgmMg+U972LRl+CBE6C7MYEHwxZ3zQalnv3l/aQbFMLSF2coDOeJTrIaJDVLtF /Gn3W5KaLGE4mOFX8uaVGwKqNtZA+XufrqFQ5uJSCuU
  • From Luca Boccassi@1:229/2 to roehling@debian.org on Tue Jun 27 21:50:03 2023
    XPost: linux.debian.devel
    From: bluca@debian.org

    On Sun, 11 Jun 2023 at 22:17, Timo Röhling <roehling@debian.org> wrote:

    * Luca Boccassi <bluca@debian.org> [2023-06-10 19:54]:
    I would caution to avoid interpreting clarifying questions being asked
    as dissent. It's good to ask questions and clarify details about
    corner cases, but I wouldn't automatically write them down as
    disagreement. At least that's my reading of recent parts of this
    thread.

    This is also my understanding. And for the record, I want to
    emphasize that I am very much in favor of the plan that Helmut came
    up with, for a number of reasons:

    [Full disclosure: I had a few in-person discussions with Helmut in
    Hamburg last month, so I am probably somewhat biased by now.]


    1. Helmut has shown experimentally that his transition plan can
    work. There are always unknown unknowns, of course, but at the very
    least, we do not have to break any use-cases intentionally.

    2. The transition will leave us in a well-defined state post-trixie
    without the need to add (and continue to maintain) any clutches
    (or "special cases") for dpkg.

    3. Almost all problematic cases can be dealt with by some black
    magic in a single usrmerge-support package. It is not pretty, but it
    will get the job done; a bunch of trickery to make dpkg do the Right
    Thing despite its incomplete knowledge of aliased paths.

    4. We will be able detect the few cases where the Right Thing does
    not happen transparently, and we can even give advance warning to
    affected package maintainers what they should and should not do. If
    the maintainers of those packages pre-upload their transitioned
    packages to experimental for some automated tests and verification,
    we can avoid any breakage in unstable and testing.


    Of course, you do not have to take my word for any of this. I am a
    big fan of Helmut's approach with experimental verification and
    data-driven discovery. Have a look at his published test scripts and
    try to poke holes in them. The more people do this, the more
    confidence we can have that this might actually work after all.

    Hi Helmut,

    Any update on this topic? I believe you were working on a write-up,
    how's that going?

    Kind regards,
    Luca Boccassi

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)