• Bug#980148: mesa-vulkan-drivers: file content conflict in Multi-Arch:sa

    From =?UTF-8?Q?Michel_D=C3=A4nzer?=@21:1/5 to Helmut Grohne on Mon Apr 7 10:40:01 2025
    On 2025-04-04 20:42, Helmut Grohne wrote:
    On Fri, Jan 15, 2021 at 12:06:14PM +0100, Michel Dänzer wrote:
    On 2021-01-15 12:02 p.m., Thorsten Glaser wrote:
    Package: mesa-vulkan-drivers
    […]
    Multi-Arch: same

    The file /usr/share/vulkan/icd.d/intel_icd.x86_64.json differs.

    amd64:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
    }

    x32:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "/usr/lib/x86_64-linux-gnux32/libvulkan_intel.so" >>> },
    "file_format_version": "1.0.0"
    }

    This file must be moved out of /usr/share and into a multiarch library
    path.

    Looks to me like the filename is wrong on x32.

    How do you reach that conclusion? At least the multiarch tuple is
    appropriate for x32.

    Actually, I think I was referring to the name of the JSON file containing x86_64 instead of something x32 specific.


    --
    Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to All on Mon Apr 7 17:20:02 2025
    On Mon, Apr 07, 2025 at 10:26:58AM +0200, Michel Dänzer wrote:
    Actually, I think I was referring to the name of the JSON file containing x86_64 instead of something x32 specific.

    Right. And there we have the same problem as with ARM32. The filename
    only encodes the CPU, but Debian supports multiple ABIs on the same CPU.

    The question remains whether the filename can be expanded to cover all
    the variance in Debian architectures or whether M-A:same should be
    dropped.

    This also poses a problem when varying the kernel (e.g. Hurd) or the
    libc (e.g. musl). In both cases, I do not expect the name of the JSON
    file to change but the contained tuple will change.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Helmut Grohne on Mon Apr 14 18:40:01 2025
    On Fri, 04 Apr 2025 at 20:42:48 +0200, Helmut Grohne wrote:
    On Fri, Jan 15, 2021 at 12:06:14PM +0100, Michel Dänzer wrote:
    On 2021-01-15 12:02 p.m., Thorsten Glaser wrote:
    Package: mesa-vulkan-drivers
    […]
    Multi-Arch: same

    The file /usr/share/vulkan/icd.d/intel_icd.x86_64.json differs.

    amd64:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
    }

    x32:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "/usr/lib/x86_64-linux-gnux32/libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
    }
    ...
    A similar problem exists for armel and armhf. Both install /usr/share/vulkan/icd.d/lvp_icd.armv8l.json and /usr/share/vulkan/icd.d/radeon_icd.armv8l.json.
    ...
    Given that these files reference shared libraries, they are inherently architecture-dependent and that makes them technically inappropriate to
    ship below /usr/share.

    Sort of, it depends how you look at them. These manifest files declare that a particular Vulkan driver exists, and some facts about it; that remains a true statement even if Vulkan is currently being initialized by the Vulkan-Loader (libvulkan) from an architecture that cannot load this particular driver. Loaders are expected to be able to recognise that a particular driver is not for them, and gracefully not load it. In practice this works fine, because all of our architectures can be distinguished by their ELF headers (and if that wasn't the case, multiarch co-installation of ordinary shared libraries would go badly wrong).

    The problem here is that Mesa's upstream build system is trying to disambiguate the manifests' filenames in order to avoid collisions, but is doing so with an architecture name that is not sufficiently unique: namely Meson's cpu(), which does not vary between architectures that run on essentially the same hardware and differ only by ABI design choices, like amd64/x32 (word size) and armel/armhf (whether to assume and use hardware floating point support).

    Meson's cpu() also does not distinguish between ABIs that have the same instruction set but different endianness, so I believe we would have a
    similar collision between ppc64 and ppc64el, or between mips and mipsel.

    I can see two ways to resolve #980148 without needing to change the
    search path for Vulkan drivers:

    1. As far as I'm aware, the basename of these files never matters: all
    that matters is their content. So Mesa's debian/rules could do something
    like this (assuming file-rename(1p) from the rename package):

    file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \
    debian/tmp/usr/share/vulkan/icd.d/*.json

    to replace the "x86_64" or "armv8l" part of the filename with a string
    that is definitely distinct for each pair of Debian architectures,
    resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json.

    Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer
    filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary
    if we want to allow mesa-vulkan-drivers:amd64,
    mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64
    to be co-installed.

    Mesa upstream probably will not want to do this because they don't have
    a better taxonomy of architectures than what Meson provides, but it
    would be fairly easy to do in debian/rules between dh_auto_install and
    dh_install, for example with the file-rename(1) invocation above.

    Or, maybe Mesa upstream would be willing to accept a patch adding a
    build option like 'architecture_string', to be used in these filenames
    instead of Meson's cpu() if non-empty, and we could run
    `meson setup -Darchitecture_string="${DEB_HOST_ARCH}"`?
    But that seems like something that would be better done post-trixie.

    2. Or, Mesa could give its Vulkan drivers the same file layout as its
    Vulkan layers (which happens to be the same as the Nvidia proprietary
    driver's Vulkan driver), taking advantage of the fact that on Debian, each
    of its drivers is installed into ld.so's default load path for shared
    libraries. So instead of hard-coding the full path of the library, it could
    set the library_path field to be just the basename, resulting in the same
    JSON content on every architecture:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
    }

    and then rename the file to a name that is intentionally the same
    for every architecture (like intel_icd.json), so that they *always*
    collide, and dpkg's multiarch refcounting resolves this by only keeping
    one copy.

    Mesa upstream probably will not want to do this by default because they
    have to assume that their users might be installing Mesa into a non-default
    prefix like /opt/mesa25 where their driver library would not be found
    without using an absolute or relative path, but it would be reasonably easy
    to implement this in debian/rules with some file-rename and sed, again
    between dh_auto_install and dh_install.

    Or maybe Mesa upstream would accept a build option to make it switch
    the generated JSON to be this way instead, although, again, that seems
    like something for post-trixie.

    I would recommend choosing (1.) or (2.) but not both: I've seen reports that loading the same Vulkan driver more than once can break things, so we should avoid having an intel_icd.x86_64.json and intel_icd.i686.json that both describe "library_path": "libvulkan_intel.so".

    Would it be feasible to transition these files
    [from] /usr/share/vulkan/icd.d to /usr/lib/<triplet>/vulkan/icd.d?

    I don't think that would be a great idea. There is a non-Debian-specific specification for how Vulkan drivers and layers are to be discovered, and components outside Debian can and do rely on it (in particular Steam's container runtime framework relies on knowing how to find Vulkan drivers, independent of libvulkan).

    If the search path does need to be varied by architecture, it should preferably be done on an "upstream first" basis, to minimize the number of Debianisms - probably by adding the compile-time LIBDIR to https://github.com/KhronosGroup/Vulkan-Loader/blob/main/docs/LoaderLayerInterface.md#linux-layer-discovery
    and its reference implementation, after EXTRASYSCONFDIR but before $XDG_DATA_HOME. Prior art for this is that the freedesktop.org Flatpak
    runtimes (which have a Debian-like multiarch layout for ${libdir}) patch
    or reconfigure their Vulkan loader to search /usr/lib/${multiarch}/GL/vulkan/whatever.d and /usr/lib/${multiarch}/vulkan/whatever.d at that position in the search
    path. (However, this has resulted in me having to maintain extra code in Steam's container runtime framework to dectect this divergence from upstream and compensate for it.)

    If this was done, it's the sort of coordinated transition (mesa + vulkan-loader + possibly others) that we shouldn't be doing for trixie at this stage unless there's no alternative. So I would recommend choosing one of the two strategies I suggested above, or some similar option that doesn't involve a transition.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to Simon McVittie on Mon Apr 14 22:00:01 2025
    Hi Simon,

    On Mon, Apr 14, 2025 at 05:23:01PM +0100, Simon McVittie wrote:
    Loaders are expected to be able to recognise that a particular driver is not for them, and gracefully not load it. In practice this works fine, because all
    of our architectures can be distinguished by their ELF headers (and if that wasn't the case, multiarch co-installation of ordinary shared libraries would go badly wrong).

    I'm sorry to disappoint you, but reality is not like that.

    You can actually run kfreebsd-amd64 binaries on a Linux kernel as their
    ELF header looks the same. Not that they do useful stuff, but they may
    go far enough as to reset your system clock. I've actually encountered
    that.

    Then, if you combine armel and armhf, those architectures also have ELF
    headers that are mostly indistinguishable. I'm not sure what happens
    exactly, but it isn't good.

    What also gets interesting is when you try to combine e.g. amd64 and musl-linux-amd64. Those also do not tell apart from their ELF header.

    The elf-arch tool from arch-test attempts to map ELF headers to Debian architectures, but it can only do so much.

    So no, as long as we support armel and armhf simultaneously, we cannot
    tell architectures apart by their ELF header.

    The problem here is that Mesa's upstream build system is trying to disambiguate
    the manifests' filenames in order to avoid collisions, but is doing so with an
    architecture name that is not sufficiently unique: namely Meson's cpu(), which
    does not vary between architectures that run on essentially the same hardware and differ only by ABI design choices, like amd64/x32 (word size) and armel/armhf (whether to assume and use hardware floating point support).

    I concur.

    Meson's cpu() also does not distinguish between ABIs that have the same instruction set but different endianness, so I believe we would have a similar collision between ppc64 and ppc64el, or between mips and mipsel.

    And libc!

    I can see two ways to resolve #980148 without needing to change the
    search path for Vulkan drivers:

    1. As far as I'm aware, the basename of these files never matters: all
    that matters is their content. So Mesa's debian/rules could do something
    like this (assuming file-rename(1p) from the rename package):

    file-rename 's/(.*)\.([^.]+?)\.json$/$1.$ENV{DEB_HOST_ARCH}.json/' \
    debian/tmp/usr/share/vulkan/icd.d/*.json

    to replace the "x86_64" or "armv8l" part of the filename with a string
    that is definitely distinct for each pair of Debian architectures,
    resulting in filenames like intel_icd.amd64.json and intel_icd.x32.json.

    Or it could use $ENV{DEB_HOST_MULTIARCH} for longer-but-maybe-clearer
    filenames like intel_icd.x86_64-linux-gnu.json, which would be necessary
    if we want to allow mesa-vulkan-drivers:amd64,
    mesa-vulkan-drivers:hurd-amd64 and mesa-vulkan-drivers:kfreebsd-amd64
    to be co-installed.

    Mesa upstream probably will not want to do this because they don't have
    a better taxonomy of architectures than what Meson provides, but it
    would be fairly easy to do in debian/rules between dh_auto_install and
    dh_install, for example with the file-rename(1) invocation above.

    Or, maybe Mesa upstream would be willing to accept a patch adding a
    build option like 'architecture_string', to be used in these filenames
    instead of Meson's cpu() if non-empty, and we could run
    `meson setup -Darchitecture_string="${DEB_HOST_ARCH}"`?
    But that seems like something that would be better done post-trixie.

    This sounds very reasonable to me. Including the post-trixie part.

    2. Or, Mesa could give its Vulkan drivers the same file layout as its
    Vulkan layers (which happens to be the same as the Nvidia proprietary
    driver's Vulkan driver), taking advantage of the fact that on Debian, each
    of its drivers is installed into ld.so's default load path for shared
    libraries. So instead of hard-coding the full path of the library, it could
    set the library_path field to be just the basename, resulting in the same
    JSON content on every architecture:

    {
    "ICD": {
    "api_version": "1.2.145",
    "library_path": "libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
    }

    and then rename the file to a name that is intentionally the same
    for every architecture (like intel_icd.json), so that they *always*
    collide, and dpkg's multiarch refcounting resolves this by only keeping
    one copy.

    Mesa upstream probably will not want to do this by default because they
    have to assume that their users might be installing Mesa into a non-default
    prefix like /opt/mesa25 where their driver library would not be found
    without using an absolute or relative path, but it would be reasonably easy
    to implement this in debian/rules with some file-rename and sed, again
    between dh_auto_install and dh_install.

    Or maybe Mesa upstream would accept a build option to make it switch
    the generated JSON to be this way instead, although, again, that seems
    like something for post-trixie.

    Given what I said earlier about the inability to tell ELF headers apart
    and the real problems observed in trying to do so, I have a preference
    for the first option.

    I don't think that would be a great idea. There is a non-Debian-specific specification for how Vulkan drivers and layers are to be discovered, and components outside Debian can and do rely on it (in particular Steam's container runtime framework relies on knowing how to find Vulkan drivers, independent of libvulkan).

    Fair enough.

    If this was done, it's the sort of coordinated transition (mesa + vulkan-loader
    + possibly others) that we shouldn't be doing for trixie at this stage unless there's no alternative. So I would recommend choosing one of the two strategies
    I suggested above, or some similar option that doesn't involve a transition.

    In general, I doubt we fix this for trixie other than dropping M-A:same
    maybe.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Simon McVittie on Wed Apr 23 21:00:01 2025
    Control: tags -1 + patch

    On Wed, 23 Apr 2025 at 15:59:00 +0100, Simon McVittie wrote:
    I think the "option 2" that I proposed is entirely feasible for trixie, >actually. I'm testing an implementation now.

    https://salsa.debian.org/xorg-team/lib/mesa/-/merge_requests/55 works successfully on my Intel system. I haven't verified that it results in co-installable Mesa for armel + armhf, but it should (it makes each pair
    of architectures essentially equivalent to any other pair).

    Patch attached here for convenience; if I see a maintainer response on
    the MR, I will assume that the MR is canonical and stop sending updated
    patches to the bug.

    Or if the mesa maintainers think the change I'm proposing is too
    intrusive at this stage of the release process, the other possibilities
    I see here would be:

    - drop the severity, with the justification that this is a Policy
    violation, but maybe not a bad enough Policy violation to qualify as
    a "serious" one, since dpkg correctly detects the conflict and prevents
    file loss, and multiarch armel + armhf systems are presumably rare in
    practice

    - or ask the release team for a trixie-ignore tag

    For forky, it might be a good idea to talk to Mesa upstream about
    whether Mesa's Vulkan drivers can be made to behave more like its Vulkan
    layers and EGL driver, generating JSON manifests that have simple names
    like `virtio_icd.json` and contain a simple basename like
    `"library_path": "libvulkan_virtio.so"`. But I think that's out of scope
    for trixie.

    Thanks,
    smcv

    From 82eed41b2456c20eac16f86456a8c583753a94f2 Mon Sep 17 00:00:00 2001
    From: Simon McVittie <smcv@debian.org>
    Date: Wed, 23 Apr 2025 15:27:11 +0100
    Subject: [PATCH] Share a single JSON manifest per Vulkan driver between all
    architectures

    Because our Vulkan driver libraries are installed to a directory in the
    dynamic linker's search path, we can list them in their JSON manifest
    by the library's basename rather than its absolute path. This makes the
    content of the JSON manifest the same for each architecture, so it can
    be a single file shared between architectures via dpkg's multiarch file reference-counting.

    This avoids multiarch file collisions between pairs of architectures
    that have the same Meson CPU name but a different library directory,
    such as armel and armhf.

    The JSON manifests for Mesa's EGL driver and Vulkan layers, and for the
    Nvidia proprietary driver's Vulkan driver, are already implemented
    this way.

    Closes: #980148
    Signed-off-by: Simon McVittie <smcv@debian.org>
    ---
    debian/merge-vulkan-driver-manifests.sh | 37 +++++++++++++++++++++++++
    debian/rules | 1 +
    2 files changed, 38 insertions(+)
    create mode 100755 debian/merge-vulkan-driver-manifests.sh

    diff --git a/debian/merge-vulkan-driver-manifests.sh b/debian/merge-vulkan-driver-manifests.sh
    new file mode 100755
    index 00000000000..199867ccc05
    --- /dev/null
    +++ b/debian/merge-vulkan-driver-manifests.sh
    @@ -0,0 +1,37 @@
    +#!/bin/sh
    +# Copyright 2025 Collabora Ltd.
    +# SPDX-License-Identifier: MIT
    +
    +# Usage: debian/merge-vulkan-driver-manifests.sh debian/tmp
    +# DEB_HOST_MULTIARCH must be set in the environment.
    +#
    +# If the JSON manifest describing a Vulkan driver contains for example
    +# "library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_lvp.so"
    +# then replace it with
    +# "library_path": "libvulkan_lvp.so"
    +# to get the same content for each architecture, and rename from for example +# "lvp_icd.x86_64.json"
    +# to
    +# "lvp_icd.json"
    +# so that the same JSON manifes