• Bug#1098661: linux: fails to boot on VisionFive 2: Unhandled exception:

    From Bastian Blank@21:1/5 to Aurelien Jarno on Sun Feb 23 21:50:01 2025
    On Sat, Feb 22, 2025 at 11:59:38AM +0100, Aurelien Jarno wrote:
    Starting with version 6.13.3-1~exp1, the riscv64 kernel is shipped as a
    EFI binary with the payload compressed with zstd (using the EFI_ZBOOT
    config option). In addition to breaking non-EFI systems, this change
    simply prevents the kernel to boot on a VisionFive 2 board:

    Please re-assign to the bootloader package.

    Bastian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aurelien Jarno@21:1/5 to Bastian Blank on Sun Feb 23 22:10:01 2025
    On 2025-02-23 21:45, Bastian Blank wrote:
    On Sat, Feb 22, 2025 at 11:59:38AM +0100, Aurelien Jarno wrote:
    Starting with version 6.13.3-1~exp1, the riscv64 kernel is shipped as a
    EFI binary with the payload compressed with zstd (using the EFI_ZBOOT config option). In addition to breaking non-EFI systems, this change
    simply prevents the kernel to boot on a VisionFive 2 board:

    Please re-assign to the bootloader package.

    I disagree. The bootloader is u-boot and while it might be fixable at
    this level, debian should be bootable on the original firmware.

    BTW, you never explained the reason for your changes. It only brings
    smaller kernel nothing more. And a working kernel is better than a
    smaller kernel that does not work.

    --
    Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://aurel32.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aurelien Jarno@21:1/5 to All on Mon Feb 24 19:30:01 2025
    Hi,

    Let me summarize the situation for external reviewers.

    The kernel for riscv64 used to rely on CONFIG_EFI_STUB=y, enabling the
    kernel to be used either as an EFI executable or as conventional ELF
    file. Unlike x86, this requires the kernel to be uncompressed, which is
    why it was shipped as vmlinux. Note that this is not the only
    architecture where the kernel is uncompressed, this is also the case for ppc64el and a many ports architectures.

    Commit 16b5ae589a679 ("[arm64, riscv64] Enable EFI_ZBOOT") [1] changed
    three things for riscv64:
    1) Changed the kernel file that ends up in the package from the
    uncompressed one (arch/riscv/boot/Image) to the compressed one
    (arch/riscv/boot/vmlinuz.efi)
    2) Enabled EFI_ZBOOT to compress the kernel payload and include a
    decompressor in the EFI binary
    3) Changed the kernel compression from GZIP to ZSTD

    Note that technically changes 2 and 3 have basically no effect on the
    resulting package without change 1. Please also note that change 1 was
    done without renaming the kernel from vmlinux to vmlinuz to match the
    (probably non-written) standard so to ship compressed kernels as vmlinuz
    and uncompressed ones as vmlinux. OTOH such a change would have
    probably broken many things.

    This change was made without checking with the porters and without any justification. I quickly noticed the commit and was worried about change
    1, as it basically enforces UEFI booting. Although Debian Installer
    defaults to a UEFI installation with the standard ISO media or UKI
    image, it is technically possible to use a system booting directly from
    U-Boot, which some users prefer (this is particularly useful for
    switching between non-UEFI vendor kernels and debian kernels). In
    addition a non-UEFI kernel is important for KVM, as it currently doesn't support running in S-mode, therefore requiring a non-UEFI kernel to be
    loaded directly without any firmware.

    As a porter I requested on IRC for the riscv64 part of the code to be
    reverted. I was told this is not possible, as Debian Installer does not
    support non-UEFI, that this change will target forky only, and that I
    can simply use a script to extract the payload from the UEFI kernel.

    The situation worsened when I realized that the changes do not even work
    on a real riscv64 board installed using the standard Debian installer:

    | Loading Linux 6.13-riscv64 ...
    | Loading initial ramdisk ...
    | EFI stub: Decompressing Linux Kernel...
    | Unhandled exception: Store/AMO access fault
    | EPC: 00000000fb64a6ea RA: 00000000fb64a6da TVAL: 0000000040020020
    | EPC: 000000003b9046ea RA: 000000003b9046da reloc adjusted
    |
    | Code: 0506 9526 4783 0015 4703 0005 3583 ed84 (0e23 fef9)
    | UEFI image [0x00000000fe6aa000:0x00000000fe6d0fff] '/efi\boot\bootriscv64.efi'
    | UEFI image [0x00000000fb646000:0x00000000fbe933ff] pc=0x46ea
    |
    |
    | resetting ...
    | reset not supported yet
    | ### ERROR ### Please RESET the board ###

    Sure this has been tested as mentioned in the MR [2], but it appears
    that booting a kernel with QEMU + EDK2 is not comparable to booting a
    kernel with a real board + U-Boot + Grub. I agree that there is an issue
    in the firmware / bootloader / kernel stack (my current wild guess is
    that it's a Grub issue), but still that change currently results in a non-working kernel.

    At this stage I have not seen a strong arguments for the original
    commit. The reason that have been given a posteriori are:
    - Smaller images, so often faster load times.
    - Feature parity between architectures.
    - Fullfils the interface (U)EFI and works fine in edk2.

    I don't believe the above reasons are enough to enforce UEFI only
    kernel and break the boot on existing boards. In addition the "forky
    only" argument doesn't stand as many newer riscv64 devices are expected
    during the lifetime of trixie and will require a kernel from
    trixie-backports. That is why I submitted a MR [3] to revert the riscv64 specific part of the commit.

    Regards
    Aurelien


    [1] https://salsa.debian.org/kernel-team/linux/-/commit/16b5ae589a679acbc9e43de9cb691f42fe058068
    [2] https://salsa.debian.org/kernel-team/linux/-/merge_requests/1362
    [3] https://salsa.debian.org/kernel-team/linux/-/merge_requests/1384

    --
    Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://aurel32.net

    -----BEGIN PGP SIGNATURE-----

    iQIzBAEBCgAdFiEEUryGlb40+QrX1Ay4E4jA+JnoM2sFAme8t+QACgkQE4jA+Jno M2vimg//YRYVFjyrEfAt3iW2/ndYMCioUGQ0RvTV8bAjKzUpYM2ddOLRug+4SWXk IaCOASgmmNyXRS1Al3XvAcNJfFYwivJMNrhcwSprG+6hjm5p/cRicd9eQWwDB9Ty FjA1tscyapbqJrV3INonPkWYZefl13oqdAT/QAJsi453T42yZRtbiI6pzPLh2DPQ gwkhf5dAXgMoyEN6ms+Zs32Dbi/3UjLyh1tmd0IA7M5wi8tTLr+rQW3vdaIgjg3o eel/qsgjRLgdd6OXNQBZIBkpZXyTqc8wXzl7j6eUP0z1uu76Q2oKfCz0oYGhtcG6 S+ylBXpo5uFDkzknw9HaZcAa/v/G+s882cYYwrNZ5tC6oo6XQuZrlh8BiqQN1IpY RPpJGYuikPvB4CQuvEuLpSLHWtQbLQtJK38YUk9uT3Sy0TybCNo6AoGvm6w+CdzF aCoHxXysyRy/nfs1fr7JQyJQKtuFHf/648jQ58Fpe0C7VxRAo/xr6rRYeMMffsex ln8WbR+zcTPHn6gkK2iNGO1gopNI0/d+AZCpIZMGdTsQS+wN7f0+sCCTm3zCwfzF KuqbdPRDCY3Nx+CzyKSK6zTgyAM7v8S7UMimUtuPWHcs8vhqBcyoXiVtLziJ2c7f nw8oK4vsRNSl8x7KCMWAG2rXxgkuRjopl0/86m368svC4o+7t6w=
    =9gIN
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet
  • From Aurelien Jarno@21:1/5 to Bastian Blank on Mon Feb 24 23:00:01 2025
    On 2025-02-24 22:32, Bastian Blank wrote:
    On Sat, Feb 22, 2025 at 11:59:38AM +0100, Aurelien Jarno wrote:
    | Loading Linux 6.13-riscv64 ...
    | Loading initial ramdisk ...
    | EFI stub: Decompressing Linux Kernel...
    | Unhandled exception: Store/AMO access fault
    | EPC: 00000000fb64a6ea RA: 00000000fb64a6da TVAL: 0000000040020020
    | EPC: 000000003b9046ea RA: 000000003b9046da reloc adjusted
    |
    | Code: 0506 9526 4783 0015 4703 0005 3583 ed84 (0e23 fef9)
    | UEFI image [0x00000000fe6aa000:0x00000000fe6d0fff] '/efi\boot\bootriscv64.efi'
    | UEFI image [0x00000000fb646000:0x00000000fbe933ff] pc=0x46ea

    I digged a bit. Yes, this is the file from linux-image-6.13-riscv64_6.13.3-1~exp1_riscv64.deb. It contains the mentioned instructions:

    | 46da: 0506 slli a0,a0,0x1
    | 46dc: 9526 add a0,a0,s1
    | 46de: 00154783 lbu a5,1(a0)
    | 46e2: 00054703 lbu a4,0(a0)
    | 46e6: ed843583 ld a1,-296(s0)
    | 46ea: fef90e23 sb a5,-4(s2)

    I did not manage to get the crash you mentioned. The u-boot out of u-boot-qemu_2024.01+dfsg-7_all.deb can start both the uncompressed EFI

    I have not been able to reproduce the crash under QEMU. I believe it
    could be due to the fact that QEMU doesn't trap unaligned accesses. So
    far I only reproduced the issue on real hardware.

    It works fine when the kernel is directly started from U-Boot with
    bootefi. It only fails when U-Boot launches Grub and Grub launches the
    EFI file.

    file and the zboot compressed one. Sadly it fails unrelated shortly
    after that in both cases.

    You should use OpenSBI as the bios, and U-Boot in S-mode as the kernel.

    --
    Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://aurel32.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Blank@21:1/5 to Aurelien Jarno on Tue Feb 25 13:40:01 2025
    On Mon, Feb 24, 2025 at 10:50:58PM +0100, Aurelien Jarno wrote:
    It works fine when the kernel is directly started from U-Boot with
    bootefi. It only fails when U-Boot launches Grub and Grub launches the
    EFI file.

    Okay, so exactly the one use case "hardware -> u-boot -> grub -> kernel"
    does not work. Also I found reports that it seems to work for others on
    this hardware.[1] So this whole ordeal is not a bug fix, but a workaround
    for another as yet not identified bug in either of the components.

    So, I see the following steps to see what the heck happens:
    - Upgrade u-boot. The version in Debian is one year old and several new
    releases exist since then.
    - Build u-boot with SHOW_REGS to see what exactly it failed on. The
    already shown TVAL register should contain the trapping address and
    that is pretty near to the loaded u-boot.
    - Try to find what this code is for. Sadly the Linux package does not
    retain debugging infos for the EFI wrappers.
    - Change the instruction into a trap to be able to see the same error in
    other environments and compare.

    You should use OpenSBI as the bios, and U-Boot in S-mode as the kernel.

    Yeah, thanks, found that as well. With that Linux is able to boot
    correctly.

    Bastian

    [1]: At least I read the last lines in this log this way https://libera.irclog.whitequark.org/u-boot/2024-05-10
    --
    If I can have honesty, it's easier to overlook mistakes.
    -- Kirk, "Space Seed", stardate 3141.9

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)