• Bug#1104670: linux-image-6.12.25-amd64: system does not shut down - GHE

    From Laurent Bonnaud@1:229/2 to All on Sun May 4 14:00:01 2025
    XPost: linux.debian.bugs.dist
    From: L.Bonnaud@laposte.net

    Hi,

    this bug is similar to bug #1053750 and bug #1034718 that have been archived.

    In the 6.1.x kernel branch, the problem has become worse:

    - Previously the kernel would output an error in /var/lib/systemd/pstore/ but would shutdown anyway.

    - Now, with kernel 6.1.135-1, the shutdown is blocked as with 6.12.x kernels (see below).

    --
    Laurent.

    <30>[ 961.098671] systemd-shutdown[1]: Rebooting.
    <6>[ 961.098743] kvm: exiting hardware virtualization
    <6>[ 961.361878] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
    <6>[ 961.414526] ACPI: PM: Preparing to enter system sleep state S5
    <0>[ 963.828210] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5
    <0>[ 963.828213] {1}[Hardware Error]: event severity: fatal
    <0>[ 963.828214] {1}[Hardware Error]: Error 0, type: fatal
    <0>[ 963.828216] {1}[Hardware Error]: section_type: PCIe error
    <0>[ 963.828216] {1}[Hardware Error]: port_type: 0, PCIe end point
    <0>[ 963.828217] {1}[Hardware Error]: version: 3.0
    <0>[ 963.828218] {1}[Hardware Error]: command: 0x0002, status: 0x0010
    <0>[ 963.828220] {1}[Hardware Error]: device_id: 0000:01:00.1
    <0>[ 963.828221] {1}[Hardware Error]: slot: 6
    <0>[ 963.828222] {1}[Hardware Error]: secondary_bus: 0x00
    <0>[ 963.828223] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x1563 <0>[ 963.828224] {1}[Hardware Error]: class_code: 020000
    <0>[ 963.828225] {1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00018000
    <0>[ 963.828226] {1}[Hardware Error]: aer_uncor_severity: 0x000ef010
    <0>[ 963.828227] {1}[Hardware Error]: TLP Header: 40000001 0000000f 90028090 00000000
    <0>[ 963.828229] GHES: Fatal hardware error but panic disabled
    <0>[ 963.828230] Kernel panic - not syncing: GHES: Fatal hardware error
    <4>[ 963.828231] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-34-amd64 #1 Debian 6.1.135-1
    <4>[ 963.828234] Hardware name: Dell Inc. PowerEdge R540/0PRWNC, BIOS 2.23.0 01/09/2025
    <4>[ 963.828235] Call Trace:
    <4>[ 963.828238] <NMI>
    <4>[ 963.828240] dump_stack_lvl+0x44/0x5c
    <4>[ 963.828247] panic+0x118/0x2f4
    <4>[ 963.828253] __ghes_panic.cold+0x28/0x28
    <4>[ 963.828258] ghes_notify_nmi+0x1db/0x370
    <4>[ 963.828263] nmi_handle+0x5a/0x120
    <4>[ 963.828269] default_do_nmi+0x40/0x130
    <4>[ 963.828273] exc_nmi+0x11e/0x150
    <4>[ 963.828276] end_repeat_nmi+0x16/0x67
    <4>[ 963.828281] RIP: 0010:mwait_idle_with_hints.constprop.0+0x48/0x90
    <4>[ 963.828286] Code: 48 89 d1 65 48 8b 04 25 80 fb 01 00 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 8f ac b0 00 b9 01 00 00 00 48 89 f8 0f 01 c9 <65> 48 8b 04 25 80 fb 01 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b
    <4>[ 963.828287] RSP: 0018:ffffffffb2e03e18 EFLAGS: 00000046
    <4>[ 963.828290] RAX: 0000000000000020 RBX: 0000000000000003 RCX: 0000000000000001
    <4>[ 963.828292] RDX: 0000000000000000 RSI: ffffffffb2fa0160 RDI: 0000000000000020
    <4>[ 963.828293] RBP: 0000000000000003 R08: 0000000000000002 R09: 000000003a518aaa
    <4>[ 963.828295] R10: 0000000000000018 R11: 000000000000afc8 R12: ffffffffb2fa0160
    <4>[ 963.828296] R13: ffffffffb2fa02b0 R14: 0000000000000003 R15: 0000000000000000
    <4>[ 963.828300] ? mwait_idle_with_hints.constprop.0+0x48/0x90
    <4>[ 963.828303] ? mwait_idle_with_hints.constprop.0+0x48/0x90
    <4>[ 963.828305] </NMI>
    <4>[ 963.828306] <TASK>
    <4>[ 963.828307] intel_idle_ibrs+0x75/0x90
    <4>[ 963.828309] cpuidle_enter_state+0x89/0x420
    <4>[ 963.828315] cpuidle_enter+0x29/0x40
    <4>[ 963.828317] do_idle+0x202/0x2a0
    <4>[ 963.828323] cpu_startup_entry+0x26/0x30
    <4>[ 963.828326] rest_init+0xca/0xd0
    <4>[ 963.828328] arch_call_rest_init+0xa/0x14
    <4>[ 963.828333] start_kernel+0x70a/0x733
    <4>[ 963.828336] secondary_startup_64_no_verify+0xe5/0xeb
    <4>[ 963.828343] </TASK>
    <0>[ 963.828357] Kernel Offset: 0x30400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Laurent Bonnaud@1:229/2 to All on Sun May 4 11:50:01 2025
    XPost: linux.debian.bugs.dist
    From: L.Bonnaud@laposte.net

    Package: src:linux
    Version: 6.12.25-1
    Severity: normal

    Dear Maintainer,

    when I try to reboot this system (by entering the "reboot" command), the screen becomes black and then nothing happens. The system never finishes its shutdown.

    Here is a debug log from /var/lib/systemd/pstore:

    <30>[ 642.476392] systemd-shutdown[1]: Rebooting.
    <6>[ 642.763811] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
    <6>[ 643.140087] ACPI: PM: Preparing to enter system sleep state S5
    <0>[ 646.279684] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5
    <0>[ 646.279686] {1}[Hardware Error]: event severity: fatal
    <0>[ 646.279688] {1}[Hardware Error]: Error 0, type: fatal
    <0>[ 646.279690] {1}[Hardware Error]: section_type: PCIe error
    <0>[ 646.279691] {1}[Hardware Error]: port_type: 0, PCIe end point
    <0>[ 646.279692] {1}[Hardware Error]: version: 3.0
    <0>[ 646.279693] {1}[Hardware Error]: command: 0x0002, status: 0x0010
    <0>[ 646.279694] {1}[Hardware Error]: device_id: 0000:01:00.1
    <0>[ 646.279696] {1}[Hardware Error]: slot: 6
    <0>[ 646.279697] {1}[Hardware Error]: secondary_bus: 0x00
    <0>[ 646.279698] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x1563 <0>[ 646.279699] {1}[Hardware Error]: class_code: 020000
    <0>[ 646.279700] {1}[Hardware Error]: aer_cor_status: 0x00002000, aer_cor_mask: 0x000031c1
    <0>[ 646.279701] {1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00018000
    <0>[ 646.279702] {1}[Hardware Error]: aer_uncor_severity: 0x000ef010
    <0>[ 646.279703] {1}[Hardware Error]: TLP Header: 40000001 0000030f 90028090 00000000
    <0>[ 646.279706] GHES: Fatal hardware error but panic disabled
    <0>[ 646.279707] Kernel panic - not syncing: GHES: Fatal hardware error
    <4>[ 646.279709] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.12.25-amd64 #1 Debian 6.12.25-1
    <4>[ 646.279714] Tainted: [W]=WARN
    <4>[ 646.279714] Hardware name: Dell Inc. PowerEdge R540/0PRWNC, BIOS 2.23.0 01/09/2025
    <4>[ 646.279716] Call Trace:
    <4>[ 646.279719] <NMI>
    <4>[ 646.279721] dump_stack_lvl+0x5d/0x80
    <4>[ 646.279727] panic+0x118/0x2db
    <4>[ 646.279734] __ghes_panic.cold+0x28/0x28
    <4>[ 646.279738] ghes_notify_nmi+0x30e/0x3b0
    <4>[ 646.279742] nmi_handle+0x5e/0x120
    <4>[ 646.279747] default_do_nmi+0x40/0x130
    <4>[ 646.279751] exc_nmi+0x122/0x1a0
    <4>[ 646.279754] end_repeat_nmi+0xf/0x53
    <4>[ 646.279758] RIP: 0010:intel_idle_ibrs+0x87/0x100
    <4>[ 646.279762] Code: 3e 0f ae f0 31 d2 48 89 f0 48 89 d1 0f 01 c8 48 8b 06 a8 08 75 14 66 90 0f 00 2d 80 1a 41 00 b9 01 00 00 00 4c 89 c8 0f 01 c9 <f0> 80 66 02 df f0 83 44 24 fc 00 48 8b 06 a8 08 74 0b 65 81 25 84
    <4>[ 646.279764] RSP: 0018:ffffffffb5603e28 EFLAGS: 00000046
    <4>[ 646.279767] RAX: 0000000000000020 RBX: 0000000000000003 RCX: 0000000000000001
    <4>[ 646.279768] RDX: 0000000000000000 RSI: ffffffffb5610940 RDI: 0000000000000001
    <4>[ 646.279770] RBP: ffffffffb57b6e40 R08: 0000000000000000 R09: 0000000000000020
    <4>[ 646.279771] R10: 0000000000000008 R11: ffff96781fc34764 R12: ffffffffb57b6e40
    <4>[ 646.279772] R13: ffffffffb57b6f90 R14: 0000000000000003 R15: 0000000000000000
    <4>[ 646.279776] ? intel_idle_ibrs+0x87/0x100
    <4>[ 646.279780] ? intel_idle_ibrs+0x87/0x100
    <4>[ 646.279783] </NMI>
    <4>[ 646.279783] <TASK>
    <4>[ 646.279784] cpuidle_enter_state+0x7e/0x420
    <4>[ 646.279788] cpuidle_enter+0x2d/0x40
    <4>[ 646.279792] do_idle+0x1e5/0x240
    <4>[ 646.279798] cpu_startup_entry+0x29/0x30
    <4>[ 646.279800] rest_init+0xcc/0xd0
    <4>[ 646.279803] start_kernel+0x74c/0x750
    <4>[ 646.279809] x86_64_start_reservations+0x24/0x30
    <4>[ 646.279813] x86_64_start_kernel+0x95/0xa0
    <4>[ 646.279816] common_startup_64+0x13e/0x141
    <4>[ 646.279823] </TASK>
    <0>[ 646.279837] Kernel Offset: 0x32a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

    Thanks,

    -- Package-specific info:
    ** Version:
    Linux version 6.12.25-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.25-1 (2025-04-25)

    ** Command line:
    BOOT_IMAGE=/boot/vmlinuz-6.12.25-amd64 root=UUID=d4026c7c-61cc-435f-81c5-76194e22454e ro quiet net.ifnames=0

    ** Tainted: W (512)
    * kernel issued warning

    ** Kernel log:
    Unable to read kernel log; any relevant messages should be attached

    ** Model information
    sys_vendor: Dell Inc.
    product_name: PowerEdge R540
    product_version:
    chassis_vendor: Dell Inc.
    chassis_version:
    bios_vendor: Dell Inc.
    bios_version: 2.23.0
    board_vendor: Dell Inc.
    board_name: 0PRWNC
    board_version: A07

    ** Configuration for modprobe:
    blacklist acpi_power_meter
    blacklist arkfb
    blacklist aty128fb
    blacklist atyfb
    blacklist radeonfb
    blacklist cirrusfb
    blacklist cyber2000fb
    blacklist kyrofb
    blacklist matroxfb_base
    blacklist mb862xxfb
    blacklist neofb
    blacklist pm2fb
    blacklist pm3fb
    blacklist s3fb
    blacklist savagefb
    blacklist sisfb
    blacklist tdfxfb
    blacklist tridentfb
    blacklist vt8623fb

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)