• 6.1.0: NVME drive goes offline randomly even with: nvme_core.default_ps

    From Justin Piszcz@21:1/5 to All on Mon Jul 1 14:40:01 2024
    Hello,

    Note: I've also followed up on the LKML with this inquiry but as
    Debian stable uses an older kernel (6.1.0), I was wondering if anyone
    on this list has run into this problem?

    Kernel: 6.1.0-17-amd64
    Distribution: Debian stable
    Arch: x86_64

    I have 2 NVME drives as part of a BTRFS RAID-1, initially when this
    happened the first time I added the following to the kernel cmdline at
    boot:
    nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

    This greatly reduced the frequency of this issue (last uptime was ~70
    days). However, it has occurred twice since then, this time I had
    netconsole up to capture the crash.

    The full kernel netconsole before during and after the crash: https://installkernel.tripod.com/20240701-6.1.0-crash.txt

    The model & firmware version of both drives are identical:
    Model Number: Samsung SSD 990 PRO with Heatsink 4TB Firmware Version: 4B2QJXD7

    Motherboard being used:
    Manufacturer: ASUSTeK COMPUTER INC.
    Product Name: Pro WS W680-ACE IPMI

    Is there a workaround or potential fix for this issue?

    The issue starts when this occurs:
    [6078737.345641] nvme nvme2: I/O 154 (I/O Cmd) QID 6 timeout, aborting [6078737.348143] nvme nvme2: I/O 155 (I/O Cmd) QID 6 timeout, aborting

    Then later, a kernel panic:
    [6078894.702941] BTRFS error (device nvme0n1p2): error writing primary
    super block to device 2
    [6078894.707920] BTRFS warning (device nvme0n1p2): csum hole found for
    disk bytenr range [3659038877598419968, 3659038877598424064)
    [6078894.708310] BTRFS critical (device nvme0n1p2): unable to find
    chunk map for logical 3659038877598419968 length 4096
    [6078894.708652] BUG: kernel NULL pointer dereference, address: 000000000000005a
    [6078894.708879] #PF: supervisor read access in kernel mode
    [6078894.709107] #PF: error_code(0x0000) - not-present page
    [6078894.709292] PGD 0 P4D 0
    [6078894.709509] Oops: 0000 [#1] PREEMPT SMP NOPTI
    [6078894.709692] CPU: 12 PID: 3349611 Comm: kworker/u64:18 Not tainted 6.1.0-17-amd64 #1 Debian 6.1.69-1
    [6078894.709856] Hardware name: ASUSTeK COMPUTER INC. System Product
    Name/Pro WS W680-ACE IPMI, BIOS 3401 03/19/2024
    [6078894.710022] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] [6078894.710267] RIP: 0010:btrfs_get_io_geometry+0x13/0xf0 [btrfs] [6078894.710483] Code: f4 ff ff ff e9 67 ff ff ff 66 66 2e 0f 1f 84 00
    00 00 00 00 0f 1f 00 0f 1f 44 00 00 41 56 49 89 c9 48 89 cf 41 55 41
    54 55 53 <4c> 8b 76 70 89 d3 31 d2 4c 8b 5e 18 41 8b 4e 10 45 8b 6e 14
    4d 29
    [6078894.710692] RSP: 0018:ffffa9cfc6657c08 EFLAGS: 00010286
    [6078894.710711] BTRFS error (device nvme0n1p2): error writing primary
    super block to device 2
    [6078894.710876] RAX: ffffffffffffffea RBX: ffffffffffffffea RCX: 32c7847906990c00
    [6078894.710882] RDX: 0000000000000000 RSI: ffffffffffffffea RDI: 32c7847906990c00
    [6078894.710882] RBP: ffffa9cfc6657d28 R08: ffffa9cfc6657cc8 R09: 32c7847906990c00
    [6078894.710882] R10: 0000000000000003 R11: ffff9a3efff6dc28 R12: ffff9a2018195000
    [6078894.710883] R13: 0000000000000001 R14: 0000000000001000 R15: ffffa9cfc6657d50
    [6078894.710884] FS: 0000000000000000(0000) GS:ffff9a3e7fb00000(0000) knlGS:0000000000000000
    [6078894.710884] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6078894.710885] CR2: 000000000000005a CR3: 0000000bfc210000 CR4: 0000000000750ee0
    [6078894.710885] PKRU: 55555554
    [6078894.710885] Call Trace:
    [6078894.710887] <TASK>
    [6078894.710891] ? page_fault_oops+0xd2/0x2b0
    [6078894.710889] ? __die_body.cold+0x1a/0x1f
    [6078894.710893] ? exc_page_fault+0x70/0x170
    [6078894.715724] ? asm_exc_page_fault+0x22/0x30
    [6078894.716084] ? btrfs_get_io_geometry+0x13/0xf0 [btrfs]
    [6078894.716470] BTRFS error (device nvme0n1p2): error writing primary
    super block to device 2
    [6078894.716462] ? btrfs_get_chunk_map.cold+0x15/0x42 [btrfs]
    [6078894.717384] __btrfs_map_block+0xc4/0xe40 [btrfs]
    [6078894.717771] ? kmem_cache_free+0x15/0x310
    [6078894.718147] btrfs_submit_bio+0xa2/0x240 [btrfs]
    [6078894.718571] btrfs_repair_one_sector+0x29f/0x3a0 [btrfs]
    [6078894.718972] ? btrfs_submit_data_write_bio+0x110/0x110 [btrfs] [6078894.719364] end_compressed_bio_read+0x118/0x2f0 [btrfs]
    [6078894.719753] process_one_work+0x1c4/0x380
    [6078894.720135] worker_thread+0x4d/0x380
    [6078894.720510] ? rescuer_thread+0x3a0/0x3a0
    [6078894.720864] kthread+0xd7/0x100
    [6078894.721224] ? kthread_complete_and_exit+0x20/0x20
    [6078894.721579] ret_from_fork+0x1f/0x30
    [6078894.721984] </TASK>

    Justin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael =?utf-8?B?S2rDtnJsaW5n?=@21:1/5 to All on Mon Jul 1 14:40:02 2024
    On 1 Jul 2024 08:23 -0400, from jpiszcz@lucidpixels.com (Justin Piszcz):
    Kernel: 6.1.0-17-amd64
    Distribution: Debian stable
    Arch: x86_64

    Your system is about half a year out of date. For Bookworm, 6.1.0-17
    (6.1.69) is from early January; 6.1.0-18 (6.1.76) is from about a week
    into February; and current is now 6.1.0-22 (6.1.94). (Upstream 6.1 is
    at 6.1.96 since a few days ago.)

    I suggest upgrading first, and seeing if the problem persists.

    --
    Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Justin Piszcz@21:1/5 to c9bc136c6063@ewoof.net on Mon Jul 1 17:50:02 2024
    Hello,

    Thanks, I've upgraded to the latest kernel version and will see if the
    issue recurs.

    $ uname -a
    Linux int 6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1
    (2024-06-21) x86_64 GNU/Linux

    Regards,
    Justin


    On Mon, Jul 1, 2024 at 8:39 AM Michael Kjörling <c9bc136c6063@ewoof.net> wrote:

    On 1 Jul 2024 08:23 -0400, from jpiszcz@lucidpixels.com (Justin Piszcz):
    Kernel: 6.1.0-17-amd64
    Distribution: Debian stable
    Arch: x86_64

    Your system is about half a year out of date. For Bookworm, 6.1.0-17
    (6.1.69) is from early January; 6.1.0-18 (6.1.76) is from about a week
    into February; and current is now 6.1.0-22 (6.1.94). (Upstream 6.1 is
    at 6.1.96 since a few days ago.)

    I suggest upgrading first, and seeing if the problem persists.

    --
    Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael =?utf-8?B?S2rDtnJsaW5n?=@21:1/5 to All on Mon Jul 1 19:10:01 2024
    On 1 Jul 2024 12:52 -0400, from jpiszcz@lucidpixels.com (Justin Piszcz):
    With the latest stable kernel (6.1.0-22), it crashed (below) shortly
    after boot (1-2hr), with the prior version (6.1.0-17) it had been
    stable other than the NVME dropping out. Will try/test with a newer
    bpo kernel or similar..

    Thank you for testing with an up-to-date kernel.

    If you are able to, consider testing with a vanilla upstream kernel
    compiled using the Debian kernel configuration settings for that same
    branch. If that crashes too, then it's not something that Debian has
    introduced but is rather either an upstream issue or something about
    your hardware.

    --
    Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Justin Piszcz@21:1/5 to jpiszcz@lucidpixels.com on Mon Jul 1 19:00:01 2024
    Hello,

    With the latest stable kernel (6.1.0-22), it crashed (below) shortly
    after boot (1-2hr), with the prior version (6.1.0-17) it had been
    stable other than the NVME dropping out. Will try/test with a newer
    bpo kernel or similar..

    7/1/2024 12:47 notice user machine-name.int [ 14.565265]
    netconsole-setup: Test log message to verify netconsole configuration.
    7/1/2024 12:47 notice user machine-name.int [ 14.565063] netconsole:
    network logging started
    7/1/2024 12:45 notice user machine-name.int [ 4105.156656] </TASK>
    7/1/2024 12:45 notice user machine-name.int [ 4105.156656] R13: 00007f2e401fafa0 R14: 00007f2e401faa10 R15: 00007f2e401faa50
    7/1/2024 12:45 notice user machine-name.int [ 4105.156655] R10: 00007ffd76755080 R11: 0000000000000293 R12: 0000000000000000
    7/1/2024 12:45 notice user machine-name.int [ 4105.156655] RBP: 00007f2e401faaa0 R08: 0000000000000005 R09: 0000000000000039
    7/1/2024 12:45 notice user machine-name.int [ 4105.156655] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000088
    7/1/2024 12:45 notice user machine-name.int [ 4105.156655] RAX: 0000000000000000 RBX: 00001ed40a513e18 RCX: 00007f2e4550b90a
    7/1/2024 12:45 notice user machine-name.int [ 4105.156654] RSP: 002b:00007f2e401fa9f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
    7/1/2024 12:45 notice user machine-name.int [ 4105.156654] Code: 48 3d
    00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c 24 0c e8
    a3 ce f8 ff 8b 7c 24 0c 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff
    77 36 89 d7 89 44 24 0c e8 03 cf f8 ff 8b 44 24
    7/1/2024 12:45 notice user machine-name.int [ 4105.156653] RIP: 0033:0x7f2e4550b90a
    7/1/2024 12:45 notice user machine-name.int [ 4105.156652] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
    7/1/2024 12:45 notice user machine-name.int [ 4105.156651] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156651] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156650] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:45 notice user machine-name.int [ 4105.156649] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156648] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156647] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:45 notice user machine-name.int [ 4105.156647] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156646] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156645] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:45 notice user machine-name.int [ 4105.156644] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156643] do_syscall_64+0x61/0xb0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156642] syscall_exit_to_user_mode+0x17/0x40
    7/1/2024 12:45 notice user machine-name.int [ 4105.156640] exit_to_user_mode_prepare+0x1e8/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156639] task_work_run+0x56/0x90
    7/1/2024 12:45 notice user machine-name.int [ 4105.156637] __fput+0xe2/0x250 7/1/2024 12:45 notice user machine-name.int [ 4105.156636] dput+0x132/0x31
    0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156634] __dentry_kill+0xdc/0x170
    7/1/2024 12:45 notice user machine-name.int [ 4105.156633] evict+0xcd/0x1d
    0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156631] ? sugov_init+0x350/0x350
    7/1/2024 12:45 notice user machine-name.int [ 4105.156611] btrfs_evict_inode+0x79/0x3c0 [btrfs]
    7/1/2024 12:45 notice user machine-name.int [ 4105.156608] truncate_inode_pages_range+0x26c/0x450
    7/1/2024 12:45 notice user machine-name.int [ 4105.156605] find_get_entries+0x70/0x180
    7/1/2024 12:45 notice user machine-name.int [ 4105.156604] xas_find+0x14d/0x1d0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156603] xas_load+0x30/0x40 7/1/2024 12:45 notice user machine-name.int [ 4105.156603] <TASK>
    7/1/2024 12:45 notice user machine-name.int [ 4105.156603] </NMI>
    7/1/2024 12:45 notice user machine-name.int [ 4105.156602] ? minmax_running_min+0xe0/0xe0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156601] ? minmax_running_min+0xe0/0xe0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156600] ? minmax_running_min+0xe0/0xe0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156598] ? end_repeat_nmi+0x16/0x67
    7/1/2024 12:45 notice user machine-name.int [ 4105.156597] ? exc_nmi+0x11e/0x150
    7/1/2024 12:45 notice user machine-name.int [ 4105.156597] ? default_do_nmi+0x40/0x130
    7/1/2024 12:45 notice user machine-name.int [ 4105.156595] ? nmi_handle+0x5a/0x120
    7/1/2024 12:45 notice user machine-name.int [ 4105.156594] ? nmi_cpu_backtrace_handler+0xd/0x20
    7/1/2024 12:45 notice user machine-name.int [ 4105.156592] ? nmi_cpu_backtrace.cold+0x1c/0x79
    7/1/2024 12:45 notice user machine-name.int [ 4105.156591] <NMI>
    7/1/2024 12:45 notice user machine-name.int [ 4105.156591] Call Trace:
    7/1/2024 12:45 notice user machine-name.int [ 4105.156590] PKRU: 55555554 7/1/2024 12:45 notice user machine-name.int [ 4105.156590] CR2: 00007f2e30720000 CR3: 00000003df93e000 CR4: 0000000000750ee0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156590] CS: 0010
    DS: 0000 ES: 0000 CR0: 0000000080050033
    7/1/2024 12:45 notice user machine-name.int [ 4105.156589] FS: 00007f2e401fd6c0(0000) GS:ffff90b13fa00000(0000)
    knlGS:0000000000000000
    7/1/2024 12:45 notice user machine-name.int [ 4105.156589] R13: ffffaf0bc7c63aa0 R14: 000000000000000c R15: ffffffffffffffff
    7/1/2024 12:45 notice user machine-name.int [ 4105.156588] R10: fffffffffffffffe R11: 00000000000015c0 R12: ffffaf0bc7c63aa0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156588] RBP: fffffffffffffffe R08: 00000000000015c0 R09: ffffffffffffffc0
    7/1/2024 12:45 notice user machine-name.int [ 4105.156588] RDX: 0000000000000002 RSI: ffff909850e1a238 RDI: ffffaf0bc7c63a18
    7/1/2024 12:45 notice user machine-name.int [ 4105.156587] RAX: ffff909850e1a23a RBX: ffffaf0bc7c63b18 RCX: 0000000000000000
    7/1/2024 12:45 notice user machine-name.int [ 4105.156587] RSP: 0018:ffffaf0bc7c63a00 EFLAGS: 00000206
    7/1/2024 12:45 notice user machine-name.int [ 4105.156586] Code: 89 4f
    0c 48 8b 57 08 48 89 57 10 c3 cc cc cc cc 48 8b 57 10 48 89 07 48 c1
    e8 20 48 89 57 08 c3 cc cc cc cc cc cc cc cc cc cc <0f> b6 0e 48 8b 57
    08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44
    7/1/2024 12:45 notice user machine-name.int [ 4105.156583] RIP: 0010:xas_descend+0x0/0x90
    7/1/2024 12:45 notice user machine-name.int [ 4105.156583] Hardware
    name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI,
    BIOS 3401 03/19/2024
    7/1/2024 12:45 notice user machine-name.int [ 4105.156582] CPU: 8 PID:
    9315 Comm: ThreadPoolForeg Tainted: G W L 6.1.0-22-amd64
    #1 Debian 6.1.94-1
    7/1/2024 12:45 notice user machine-name.int [ 4105.156580] NMI
    backtrace for cpu 8
    7/1/2024 12:45 notice user machine-name.int [ 4105.156327] Sending NMI
    from CPU 10 to CPUs 8:
    7/1/2024 12:45 notice user machine-name.int [ 4105.156061] rcu:
    blocking rcu_node structures (internal RCU debug): l=1:0-15:0x100/.
    7/1/2024 12:45 notice user machine-name.int [ 4105.155760] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 8-.... } 37197
    jiffies s: 2161 root: 0x1/.
    7/1/2024 12:45 notice user machine-name.int [ 4100.016074] </TASK>
    7/1/2024 12:45 notice user machine-name.int [ 4100.015829] R13: 00007f2e401fafa0 R14: 00007f2e401faa10 R15: 00007f2e401faa50
    7/1/2024 12:45 notice user machine-name.int [ 4100.015580] R10: 00007ffd76755080 R11: 0000000000000293 R12: 0000000000000000
    7/1/2024 12:45 notice user machine-name.int [ 4100.015327] RBP: 00007f2e401faaa0 R08: 0000000000000005 R09: 0000000000000039
    7/1/2024 12:45 notice user machine-name.int [ 4100.015074] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000088
    7/1/2024 12:45 notice user machine-name.int [ 4100.014823] RAX: 0000000000000000 RBX: 00001ed40a513e18 RCX: 00007f2e4550b90a
    7/1/2024 12:45 notice user machine-name.int [ 4100.014573] RSP: 002b:00007f2e401fa9f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
    7/1/2024 12:45 notice user machine-name.int [ 4100.014326] Code: 48 3d
    00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c 24 0c e8
    a3 ce f8 ff 8b 7c 24 0c 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff
    77 36 89 d7 89 44 24 0c e8 03 cf f8 ff 8b 44 24
    7/1/2024 12:45 notice user machine-name.int [ 4100.014097] RIP: 0033:0x7f2e4550b90a
    7/1/2024 12:45 notice user machine-name.int [ 4100.013858] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
    7/1/2024 12:45 notice user machine-name.int [ 4100.013621] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4100.013393] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:45 notice user machine-name.int [ 4100.013157] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:45 notice user machine-name.int [ 4100.012927] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4100.012689] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:45 notice user machine-name.int [ 4100.012450] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:45 notice user machine-name.int [ 4100.012211] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4100.011973] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:45 notice user machine-name.int [ 4100.011728] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:45 notice user machine-name.int [ 4100.011491] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4100.011264] do_syscall_64+0x61/0xb0
    7/1/2024 12:45 notice user machine-name.int [ 4100.011027] syscall_exit_to_user_mode+0x17/0x40
    7/1/2024 12:45 notice user machine-name.int [ 4100.010790] exit_to_user_mode_prepare+0x1e8/0x1f0
    7/1/2024 12:45 notice user machine-name.int [ 4100.010555] task_work_run+0x56/0x90
    7/1/2024 12:45 notice user machine-name.int [ 4100.010318] __fput+0xe2/0x250 7/1/2024 12:45 notice user machine-name.int [ 4100.010090] dput+0x132/0x31
    0
    7/1/2024 12:45 notice user machine-name.int [ 4100.009860] __dentry_kill+0xdc/0x170
    7/1/2024 12:45 notice user machine-name.int [ 4100.009624] evict+0xcd/0x1d
    0
    7/1/2024 12:45 notice user machine-name.int [ 4100.009394] ? sugov_init+0x350/0x350
    7/1/2024 12:45 notice user machine-name.int [ 4100.009136] btrfs_evict_inode+0x79/0x3c0 [btrfs]
    7/1/2024 12:45 notice user machine-name.int [ 4100.008898] truncate_inode_pages_range+0x26c/0x450
    7/1/2024 12:45 notice user machine-name.int [ 4100.008664] find_get_entries+0x70/0x180
    7/1/2024 12:45 notice user machine-name.int [ 4100.008423] xas_find+0x14d/0x1d0
    7/1/2024 12:45 notice user machine-name.int [ 4100.008183] xas_load+0x30/0x40 7/1/2024 12:45 notice user machine-name.int [ 4100.007950] ? xas_descend+0x18/0x90
    7/1/2024 12:45 notice user machine-name.int [ 4100.007706] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
    7/1/2024 12:45 notice user machine-name.int [ 4100.007471] <TASK>
    7/1/2024 12:45 notice user machine-name.int [ 4100.007238] </IRQ>
    7/1/2024 12:45 notice user machine-name.int [ 4100.006984] ? sysvec_apic_timer_interrupt+0x69/0x90
    7/1/2024 12:45 notice user machine-name.int [ 4100.006736] ? __sysvec_apic_timer_interrupt+0x5a/0x110
    7/1/2024 12:45 notice user machine-name.int [ 4100.006489] ? hrtimer_interrupt+0xf4/0x210
    7/1/2024 12:45 notice user machine-name.int [ 4100.006228] ? __hrtimer_run_queues+0x10f/0x2b0
    7/1/2024 12:45 notice user machine-name.int [ 4100.005969] ? lockup_detector_update_enable+0x50/0x50
    7/1/2024 12:45 notice user machine-name.int [ 4100.005716] ? watchdog_timer_fn+0x1a4/0x200
    7/1/2024 12:45 notice user machine-name.int [ 4100.005455] <IRQ>
    7/1/2024 12:45 notice user machine-name.int [ 4100.005200] Call Trace:
    7/1/2024 12:45 notice user machine-name.int [ 4100.004947] PKRU: 55555554 7/1/2024 12:45 notice user machine-name.int [ 4100.004693] CR2: 00007f2e30720000 CR3: 00000003df93e000 CR4: 0000000000750ee0
    7/1/2024 12:45 notice user machine-name.int [ 4100.004436] CS: 0010
    DS: 0000 ES: 0000 CR0: 0000000080050033
    7/1/2024 12:45 notice user machine-name.int [ 4100.004178] FS: 00007f2e401fd6c0(0000) GS:ffff90b13fa00000(0000)
    knlGS:0000000000000000
    7/1/2024 12:45 notice user machine-name.int [ 4100.003927] R13: ffffaf0bc7c63aa0 R14: 000000000000000c R15: ffffffffffffffff
    7/1/2024 12:45 notice user machine-name.int [ 4100.003653] R10: fffffffffffffffe R11: 00000000000015c0 R12: ffffaf0bc7c63aa0
    7/1/2024 12:45 notice user machine-name.int [ 4100.003396] RBP: fffffffffffffffe R08: 00000000000015c0 R09: ffffffffffffffc0
    7/1/2024 12:45 notice user machine-name.int [ 4100.003134] RDX: 0000000000000017 RSI: ffff909850e1bb50 RDI: ffffaf0bc7c63a18
    7/1/2024 12:45 notice user machine-name.int [ 4100.002872] RAX: ffff90926c3a0db2 RBX: ffffaf0bc7c63b18 RCX: 0000000000000006
    7/1/2024 12:45 notice user machine-name.int [ 4100.002613] RSP: 0018:ffffaf0bc7c63a00 EFLAGS: 00000206
    7/1/2024 12:45 notice user machine-name.int [ 4100.002353] Code: c1 e8
    20 48 89 57 08 c3 cc cc cc cc cc cc cc cc cc cc 0f b6 0e 48 8b 57 08
    48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 <48> 89 77 18 48 89
    c1 83 e1 03 48 83 f9 02 75 08 48 3d fd 00 00 00
    7/1/2024 12:45 notice user machine-name.int [ 4100.002091] RIP: 0010:xas_descend+0x18/0x90
    7/1/2024 12:45 notice user machine-name.int [ 4100.001833] Hardware
    name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI,
    BIOS 3401 03/19/2024
    7/1/2024 12:45 notice user machine-name.int [ 4100.001586] CPU: 8 PID:
    9315 Comm: ThreadPoolForeg Tainted: G W L 6.1.0-22-amd64
    #1 Debian 6.1.94-1
    7/1/2024 12:45 notice user machine-name.int [ 4100.000190] usbhid hid
    ast sr_mod i2c_algo_bit cdrom drm_vram_helper nvme drm_ttm_helper
    ixgbe nvme_core ahci ttm xhci_pci libahci t10_pi xfrm_algo xhci_hcd drm_kms_helper dca crc64_rocksoft libata crc64 mdio_devres crc_t10dif
    i2c_i801 crc32_pclmul crct10dif_generic usbcore scsi_mod drm
    intel_lpss_pci crc32c_intel igc libphy crct10dif_pclmul intel_lpss
    i2c_smbus video usb_common scsi_common crct10dif_common mdio vmd
    idma64 fan wmi pinctrl_alderlake button
    7/1/2024 12:45 notice user eric
    7/1/2024 12:45 notice user machine-name.int [ 4099.999271]
    snd_compress soundwire_bus aesni_intel snd_hda_intel crypto_simd snd_intel_dspcfg snd_intel_sdw_acpi cryptd snd_hda_codec snd_hda_core
    xfs snd_hwdep eeepc_wmi snd_pcm_oss asus_wmi battery rapl
    snd_mixer_oss iTCO_wdt platform_profile intel_pmc_bxt pmt_telemetry sparse_keymap nfnetlink_log snd_pcm mei_hdcp mei_wdt ipmi_ssif iTCO_vendor_support ledtrig_audio sd_mod intel_cstate pmt_class
    acpi_ipmi snd_timer rfkill wmi_bmof intel_uncore watchdog pcspkr
    nft_log ipmi_si mei_me snd joydev cdc_acm soundcore ipmi_devintf mei
    intel_vsec ipmi_msghandler acpi_tad intel_pmc_core acpi_pad sg evdev
    nfsd auth_rpcgss nf_tables parport_pc nfs_acl lockd ppdev grace lp
    parport nfnetlink sunrpc fuse loop dm_mod efi_pstore configfs
    ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress
    cdc_ether usbnet mii efivarfs raid10 raid456 async_raid6_recov
    async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c
    crc32c_generic raid1 raid0 multipath linear md_mod uas usb_storage
    hid_gen
    7/1/2024 12:45 notice user machine-name.int [ 4099.999244] Modules
    linked in: tls bluetooth jitterentropy_rng drbg ansi_cprng
    ecdh_generic ecc crc16 xt_nat xt_tcpudp veth xt_conntrack
    xt_MASQUERADE nf_conntrack_netlink xfrm_user xt_addrtype nft_compat br_netfilter bridge nfsv3 nfs fscache netfs tcp_bbr sch_fq tun
    netconsole nvme_fabrics overlay pps_ldisc cfg80211 8021q garp stp mrp
    llc lz4 lz4_compress zram zsmalloc binfmt_misc nls_ascii nls_cp437
    vfat fat nft_masq nft_redir intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nft_chain_nat nf_nat
    kvm_intel kvm snd_hda_codec_realtek snd_sof_pci_intel_tgl
    snd_hda_codec_generic irqbypass snd_sof_intel_hda_common
    soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci ghash_clmulni_intel snd_sof_xtensa_dsp sha512_ssse3 sha512_generic snd_sof sha256_ssse3 nft_ct sha1_ssse3 snd_sof_utils snd_soc_hdac_hda nf_conntrack snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi nf_defrag_ipv6 nf_defrag_ipv4 snd_soc_core
    7/1/2024 12:45 notice user machine-name.int [ 4099.999021] watchdog:
    BUG: soft lockup - CPU#8 stuck for 164s! [ThreadPoolForeg:9315]
    7/1/2024 12:44 notice user machine-name.int [ 4074.036689] </TASK>
    7/1/2024 12:44 notice user machine-name.int [ 4074.036506] R13: 00007f2e401fafa0 R14: 00007f2e401faa10 R15: 00007f2e401faa50
    7/1/2024 12:44 notice user machine-name.int [ 4074.036321] R10: 00007ffd76755080 R11: 0000000000000293 R12: 0000000000000000
    7/1/2024 12:44 notice user machine-name.int [ 4074.036140] RBP: 00007f2e401faaa0 R08: 0000000000000005 R09: 0000000000000039
    7/1/2024 12:44 notice user machine-name.int [ 4074.035959] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000088
    7/1/2024 12:44 notice user machine-name.int [ 4074.035778] RAX: 0000000000000000 RBX: 00001ed40a513e18 RCX: 00007f2e4550b90a
    7/1/2024 12:44 notice user machine-name.int [ 4074.035589] RSP: 002b:00007f2e401fa9f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
    7/1/2024 12:44 notice user machine-name.int [ 4074.035406] Code: 48 3d
    00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c 24 0c e8
    a3 ce f8 ff 8b 7c 24 0c 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff
    77 36 89 d7 89 44 24 0c e8 03 cf f8 ff 8b 44 24
    7/1/2024 12:44 notice user machine-name.int [ 4074.035240] RIP: 0033:0x7f2e4550b90a
    7/1/2024 12:44 notice user machine-name.int [ 4074.035062] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
    7/1/2024 12:44 notice user machine-name.int [ 4074.034893] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4074.034722] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:44 notice user machine-name.int [ 4074.034550] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:44 notice user machine-name.int [ 4074.034381] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4074.034214] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:44 notice user machine-name.int [ 4074.034042] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:44 notice user machine-name.int [ 4074.033870] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4074.033698] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:44 notice user machine-name.int [ 4074.033524] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:44 notice user machine-name.int [ 4074.033350] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4074.033181] do_syscall_64+0x61/0xb0
    7/1/2024 12:44 notice user machine-name.int [ 4074.033009] syscall_exit_to_user_mode+0x17/0x40
    7/1/2024 12:44 notice user machine-name.int [ 4074.032839] exit_to_user_mode_prepare+0x1e8/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4074.032669] task_work_run+0x56/0x90
    7/1/2024 12:44 notice user machine-name.int [ 4074.032490] __fput+0xe2/0x250 7/1/2024 12:44 notice user machine-name.int [ 4074.032311] dput+0x132/0x31
    0
    7/1/2024 12:44 notice user machine-name.int [ 4074.032137] __dentry_kill+0xdc/0x170
    7/1/2024 12:44 notice user machine-name.int [ 4074.031956] evict+0xcd/0x1d
    0
    7/1/2024 12:44 notice user machine-name.int [ 4074.031763] ? sugov_init+0x350/0x350
    7/1/2024 12:44 notice user machine-name.int [ 4074.031547] btrfs_evict_inode+0x79/0x3c0 [btrfs]
    7/1/2024 12:44 notice user machine-name.int [ 4074.031355] truncate_inode_pages_range+0x26c/0x450
    7/1/2024 12:44 notice user machine-name.int [ 4074.031162] find_get_entries+0x70/0x180
    7/1/2024 12:44 notice user machine-name.int [ 4074.030956] xas_find+0x14d/0x1d0
    7/1/2024 12:44 notice user machine-name.int [ 4074.030749] xas_load+0x30/0x40 7/1/2024 12:44 notice user machine-name.int [ 4074.030525] ? minmax_running_min+0xe0/0xe0
    7/1/2024 12:44 notice user machine-name.int [ 4074.030301] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
    7/1/2024 12:44 notice user machine-name.int [ 4074.030074] <TASK>
    7/1/2024 12:44 notice user machine-name.int [ 4074.029840] </IRQ>
    7/1/2024 12:44 notice user machine-name.int [ 4074.029588] ? sysvec_apic_timer_interrupt+0x69/0x90
    7/1/2024 12:44 notice user machine-name.int [ 4074.029337] ? __sysvec_apic_timer_interrupt+0x5a/0x110
    7/1/2024 12:44 notice user machine-name.int [ 4074.029078] ? hrtimer_interrupt+0xf4/0x210
    7/1/2024 12:44 notice user machine-name.int [ 4074.028813] ? __hrtimer_run_queues+0x10f/0x2b0
    7/1/2024 12:44 notice user machine-name.int [ 4074.028544] ? tick_sched_do_timer+0xa0/0xa0
    7/1/2024 12:44 notice user machine-name.int [ 4074.028241] ? tick_sched_timer+0x63/0x80
    7/1/2024 12:44 notice user machine-name.int [ 4074.027972] ? tick_sched_handle+0x22/0x60
    7/1/2024 12:44 notice user machine-name.int [ 4074.027693] ? update_process_times+0x70/0xb0
    7/1/2024 12:44 notice user machine-name.int [ 4074.027423] ? timekeeping_advance+0x377/0x570
    7/1/2024 12:44 notice user machine-name.int [ 4074.027153] ? timekeeping_update+0xdd/0x130
    7/1/2024 12:44 notice user machine-name.int [ 4074.026882] ? raw_notifier_call_chain+0x41/0x60
    7/1/2024 12:44 notice user machine-name.int [ 4074.026599] ? sched_slice+0x87/0x140
    7/1/2024 12:44 notice user machine-name.int [ 4074.026320] ? update_load_avg+0x7e/0x780
    7/1/2024 12:44 notice user machine-name.int [ 4074.026030] ? rcu_sched_clock_irq.cold+0xe8/0x459
    7/1/2024 12:44 notice user machine-name.int [ 4074.025746] ? rcu_dump_cpu_stacks+0xa4/0xe0
    7/1/2024 12:44 notice user machine-name.int [ 4074.025466] <IRQ>
    7/1/2024 12:44 notice user machine-name.int [ 4074.025174] Call Trace:
    7/1/2024 12:44 notice user machine-name.int [ 4074.024891] PKRU: 55555554 7/1/2024 12:44 notice user machine-name.int [ 4074.024598] CR2: 00007f2e30720000 CR3: 00000003df93e000 CR4: 0000000000750ee0
    7/1/2024 12:44 notice user machine-name.int [ 4074.024311] CS: 0010
    DS: 0000 ES: 0000 CR0: 0000000080050033
    7/1/2024 12:44 notice user machine-name.int [ 4074.024019] FS: 00007f2e401fd6c0(0000) GS:ffff90b13fa00000(0000)
    knlGS:0000000000000000
    7/1/2024 12:44 notice user machine-name.int [ 4074.023738] R13: ffffaf0bc7c63aa0 R14: 000000000000000c R15: ffffffffffffffff
    7/1/2024 12:44 notice user machine-name.int [ 4074.023460] R10: fffffffffffffffe R11: 00000000000015c0 R12: ffffaf0bc7c63aa0
    7/1/2024 12:44 notice user machine-name.int [ 4074.023182] RBP: fffffffffffffffe R08: 00000000000015c0 R09: ffffffffffffffc0
    7/1/2024 12:44 notice user machine-name.int [ 4074.022904] RDX: 0000000000000002 RSI: ffff90926c3a0db0 RDI: ffffaf0bc7c63a18
    7/1/2024 12:44 notice user machine-name.int [ 4074.022624] RAX: ffff90926c3a0db2 RBX: ffffaf0bc7c63b18 RCX: 0000000000000000
    7/1/2024 12:44 notice user machine-name.int [ 4074.022346] RSP: 0018:ffffaf0bc7c63a00 EFLAGS: 00000246
    7/1/2024 12:44 notice user machine-name.int [ 4074.022069] Code: 89 4f
    0c 48 8b 57 08 48 89 57 10 c3 cc cc cc cc 48 8b 57 10 48 89 07 48 c1
    e8 20 48 89 57 08 c3 cc cc cc cc cc cc cc cc cc cc <0f> b6 0e 48 8b 57
    08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44
    7/1/2024 12:44 notice user machine-name.int [ 4074.021802] RIP: 0010:xas_descend+0x0/0x90
    7/1/2024 12:44 notice user machine-name.int [ 4074.021530] Hardware
    name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI,
    BIOS 3401 03/19/2024
    7/1/2024 12:44 notice user machine-name.int [ 4074.021267] CPU: 8 PID:
    9315 Comm: ThreadPoolForeg Tainted: G W L 6.1.0-22-amd64
    #1 Debian 6.1.94-1
    7/1/2024 12:44 notice user machine-name.int [ 4074.021006]
    #011(t=36756 jiffies g=463081 q=1184920 ncpus=32)
    7/1/2024 12:44 notice user machine-name.int [ 4074.020693] rcu:
    #0118-....: (36731 ticks this GP) idle=fd64/1/0x4000000000000000 softirq=280582/280584 fqs=18362
    7/1/2024 12:44 notice user machine-name.int [ 4074.020355] rcu: INFO: rcu_preempt self-detected stall on CPU
    7/1/2024 12:44 notice user machine-name.int [ 4068.017200] </TASK>
    7/1/2024 12:44 notice user machine-name.int [ 4068.016939] R13: 00007f2e401fafa0 R14: 00007f2e401faa10 R15: 00007f2e401faa50
    7/1/2024 12:44 notice user machine-name.int [ 4068.016679] R10: 00007ffd76755080 R11: 0000000000000293 R12: 0000000000000000
    7/1/2024 12:44 notice user machine-name.int [ 4068.016422] RBP: 00007f2e401faaa0 R08: 0000000000000005 R09: 0000000000000039
    7/1/2024 12:44 notice user machine-name.int [ 4068.016161] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000088
    7/1/2024 12:44 notice user machine-name.int [ 4068.015911] RAX: 0000000000000000 RBX: 00001ed40a513e18 RCX: 00007f2e4550b90a
    7/1/2024 12:44 notice user machine-name.int [ 4068.015658] RSP: 002b:00007f2e401fa9f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
    7/1/2024 12:44 notice user machine-name.int [ 4068.015413] Code: 48 3d
    00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c 24 0c e8
    a3 ce f8 ff 8b 7c 24 0c 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff
    77 36 89 d7 89 44 24 0c e8 03 cf f8 ff 8b 44 24
    7/1/2024 12:44 notice user machine-name.int [ 4068.015177] RIP: 0033:0x7f2e4550b90a
    7/1/2024 12:44 notice user machine-name.int [ 4068.014932] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
    7/1/2024 12:44 notice user machine-name.int [ 4068.014689] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4068.014455] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:44 notice user machine-name.int [ 4068.014212] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:44 notice user machine-name.int [ 4068.013970] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4068.013727] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:44 notice user machine-name.int [ 4068.013485] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:44 notice user machine-name.int [ 4068.013243] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4068.013001] ? do_syscall_64+0x61/0xb0
    7/1/2024 12:44 notice user machine-name.int [ 4068.012764] ? syscall_exit_to_user_mode+0x1e/0x40
    7/1/2024 12:44 notice user machine-name.int [ 4068.012500] ? exit_to_user_mode_prepare+0x44/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4068.012255] do_syscall_64+0x61/0xb0
    7/1/2024 12:44 notice user machine-name.int [ 4068.012010] syscall_exit_to_user_mode+0x17/0x40
    7/1/2024 12:44 notice user machine-name.int [ 4068.011765] exit_to_user_mode_prepare+0x1e8/0x1f0
    7/1/2024 12:44 notice user machine-name.int [ 4068.011527] task_work_run+0x56/0x90
    7/1/2024 12:44 notice user machine-name.int [ 4068.011280] __fput+0xe2/0x250 7/1/2024 12:44 notice user machine-name.int [ 4068.011042] dput+0x132/0x31
    0
    7/1/2024 12:44 notice user machine-name.int [ 4068.010798] __dentry_kill+0xdc/0x170
    7/1/2024 12:44 notice user machine-name.int [ 4068.010537] evict+0xcd/0x1d
    0
    7/1/2024 12:44 notice user machine-name.int [ 4068.010275] ? sugov_init+0x350/0x350
    7/1/2024 12:44 notice user machine-name.int [ 4068.010000] btrfs_evict_inode+0x79/0x3c0 [btrfs]
    7/1/2024 12:44 notice user machine-name.int [ 4068.009732] truncate_inode_pages_range+0x26c/0x450
    7/1/2024 12:44 notice user machine-name.int [ 4068.009475] find_get_entries+0x70/0x180
    7/1/2024 12:44 notice user machine-name.int [ 4068.009218] ? xas_find+0x14d/0x1d0
    7/1/2024 12:44 notice user machine-name.int [ 4068.008957] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
    7/1/2024 12:44 notice user machine-name.int [ 4068.008698] <TASK>
    7/1/2024 12:44 notice user machine-name.int [ 4068.008427] </IRQ>
    7/1/2024 12:44 notice user machine-name.int [ 4068.008163] ? sysvec_apic_timer_interrupt+0x69/0x90
    7/1/2024 12:44 notice user machine-name.int [ 4068.007899] ? __sysvec_apic_timer_interrupt+0x5a/0x110
    7/1/2024 12:44 notice user machine-name.int [ 4068.007637] ? hrtimer_interrupt+0xf4/0x210
    7/1/2024 12:44 notice user machine-name.int [ 4068.007366] ? __hrtimer_run_queues+0x10f/0x2b0
    7/1/2024 12:44 notice user machine-name.int [ 4068.007102] ? lockup_detector_update_enable+0x50/0x50
    7/1/2024 12:44 notice user machine-name.int [ 4068.006832] ? watchdog_timer_fn+0x1a4/0x200
    7/1/2024 12:44 notice user machine-name.int [ 4068.006559] <IRQ>
    7/1/2024 12:44 notice user machine-name.int [ 4068.006281] Call Trace:
    7/1/2024 12:44 notice user machine-name.int [ 4068.006005] PKRU: 55555554 7/1/2024 12:44 notice user machine-name.int [ 4068.005719] CR2: 00007f2e30720000 CR3: 00000003df93e000 CR4: 0000000000750ee0
    7/1/2024 12:44 notice user machine-name.int [ 4068.005437] CS: 0010
    DS: 0000 ES: 0000 CR0: 0000000080050033
    7/1/2024 12:44 notice user machine-name.int [ 4068.005163] FS: 00007f2e401fd6c0(0000) GS:ffff90b13fa00000(0000)
    knlGS:0000000000000000
    7/1/2024 12:44 notice user machine-name.int [ 4068.004890] R13: ffffaf0bc7c63aa0 R14: 000000000000000c R15: ffffffffffffffff
    7/1/2024 12:44 notice user machine-name.int [ 4068.004615] R10: fffffffffffffffe R11: 00000000000015c0 R12: ffffaf0bc7c63aa0
    7/1/2024 12:44 notice user machine-name.int [ 4068.004338] RBP: fffffffffffffffe R08: 00000000000015c0 R09: ffffffffffffffc0
    7/1/2024 12:44 notice user machine-name.int [ 4068.004067] RDX: 0000000000000000 RSI: ffff90926c3a0db0 RDI: ffffaf0bc7c63a18
    7/1/2024 12:44 notice user machine-name.int [ 4068.003787] RAX: ffffe42ec7fe7400 RBX: ffffaf0bc7c63b18 RCX: 0000000000000000
    7/1/2024 12:44 notice user machine-name.int [ 4068.003517] RSP: 0018:ffffaf0bc7c63a10 EFLAGS: 00000246
    7/1/2024 12:44 notice user machine-name.int [ 4068.003249] Code: cc cc
    cc cc 4c 89 d9 0f b6 d0 83 e1 3f 48 39 ca 0f 84 15 ff ff ff 41 8d 43
    ff 83 e0 3f 83 c0 01 e9 06 ff ff ff e8 c3 f0 ff ff <48> 85 c0 0f 85 5a
    ff ff ff 48 8b 77 18 40 f6 c6 03 75 c0 48 85 f6
    7/1/2024 12:44 notice user machine-name.int [ 4068.002986] RIP: 0010:xas_find+0x14d/0x1d0
    7/1/2024 12:44 notice user machine-name.int [ 4068.002716] Hardware
    name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI,
    BIOS 3401 03/19/2024
    7/1/2024 12:44 notice user machine-name.int [ 4068.002459] CPU: 8 PID:
    9315 Comm: ThreadPoolForeg Tainted: G W L 6.1.0-22-amd64
    #1 Debian 6.1.94-1
    7/1/2024 12:44 notice user machine-name.int [ 4068.001011] usbhid hid
    ast sr_mod i2c_algo_bit cdrom drm_vram_helper nvme drm_ttm_helper
    ixgbe nvme_core ahci ttm xhci_pci libahci t10_pi xfrm_algo xhci_hcd drm_kms_helper dca crc64_rocksoft libata crc64 mdio_devres crc_t10dif
    i2c_i801 crc32_pclmul crct10dif_generic usbcore scsi_mod drm
    intel_lpss_pci crc32c_intel igc libphy crct10dif_pclmul intel_lpss
    i2c_smbus video usb_common scsi_common crct10dif_common mdio vmd
    idma64 fan wmi pinctrl_alderlake button
    7/1/2024 12:44 notice user eric
    7/1/2024 12:44 notice user machine-name.int [ 4068.000093]
    snd_compress soundwire_bus aesni_intel snd_hda_intel crypto_simd snd_intel_dspcfg snd_intel_sdw_acpi cryptd snd_hda_codec snd_hda_core
    xfs snd_hwdep eeepc_wmi snd_pcm_oss asus_wmi battery rapl
    snd_mixer_oss iTCO_wdt platform_profile intel_pmc_bxt pmt_telemetry sparse_keymap nfnetlink_log snd_pcm mei_hdcp mei_wdt ipmi_ssif iTCO_vendor_support ledtrig_audio sd_mod intel_cstate pmt_class
    acpi_ipmi snd_timer rfkill wmi_bmof intel_uncore watchdog pcspkr
    nft_log ipmi_si mei_me snd joydev cdc_acm soundcore ipmi_devintf mei
    intel_vsec ipmi_msghandler acpi_tad intel_pmc_core acpi_pad sg evdev
    nfsd auth_rpcgss nf_tables parport_pc nfs_acl lockd ppdev grace lp
    parport nfnetlink sunrpc fuse loop dm_mod efi_pstore configfs
    ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress
    cdc_ether usbnet mii efivarfs raid10 raid456 async_raid6_recov
    async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c
    crc32c_generic raid1 raid0 multipath linear md_mod uas usb_storage
    hid_gen
    7/1/2024 12:44 notice user machine-name.int [ 4068.000065] Modules
    linked in: tls bluetooth jitterentropy_rng drbg ansi_cprng
    ecdh_generic ecc crc16 xt_nat xt_tcpudp veth xt_conntrack
    xt_MASQUERADE nf_conntrack_netlink xfrm_user xt_addrtype nft_compat br_netfilter bridge nfsv3 nfs fscache netfs tcp_bbr sch_fq tun
    netconsole nvme_fabrics overlay pps_ldisc cfg80211 8021q garp stp mrp

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Justin Piszcz@21:1/5 to cyprussocialite@gmail.com on Wed Jul 3 17:30:02 2024
    On Tue, Jul 2, 2024 at 5:17 AM Dmitrii Odintcov
    <cyprussocialite@gmail.com> wrote:

    Hi,


    Just had it happen to me again - on boot this time - with the
    recommended `nvme_core.default_ps_max_latency_us=0 pcie_aspm=off`.

    Worth noting that it's only happening with one of two SSDs I have
    installed - the other being Samsung SSD 980 PRO 2TB - and that they've
    been working fine for several years until this started happening
    sometime last month.

    One thing I noticed is when I boot without any special kernel
    parameters I was seeing the following, I've read that this error can
    be due to a buggy BIOS:

    $ lspci | grep -i 1b.4
    00:1b.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express
    Root Port (rev 11)

    Jul 2 14:09:00 x kernel: [ 1213.547609] pcieport 0000:00:1b.4: AER:
    Multiple Corrected error message received from 0000:00:1b.4
    Jul 2 14:09:00 x kernel: [ 1213.547843] pcieport 0000:00:1b.4: PCIe
    Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Jul 2 14:09:00 x kernel: [ 1213.548011] pcieport 0000:00:1b.4:
    device [8086:7ac4] error status/mask=00000040/00002000
    Jul 2 14:09:00 x kernel: [ 1213.548182] pcieport 0000:00:1b.4: [ 6] BadTLP Jul 2 14:09:43 x kernel: [ 1256.906830] pcieport 0000:00:1b.4: AER:
    Multiple Corrected error message received from 0000:00:1b.4
    Jul 2 14:09:43 x kernel: [ 1256.907169] pcieport 0000:00:1b.4: PCIe
    Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Jul 2 14:09:43 x kernel: [ 1256.907395] pcieport 0000:00:1b.4:
    device [8086:7ac4] error status/mask=00000040/00002000
    Jul 2 14:09:43 x kernel: [ 1256.907648] pcieport 0000:00:1b.4: [ 6] BadTLP

    I am now testing with pci=nommconf, so far it has been 20 hours and no
    crash yet nor Corrected errors as I showed above, but I need to give
    it more time. I am curious if there is any improvement on your side
    if only testing with pci=nommconf ?

    Justin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)