• Bug#1106268: linux-image-6.12.29-amd64: amdgpu DMCUB errors and lockups

    From Alper Nebi Yasak@1:229/2 to Ozgur Karatas on Thu May 22 12:50:01 2025
    XPost: linux.debian.bugs.dist
    From: alpernebiyasak@gmail.com

    On 2025-05-22 13:06 +03:00, Ozgur Karatas wrote:
    Alper Nebi Yasak <alpernebiyasak@gmail.com>, 22 May 2025 Per, 12:57
    tarihinde şunu yazdı:

    Package: src:linux
    Version: 6.12.29-1
    Severity: important
    Tags: upstream
    Forwarded: https://gitlab.freedesktop.org/drm/amd/-/issues/4238

    Hello,

    this is a hardware or firmware error for the GPU so driver
    incompatibility and kernel not responding errors and resulting in a
    crash.
    is this latest Debian version? did you also install an AMD driver?

    This is on Debian unstable with linux-image-6.12.29-amd64 and firmware-amd-graphics (=20250410-2) packages from Debian, without
    proprietary drivers from AMD. I'm using a Radeon 6800 XT in case knowing
    that helps.

    My PC consistently freezes with Linux v6.12.29 in a few minutes after
    boot, but it works fine on v6.12.27 and v6.14.6. I've seen new errors
    from the amdgpu module in the kernel dmesg.

    [ 3249.244690] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX partially written
    [ 3249.244692] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX reply command not ACK: 0x01.
    [ 3249.246142] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX partially written
    [ 3249.246144] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX reply command not ACK: 0x01.
    [ 3273.453141] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3273.453163] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3310.955288] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3310.959676] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3311.183514] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3311.407523] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3314.373183] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 2302686514 wd_nsec: 2302685676
    [ 3315.677400] amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000028 SMN_C2PMSG_82:0x00000000
    [ 3315.677407] amdgpu 0000:0d:00.0: amdgpu: Failed to enable gfxoff!
    [ 3320.662787] amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000028 SMN_C2PMSG_82:0x00000000
    [ 3320.662794] amdgpu 0000:0d:00.0: amdgpu: Failed to enable gfxoff!
    [ 3320.806277] amdgpu 0000:0d:00.0: amdgpu: Dumping IP State
    [ 3325.715100] amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000028 SMN_C2PMSG_82:0x00000000
    [ 3325.715107] amdgpu 0000:0d:00.0: amdgpu: Failed to enable gfxoff!
    [ 3325.720230] amdgpu 0000:0d:00.0: amdgpu: Dumping IP State Completed
    [ 3325.720283] amdgpu 0000:0d:00.0: amdgpu: ring vcn_dec_0 timeout, signaled seq=367410, emitted seq=367411
    [ 3325.720287] amdgpu 0000:0d:00.0: amdgpu: Process information: process RDD Process pid 4872 thread firefox:cs0 pid 5309
    [ 3325.720290] amdgpu 0000:0d:00.0: amdgpu: GPU reset begin!
    [...]
    [ 3335.748049] watchdog: Watchdog detected hard LOCKUP on cpu 23

    Apparently there are already upstream reports for those about it, where
    people tracked it to commit cfb2d41831ee ("drm/amd/display: more liberal
    vmin/vmax update for freesync"). Setting my display to a fixed 60Hz
    appears to help, the messages above are shortly after switching back to
    144Hz (which would enable freesync) after running about an hour at 60Hz.

    Apparently the commit is in v6.15-rc6/7 as well, in case you're thinking
    of an experimental upload any time soon.

    [...]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Alper Nebi Yasak@1:229/2 to All on Thu May 22 12:00:01 2025
    XPost: linux.debian.bugs.dist
    From: alpernebiyasak@gmail.com

    This is a multi-part message in MIME format.
    Package: src:linux
    Version: 6.12.29-1
    Severity: important
    Tags: upstream
    Forwarded: https://gitlab.freedesktop.org/drm/amd/-/issues/4238

    Hi,

    My PC consistently freezes with Linux v6.12.29 in a few minutes after
    boot, but it works fine on v6.12.27 and v6.14.6. I've seen new errors
    from the amdgpu module in the kernel dmesg.

    [ 3249.244690] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX partially written [ 3249.244692] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX reply command not ACK: 0x01.
    [ 3249.246142] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX partially written [ 3249.246144] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX reply command not ACK: 0x01.
    [ 3273.453141] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3273.453163] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3310.955288] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3310.959676] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3311.183514] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3311.407523] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3314.373183] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 2302686514 wd_nsec: 2302685676
    [ 3315.677400] amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000028 SMN_C2PMSG_82:0x00000000
    [ 3315.677407] amdgpu 0000:0d:00.0: amdgpu: Failed to enable gfxoff!
    [ 3320.662787] amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000028 SMN_C2PMSG_82:0x00000000
    [ 3320.662794] amdgpu 0000:0d:00.0: amdgpu: Failed to enable gfxoff!
    [ 3320.806277] amdgpu 0000:0d:00.0: amdgpu: Dumping IP State
    [ 3325.715100] amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000028 SMN_C2PMSG_82:0x00000000
    [ 3325.715107] amdgpu 0000:0d:00.0: amdgpu: Failed to enable gfxoff!
    [ 3325.720230] amdgpu 0000:0d:00.0: amdgpu: Dumping IP State Completed
    [ 3325.720283] amdgpu 0000:0d:00.0: amdgpu: ring vcn_dec_0 timeout, signaled seq=367410, emitted seq=367411
    [ 3325.720287] amdgpu 0000:0d:00.0: amdgpu: Process information: process RDD Process pid 4872 thread firefox:cs0 pid 5309
    [ 3325.720290] amdgpu 0000:0d:00.0: amdgpu: GPU reset begin!
    [...]
    [ 3335.748049] watchdog: Watchdog detected hard LOCKUP on cpu 23

    Apparently there are already upstream reports for those about it, where
    people tracked it to commit cfb2d41831ee ("drm/amd/display: more liberal vmin/vmax update for freesync"). Setting my display to a fixed 60Hz
    appears to help, the messages above are shortly after switching back to
    144Hz (which would enable freesync) after running about an hour at 60Hz.

    Apparently the commit is in v6.15-rc6/7 as well, in case you're thinking
    of an experimental upload any time soon.

    Thanks,
    Alper


    -- Package-specific info:
    ** Kernel log:
    See dmesg.txt.gz attachment

    ** Model information
    sys_vendor: ASUS
    product_name: System Product Name
    product_version: System Version
    chassis_vendor: Default string
    chassis_version: Default string
    bios_vendor: American Megatrends Inc.
    bios_version: 3301
    board_vendor: ASUSTeK COMPUTER INC.
    board_name: TUF GAMING B550M-E WIFI
    board_version: Rev X.0x

    ** Configuration for modprobe:
    blacklist microcode
    blacklist vmw_vsock_vmci_transport
    blacklist arkfb
    blacklist aty128fb
    blacklist atyfb
    blacklist radeonfb
    blacklist cirrusfb
    blacklist cyber2000fb
    blacklist kyrofb
    blacklist matroxfb_base
    blacklist mb862xxfb
    blacklist neofb
    blacklist pm2fb
    blacklist pm3fb
    blacklist s3fb
    blacklist savagefb
    blacklist sisfb
    blacklist tdfxfb
    blacklist tridentfb
    blacklist vt8623fb
    blacklist microcode
    blacklist pcspkr
    options snd_pcsp index=-2
    options cx88_alsa index=-2
    options snd_atiixp_modem index=-2
    options snd_intel8x0m index=-2
    options snd_via82xx_modem index=-2
    options md_mod start_ro=1
    options snd_dummy pcm_devs=4
    options bonding max_bonds=0
    options dummy numdummies=0
    options ifb numifbs=0
    options ipv6 disable=1

    ** Network interface configuration:
    *** /etc/network/interfaces:

    source /etc/network/interfaces.d/*

    auto lo
    iface lo inet loopback

    ** PCI devices:
    00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480]
    Subsystem: ASUSTeK Computer Inc. Device [1043:8808]
    Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

    00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]
    Subsystem: ASUSTeK Computer Inc. Device [1043:8808]

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Ozgur Karatas@1:229/2 to All on Thu May 22 12:20:01 2025
    XPost: linux.debian.bugs.dist
    From: o@zgur.org

    Alper Nebi Yasak <alpernebiyasak@gmail.com>, 22 May 2025 Per, 12:57
    tarihinde şunu yazdı:

    Package: src:linux
    Version: 6.12.29-1
    Severity: important
    Tags: upstream
    Forwarded: https://gitlab.freedesktop.org/drm/amd/-/issues/4238

    Hi,


    Hello,

    this is a hardware or firmware error for the GPU so driver
    incompatibility and kernel not responding errors and resulting in a
    crash.
    is this latest Debian version? did you also install an AMD driver?

    My PC consistently freezes with Linux v6.12.29 in a few minutes after
    boot, but it works fine on v6.12.27 and v6.14.6. I've seen new errors
    from the amdgpu module in the kernel dmesg.

    [ 3249.244690] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX partially written
    [ 3249.244692] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX reply command not ACK: 0x01.
    [ 3249.246142] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX partially written
    [ 3249.246144] amdgpu 0000:0d:00.0: amdgpu: [drm] amdgpu: AUX reply command not ACK: 0x01.
    [ 3273.453141] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3273.453163] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3310.955288] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3310.959676] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3311.183514] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3311.407523] amdgpu 0000:0d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
    [ 3314.373183] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 2302686514 wd_nsec: 2302685676
    [ 3315.677400] amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000028 SMN_C2PMSG_82:0x00000000
    [ 3315.677407] amdgpu 0000:0d:00.0: amdgpu: Failed to enable gfxoff!
    [ 3320.662787] amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000028 SMN_C2PMSG_82:0x00000000
    [ 3320.662794] amdgpu 0000:0d:00.0: amdgpu: Failed to enable gfxoff!
    [ 3320.806277] amdgpu 0000:0d:00.0: amdgpu: Dumping IP State
    [ 3325.715100] amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000028 SMN_C2PMSG_82:0x00000000
    [ 3325.715107] amdgpu 0000:0d:00.0: amdgpu: Failed to enable gfxoff!
    [ 3325.720230] amdgpu 0000:0d:00.0: amdgpu: Dumping IP State Completed
    [ 3325.720283] amdgpu 0000:0d:00.0: amdgpu: ring vcn_dec_0 timeout, signaled seq=367410, emitted seq=367411
    [ 3325.720287] amdgpu 0000:0d:00.0: amdgpu: Process information: process RDD Process pid 4872 thread firefox:cs0 pid 5309
    [ 3325.720290] amdgpu 0000:0d:00.0: amdgpu: GPU reset begin!
    [...]
    [ 3335.748049] watchdog: Watchdog detected hard LOCKUP on cpu 23

    Apparently there are already upstream reports for those about it, where people tracked it to commit cfb2d41831ee ("drm/amd/display: more liberal vmin/vmax update for freesync"). Setting my display to a fixed 60Hz
    appears to help, the messages above are shortly after switching back to
    144Hz (which would enable freesync) after running about an hour at 60Hz.

    Apparently the commit is in v6.15-rc6/7 as well, in case you're thinking
    of an experimental upload any time soon.

    Thanks,
    Alper


    -- Package-specific info:
    ** Kernel log:
    See dmesg.txt.gz attachment

    ** Model information
    sys_vendor: ASUS
    product_name: System Product Name
    product_version: System Version
    chassis_vendor: Default string
    chassis_version: Default string
    bios_vendor: American Megatrends Inc.
    bios_version: 3301
    board_vendor: ASUSTeK COMPUTER INC.
    board_name: TUF GAMING B550M-E WIFI
    board_version: Rev X.0x

    ** Configuration for modprobe:
    blacklist microcode
    blacklist vmw_vsock_vmci_transport
    blacklist arkfb
    blacklist aty128fb
    blacklist atyfb
    blacklist radeonfb
    blacklist cirrusfb
    blacklist cyber2000fb
    blacklist kyrofb
    blacklist matroxfb_base
    blacklist mb862xxfb
    blacklist neofb
    blacklist pm2fb
    blacklist pm3fb
    blacklist s3fb
    blacklist savagefb
    blacklist sisfb
    blacklist tdfxfb
    blacklist tridentfb
    blacklist vt8623fb
    blacklist microcode
    blacklist pcspkr
    options snd_pcsp index=-2
    options cx88_alsa index=-2
    options snd_atiixp_modem index=-2
    options snd_intel8x0m index=-2
    options snd_via82xx_modem index=-2
    options md_mod start_ro=1
    options snd_dummy pcm_devs=4
    options bonding max_bonds=0
    options dummy numdummies=0
    options ifb numifbs=0
    options ipv6 disable=1

    ** Network interface configuration:
    *** /etc/network/interfaces:

    source /etc/network/interfaces.d/*

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Axel Regnat@1:229/2 to All on Sat Jun 7 14:40:02 2025
    XPost: linux.debian.bugs.dist
    From: axel.regnat@posteo.de

    Today 6.12.30 landed in testing and is affected by this. So after a few
    minutes my system freezes and I can only use Magic SysRQ to reboot.

    6.12.27 works fine so I'll have to stick for this kernel till this is fixed.

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)