• Bug#1106743: amdgpu: random brief system hangs with drm DMCUB error (1/

    From Forest@1:229/2 to All on Thu May 29 03:10:01 2025
    XPost: linux.debian.bugs.dist
    From: forestix@nom.one

    Package: src:linux
    Version: 6.12.29-1
    Severity: important
    X-Debbugs-Cc: forestix@nom.one

    Dear Maintainer,

    Since upgrading from kernel 6.12.27 to 6.12.29, my system occasionally
    hangs (evidenced by the mouse pointer freezing in place) for roughly
    0.3 - 2 seconds at a time.

    These hangs are accompanied by the following line in dmesg, often
    repeated several times:

    amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

    I believe this upstream report is the problem I am experiencing:

    https://gitlab.freedesktop.org/drm/amd/-/issues/4238

    The problem vanishes when I use a custom build of kernel 6.12.29 with
    commit ID 468034a06a6e8043c5b50f9cd0cac730a6e497b5 reverted. Upstream
    commit ID: f1c6be3999d2be2673a51a9be0caf9348e254e52

    https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=468034a06a6e8043c5b50f9cd0cac730a6e497b5

    -- Package-specific info:
    ** Version:
    Linux version 6.12.29-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.29-1 (2025-05-18)

    ** Command line:
    BOOT_IMAGE=/boot/vmlinuz-6.12.29-amd64 root=UUID=<my-root-uuid> ro quiet cryptdevice=<my-device-id> root=/dev/mapper/<my-device-name> splash

    ** Not tainted

    ** Kernel log:
    [ 9.176996] kvm_amd: Virtual GIF supported
    [ 9.176997] kvm_amd: Virtual NMI enabled
    [ 9.218072] MCE: In-kernel MCE decoding enabled.
    [ 9.220383] EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT)
    [ 9.220386] EDAC amd64: F19h_M60h detected (node 0).
    [ 9.220387] EDAC MC: UMC0 chip selects:
    [ 9.220388] EDAC amd64: MC: 0: 0MB 1: 0MB
    [ 9.220390] EDAC amd64: MC: 2: 16384MB 3: 0MB
    [ 9.220391] EDAC MC: UMC1 chip selects:
    [ 9.220392] EDAC amd64: MC: 0: 0MB 1: 0MB
    [ 9.220393] EDAC amd64: MC: 2: 16384MB 3: 0MB
    [ 9.246759] intel_rapl_common: Found RAPL domain package
    [ 9.246761] intel_rapl_common: Found RAPL domain core
    [ 9.246761] amd_atl: AMD Address Translation Library initialized
    [ 9.252209] usbcore: registered new interface driver btusb
    [ 9.254299] snd_hda_intel 0000:03:00.1: enabling device (0000 -> 0002)
    [ 9.254378] snd_hda_intel 0000:03:00.1: Handle vga_switcheroo audio client
    [ 9.254380] snd_hda_intel 0000:03:00.1: Force to non-snoop mode
    [ 9.254628] snd_hda_intel 0000:0e:00.1: enabling device (0000 -> 0002)
    [ 9.254660] snd_hda_intel 0000:0e:00.1: Handle vga_switcheroo audio client
    [ 9.254691] snd_hda_intel 0000:0e:00.6: enabling device (0000 -> 0002)
    [ 9.262640] snd_hda_intel 0000:0e:00.1: bound 0000:0e:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
    [ 9.263406] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
    [ 9.264995] input: HD-Audio Generic HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:08.1/0000:0e:00.1/sound/card1/input13
    [ 9.265027] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.1/sound/card0/input15
    [ 9.265069] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.1/sound/card0/input16
    [ 9.265097] input: HD-Audio Generic HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:08.1/0000:0e:00.1/sound/card1/input14
    [ 9.265178] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.1/sound/card0/input17
    [ 9.265280] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.1/sound/card0/input18
    [ 9.266471] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20241106151414
    [ 9.268902] mt7921e 0000:0b:00.0: enabling device (0000 -> 0002)
    [ 9.274449] mt7921e 0000:0b:00.0: ASIC revision: 79610010
    [ 9.275674] snd_hda_codec_realtek hdaudioC2D0: ALCS1200A: SKU not ready 0x00000000
    [ 9.276132] snd_hda_codec_realtek hdaudioC2D0: autoconfig for ALCS1200A: line_outs=3 (0x14/0x15/0x16/0x0/0x0) type:line
    [ 9.276135] snd_hda_codec_realtek hdaudioC2D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
    [ 9.276136] snd_hda_codec_realtek hdaudioC2D0: hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
    [ 9.276138] snd_hda_codec_realtek hdaudioC2D0: mono: mono_out=0x0
    [ 9.276139] snd_hda_codec_realtek hdaudioC2D0: inputs:
    [ 9.276140] snd_hda_codec_realtek hdaudioC2D0: Rear Mic=0x18
    [ 9.276141] snd_hda_codec_realtek hdaudioC2D0: Front Mic=0x19
    [ 9.276142] snd_hda_codec_realtek hdaudioC2D0: Line=0x1a
    [ 9.288600] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:08.1/0000:0e:00.6/sound/card2/input19
    [ 9.288645] input: HD-Audio Generic Rear Mic as /devices/pci0000:00/0000:00:08.1/0000:0e:00.6/sound/card2/input20
    [ 9.288685] input: HD-Audio Generic Front Mic as /devices/pci0000:00/0000:00:08.1/0000:0e:00.6/sound/card2/input21
    [ 9.288718] input: HD-Audio Generic Line as /devices/pci0000:00/0000:00:08.1/0000:0e:00.6/sound/card2/input22
    [ 9.288752] input: HD-Audio Generic Line Out Front as /devices/pci0000:00/0000:00:08.1/0000:0e:00.6/sound/card2/input23
    [ 9.288785] input: HD-Audio Generic Line Out Surround as /devices/pci0000:00/0000:00:08.1/0000:0e:00.6/sound/card2/input24
    [ 9.288820] input: HD-Audio Generic Line Out CLFE as /devices/pci0000:00/0000:00:08.1/0000:0e:00.6/sound/card2/input25
    [ 9.288858] input: HD-Audio Generic Front Headphone as /devices/pci0000:00/0000:00:08.1/0000:0e:00.6/sound/card2/input26
    [ 9.351548] mt7921e 0000:0b:00.0: HW/SW Version: 0x8a108a10, Build Time: 20241106151007a

    [ 9.361738] mt7921e 0000:0b:00.0: WM Firmware Version: ____010000, Build Time: 20241106151045
    [ 9.445516] Bluetooth: hci0: Device setup in 188743 usecs
    [ 9.445519] Bluetooth: hci0: HCI Enhanced Setup Synchronous Connection command is advertised, but not supported.

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Uwe =?utf-8?Q?Kleine-K=C3=B6nig?=@1:229/2 to Forest on Wed Jun 4 11:10:01 2025
    XPost: linux.debian.bugs.dist
    From: u.kleine-koenig@baylibre.com

    Hello,

    On Wed, May 28, 2025 at 06:03:15PM -0700, Forest wrote:
    Package: src:linux
    Version: 6.12.29-1
    Severity: important
    X-Debbugs-Cc: forestix@nom.one

    Dear Maintainer,

    Since upgrading from kernel 6.12.27 to 6.12.29, my system occasionally
    hangs (evidenced by the mouse pointer freezing in place) for roughly
    0.3 - 2 seconds at a time.

    These hangs are accompanied by the following line in dmesg, often
    repeated several times:

    amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

    I believe this upstream report is the problem I am experiencing:

    https://gitlab.freedesktop.org/drm/amd/-/issues/4238

    The problem vanishes when I use a custom build of kernel 6.12.29 with
    commit ID 468034a06a6e8043c5b50f9cd0cac730a6e497b5 reverted. Upstream
    commit ID: f1c6be3999d2be2673a51a9be0caf9348e254e52

    https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=468034a06a6e8043c5b50f9cd0cac730a6e497b5


    Note there is some confusion about the affected commits because the
    offending change is in Linus Torvalds tree twice. Once as f1c6be3999d2
    in v6.15-rc6 (with Cc: stable) and once as cfb2d41831ee to be included
    in v6.16-rc1 (without Cc: stable).

    Upstream reverted cfb2d41831ee in commit 1b824eef269db44d068bbc0de74c94a8e8f9ce02, which currently waits in next
    to go into v6.16. It's marked for stable.

    f1c6be3999d2 got backported to
    6.14.7 4ec308a4104bc71a431c75cc9babf49303645617
    6.12.29 468034a06a6e8043c5b50f9cd0cac730a6e497b5
    6.6.91 c8a91debb020298c74bba0b9b6ed720fa98dc4a9

    So this might need some proding to get the backports all done correctly
    if the stable maintainers don't notice that f1c6be3999d2 ==
    cfb2d41831ee.

    Best regards
    Uwe

    -----BEGIN PGP SIGNATURE-----

    iQEzBAABCgAdFiEEP4GsaTp6HlmJrf7Tj4D7WH0S/k4FAmhAC/AACgkQj4D7WH0S /k4ptAf/bKROvlxkV8VMR7DxScZf8FwC8Riyi9RJz4ARi+Av4ThU/4psSnech8e9 SYJYcsZUhHBBwC07lZXP4c1oWTCkfnhAct6lLwBy7lcqQKXA//dPVo5MWnjGaw4e TMilDJcvYCBUlbL/lfM6a3pL78qI09fxAUH8GfUjq9TxG5Pe0w7pW206aOpMoqW9 9xm1OE2fIaSanoZ6QSKa0DeyhtM4ad/wHoqaGtJ8J2BOOeA/cjS4avw5dHrbMUec xPs+0sbiRyQGnauDOP+65YNxG1XY9ofWd7PVfNz77lLlJRgDPq/8qIkSa7LrUT1l qWqTQRr892mHUJaEFs9Idnkz52F+aA==
    =S+Nd
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)