• Bug#1104732: linux-image-6.1.0-34-amd64: amdgpu error, kernel cpu loop,

    From mg@1:229/2 to All on Mon May 5 13:40:02 2025
    XPost: linux.debian.bugs.dist
    From: mg-public-addr@protonmail.com

    Package: src:linux
    Version: 6.1.135-1
    Severity: important
    X-Debbugs-Cc: mg-public-addr@protonmail.com

    Dear Maintainer,

    The system will log an error, followed by 100% cpu usage on one core, I believe by the kernel, which results in the message

    ```
    [drm:dc_add_plane_to_context [amdgpu]] *ERROR* Head pipe not found for stream_state 00000000b7629c18 !
    ```

    logged endlessly and as fast as the CPU can process to the kernel log.

    The trigger for this issue is unclear to me, as it will not happen on every boot of the system, and can take hours, days or
    weeks to appear after a reboot.

    Other system functions appear to work as normal, or not degreded in a way I have noticed, as long as the logs are rotated.

    Rebooting the system is my current approach when this error happens, and buys time until it occurs again. This system has run
    without issue for >60 days on an affected kernel version, so I suspect there is no guarantee this bug will always appear.

    The system is run as a mostly headless server, does not hibernate, sleep or suspend. It *is* connected to a TV via a HDMI cable,
    that turns on and off throughout the day, and is one of the few inputs relevant to the amd gpu driver that I suspect could be a
    trigger. This is connected to the motherboard HDMI connection, using the iGPU of a Ryzen 2200G

    This system is running openmediavault (intalled from that install media), but I am logging here as I suspect it does not make
    modifications to the kernel and core debian system.

    The last time this was known to be stable for me was on the 5.10 kernels under bullseye, and on both bullseye and bookworm under
    the 6.x kernel this issue has appeared.

    I have captured two instances of this from separate dates included below. The last line is the one to repeat infinitely from
    this point onwards. It is difficult to capture as the logs will quickly either fill up the hard drive, or get log rotated
    out, which means that it has been hard to observe anything other than the final message in the logs! It has been recurring
    around 6-10 times total in a 12 month period.

    Please note that the last kernel log included by `reportbug` is on a fresh reboot of the system where this issue has not
    occured yet and may be of no use - else it would only capture the spammed message log!

    Logs from first time I caught the issue:

    2024-07-19T20:17:35.445701+01:00 rhino kernel: [72604.746570] ------------[ cut here ]------------
    2024-07-19T20:17:35.445717+01:00 rhino kernel: [72604.746574] WARNING: CPU: 1 PID: 56 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3074 dc_update_planes_and_stream+0x342/0x870 [amdgpu]
    2024-07-19T20:17:35.445720+01:00 rhino kernel: [72604.747029] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_
    ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink overlay cpufreq_powersave cpufreq_ondemand cpufreq_conservative cpufreq_userspace sunrpc quota_v2 quota_tree binfmt_misc intel_rapl_msr nls_ascii nls_cp437 intel_rapl_common vfat snd_hda_codec_realtek
    fat stv6111 btusb snd_hda_codec_generic btrtl lnbh25 edac_mce_amd ledtrig_audio btbcm btintel snd_hda_codec_hdmi kvm_amd amdgpu snd_hda_intel btmtk stv0910 iwlmvm kvm snd_intel_dspcfg snd_intel_sdw_acpi bluetooth irqbypass snd_hda_codec mac80211 gpu_
    sched drm_buddy jitterentropy_rng ghash_clmulni_intel snd_hda_core libarc4 drm_display_helper sha256_ssse3 sha512_ssse3 snd_hwdep sha1_ssse3 sha512_generic cec snd_pcm drbg iwlwifi rc_core ansi_cprng drm_ttm_helper snd_timer aesni_intel ttm snd ddbridge
    crypto_simd cryptd soundcore
    2024-07-19T20:17:35.445722+01:00 rhino kernel: [72604.747114] drm_kms_helper ecdh_generic dvb_core ecc mc rapl cfg80211 ccp wmi_bmof pcspkr sp5100_tco k10temp sg rfkill evdev acpi_cpufreq button wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_
    64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel softdog watchdog nct6775 drm nct6775_core hwmon_vid dm_mod fuse loop efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic zstd_compress
    efivarfs raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common
    xhci_pci ahci libahci crc32_pclmul crc32c_intel xhci_hcd libata igb usbcore scsi_mod i2c_algo_bit dca i2c_piix4 scsi_common usb_common video wmi gpio_amdpt gpio_generic
    2024-07-19T20:17:35.445724+01:00 rhino kernel: [72604.747206] CPU: 1 PID: 56 Comm: kworker/1:1H Not tainted 6.1.0-23-amd64 #1 Debian 6.1.99-1
    2024-07-19T20:17:35.445725+01:00 rhino kernel: [72604.747212] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Gaming-ITX/ac, BIOS P3.40 07/17/2019
    2024-07-19T20:17:35.445737+01:00 rhino kernel: [72604.747215] Workqueue: events_highpri dm_irq_work_func [amdgpu]
    2024-07-19T20:17:35.445738+01:00 rhino kernel: [72604.747659] RIP: 0010:dc_update_planes_and_stream+0x342/0x870 [amdgpu]
    2024-07-19T20:17:35.445740+01:00 rhino kernel: [72604.748095] Code: 48 2b 14 25 28 00 00 00 0f 85 38 05 00 00 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e 41 5f e9 57 4e 9a de 45 85 ed 0f 84 51 fe ff ff <0f> 0b 31 c0 eb ca 8b 93 50 06 00 00 83 fa 01 0f 84 68 fe
    ff ff 48
    2024-07-19T20:17:35.445741+01:00 rhino kernel: [72604.748099] RSP: 0018:ffffbeb1c040f870 EFLAGS: 00010202
    2024-07-19T20:17:35.445742+01:00 rhino kernel: [72604.748103] RAX: 0000000000000000 RBX: ffff962e90544000 RCX: 0000000000000000
    2024-07-19T20:17:35.445744+01:00 rhino kernel: [72604.748106] RDX: 0000000000000000 RSI: ffff962eb3a00000 RDI: ffff962e90544000
    2024-07-19T20:17:35.445745+01:00 rhino kernel: [72604.748108] RBP: ffffbeb1c040fc68 R08: 0000000000000000 R09: 0000000000000004

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Salvatore Bonaccorso@1:229/2 to All on Mon May 5 16:30:01 2025
    XPost: linux.debian.bugs.dist
    From: carnil@debian.org

    Control: tags -1 + moreinfo

    Hi

    thanks for your report.

    On Mon, May 05, 2025 at 12:35:36PM +0100, mg wrote:
    Package: src:linux
    Version: 6.1.135-1
    Severity: important
    X-Debbugs-Cc: mg-public-addr@protonmail.com

    Dear Maintainer,

    The system will log an error, followed by 100% cpu usage on one core, I believe by the kernel, which results in the message

    ```
    [drm:dc_add_plane_to_context [amdgpu]] *ERROR* Head pipe not found for stream_state 00000000b7629c18 !
    ```

    logged endlessly and as fast as the CPU can process to the kernel log.

    The trigger for this issue is unclear to me, as it will not happen on every boot of the system, and can take hours, days or
    weeks to appear after a reboot.

    Other system functions appear to work as normal, or not degreded in a way I have noticed, as long as the logs are rotated.

    Rebooting the system is my current approach when this error happens, and buys time until it occurs again. This system has run
    without issue for >60 days on an affected kernel version, so I suspect there is no guarantee this bug will always appear.

    The system is run as a mostly headless server, does not hibernate, sleep or suspend. It *is* connected to a TV via a HDMI cable,
    that turns on and off throughout the day, and is one of the few inputs relevant to the amd gpu driver that I suspect could be a
    trigger. This is connected to the motherboard HDMI connection, using the iGPU of a Ryzen 2200G

    This system is running openmediavault (intalled from that install media), but I am logging here as I suspect it does not make
    modifications to the kernel and core debian system.

    The last time this was known to be stable for me was on the 5.10 kernels under bullseye, and on both bullseye and bookworm under
    the 6.x kernel this issue has appeared.

    I have captured two instances of this from separate dates included below. The last line is the one to repeat infinitely from
    this point onwards. It is difficult to capture as the logs will quickly either fill up the hard drive, or get log rotated
    out, which means that it has been hard to observe anything other than the final message in the logs! It has been recurring
    around 6-10 times total in a 12 month period.

    Please note that the last kernel log included by `reportbug` is on a fresh reboot of the system where this issue has not
    occured yet and may be of no use - else it would only capture the spammed message log!

    Logs from first time I caught the issue:

    2024-07-19T20:17:35.445701+01:00 rhino kernel: [72604.746570] ------------[ cut here ]------------
    2024-07-19T20:17:35.445717+01:00 rhino kernel: [72604.746574] WARNING: CPU: 1 PID: 56 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3074 dc_update_planes_and_stream+0x342/0x870 [amdgpu]
    2024-07-19T20:17:35.445720+01:00 rhino kernel: [72604.747029] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_
    defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink overlay cpufreq_powersave cpufreq_ondemand cpufreq_conservative cpufreq_userspace sunrpc quota_v2 quota_tree binfmt_misc intel_rapl_msr nls_ascii nls_cp437 intel_rapl_common vfat snd_hda_codec_
    realtek fat stv6111 btusb snd_hda_codec_generic btrtl lnbh25 edac_mce_amd ledtrig_audio btbcm btintel snd_hda_codec_hdmi kvm_amd amdgpu snd_hda_intel btmtk stv0910 iwlmvm kvm snd_intel_dspcfg snd_intel_sdw_acpi bluetooth irqbypass snd_hda_codec mac80211
    gpu_sched drm_buddy jitterentropy_rng ghash_clmulni_intel snd_hda_core libarc4 drm_display_helper sha256_ssse3 sha512_ssse3 snd_hwdep sha1_ssse3 sha512_generic cec snd_pcm drbg iwlwifi rc_core ansi_cprng drm_ttm_helper snd_timer aesni_intel ttm snd
    ddbridge crypto_simd cryptd soundcore
    2024-07-19T20:17:35.445722+01:00 rhino kernel: [72604.747114] drm_kms_helper ecdh_generic dvb_core ecc mc rapl cfg80211 ccp wmi_bmof pcspkr sp5100_tco k10temp sg rfkill evdev acpi_cpufreq button wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_
    64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel softdog watchdog nct6775 drm nct6775_core hwmon_vid dm_mod fuse loop efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic zstd_compress
    efivarfs raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common
    xhci_pci ahci libahci crc32_pclmul crc32c_intel xhci_hcd libata igb usbcore scsi_mod i2c_algo_bit dca i2c_piix4 scsi_common usb_common video wmi gpio_amdpt gpio_generic
    2024-07-19T20:17:35.445724+01:00 rhino kernel: [72604.747206] CPU: 1 PID: 56 Comm: kworker/1:1H Not tainted 6.1.0-23-amd64 #1 Debian 6.1.99-1
    2024-07-19T20:17:35.445725+01:00 rhino kernel: [72604.747212] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Gaming-ITX/ac, BIOS P3.40 07/17/2019
    2024-07-19T20:17:35.445737+01:00 rhino kernel: [72604.747215] Workqueue: events_highpri dm_irq_work_func [amdgpu]
    2024-07-19T20:17:35.445738+01:00 rhino kernel: [72604.747659] RIP: 0010:dc_update_planes_and_stream+0x342/0x870 [amdgpu]
    2024-07-19T20:17:35.445740+01:00 rhino kernel: [72604.748095] Code: 48 2b 14 25 28 00 00 00 0f 85 38 05 00 00 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e 41 5f e9 57 4e 9a de 45 85 ed 0f 84 51 fe ff ff <0f> 0b 31 c0 eb ca 8b 93 50 06 00 00 83 fa 01 0f 84 68 fe
    ff ff 48
    2024-07-19T20:17:35.445741+01:00 rhino kernel: [72604.748099] RSP: 0018:ffffbeb1c040f870 EFLAGS: 00010202
    2024-07-19T20:17:35.445742+01:00 rhino kernel: [72604.748103] RAX: 0000000000000000 RBX: ffff962e90544000 RCX: 0000000000000000

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From mg@1:229/2 to Salvatore Bonaccorso on Wed May 7 21:10:02 2025
    XPost: linux.debian.bugs.dist
    From: mg-public-addr@protonmail.com

    Thank you for your time looking into this Salvatore.

    I assume you cannot more specifically say when you saw the problem
    first appearing in the 6.1.y series?

    Sorry, no from memory it was around then but I cannot find any logs to firmly conclude a particular version.

    Would you be able to test newer stable series as well (ideally 6.12.y
    as will be shipped in trixie or mainline kernel?)

    I may need some instruction on building or applying any kernel patches and how to switch to an alternative kernel on Debian. I am familiar with git and building software, although not the kernel / C applications.

    I have searched the upstream issues in https://gitlab.freedesktop.org/drm/amd/-/issues and did not found
    something directly matching your report. Could you please report it
    upstream and report back the upstream issue back here so we can link
    those?

    Sice you are owning the hardware that would help speed up the
    debugging by having you directly interacting with upstream on the
    matter. Can you do that?

    I have filed the bug upstream at https://gitlab.freedesktop.org/drm/amd/-/issues/4212 and will follow up with them, unless I need some Debian specific help.

    Kind regards,
    MG.

    On Monday, 5 May 2025 at 15:23, Salvatore Bonaccorso <carnil@debian.org> wrote:

    Control: tags -1 + moreinfo

    Hi

    thanks for your report.

    On Mon, May 05, 2025 at 12:35:36PM +0100, mg wrote:

    Package: src:linux
    Version: 6.1.135-1
    Severity: important
    X-Debbugs-Cc: mg-public-addr@protonmail.com

    Dear Maintainer,

    The system will log an error, followed by 100% cpu usage on one core, I believe by the kernel, which results in the message

    `[drm:dc_add_plane_to_context [amdgpu]] *ERROR* Head pipe not found for stream_state 00000000b7629c18 !`

    logged endlessly and as fast as the CPU can process to the kernel log.

    The trigger for this issue is unclear to me, as it will not happen on every boot of the system, and can take hours, days or
    weeks to appear after a reboot.

    Other system functions appear to work as normal, or not degreded in a way I have noticed, as long as the logs are rotated.

    Rebooting the system is my current approach when this error happens, and buys time until it occurs again. This system has run
    without issue for >60 days on an affected kernel version, so I suspect there is no guarantee this bug will always appear.

    The system is run as a mostly headless server, does not hibernate, sleep or suspend. It is connected to a TV via a HDMI cable,
    that turns on and off throughout the day, and is one of the few inputs relevant to the amd gpu driver that I suspect could be a
    trigger. This is connected to the motherboard HDMI connection, using the iGPU of a Ryzen 2200G

    This system is running openmediavault (intalled from that install media), but I am logging here as I suspect it does not make
    modifications to the kernel and core debian system.

    The last time this was known to be stable for me was on the 5.10 kernels under bullseye, and on both bullseye and bookworm under
    the 6.x kernel this issue has appeared.

    I have captured two instances of this from separate dates included below. The last line is the one to repeat infinitely from
    this point onwards. It is difficult to capture as the logs will quickly either fill up the hard drive, or get log rotated
    out, which means that it has been hard to observe anything other than the final message in the logs! It has been recurring
    around 6-10 times total in a 12 month period.

    Please note that the last kernel log included by `reportbug` is on a fresh reboot of the system where this issue has not
    occured yet and may be of no use - else it would only capture the spammed message log!

    Logs from first time I caught the issue:

    2024-07-19T20:17:35.445701+01:00 rhino kernel: [72604.746570] ------------[ cut here ]------------
    2024-07-19T20:17:35.445717+01:00 rhino kernel: [72604.746574] WARNING: CPU: 1 PID: 56 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3074 dc_update_planes_and_stream+0x342/0x870 [amdgpu]
    2024-07-19T20:17:35.445720+01:00 rhino kernel: [72604.747029] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_
    defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink overlay cpufreq_powersave cpufreq_ondemand cpufreq_conservative cpufreq_userspace sunrpc quota_v2 quota_tree binfmt_misc intel_rapl_msr nls_ascii nls_cp437 intel_rapl_common vfat snd_hda_codec_
    realtek fat stv6111 btusb snd_hda_codec_generic btrtl lnbh25 edac_mce_amd ledtrig_audio btbcm btintel snd_hda_codec_hdmi kvm_amd amdgpu snd_hda_intel btmtk stv0910 iwlmvm kvm snd_intel_dspcfg snd_intel_sdw_acpi bluetooth irqbypass snd_hda_codec mac80211
    gpu_sched drm_buddy jitterentropy_rng ghash_clmulni_intel snd_hda_core libarc4 drm_display_helper sha256_ssse3 sha512_ssse3 snd_hwdep sha1_ssse3 sha512_generic cec snd_pcm drbg iwlwifi rc_core ansi_cprng drm_ttm_helper snd_timer aesni_intel ttm snd
    ddbridge crypto_simd cryptd soundcore

    [continued in next message]

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)