• Re: Hardware question

    From Felix Miata@21:1/5 to All on Fri Feb 28 04:40:02 2025
    Van Snyder composed on 2025-02-27 19:06 (UTC-0800):

    While running at level 3 in Debian 12.5, I got the following messages:

    mce: {Hardware Error]: CPU: 8 Machine Check: 0 Bank 0: 8000004000040005
    mce: {Hardware Error]: TSC 1838aa435b6d
    mce: {Hardware Error]: PROCESSOR 0: b0671 TIME 140710368 SOCKET 0 APIC
    20 microcode 12b

    Motherboard is Micro-Star International Z790 GAMING PLUS WIFI (MS-7E06)

    Processor is Intel(R) Core(TM) i9-14900K model 183 with 0x12b microcode

    RAM is Micron Technology Part Number: CP16G56C46U5.C8D

    The MB and CPU and RAM are only a few weeks old. Should I try to get
    warranty replacement of something?

    Your kernel is older than your CPU by about a year, so likely doesn't have enough
    backporting to fully support it properly. A newer kernel could be all it takes to
    make those MCEs go away.
    --
    Evolution as taught in public schools is, like religion,
    based on faith, not based on science.

    Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

    Felix Miata

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Smith@21:1/5 to Van Snyder on Fri Feb 28 20:40:01 2025
    Hi,

    On Fri, Feb 28, 2025 at 11:27:40AM -0800, Van Snyder wrote:
    What's "mce?"

    https://en.wikipedia.org/wiki/Machine-check_exception

    "apt update" says everything is up to date, but the kernel is 6.1.0-18.
    I believe there are several newer ones, maybe up to 6.1.0-31?

    You can try a kernel from bookworm-backports or even download the source
    of the latest Linux kernel and build a deb package for it.

    Thanks,
    Andy

    --
    https://bitfolk.com/ -- No-nonsense VPS hosting

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Van Snyder on Fri Feb 28 20:40:02 2025
    On Fri, Feb 28, 2025 at 11:27:40 -0800, Van Snyder wrote:
    "apt update" says everything is up to date, but the kernel is 6.1.0-18.
    I believe there are several newer ones, maybe up to 6.1.0-31?

    That's correct. You're probably missing the metapackage that brings
    in new kernels automatically. For an amd64 machine, that metapackage
    is named "linux-image-amd64". (If you use DKMS kernel modules, you'll
    also want the corresponding linux-headers-* metapackage.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Van Snyder on Fri Feb 28 22:20:01 2025
    On Fri, Feb 28, 2025 at 12:46:35 -0800, Van Snyder wrote:
    On Fri, 2025-02-28 at 14:34 -0500, Greg Wooledge wrote:
    On Fri, Feb 28, 2025 at 11:27:40 -0800, Van Snyder wrote:
    "apt update" says everything is up to date, but the kernel is
    6.1.0-18.
    I believe there are several newer ones, maybe up to 6.1.0-31?

    That's correct.  You're probably missing the metapackage that brings
    in new kernels automatically.  For an amd64 machine, that metapackage
    is named "linux-image-amd64".  (If you use DKMS kernel modules,
    you'll
    also want the corresponding linux-headers-* metapackage.)

    The NVidia 570 driver is a kernel package. What's the name of the corresponding linux-headers-* metapackage?

    Whatever your linux-image-* metapackage is called, you replace "image"
    with "headers" to get the other metapackage name.

    For amd64 they would be linux-image-amd64 and linux-headers-amd64.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Felix Miata@21:1/5 to All on Fri Feb 28 21:40:01 2025
    Van Snyder composed on 2025-02-28 11:27 (UTC-0800):

    On Thu, 2025-02-27 at 22:35 -0500, Felix Miata wrote:

    Your kernel is older than your CPU by about a year, so likely doesn't
    have enough
    backporting to fully support it properly. A newer kernel could be all
    it takes to
    make those MCEs go away.

    What's "mce?"

    The part you failed to quote:

    mce: {Hardware Error]: CPU: 8 Machine Check: 0 Bank 0: 8000004000040005
    mce: {Hardware Error]: TSC 1838aa435b6d
    mce: {Hardware Error]: PROCESSOR 0: b0671 TIME 140710368 SOCKET 0 APIC
    20 microcode 12b

    Machine Check (Exception) is a "Hardware Error", typically caused not by a hardware error, but by attempting to use hardware newer than the software, the kernel in particular.

    "apt update" says everything is up to date, but the kernel is 6.1.0-18.
    I believe there are several newer ones, maybe up to 6.1.0-31?

    "Newer" kernels do include those newer than your 6.1.0-18 and it's "update" derivative 6.1.0-31. Updates are newer by date, due to the security and bug fixes
    they contain. But, little if any support for newer hardware is added to update kernels, so their support for newer hardware is generally limited or absent.

    "Newer" in another sense, means 6.2.x and up. Newer kernels for this sense of the
    word support newer hardware than that generally available around the time of the
    version selected for provision with the original release of the Debian 12 operating system you are using.

    In another sense of the word "newer", that which is most likely applicable to your
    environment, is a version newer than your i9-14900K CPU, which was released to market in the third quarter of 2023, when the upstream newest available kernel was
    6.8.x, 7 major versions newer than Bookworm's 6.1.x.
    --
    Evolution as taught in public schools is, like religion,
    based on faith, not based on science.

    Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

    Felix Miata

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joe@21:1/5 to Van Snyder on Sat Mar 1 10:10:01 2025
    On Fri, 28 Feb 2025 22:00:41 -0800
    Van Snyder <van.snyder@sbcglobal.net> wrote:

    On Fri, 2025-02-28 at 12:46 -0800, Van Snyder wrote:
    On Fri, 2025-02-28 at 14:34 -0500, Greg Wooledge wrote:
    On Fri, Feb 28, 2025 at 11:27:40 -0800, Van Snyder wrote:
    "apt update" says everything is up to date, but the kernel is
    6.1.0-18.
    I believe there are several newer ones, maybe up to 6.1.0-31?

    That's correct.  You're probably missing the metapackage that
    brings
    in new kernels automatically.  For an amd64 machine, that
    metapackage
    is named "linux-image-amd64".  (If you use DKMS kernel modules,
    you'll
    also want the corresponding linux-headers-* metapackage.)

    The NVidia 570 driver is a kernel package. What's the name of the corresponding linux-headers-* metapackage?

    The NVidia 570 driver is a kernel package. It's installed by
    downloading and running an bash script that requires the kernel
    headers. If I get the metapackage linux-image-amd64 will I need to
    rebuild the NVidia driver every time it loads a new kernel?

    If I get a new kernel by way of "apt update" it leaves a line in grub
    to load older kernels. Will the metapackage do that so I can at least
    boot something until I rebuild the NVidia 570 driver to go with a new
    kernel?




    It will normally install a new kernel and keep the previous one. I
    can't remember if it removes earlier ones or whether I do that, as I
    run apt autoremove regularly. Autoremove certainly leaves the last
    kernel installed as well as the current one.

    --
    Joe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Timothy M Butterworth on Sat Mar 1 15:00:01 2025
    On Sat, Mar 01, 2025 at 04:06:37 -0500, Timothy M Butterworth wrote:
    On Sat, Mar 1, 2025 at 4:04 AM Joe <joe@jretrading.com> wrote:

    On Fri, 28 Feb 2025 11:27:40 -0800
    Van Snyder <van.snyder@sbcglobal.net> wrote:

    On Thu, 2025-02-27 at 22:35 -0500, Felix Miata wrote:
    Your kernel is older than your CPU by about a year, so likely
    doesn't have enough
    backporting to fully support it properly. A newer kernel could be
    all it takes to
    make those MCEs go away.

    What's "mce?"

    "apt update" says everything is up to date, but the kernel is
    6.1.0-18. I believe there are several newer ones, maybe up to
    6.1.0-31?


    When is the last time you rebooted? Debian does not provide live kernel patching. If apt says you are all up to date but uname shows an older
    kernel then it is time to reboot.

    That's only true if the linux-image-$ARCH metapackage is installed,
    and if the user has been using "apt update" + "apt upgrade" (or
    equivalents) on a regular basis.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans@21:1/5 to All on Sat Mar 1 10:00:01 2025
    Am Freitag, 28. Februar 2025, 21:46:35 CET schrieb Van Snyder:
    On Fri, 2025-02-28 at 14:34 -0500, Greg Wooledge wrote:
    On Fri, Feb 28, 2025 at 11:27:40 -0800, Van Snyder wrote:
    "apt update" says everything is up to date, but the kernel is
    6.1.0-18.
    I believe there are several newer ones, maybe up to 6.1.0-31?

    That's correct. You're probably missing the metapackage that brings
    in new kernels automatically. For an amd64 machine, that metapackage
    is named "linux-image-amd64". (If you use DKMS kernel modules,
    you'll
    also want the corresponding linux-headers-* metapackage.)

    The NVidia 570 driver is a kernel package. What's the name of the corresponding linux-headers-* metapackage?

    if the kernel is 6.1.0-31-amd64, the kernel-header-package is

    linux-headers-6.1.0-31-amd64

    If in doubt, install the "module-assistant" package, then start it with the command "m-a", and then over the ncurses-gui you can easily prepare your compiling environmen for building kernel-modules. It is self explainary and it will install all packages you need for the actual running kernel.

    Hope this helps.


    Best

    Hans

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans@21:1/5 to All on Sat Mar 1 21:10:01 2025
    Am Samstag, 1. März 2025, 20:27:25 CET schrieb Van Snyder:
    On Fri, 2025-02-28 at 22:00 -0800, Van Snyder wrote:
    That's correct. You're probably missing the metapackage that brings
    in new kernels automatically. For an amd64 machine, that metapackage
    is named "linux-image-amd64". (If you use DKMS kernel modules,
    you'll
    also want the corresponding linux-headers-* metapackage.)

    The NVidia 570 driver is a kernel package that's installed by running
    an bash script that requires the kernel headers.

    If I get the metapackages linux-image-amd64 and limux-headers-amd64,
    will I need to rebuild the NVidia driver every time it loads a new
    kernel?

    Yes, when you get a new kernel-version, let`s say, 6.1.0-32-am64, then the nvidia kernel module has to be built.

    So, after an upgrade and a new kernel, let the old kernel installed. When the nvidia build fails and X is not starting, you can reboot and start with the older kernel.

    After the new kernel is starting well and X is loaded and the nvidia-module and drivers are working well, you can deinstall the older kernel.

    With an upgrade the build of the nvidia-kernel-module should run automatically.

    Hans

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Guthausen@21:1/5 to Van Snyder on Sat Mar 1 21:10:01 2025
    On Sat, 01 Mar 2025 11:27:25 -0800
    Van Snyder <van.snyder@sbcglobal.net> wrote:

    If I get the metapackages linux-image-amd64 and limux-headers-amd64,
    will I need to rebuild the NVidia driver every time it loads a new
    kernel?

    Probably yes. If you install a fixed kernel version, you'll probably
    need only the headers for exactly this kernel version. But an outdated
    kernel version might have security issues.
    --
    kind regards
    Frank

    -----BEGIN PGP SIGNATURE-----

    iQGzBAEBCgAdFiEE86z15c6qwvuAkhy+zDIN/uu9BloFAmfDaFkACgkQzDIN/uu9 Blq38Qv/RFpb3qFqpXrf3ml/hEK5v5J42xdTwKMaE71+mK3igJqyB6YA/Dxe5Rhb Ol3tL3084+3n68iDeeCMTjGPr8+mpgHZU0Du6NoTRJYXD/niU+UP8NFFjbRzdP1R cA+dJBVFRd3/9SwSez5Uz3Johgnb57naqXrXI0NHAY/fqoIOxcUiIME6e17WmgDx L8r4DQaZq6N9oRhRUaBsJj0yNac0BIFdldaW4HHtVdaCabW6tiMuEb33mHFFMYdE vizEBFMRu5UAAfvEyTks/4nXpX0r2MHujXX4fWESf5WPyWJRlUzr+JnJImMw/mhS FiII5yv4A+A9J85WiW/Lllw+fnz6V5r5aYX87RO2cpqcMfTvtsT0hHfOd0hZmZFH TAQpP+LAPJcUhCSaORU+dDlrjiMkrOl8WqDkBoq0dcTTI8t/2LcJCo6Ouk9jxRYD 9jzvRyXy8otkno29ZSSnJuy3/sJ9m/CS0WnTH2IHFQzb1J9jbHrdFcv5MKAlT+/5
    6pmBTL+a
    =cwap
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Smith@21:1/5 to Van Snyder on Sat Mar 1 23:40:01 2025
    Hi,

    On Sat, Mar 01, 2025 at 02:24:04PM -0800, Van Snyder wrote:
    The NVidia kernel module is built by running a bash script. It's not a
    .deb package.

    Will it still be automatigically rebuilt?

    If it's a DKMS, which is what my nvidia driver is, then it will try to
    be built for any kernel install and should work as long as you have
    headers installed. Though there have been times that things have changed
    and its build is broken.

    If you need a newer kernel to properly support your CPU or motherboard
    then my gut feeling is that you'll need something newer than the 6.1.x
    provided by Debian stable packages, as these tend to be a long term
    stable kernel with only bug fix backported, not new features or device
    support. So you may have to see what is in backports or experimental.

    Thanks,
    Andy

    --
    https://bitfolk.com/ -- No-nonsense VPS hosting

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anssi Saari@21:1/5 to Van Snyder on Sun Mar 2 20:40:02 2025
    Van Snyder <van.snyder@sbcglobal.net> writes:

    I install the driver by running the NVIDIA-Linux-x86_64-570.124.04.run script at level 3, then rebooting.

    Why?

    Is that DKMS?

    To be clear, it's a manual installation of drivers from the
    manufacturer. Definitely not DKMS. You want install stuff manually, you
    get to update it yourself.

    Was there some reason to go with this instead of just installing the nvidia-driver package from non-free? Even the free nouveau driver seems
    to have fairly good support for your GPU.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans@21:1/5 to All on Mon Mar 3 09:40:02 2025
    The nvidia-driver package from non-free apparently doesn't work with a
    Quadro K2200. NVidia recommends the 570 driver.

    When I first installed the system, the left-hand pane of Evolution
    would spontaneously scroll, even if a different window had keyboard and
    mouse focus, and the mouse cursor disappeared when I put it into a
    window title bar or boundary.

    The nouveau driver would occasionally lock up.

    Those problems were cured by installing the NVidia driver.

    I believe, for the Quadro K2200 the kernel driver is the "nvidia-open-kernel" package. This is for the newest graphics card.

    Did you try this one?

    Best

    Hans

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anssi Saari@21:1/5 to Van Snyder on Tue Mar 4 09:10:01 2025
    Van Snyder <van.snyder@sbcglobal.net> writes:

    The nvidia-driver package from non-free apparently doesn't work with a Quadro K2200.

    But you didn't check? The release notes tell a different story.

    NVidia recommends the 570 driver.

    They always recommend the latest.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)