• Re: Automatic reboot on kernel crash in Debian 12 - how?

    From Michael =?utf-8?B?S2rDtnJsaW5n?=@21:1/5 to All on Tue Apr 16 11:40:01 2024
    On 16 Apr 2024 11:22 +0200, from george@nsup.org (Nicolas George):
    Do I need to set some more settings to ensure that the system will
    automatically reboot on a panic? If so, what?

    If the crash was bad enough to freeze the kernel before it could
    trigger the reboot, there is nothing the software can do.

    You need a hardware watchdog.

    Are you saying that the settings themselves are reasonable for the
    purpose, and that this particular crash just happened to be such a one
    that no software running on the system in question can reasonably help
    with that scenario?

    This happened on a VM that I can't directly influence the hardware configuration of (a commercially provided VPS), but I should be able
    to jury-rig something using the provider's API if necessary.

    --
    Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael =?utf-8?B?S2rDtnJsaW5n?=@21:1/5 to All on Tue Apr 16 11:50:01 2024
    On 16 Apr 2024 11:42 +0200, from george@nsup.org (Nicolas George):
    Are you saying that the settings themselves are reasonable for the
    purpose, and that this particular crash just happened to be such a one
    that no software running on the system in question can reasonably help
    with that scenario?

    No, unfortunately I do not have the gift of divination, it would be convenient. I am saying that you cannot use software to protect yourself entirely from software bugs.

    Well, naturally. But if there is some setting which I _could_ set that
    would get me closer to my desired state, I would still like to know
    which one and perhaps even what might be an appropriate value for it.

    --
    Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nicolas George@21:1/5 to All on Tue Apr 16 11:30:01 2024
    Michael Kjrling (12024-04-16):
    However, this morning I woke up to one of those systems showing a
    kernel crash dump and being frozen. Unfortunately the first part of
    the crash dump had scrolled past so I couldn't tell what class of
    problem caused the crash.

    Do I need to set some more settings to ensure that the system will automatically reboot on a panic? If so, what?

    If the crash was bad enough to freeze the kernel before it could
    trigger the reboot, there is nothing the software can do.

    You need a hardware watchdog. If your motherboard has one, just install
    and enable the corresponding daemon, and check it works by SIGSTOPing
    it.

    If your motherboard does not have one, you can probably DIY one from a
    RPi or an Arduino.

    Regards,

    --
    Nicolas George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael =?utf-8?B?S2rDtnJsaW5n?=@21:1/5 to All on Tue Apr 16 11:20:01 2024
    I have a handful of Debian 12 systems that I want to configure such
    that they reboot automatically in case of a problem. I have set them
    up with userspace scripts (executed through cron) to reboot if
    something goes wrong there; that appears to work as expected if I
    induce an issue that those scripts check for. That leaves kernel-level
    issues.

    To try to configure this, I have created a file
    /etc/sysctl.d/local.conf (owned by root:root, mode 0644).

    # cat /etc/sysctl.d/local.conf
    kernel.panic = 120
    kernel.panic_on_oops = 1
    kernel.panic_on_stackoverflow = 1
    kernel.panic_on_io_nmi = 1
    #

    With the exception of panic_on_stackoverflow, as far as I can tell
    these are in effect after a reboot:

    # sysctl kernel.panic kernel.panic_on_oops kernel.panic_on_stackoverflow kernel.panic_on_io_nmi
    kernel.panic = 120
    kernel.panic_on_oops = 1
    sysctl: cannot stat /proc/sys/kernel/panic_on_stackoverflow: No such file or directory
    kernel.panic_on_io_nmi = 1
    #

    However, this morning I woke up to one of those systems showing a
    kernel crash dump and being frozen. Unfortunately the first part of
    the crash dump had scrolled past so I couldn't tell what class of
    problem caused the crash.

    Do I need to set some more settings to ensure that the system will automatically reboot on a panic? If so, what?

    I know that best is to not crash; this is _in case of_.

    --
    Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nicolas George@21:1/5 to All on Tue Apr 16 11:50:01 2024
    Michael Kjrling (12024-04-16):
    Are you saying that the settings themselves are reasonable for the
    purpose, and that this particular crash just happened to be such a one
    that no software running on the system in question can reasonably help
    with that scenario?

    No, unfortunately I do not have the gift of divination, it would be
    convenient. I am saying that you cannot use software to protect yourself entirely from software bugs.

    This happened on a VM that I can't directly influence the hardware configuration of (a commercially provided VPS), but I should be able
    to jury-rig something using the provider's API if necessary.

    You probably can. But first check if your VM has an emulated hardware
    watchdog.

    Regards,

    --
    Nicolas George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)