• Could Gnome's "install pending software updates" cause installation scr

    From Lucas B. Cohen@21:1/5 to All on Fri Mar 29 12:20:01 2024
    Hi,

    I've had a bit of a headache understanding why my Debian bookworm system suddenly panicked at boot with an 'unable to mount root fs' error. Turns
    out the first of my two menuentries in grub.cfg were no longer
    specifying the linux root by its device UUID (as I was expecting it to
    do, by honoring GRUB_DISABLE_LINUX_UUID != true) ; instead these
    menuentries were using the device node/file (/dev/md0 in this case,
    hence the kernel panic).

    I've poured through the grub scripts a bit but they're quite complex.
    I've noticed that :

    - uninstalling the second of two kernels caused the remaining one to
    correctly use the device UUID in grub.cfg ;

    - reinstalling that second kernel caused grub.cfg to use UUIDs in all menuentries, as expected.

    (Kernel were the two most recent stable ones: 6.1.0-17 and -18.)

    This leads me to suspect that my grub.cfg might have been damaged in the
    way described above because update-grub might have been called in some
    unusual, limited execution environment. I'd very recently powered off my
    system and let the default "install pending software updates" option
    checked by accident, which caused every updated package from the 12.5
    release mark to be pulled. I'm guessing that linux-image-6.1.0-18 was
    part of it.

    Has anyone witnessed something similar? Would anyone here care to check
    this somehow? Or should I open a bug against gnome-desktop without waiting?

    Thank you for any insight.

    Apologies for possible e-mail client misconfiguration.

    Regards,

    --
    Lucas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Henning Follmann@21:1/5 to Lucas B. Cohen on Fri Mar 29 16:10:01 2024
    On Fri, Mar 29, 2024 at 12:01:27PM +0100, Lucas B. Cohen wrote:
    Hi,

    I've had a bit of a headache understanding why my Debian bookworm system suddenly panicked at boot with an 'unable to mount root fs' error. Turns out the first of my two menuentries in grub.cfg were no longer specifying the linux root by its device UUID (as I was expecting it to do, by honoring GRUB_DISABLE_LINUX_UUID != true) ; instead these menuentries were using the device node/file (/dev/md0 in this case, hence the kernel panic).


    Was there any error message during the update?
    I think what might have gone wrong, that you ran out of space on /boot.


    I've poured through the grub scripts a bit but they're quite complex. I've noticed that :

    Yeah, don't do that. These files are all automatically managed.
    All changes should be done in /etc/default/grub or in the config files in /etc/default/grub.d
    Then the grub config files are created by running
    update-grub



    - uninstalling the second of two kernels caused the remaining one to correctly use the device UUID in grub.cfg ;

    and that might have freed enough space on /boot.
    So now everything works again :)


    - reinstalling that second kernel caused grub.cfg to use UUIDs in all menuentries, as expected.

    (Kernel were the two most recent stable ones: 6.1.0-17 and -18.)

    This leads me to suspect that my grub.cfg might have been damaged in the way described above because update-grub might have been called in some unusual, limited execution environment. I'd very recently powered off my system and let the default "install pending software updates" option checked by accident, which caused every updated package from the 12.5 release mark to
    be pulled. I'm guessing that linux-image-6.1.0-18 was part of it.

    Has anyone witnessed something similar? Would anyone here care to check this somehow? Or should I open a bug against gnome-desktop without waiting?


    Usually it requires some trickery to install a new kernel on machines which might not have enough remaining space on the boot partition.

    For simple housekeeping it often is sufficient to run
    apt autoremove
    after recent updates (after you confirmed that the newly installed kernel
    boots fine).
    That usually frees enough space for a possible new update.


    -H

    --
    Henning Follmann | hfollmann@itcfollmann.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Wright@21:1/5 to Henning Follmann on Fri Mar 29 16:50:01 2024
    On Fri 29 Mar 2024 at 11:06:45 (-0400), Henning Follmann wrote:
    On Fri, Mar 29, 2024 at 12:01:27PM +0100, Lucas B. Cohen wrote:

    I've had a bit of a headache understanding why my Debian bookworm system suddenly panicked at boot with an 'unable to mount root fs' error. Turns out
    the first of my two menuentries in grub.cfg were no longer specifying the linux root by its device UUID (as I was expecting it to do, by honoring GRUB_DISABLE_LINUX_UUID != true) ; instead these menuentries were using the device node/file (/dev/md0 in this case, hence the kernel panic).

    Was there any error message during the update?
    I think what might have gone wrong, that you ran out of space on /boot.

    I've poured through the grub scripts a bit but they're quite complex. I've noticed that :

    Yeah, don't do that. These files are all automatically managed.
    All changes should be done in /etc/default/grub or in the config files in /etc/default/grub.d
    Then the grub config files are created by running
    update-grub

    - uninstalling the second of two kernels caused the remaining one to correctly use the device UUID in grub.cfg ;

    and that might have freed enough space on /boot.
    So now everything works again :)

    - reinstalling that second kernel caused grub.cfg to use UUIDs in all menuentries, as expected.

    (Kernel were the two most recent stable ones: 6.1.0-17 and -18.)

    This leads me to suspect that my grub.cfg might have been damaged in the way
    described above because update-grub might have been called in some unusual, limited execution environment. I'd very recently powered off my system and let the default "install pending software updates" option checked by accident, which caused every updated package from the 12.5 release mark to be pulled. I'm guessing that linux-image-6.1.0-18 was part of it.

    I'd write "upgraded" rather than "pulled", if that's what you meant.

    Has anyone witnessed something similar? Would anyone here care to check this
    somehow? Or should I open a bug against gnome-desktop without waiting?

    Usually it requires some trickery to install a new kernel on machines which might not have enough remaining space on the boot partition.

    For simple housekeeping it often is sufficient to run
    apt autoremove
    after recent updates (after you confirmed that the newly installed kernel boots fine).
    That usually frees enough space for a possible new update.

    You can also reduce the space taken up by initrd files, which are
    getting rather large nowadays if they are built with MODULES=most
    rather than MODULES=dep.

    When you have at least two working kernels, remove any unnecessary
    backups, copy the older kernel's initrd somewhere else, then rebuild
    it with MODULES=dep. If that kernel still boots ok, then you probably
    have a lot more room available now for the next kernel upgrade.
    Finally, reboot the newer kernel.

    Cheers,
    David.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lucas B. Cohen@21:1/5 to Henning Follmann on Fri Mar 29 21:00:01 2024
    On Fri 29 Mar 2024 at 11:06:45 (-0400), Henning Follmann wrote:
    On Fri, Mar 29, 2024 at 12:01:27PM +0100, Lucas B. Cohen wrote:
    Hi,

    I've had a bit of a headache understanding why my Debian bookworm system
    suddenly panicked at boot with an 'unable to mount root fs' error. Turns out >> the first of my two menuentries in grub.cfg were no longer specifying the
    linux root by its device UUID (as I was expecting it to do, by honoring
    GRUB_DISABLE_LINUX_UUID != true) ; instead these menuentries were using the >> device node/file (/dev/md0 in this case, hence the kernel panic).


    Was there any error message during the update?
    I think what might have gone wrong, that you ran out of space on /boot.

    Space on /boot couldn't have been the issue, I have 1GB allocated to
    that partition, and those 2 kernels only take up about a third of that
    space.

    The was no visible error message at the time, as it's all hidden from
    the user's view by Gnome, right before power off. However I'm checking
    my /var/log/apt/term.log where it was handily stored, and here's what
    I'm seeing:

    - seems that grub-mkconfig (the grub script called by Debian's
    update-grub wrapper) was in fact never called during that update
    sequence! (Therefore Gnome's handling of updates is off the hook.)
    Perhaps it was because of some bad dkms and linux-headers interaction.
    Some module failed to build, which cascaded into leaving the kernel and
    headers packages into the 'unconfigured' state:

    Building module:
    Cleaning build area...
    env NV_VERBOSE=1 make -j12 modules KERNEL_UNAME=6.1.0-18-amd64.......................(bad exit status: 2)
    Error! Bad return status for module build on kernel: 6.1.0-18-amd64 (x86_64) Consult /var/lib/dkms/nvidia-current/525.147.05/build/make.log for more information.
    Error! One or more modules failed to install during autoinstall.
    Refer to previous errors for more information.
    dkms: autoinstall for kernel: 6.1.0-18-amd64 failed!
    run-parts: /etc/kernel/postinst.d/dkms exited with return code 11
    dpkg: error processing package linux-image-6.1.0-18-amd64 (--configure):
    installed linux-image-6.1.0-18-amd64 package post-installation script subprocess returned error exit status 1

    dpkg: dependency problems prevent configuration of linux-headers-amd64:
    linux-headers-amd64 depends on linux-headers-6.1.0-18-amd64 (=
    6.1.76-1); however:
    Package linux-headers-6.1.0-18-amd64 is not configured yet.

    - Consequence: my grub.cfg was only regenerated two days later,
    incidentally , during a scheduled unattended-upgrades run. Where

    Log started: 2024-03-28 09:56:03
    [...]
    Removing linux-image-6.1.0-15-amd64 (6.1.66-1) ...
    /etc/kernel/prerm.d/dkms:
    [...]
    depmod...
    /etc/kernel/postrm.d/initramfs-tools:
    [...]
    /etc/kernel/postrm.d/zz-update-grub:
    Generating grub configuration file ...
    Found background image: /usr/share/images/desktop-base/desktop-grub.png
    Found linux image: /boot/vmlinuz-6.1.0-18-amd64
    Found linux image: /boot/vmlinuz-6.1.0-17-amd64
    Found initrd image: /boot/initrd.img-6.1.0-17-amd64
    Warning: Not executing os-prober.
    done
    [...]
    Error! One or more modules failed to install during autoinstall.
    Refer to previous errors for more information.
    dkms: autoinstall for kernel: 6.1.0-18-amd64 failed!
    run-parts: /etc/kernel/header_postinst.d/dkms exited with return code 11
    Failed to process /etc/kernel/header_postinst.d at /var/lib/dpkg/info/linux-headers-6.1.0-18-amd64.postinst line 11.
    dpkg: error processing package linux-headers-6.1.0-18-amd64 (--configure):
    installed linux-headers-6.1.0-18-amd64 package post-installation
    script subprocess returned error exit status 1
    dpkg: dependency problems prevent configuration of linux-image-amd64:
    linux-image-amd64 depends on linux-image-6.1.0-18-amd64 (= 6.1.76-1); however:
    Package linux-image-6.1.0-18-amd64 is not configured yet.

    dpkg: error processing package linux-image-amd64 (--configure):
    dependency problems - leaving unconfigured
    dpkg: dependency problems prevent configuration of linux-headers-amd64:
    linux-headers-amd64 depends on linux-headers-6.1.0-18-amd64 (=
    6.1.76-1); however:
    Package linux-headers-6.1.0-18-amd64 is not configured yet.

    dpkg: error processing package linux-headers-amd64 (--configure):
    dependency problems - leaving unconfigured
    Errors were encountered while processing:
    linux-image-6.1.0-18-amd64
    linux-headers-6.1.0-18-amd64
    linux-image-amd64
    linux-headers-amd64
    Log ended: 2024-03-28 09:58:24

    Something's now apparent: the initrd hadn't been created for this new
    -18 kernel until after grub-mkconfig's execution. My backed up erroneous grub.cfg confirms this. Maybe grub-mkconfig doesn't allow the use of
    UUID= absent an initrd? That would be enough to explain everything.

    Anyway, this is not an easy thing to reproduce. I guess it just calls
    attention to the danger of unattended/automatic upgrades in odd cases
    like these.

    Thanks Henning, and thank you David for your help. (Apologies for not
    replying to your messages; I'd forgotten to subscribe to the ML.)


    -H

    --
    Henning Follmann | hfollmann@itcfollmann.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)