• [gentoo-user] Uefi + uki stuck while booting (/dev/gpt-auto-root)

    From Alexander Puchmayr@21:1/5 to All on Sun Jun 16 10:10:01 2024
    Hi there,

    I just tried to prepare my new laptop for UFEI+secureboot by creating a single unified kernel image including kernel,initrd,microcode,etc.
    NB: The partition layout has a vfat/Efi partition and a luks encrypted lvm container holding SYS(Root), Data(home) and swap.

    I added uki and ukify use flags to installkernel and systemd, checked the configuration again and configured the kernel by emerge --config sys-kernel/ gentoo-kernel.

    Bulding the kernel image seems to work fine, the log messages say its creating a initrd using dracut, creating a efi file, signing it properly and even installs it under /boot/efi/EFI/Linux.

    When booting it, it loads the kernel and then seems to get stuck:

    Timed out waiting for device /dev/gpt-auto-root
    Dependency failed for File System Check in /dev/gpt-auto-root
    Dependency failed for Root Partition
    Dependency failed for Initrd Root File System
    Dependency failed for Initrd Mountpoints Configured in the Real Root
    Dependency failed for Initrd Root Device

    Then it ends up in an emergency shell.

    There's a log in /run/initramfs/rdsosreport.txt, which reveals that it does
    not find my encrypted lvm partition (LUKS encrypted lvm container holding SYS, DATA, SWAP, etc), which obviously needs to be setup first. Seems like some boot parameter is missing.

    Checking systemd's USE flags: Relevant flags lvm + cryptsetup + boot + secureboot use flags are set

    To me it looks like as if its missing information which partition to use for decrypting/mounting, and which lvm volume to use as real-root.

    Is this a dracut configuration? A systemd configruation? An installkernel configuration? Something else?

    Thanks
    Alex

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael@21:1/5 to All on Sun Jun 16 11:59:54 2024
    I'm not the right person to comment reliably on this, because I don't use systemd and do not use LVM, but until someone else chimes in I'll give it a go ... :-)

    On Sunday, 16 June 2024 09:04:26 BST Alexander Puchmayr wrote:
    Hi there,

    I just tried to prepare my new laptop for UFEI+secureboot by creating a single unified kernel image including kernel,initrd,microcode,etc.
    NB: The partition layout has a vfat/Efi partition and a luks encrypted lvm container holding SYS(Root), Data(home) and swap.

    I added uki and ukify use flags to installkernel and systemd, checked the configuration again and configured the kernel by emerge --config sys-kernel/ gentoo-kernel.

    Bulding the kernel image seems to work fine, the log messages say its creating a initrd using dracut, creating a efi file, signing it properly
    and even installs it under /boot/efi/EFI/Linux.

    Why is the ESP mounted under /boot/efi, instead of /efi?

    https://wiki.gentoo.org/wiki/EFI_System_Partition#Mount_point


    When booting it, it loads the kernel and then seems to get stuck:

    Timed out waiting for device /dev/gpt-auto-root
    Dependency failed for File System Check in /dev/gpt-auto-root
    Dependency failed for Root Partition
    Dependency failed for Initrd Root File System
    Dependency failed for Initrd Mountpoints Configured in the Real Root Dependency failed for Initrd Root Device

    The gpt-auto-root is a script which tries to automatically detect and mount
    the root fs. Did you create your partition(s) with GPT and did you select the correct partition type "Linux Root (x86-64)" to make sure the partition GUID code for LUKS is correct according to the Discoverable Partitions Specification? If you used fdisk, you'll probably need to add the partition type GUID code manually, as advised in the Handbook. Press -i in fdisk to
    find out what it currently is set as.


    Then it ends up in an emergency shell.

    There's a log in /run/initramfs/rdsosreport.txt, which reveals that it does not find my encrypted lvm partition (LUKS encrypted lvm container holding SYS, DATA, SWAP, etc), which obviously needs to be setup first. Seems like some boot parameter is missing.

    Did you configure dracut to include the necessary modules and to add the corresponding LUKS and LVM UUIDs?

    https://wiki.gentoo.org/wiki/ Full_Disk_Encryption_From_Scratch#Initramfs_configuration


    Checking systemd's USE flags: Relevant flags lvm + cryptsetup + boot + secureboot use flags are set

    To me it looks like as if its missing information which partition to use for decrypting/mounting, and which lvm volume to use as real-root.

    Is this a dracut configuration? A systemd configruation? An installkernel configuration? Something else?

    Thanks
    Alex

    I think this is a dracut configuration issue, because systemd's 'kernel- install' setup is relatively straight forward:

    https://wiki.gentoo.org/wiki/Installkernel#Systemd_kernel-install_.28USE.3D. 2Bsystemd.29

    If the problem is with dracut as I suspect, you may find 'sys-kernel/ugrd' easier than dracut for your type of installation, but dracut should work too
    if correctly configured.

    HTH.
    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEXqhvaVh2ERicA8Ceseqq9sKVZxkFAmZuxaoACgkQseqq9sKV Zxlu1A/+PD9kZi5J05+9uf5UncCHAF3cpqgIMPN0pcowKPFFaQZjNhrqX2ycc4Oq TRWVzZWbkvNHYipmxc3eFVIDZ3t8gf5jiLtrrgG2oC3hTzqd/eNU5fqAiH87lzDX S32I17A7v8nhm37dZDzk60RZw0DGBBUZMI9y5Z65L4MgPRpcDK54wHPoep4X7kXO iIv9+awoZqNLULoBO4eSQfC0PBGTObnYlOTpidtDZOHHOBvgbYmuqP43aw6tnp/0 MFLQhSrkb4r0JJqSNtZMcUbx5M/HGLSf0VsO/8zl6TXi9MyocF/IXP5ThFS4WRMg os+CNa2Bvg0sATrje9+0AIAzMyh7JP9d43iXp0LN7zh4eK4uuk388HVtem+pmdlR 2KuT87RkfqGhCF6PllG3poeSVAU9ss102nh+SD9g5GWM+ahaie4F4IG4A8CuPucb REXWHY9Jj8WCFg7brL134Y1kjEIR0xBevFxdKGH1uRnjytZoZuvF9bYm+IMF3cbS LOMpKXpYwWuEmBWgJ2+3XDYUqtjJiEeZNVJYqAnu+3+JAhAcBFcM0lVJExEwkuYg anmvUvtlXfyCtQeFZFBnqjQLB2w2aIOimGG+LrA8b19dn2IuJUpkdRZLvgcBkyvu 108BPaXByy1HCG3Ofzzn0rPU30/gwkC6H2YSgPz1C/kTcfvD03Y=
    =MwBp
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wols Lists@21:1/5 to Alan Mackenzie on Mon Jun 17 09:40:02 2024
    On 15/06/2024 21:10, Alan Mackenzie wrote:
    Why didn't you keep a copy of the old file?

    Because that's one of the itsy-bitsy routine things that ought to be automatic, not something that each user should have to think out for
    himself.

    Dunno which update tool it is, but istr there is a tool that does
    exactly that ...

    It turns out I actually had the old file in /etc/config-archive all
    along. It's a shame dispatch-conf and friends don't do an automatic
    3-way merge. This would make things so much simpler and less stressful.

    Until it gets it wrong. Which is exactly why it doesn't. Which is why
    git doesn't. Which is why pretty much all tools don't. If they can
    figure it out they do, but they don't charge ahead regardless.

    Cheers,
    Wol

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael@21:1/5 to All on Mon Dec 2 16:05:30 2024
    On Wednesday 29 November 2023 00:16:11 GMT you wrote:
    On Tuesday, 28 November 2023 15:49:10 GMT Daniel Frey wrote:
    On 11/28/23 03:38, Michael wrote:
    Over the last 8-9 months I noticed an old Lenovo G505s laptop is
    spending
    a
    long time in the POST process, before eventually the OEM logo shows up
    on
    the screen. Last time I timed it, it took 2.5-3.0 minutes. Normally it would only take ~20-30 seconds. Once the logo shows up the boot process proceeds without further delay.

    Initially, this delay to POST would happen randomly and rarely. Now it happens every time.

    Things I tried:

    1. Reflashing the UEFI firmware - it didn't work because it already has the
    latest firmware.

    2. Removing the main battery and holding down the power button for 15 seconds, hoping to reset the firmware.

    3. Leaving the PSU cable connected overnight.

    4. Testing the RAM and HDD.

    None of the above improved the situation, or indicated what might be wrong.

    I'll reseat the RAM sticks and the HDD next, in case a contact is oxidised,
    but what else could cause this noticeable delay to POST? A failing RTC CMOS battery?

    We have had a few of these at work and these symptoms were cured by a
    new CMOS battery. The voltage on the battery has likely dipped to
    2.9-3.0 volts; they get unreliable then (i.e. it's dead.) If you leave
    it long enough you'll start getting RTC errors on POST.

    I'd try that first, assuming you can still get the CMOS battery for these.

    Dan

    Thanks Dan, will do. I was planning to take it apart soon to replace the
    HDD with an SSD, so this would be the first thing to check. I expect
    finding a replacement unit will be difficult. Every Lenovo RTC battery
    seems to have a different part number.

    Some things are worth waiting for, others no so much. :-(

    So, this laptop was taking longer and longer and longer to boot, until it eventually stopped booting 3-4 months ago:

    When the power button is pressed the cooling fan spins for a second or two, then it stops. A few minutes later the CPU overheats and eventually it goes into a thermal shutdown. Using an external fan to push air through merely delays this process, but the laptop still does not boot. I am getting a black screen and no POST for many minutes until it cuts out.

    I tried to reset the MoBo BIOS by pressing the power button with no battery or mains connected. I also removed the newly replaced CMOS/RTC battery and pressed the power button, but the same failure mode remains after I
    reassembled everything.

    Do I have:

    1. Corrupted MoBo UEFI firmware?
    2. A dying/dead chipset?
    3. Something else?

    Is there anything else I could possibly try?
    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEXqhvaVh2ERicA8Ceseqq9sKVZxkFAmdN2soACgkQseqq9sKV ZxmspQ/8Ck6fKfHSZmHYEbWeqq9sAJz4MSuuCDMCLq/0kLfyICEpD542NXaIy+1h yYojWLSQspxIE2YW0PLWTsN3eWyohENQlx7L3bCAAlOUPJe4TZs+8BnImAiYf4VJ bY1GszXhmyhEAcGnZ6K7YdwGnwaZsJmoJoydANDMZGSESOMb/eu5SgC9+1GXiMHa CHH24dXBHXcp9SgVp2+0CjVCILxX80Gh6sEdRo/hWF2J67ZUgTaUq8GI/CRGVUQ2 5yl8NLvsk2GgFEHVPsoIsP++tgF42s7IVaCOpxyH28qyC5XKulOusTyj+R3aEXlH r41nmeJGluRcxXwEjAtvJRePdLz4mpwUB7KLvhlZKkFu71cJbA8ebUTHoEMgmg4g +L2YBnC5VKlDdQuhUrn3OK9q5hcuPxNjD2vIqbq08eGNLRUM5H1piue5jEMd87JA xv3VL8e3rpiZj1afLVr2dEMWNtUpWMNpCyHLgsfKUc/p92fmOq9Cbz0ftjagHXQn vpYgI6lJq2Log2RMJoiPcwvhTcWRuKZFEOZojjCyddKvAStDojl6H3YFhfz+mq4P NGrKeS5CoeA44dGCFREl6SKaA06YFvF2/qiLAl+wJ5D4VK8lGqvLl3Yx0tnkN7iG VJO/QPF+CkYBAocXaRtm1L4hM42C/qbW36uVoFfkxJFyAJ+3m74=
    =pPcn
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mad.scientist.at.large@tutanota.com@21:1/5 to All on Tue Dec 3 01:40:01 2024
    usually in laptops which become dirty (they all do, and it's very difficult to take them apart and clean and reassemble) the hard drive is the first thing to fail completely.  A dying hard drive can easily slow/halt boot. 

    CPU/GPU are likely next along with the power supplies for those chips.  All you can really do is take it apart, clean it, and if possible test the hard drive etc. in a known good machine.  Alternately, try a different hard drive (either a replacement
    or a small, cheap used one for testing).  Check that all the fans still spin freely after cleaning them, if possible test them on a power supply (note that some will be 5V and some 12V, be careful to read the labels or start low). 

    The CPU/GPU may or may not have become marginal, same goes for the memory and all the other chips and other temperature sensitive parts.  Really, it's essential to fix a laptop soon after it starts acting up, those higher temperatures age everything
    rapidly and make all the parts more likely to fail.  You may or may not be able to get that laptop working again.

    If you can get it basically working test the hell out of everything with utilities like stress so it doesn't fool you and die hard soon.

    I hate working on laptops and AIO desktops, always hard to take them apart and put them back together and they both need regular cleaning, before they act up (or at least immediately when they start acting up).  Because of the dust I clean my desktops
    at least once a year, also a pain but much easier than a laptop or AIO.  This keeps them from wearing out as quickly and as some one on a small fixed income that's very important to me.

    Depending on your' situation and what your' time is worth replacement might be the way to go, though you still probably want to recover what you can from the drive.



    --"Fascism begins the moment a ruling class, fearing the people may use their political democracy to gain economic democracy, begins to destroy political democracy in order to retain its power of exploitation and special privilege." Tommy Douglas




    Dec 2, 2024, 09:06 by confabulate@kintzios.com:

    On Wednesday 29 November 2023 00:16:11 GMT you wrote:

    On Tuesday, 28 November 2023 15:49:10 GMT Daniel Frey wrote:
    On 11/28/23 03:38, Michael wrote:
    Over the last 8-9 months I noticed an old Lenovo G505s laptop is
    spending
    a
    long time in the POST process, before eventually the OEM logo shows up >> > > on
    the screen. Last time I timed it, it took 2.5-3.0 minutes. Normally it >> > > would only take ~20-30 seconds. Once the logo shows up the boot process >> > > proceeds without further delay.

    Initially, this delay to POST would happen randomly and rarely. Now it >> > > happens every time.

    Things I tried:

    1. Reflashing the UEFI firmware - it didn't work because it already has >> > > the
    latest firmware.

    2. Removing the main battery and holding down the power button for 15
    seconds, hoping to reset the firmware.

    3. Leaving the PSU cable connected overnight.

    4. Testing the RAM and HDD.

    None of the above improved the situation, or indicated what might be
    wrong.

    I'll reseat the RAM sticks and the HDD next, in case a contact is
    oxidised,
    but what else could cause this noticeable delay to POST? A failing RTC >> > > CMOS battery?

    We have had a few of these at work and these symptoms were cured by a
    new CMOS battery. The voltage on the battery has likely dipped to
    2.9-3.0 volts; they get unreliable then (i.e. it's dead.) If you leave
    it long enough you'll start getting RTC errors on POST.

    I'd try that first, assuming you can still get the CMOS battery for these. >> >
    Dan

    Thanks Dan, will do. I was planning to take it apart soon to replace the
    HDD with an SSD, so this would be the first thing to check. I expect
    finding a replacement unit will be difficult. Every Lenovo RTC battery
    seems to have a different part number.


    Some things are worth waiting for, others no so much. :-(

    So, this laptop was taking longer and longer and longer to boot, until it eventually stopped booting 3-4 months ago:

    When the power button is pressed the cooling fan spins for a second or two, then it stops. A few minutes later the CPU overheats and eventually it goes into a thermal shutdown. Using an external fan to push air through merely delays this process, but the laptop still does not boot. I am getting a black
    screen and no POST for many minutes until it cuts out.

    I tried to reset the MoBo BIOS by pressing the power button with no battery or
    mains connected. I also removed the newly replaced CMOS/RTC battery and pressed the power button, but the same failure mode remains after I reassembled everything.

    Do I have:

    1. Corrupted MoBo UEFI firmware?
    2. A dying/dead chipset?
    3. Something else?

    Is there anything else I could possibly try?


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael@21:1/5 to All on Tue Dec 3 10:56:44 2024
    On Tuesday 3 December 2024 00:34:13 GMT mad.scientist.at.large@tutanota.com wrote:
    usually in laptops which become dirty (they all do, and it's very difficult to take them apart and clean and reassemble) the hard drive is the first thing to fail completely. A dying hard drive can easily slow/halt boot.

    The hard drive is good.


    CPU/GPU are likely next along with the power supplies for those chips. All you can really do is take it apart, clean it, and if possible test the hard drive etc. in a known good machine. Alternately, try a different hard
    drive (either a replacement or a small, cheap used one for testing).

    The laptop has been used on a cooling pad away from dusty surfaces and cleaned regularly. When I took it apart there was just a little dust on the fan and exhaust, because it hadn't been cleaned for ~6 months or so.

    The drive is in good health according to smartctl and fully accessible over a USB cradle.


    Check
    that all the fans still spin freely after cleaning them, if possible test them on a power supply (note that some will be 5V and some 12V, be careful
    to read the labels or start low).

    The fan spins freely when rotated by hand. It also spins initially when the power button is pressed. However, it stops within a couple of seconds and
    does not start again.

    This makes me think ... normally when the fan starts spinning at boot it soon climbs up to maximum speed while the BIOS runs through POST. Then it slows down as the BIOS hands over to the OS. Its connected to the MoBo via 4 wires which control its PWM. The wires are in a good condition, but I have no way
    to check what its miniature socket voltage could be while I power it up starts without some spare fan or plug. Anyway, I don't know if some pull-up resistor in the fan control circuit is damaged, but even if it failed wouldn't the fan continue spinning but at a low speed? This one stops dead. Hence it makes me think of a corrupt/damaged MoBo chipset or firmware. :-/


    I hate working on laptops and AIO desktops, always hard to take them apart and put them back together and they both need regular cleaning, before they act up (or at least immediately when they start acting up).

    Yes, I *really* don't like laptops for a number of reasons, no matter the convenience they offer. The inherent difficulty in cleaning/upgrade/repairs, added to their relatively small screen size, makes me wanting to avoid them.


    Because of the
    dust I clean my desktops at least once a year, also a pain but much easier than a laptop or AIO. This keeps them from wearing out as quickly and as some one on a small fixed income that's very important to me.

    The design compromises on a laptop compared to a desktop are many. I don't think I ever had a laptop lasting for more than 6 years of continuous usage without things breaking, no matter how careful I was with it. By the time its battery needs replacement something around the corner is usually about to fail on me.

    I recall seeing the same symptom on an HP laptop some years ago now. I replaced its fan, but it continued to fail to boot in the same manner. I
    never got to the bottom of it at the time.


    Dec 2, 2024, 09:06 by confabulate@kintzios.com:
    On Wednesday 29 November 2023 00:16:11 GMT you wrote:
    On Tuesday, 28 November 2023 15:49:10 GMT Daniel Frey wrote:
    On 11/28/23 03:38, Michael wrote:
    Over the last 8-9 months I noticed an old Lenovo G505s laptop is
    spending
    a
    long time in the POST process, before eventually the OEM logo shows
    up
    on
    the screen. Last time I timed it, it took 2.5-3.0 minutes. Normally >> > > it
    would only take ~20-30 seconds. Once the logo shows up the boot
    process
    proceeds without further delay.

    Initially, this delay to POST would happen randomly and rarely. Now >> > > it
    happens every time.

    Things I tried:

    1. Reflashing the UEFI firmware - it didn't work because it already
    has
    the
    latest firmware.

    2. Removing the main battery and holding down the power button for 15 >> > > seconds, hoping to reset the firmware.

    3. Leaving the PSU cable connected overnight.

    4. Testing the RAM and HDD.

    None of the above improved the situation, or indicated what might be >> > > wrong.

    I'll reseat the RAM sticks and the HDD next, in case a contact is
    oxidised,
    but what else could cause this noticeable delay to POST? A failing
    RTC
    CMOS battery?

    We have had a few of these at work and these symptoms were cured by a
    new CMOS battery. The voltage on the battery has likely dipped to
    2.9-3.0 volts; they get unreliable then (i.e. it's dead.) If you leave >> > it long enough you'll start getting RTC errors on POST.

    I'd try that first, assuming you can still get the CMOS battery for
    these.

    Dan

    Thanks Dan, will do. I was planning to take it apart soon to replace the >> HDD with an SSD, so this would be the first thing to check. I expect
    finding a replacement unit will be difficult. Every Lenovo RTC battery
    seems to have a different part number.

    Some things are worth waiting for, others no so much. :-(

    So, this laptop was taking longer and longer and longer to boot, until it eventually stopped booting 3-4 months ago:

    When the power button is pressed the cooling fan spins for a second or
    two,
    then it stops. A few minutes later the CPU overheats and eventually it goes into a thermal shutdown. Using an external fan to push air through merely delays this process, but the laptop still does not boot. I am getting a black screen and no POST for many minutes until it cuts out.

    I tried to reset the MoBo BIOS by pressing the power button with no
    battery or mains connected. I also removed the newly replaced CMOS/RTC battery and pressed the power button, but the same failure mode remains after I reassembled everything.

    Do I have:

    1. Corrupted MoBo UEFI firmware?
    2. A dying/dead chipset?
    3. Something else?

    Is there anything else I could possibly try?


    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEXqhvaVh2ERicA8Ceseqq9sKVZxkFAmdO4+0ACgkQseqq9sKV ZxlZmBAAkyv3C6c752CWXOH429TRNbJm1Vz9dbjGa916Hix1cEO3OWYoOG06uwR/ 58vtp1IwK4aEUM7Ptaz2Q4A+MK2s+pm4R7Q5JctqS9ilL4peWRIMyKFRncADuGG7 HwFpIzPbDv/CCM4S5TPk8svJfHjxo/i0ybDMkKk3OdfkU4mXQNsUelB6zdKMiAH7 iHf+HOl2NpHcmJ6Js4uQCLuXDXd+8FzbLTcCNR4SLALpEIMFMhjuP3rCoh82giWk Fbl84qu53Tc90ijMVzwowst7MAcxRdG8O+Ww+R2bMVeCvXk5uP+P1GhFboR9c9k/ FeHYw4L07FCLZviQJYSJZEvDj/bnHUdTfLsSitPPFNztM49VGp8Xk6QNI6d0wrUE 614qM7+sSYlmT0hXofrZjaORvWiO6kHVbqsMt7A/f/5gxr5iyKXVDMAVRc1/IP9Q ezvzrJL6gNTdEBwFAiqZEBXcnO7ILjahTHPDXXrJXXZI1Xs5Tm9tQB31GuPl/n2E 8ysM104oBE7dDJ6BSBoWbxaLHYvoNyTSpLEC/PfKmxM7J6wdQcH159ICMfgkpd/P f+qahRJv6mIUB4EU/3mI5u678B/5IguYsNSY6W1AWKj943/UKntvQCbCzOGYID9D ViVUD/RFoCmrvKmQTMMiNj/+2A0nUBdr4FIkOhbAnK1GXmCpDj4=
    =wyG3
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)