• Re: ext4 FS Crash

    From Jochen Spieker@21:1/5 to All on Wed Dec 4 13:50:01 2024
    Daniel Harris:

    Not sure if I can attach a picture of the error messages but some of the errors are
    ext4_find_entry
    ext4_journal_check_start
    ext4_setattr
    mounting filesystem read-only

    Please do not send any attachments here. I suggest you just copy and
    paste the log messages from journalctl or wherever you can find them. It
    is really hard to say anything specific without those messages.

    I have done a file system check and everything seems ok and the drive
    doesn't have any error messages and journalctl doesn't seem to log any
    errors (not that i can find anyway).

    Have you tried 'journalctl --dmesg'?

    ps my system is a desktop system not a laptop if that makes a difference.

    I do not think this matters here.

    J.
    --
    I wish I had been aware enough to enjoy my time as a toddler.
    [Agree] [Disagree]
    <http://archive.slowlydownward.com/NODATA/data_enter2.html>

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEERCNn0ngYrOUG3zZFU4ruOUNvhZcFAmdQTd4ACgkQU4ruOUNv hZcKhQ//VFfxlV2WHf/F4pHGYirZgHQBIfNRjlM1vU2QP2MjCj3zonnwLwepBiSY IdeXQ+DbYchrNJs/NcU26++gXQBf7Y/DJgkGH50syIA4R8feQGVxOSFWwRf8V/HY gJg8hu6pYTXIqOAj4eMa6dc51iweeP9NW3NOsK8RFWnXgj4ZRYImAXWhxvsvOZ76 KXmJYpV5GQ6xDBvCiFHLIiasHRfGq0MzJL673ddUfCbiLev5y5yc+TZs+GJdFOV6 uX/QwmpKASgEDpqQJd2eddybCMNFi2Fq712+k6q0MXZ+Iv0UFmkMGW1SKKqtQfqY wYJ353r0Dw+gz+JGrvBSXLdWJnzyiRNMgpY7HnCuI48V09C8uph5X1n8lwPZ4w2W QAVKOF/Q15PoyZV5AwnUrbtiYsDigwGyCPb+Fr3hwr4/VCmM0HaJIfsCb6dcM8bL 4dHT1vf89qg2ucb4Trl73HgcrOy4KNFO8dn43gEtTXnmMj1NJ7waFs07z6AWndqT Jc1+E1gH2Usr7mNE8BdpTNbamuAbBtPoTpJoTrBYglkPoeNUI27y4NIxz7/I4ObU ERHOMe9lhHxkAEVxOALaJhujtgtibCCiHjrKxIi9XFdRZ5yo3koyjWaiNLLiVYhg Bvf7hXjiCIrD4RCjZf3GKEnHGMZ1DE0x6mv9L/4qeRL7Vv8qhQM=
    =2w0z
    -----END PGP SIGNATURE-----

    ---
  • From Daniel Harris@21:1/5 to All on Wed Dec 4 13:30:01 2024
    Hello

    I have been using the stable branch but recently it has not been so
    stable. I have experienced some unexpected behavior Not sure if its
    related to this ubuntu bug ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816 )

    Seeing the similarity especially that we are both using similar drives
    (mine i( Samsung SSD 980 PRO 1TB) makes me think it might be a hardware instead of a software issue.

    Anyway its interesting that it has only started recently. Different kernel possibly?

    Not sure if I can attach a picture of the error messages but some of the
    errors are
    ext4_find_entry
    ext4_journal_check_start
    ext4_setattr
    mounting filesystem read-only

    I have done a file system check and everything seems ok and the drive
    doesn't have any error messages and journalctl doesn't seem to log any
    errors (not that i can find anyway).

    I will have to disable the apst but i just thought this bug might have been fixed from 2018 but maybe not.

    ps my system is a desktop system not a laptop if that makes a difference.

    Thanks Dan

    <div dir="ltr"><div>Hello</div><div><br></div><div>I have been using the stable branch but recently it has not been so stable.  I have experienced some unexpected behavior Not sure if its related to this ubuntu bug ( <a href="https://bugs.launchpad.net/
    ubuntu/+source/linux/+bug/1805816">https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816</a> )</div><div><br></div><div>Seeing the similarity especially that we are both using similar drives (mine i( Samsung SSD 980 PRO 1TB)  makes me think it
    might be a hardware instead of a software issue.</div><div><br></div><div>Anyway its interesting that it has only started recently.  Different kernel possibly?</div><div><br></div><div>Not sure if I can attach a picture of the error messages but some of
    the errors are <br></div><div>ext4_find_entry</div><div>ext4_journal_check_start</div><div>ext4_setattr</div><div>mounting filesystem read-only</div><div><br></div><div>I have done a file system check and everything seems ok and the drive doesn&#39;t
    have any error messages and journalctl doesn&#39;t seem to log any errors (not that i can find anyway).</div><div><br></div><div>I will have to disable the apst but i just thought this bug might have been fixed from 2018 but maybe not.</div><div><br></
    <div>ps my system is a desktop system not a laptop if that makes a difference.</div><div><br></div><div>Thanks Dan<br></div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eike Lantzsch ZP5CGE / KY4PZ@21:1/5 to All on Wed Dec 4 15:40:02 2024
    On Wednesday, 4 December 2024 09:29:17 -03 Daniel Harris wrote:
    Hello

    I have been using the stable branch but recently it has not been so
    stable. I have experienced some unexpected behavior Not sure if its
    related to this ubuntu bug ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816 )

    Seeing the similarity especially that we are both using similar drives
    (mine i( Samsung SSD 980 PRO 1TB) makes me think it might be a
    hardware instead of a software issue.

    Anyway its interesting that it has only started recently. Different
    kernel possibly?

    Not sure if I can attach a picture of the error messages but some of
    the errors are
    ext4_find_entry
    ext4_journal_check_start
    ext4_setattr
    mounting filesystem read-only

    I have done a file system check and everything seems ok and the drive
    doesn't have any error messages and journalctl doesn't seem to log any
    errors (not that i can find anyway).

    I will have to disable the apst but i just thought this bug might have
    been fixed from 2018 but maybe not.
    Kernel 4.19 I'd be very surprised if that would be related.

    ps my system is a desktop system not a laptop if that makes a
    difference.

    Thanks Dan

    Hello Dan,

    I am using the exact same drive Samsung SSD 980 PRO 1TB in my desktop
    and in my laptop.
    Desktop with Debian Sid
    Laptop with Ubuntu 24 LTS (TUXEDO)
    Neither one nor the other has experienced ext4 FS crashes

    I'd save my data a.s.a.p. and install a new NVME drive if I were in your
    shoes.

    have a nice day
    --
    Eike Lantzsch KY4PZ / ZP5CGE

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Klaus Singvogel@21:1/5 to Daniel Harris on Wed Dec 4 17:50:02 2024
    Daniel Harris wrote:
    Seeing the similarity especially that we are both using similar drives
    (mine i( Samsung SSD 980 PRO 1TB) makes me think it might be a hardware instead of a software issue.

    The referenced Samsung SSD 980 PRO has a critical firmware bug.

    This bug hit me too. I had to replace my broken SSD by a new one, via the Samsung support.
    I use a full-disk encryption, and therefore had a data loss - up to my last backup (5 days ago).

    Some more details here: https://www.pugetsystems.com/support/guides/critical-samsung-ssd-firmware-update/


    Best regards,
    Klaus.
    --
    Klaus Singvogel
    GnuPG-Key-ID: 1024R/5068792D 1994-06-27

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daniel Harris@21:1/5 to All on Wed Dec 4 18:20:01 2024
    Thanks for all your replies.
    As far as I can tell there are no errors reported using fsck or smartctl or nvme
    and the firmware is the correct and newest version so no problems there.

    The following are the messages that appear but only taken from my phone and copied from the photo (lots of scrolling errors repeating over).
    I thought these new drives were supposed to last longer than older moving
    HDD but obviously not

    I guess its time to buy a new drive : (

    EXT4-fs (nvme0n1p2): Remounting filesystem read-onty
    EXT4-fs error (device nvme01p2): ext4_journal_check_start:83: comm systemd-journal: Detected aborted Jounal
    [633582.907324] EXT4-fs error (device nvme0n1p2): _ext4_find_entry:1683:
    inode #5898248: comm ntpd: reading directory iblock 0
    [633582 908250] EXT4-fs (nvme0n1p2) : Remounting filesystem read-only
    EXT4-fs (nvme0n1p2): Remounting filesystem read-only


    EXT4-fs error (device nvme0n1p2) : ext4_journal.check_start:83: comm systemd-journal: Detected aborted journal
    [633582.912099] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
    EXT4-fs error (device nvme0n1p2): ext4_journal-check_start:83: comm systemd-journal: Detected aborted journal
    [633582.916126] EXT4-fs (nvme0n1p2): Remounting filesystem read-only

    (633583.797550) EXT4-fs error (device nvme0n1p2) in ext4_setattr:5628: —— Journal has aborted
    (633583. 798466) EXT4-fs (nvme0n1p2): Remounting filesystem read-only
    EXT4-fs error (device nvme01p2): ext4_journal_check_start:83: comm systemd-journal: Detected aborted Jounal
    EXT4-fs (nvme@nip2): Remounting filesystem read-only


    EXT4-fs error (device nvme0n1p2): _ext4_find_entry:1683: inode #1966081:
    comm cron: reading directory Iblock 0
    EXT4-fs (nvme@nip2): Remounting filesystem read-only
    EXT4-fs error (device nvme0n1p2) in ext4_setattr:5628: —— Journal has aborted
    EXT4-fs (nvme@nip2): Remounting filesystem read-only



    On Wed, Dec 4, 2024 at 4:39 PM Klaus Singvogel <deb-user-ml@singvogel.net> wrote:

    Daniel Harris wrote:
    Seeing the similarity especially that we are both using similar drives (mine i( Samsung SSD 980 PRO 1TB) makes me think it might be a hardware instead of a software issue.

    The referenced Samsung SSD 980 PRO has a critical firmware bug.

    This bug hit me too. I had to replace my broken SSD by a new one, via the Samsung support.
    I use a full-disk encryption, and therefore had a data loss - up to my
    last backup (5 days ago).

    Some more details here:

    https://www.pugetsystems.com/support/guides/critical-samsung-ssd-firmware-update/


    Best regards,
    Klaus.
    --
    Klaus Singvogel
    GnuPG-Key-ID: 1024R/5068792D 1994-06-27


    <div dir="ltr"><div>Thanks for all your replies.</div><div>As far as I can tell there are no errors reported using fsck or smartctl or nvme<br></div><div> and the firmware is the correct and newest version so no problems there.</div><div><br></div><div>
    The following are the messages that appear but only taken from my phone and copied from the photo (lots of scrolling errors repeating over).</div><div>I thought these new drives were supposed to last longer than older moving HDD but obviously not <br></
    <div><br></div><div>I guess its time to buy a new drive : (<br></div><div><br></div><div>EXT4-fs  (nvme0n1p2): Remounting filesystem read-onty<br>EXT4-fs  error (device nvme01p2): ext4_journal_check_start:83: comm systemd-journal: Detected aborted
    Jounal<br>[633582.907324] EXT4-fs error (device nvme0n1p2): _ext4_find_entry:1683: inode #5898248: comm ntpd: reading directory iblock 0<br>[633582 908250] EXT4-fs (nvme0n1p2) : Remounting filesystem read-only<br>EXT4-fs (nvme0n1p2): Remounting
    filesystem read-only<br><br><br>EXT4-fs error (device nvme0n1p2) : ext4_journal.check_start:83: comm systemd-journal: Detected aborted journal<br>[633582.912099] EXT4-fs (nvme0n1p2): Remounting filesystem read-only<br>EXT4-fs error (device nvme0n1p2):
    ext4_journal-check_start:83: comm systemd-journal: Detected aborted journal<br>[633582.916126] EXT4-fs (nvme0n1p2): Remounting filesystem read-only<br><br>(633583.797550) EXT4-fs error (device nvme0n1p2) in ext4_setattr:5628: —— Journal has aborted<
    (633583. 798466) EXT4-fs (nvme0n1p2): Remounting filesystem read-only<br>EXT4-fs  error (device nvme01p2): ext4_journal_check_start:83: comm systemd-journal: Detected aborted Jounal<br>EXT4-fs (nvme@nip2): Remounting filesystem read-only<br><br><br>
    EXT4-fs error (device nvme0n1p2): _ext4_find_entry:1683: inode #1966081: comm cron: reading directory Iblock 0<br>EXT4-fs (nvme@nip2): Remounting filesystem read-only<br>EXT4-fs error (device nvme0n1p2) in ext4_setattr:5628: —— Journal has aborted<br>
    EXT4-fs (nvme@nip2): Remounting filesystem read-only<br><br><br></div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Wed, Dec 4, 2024 at 4:39 PM Klaus Singvogel &lt;<a href="mailto:deb-user-ml@singvogel.
    net">deb-user-ml@singvogel.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Daniel Harris wrote:<br>
    &gt; Seeing the similarity especially that we are both using similar drives<br> &gt; (mine i( Samsung SSD 980 PRO 1TB)  makes me think it might be a hardware<br>
    &gt; instead of a software issue.<br>

    The referenced Samsung SSD 980 PRO has a critical firmware bug.<br>

    This bug hit me too. I had to replace my broken SSD by a new one, via the Samsung support.<br>
    I use a full-disk encryption, and therefore had a data loss - up to my last backup (5 days ago).<br>

    Some more details here:<br>
    <a href="https://www.pugetsystems.com/support/guides/critical-samsung-ssd-firmware-update/" rel="noreferrer" target="_blank">https://www.pugetsystems.com/support/guides/critical-samsung-ssd-firmware-update/</a><br>


    Best regards,<br>
            Klaus.<br>
    -- <br>
    Klaus Singvogel<br>
    GnuPG-Key-ID: 1024R/5068792D  1994-06-27<br>
    </blockquote></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrew M.A. Cater@21:1/5 to All on Wed Dec 4 19:20:01 2024
    On Wed, Dec 04, 2024 at 11:34:41AM -0300, Eike Lantzsch ZP5CGE / KY4PZ wrote:
    On Wednesday, 4 December 2024 09:29:17 -03 Daniel Harris wrote:
    Hello

    I have been using the stable branch but recently it has not been so
    stable. I have experienced some unexpected behavior Not sure if its related to this ubuntu bug ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816 )

    I have done a file system check and everything seems ok and the drive doesn't have any error messages and journalctl doesn't seem to log any errors (not that i can find anyway).

    I will have to disable the apst but i just thought this bug might have
    been fixed from 2018 but maybe not.
    Kernel 4.19 I'd be very surprised if that would be related.

    Agreed - that's the Buster (Debian 10) kernel not the current kernel.


    Thanks Dan

    I'd save my data a.s.a.p. and install a new NVME drive if I were in your shoes.

    have a nice day
    --
    Eike Lantzsch KY4PZ / ZP5CGE


    All best, as ever,

    Andy Cater
    (amacater@debian.org)



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to Daniel Harris on Thu Dec 5 00:50:02 2024
    On Wed, Dec 04, 2024 at 05:11:47PM +0000, Daniel Harris wrote:
    Thanks for all your replies.
    As far as I can tell there are no errors reported using fsck or smartctl or >nvme
    and the firmware is the correct and newest version so no problems there.

    The following are the messages that appear but only taken from my phone and >copied from the photo (lots of scrolling errors repeating over).
    I thought these new drives were supposed to last longer than older moving HDD >but obviously not

    Is this during boot? The messages indicate a corrupted journal, which
    generally means a device error, or maybe a device which lost power while writing. It should be possible to mount read-only without replaying the
    journal for recovery purposes, but it's basically unfixable.

    I guess its time to buy a new drive : (

    Did you try "nvme smart-log /dev/nvme0" to look for issues?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Klaus Singvogel@21:1/5 to Jeffrey Walton on Thu Dec 5 08:30:01 2024
    Jeffrey Walton wrote:
    On Wed, Dec 4, 2024 at 2:47 PM Klaus Singvogel wrote:

    Some more details here: https://www.pugetsystems.com/support/guides/critical-samsung-ssd-firmware-update/

    That's interesting (in a morbid sort of way).

    Do you know if fwupdmgr will detect out-of-date firmware on the drives?

    I'm sorry, but I don't know. I only became aware of fwupdmgr afterwards.

    At least the replacement Samsung SSD was detected by fwupdmgr on my last run.

    Best regards,
    Klaus.
    --
    Klaus Singvogel
    GnuPG-Key-ID: 1024R/5068792D 1994-06-27

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?J=C3=B6rg-Volker_Peetz?=@21:1/5 to All on Thu Dec 5 10:30:01 2024
    There are two things which could be tried with the SSD

    1. SSD's have some self healing capacities (discarding defect sectors) which are
    performed when the drive is not mounted. Therefore, enter the BIOS of the computer and let it running for ca. an hour. Then restart the computer.

    2. After making a backup, do a "secure erase" of the SSD. Of course that needs reformatting the drive and rebuilding the system. I was able to revive a Samsung
    960 Pro this way.

    Regards,
    Jörg.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gene heskett@21:1/5 to Klaus Singvogel on Thu Dec 5 11:10:01 2024
    On 12/5/24 02:23, Klaus Singvogel wrote:
    Jeffrey Walton wrote:
    On Wed, Dec 4, 2024 at 2:47 PM Klaus Singvogel wrote:

    Some more details here:
    https://www.pugetsystems.com/support/guides/critical-samsung-ssd-firmware-update/

    That's interesting (in a morbid sort of way).

    Do you know if fwupdmgr will detect out-of-date firmware on the drives?

    I'm sorry, but I don't know. I only became aware of fwupdmgr afterwards.

    At least the replacement Samsung SSD was detected by fwupdmgr on my last run.

    Best regards,
    Klaus.
    interesting comments here. I've been using SSD's since 40G was the
    biggest. The 256G spinning rust, now 15 years old is the only spinning
    rust left here. And I've drawer full of samsung 860-870 series drives
    that have all gone wonky but not RO yet.. I now have a mixture of stuff
    from Taiwan in 2T and 4T sizes, all healthy. I guess I was not the only
    one that got questionable drives from Samsung. This is the first time
    I've seen them discussed in this context. Thank you for saying something
    out loud.

    Cheers, Gene Heskett, CET.
    --
    "There are four boxes to be used in defense of liberty:
    soap, ballot, jury, and ammo. Please use in that order."
    -Ed Howdershelt (Author, 1940)
    If we desire respect for the law, we must first make the law respectable.
    - Louis D. Brandeis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daniel Harris@21:1/5 to mstone@debian.org on Thu Dec 5 12:00:01 2024
    On Wed, Dec 4, 2024 at 11:43 PM Michael Stone <mstone@debian.org> wrote:

    On Wed, Dec 04, 2024 at 05:11:47PM +0000, Daniel Harris wrote:
    Thanks for all your replies.
    As far as I can tell there are no errors reported using fsck or smartctl
    or
    nvme
    and the firmware is the correct and newest version so no problems there.

    The following are the messages that appear but only taken from my phone
    and
    copied from the photo (lots of scrolling errors repeating over).
    I thought these new drives were supposed to last longer than older moving HDD
    but obviously not

    Is this during boot? The messages indicate a corrupted journal, which generally means a device error, or maybe a device which lost power while writing. It should be possible to mount read-only without replaying the journal for recovery purposes, but it's basically unfixable.


    So its not actually a crash. On the 2 occasions it has happened, I have
    been away from my computer for a while, and when I return and move the
    mouse, I can see messages scrolling on a black screen (no X running). I
    can move to a new vt but I cannot log in. When I try to log in I just get
    the errors repeating on the screen. After I do a hard reset everything
    works perfectly. No errors anywhere.


    I guess its time to buy a new drive : (

    Did you try "nvme smart-log /dev/nvme0" to look for issues?


    seems normal to me

    Smart Log for NVME device:nvme0 namespace-id:ffffffff
    critical_warning : 0
    temperature : 31°C (304 Kelvin)
    available_spare : 100%
    available_spare_threshold : 10%
    percentage_used : 0%
    endurance group critical warning summary: 0
    Data Units Read : 807,634 (413.51 GB)
    Data Units Written : 5,680,746 (2.91 TB) host_read_commands : 6,573,734
    host_write_commands : 75,990,191
    controller_busy_time : 1,145
    power_cycles : 618
    power_on_hours : 197
    unsafe_shutdowns : 21
    media_errors : 0
    num_err_log_entries : 0
    Warning Temperature Time : 0
    Critical Composite Temperature Time : 0
    Temperature Sensor 1 : 31°C (304 Kelvin)
    Temperature Sensor 2 : 38°C (311 Kelvin)
    Thermal Management T1 Trans Count : 0
    Thermal Management T2 Trans Count : 0
    Thermal Management T1 Total Time : 0
    Thermal Management T2 Total Time : 0


    Thanks Dan

    <div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Wed, Dec 4, 2024 at 11:43 PM Michael Stone &lt;<a href="mailto:mstone@debian.org">mstone@
    debian.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, Dec 04, 2024 at 05:11:47PM +0000, Daniel Harris wrote:<br>
    &gt;Thanks for all your replies.<br>
    &gt;As far as I can tell there are no errors reported using fsck or smartctl or<br>
    &gt;nvme<br>
    &gt; and the firmware is the correct and newest version so no problems there.<br>
    &gt;<br>
    &gt;The following are the messages that appear but only taken from my phone and<br>
    &gt;copied from the photo (lots of scrolling errors repeating over).<br>
    &gt;I thought these new drives were supposed to last longer than older moving HDD<br>
    &gt;but obviously not<br>

    Is this during boot? The messages indicate a corrupted journal, which <br> generally means a device error, or maybe a device which lost power while <br> writing. It should be possible to mount read-only without replaying the <br> journal for recovery purposes, but it&#39;s basically unfixable.<br></blockquote><div><br></div><div>So its not actually a crash.  On the 2 occasions it has happened, I have been away from my computer for a while, and when I return and move the mouse, I
    can see messages scrolling on a black screen (no X running).  I can move to a new vt but I cannot log in.  When I try to log in I just get the errors repeating on the screen.  After I do a hard reset everything works perfectly. No errors anywhere.</
    <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
    &gt;I guess its time to buy a new drive : (<br>

    Did you try &quot;nvme smart-log /dev/nvme0&quot; to look for issues?<br> <br></blockquote><div><br></div><div>seems normal to me<br></div><div><br></div><div> Smart Log for NVME device:nvme0 namespace-id:ffffffff<br>critical_warning                        : 0<br>temperature                           Â
      : 31°C (304 Kelvin)<br>available_spare                         : 100%<br>available_spare_threshold               : 10%<br>percentage_used                         : 0%<br>endurance group critical warning summary: 0<br>
    Data Units Read                         : 807,634 (413.51 GB)<br>Data Units Written                      : 5,680,746 (2.91 TB)<br>host_read_commands                      : 6,573,734<br>host_write_commands           Â
              : 75,990,191<br>controller_busy_time                    : 1,145<br>power_cycles                            : 618<br>power_on_hours                          : 197<br>unsafe_shutdowns               Â
             : 21<br>media_errors                            : 0<br>num_err_log_entries                     : 0<br>Warning Temperature Time                : 0<br>Critical Composite Temperature Time     : 0<br>Temperature
    Sensor 1           : 31°C (304 Kelvin)<br>Temperature Sensor 2           : 38°C (311 Kelvin)<br>Thermal Management T1 Trans Count       : 0<br>Thermal Management T2 Trans Count       : 0<br>Thermal Management T1 Total Time        :
    0<br>Thermal Management T2 Total Time        : 0</div><div><br></div><div><br></div><div>Thanks Dan<br></div></div></div></div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Klaus Singvogel@21:1/5 to gene heskett on Thu Dec 5 13:10:01 2024
    Hi Gene,

    gene heskett wrote:
    interesting comments here. I've been using SSD's since 40G was the biggest. The 256G spinning rust, now 15 years old is the only spinning rust left
    here. And I've drawer full of samsung 860-870 series drives that have all gone wonky but not RO yet.. I now have a mixture of stuff from Taiwan in 2T and 4T sizes, all healthy. I guess I was not the only one that got questionable drives from Samsung. This is the first time I've seen them discussed in this context. Thank you for saying something out loud.

    To point this out: it was only exactly this model from Samsung: SSD 980 PRO, which isn't working properly.
    Repeat: only Samsung SSD 980 PRO.

    It can be fixed by a Firmware upgrade, and more recently charges of Samsung SSD 980 PRO are flashed/sold with a good Firmware out-of-the-box.

    Best regards,
    Klaus.
    --
    Klaus Singvogel
    GnuPG-Key-ID: 1024R/5068792D 1994-06-27

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to Daniel Harris on Thu Dec 5 13:40:01 2024
    On Thu, Dec 05, 2024 at 10:53:54AM +0000, Daniel Harris wrote:
    So its not actually a crash.  On the 2 occasions it has happened, I have been >away from my computer for a while, and when I return and move the mouse, I can >see messages scrolling on a black screen (no X running).  I can move to a new >vt but I cannot log in.  When I try to log in I just get the errors repeating >on the screen.  After I do a hard reset everything works perfectly. No errors >anywhere.

    Have you tried a memory test? Those symptoms and the smart output make
    me think the problem is in hardware other than the drive itself. Memory
    is the easiest to check and the easiest to remedy.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gene heskett@21:1/5 to Klaus Singvogel on Thu Dec 5 13:40:01 2024
    On 12/5/24 06:59, Klaus Singvogel wrote:
    Hi Gene,

    gene heskett wrote:
    interesting comments here. I've been using SSD's since 40G was the biggest. >> The 256G spinning rust, now 15 years old is the only spinning rust left
    here. And I've drawer full of samsung 860-870 series drives that have all
    gone wonky but not RO yet.. I now have a mixture of stuff from Taiwan in 2T >> and 4T sizes, all healthy. I guess I was not the only one that got
    questionable drives from Samsung. This is the first time I've seen them
    discussed in this context. Thank you for saying something out loud.

    To point this out: it was only exactly this model from Samsung: SSD 980 PRO, which isn't working properly.
    Repeat: only Samsung SSD 980 PRO.

    It can be fixed by a Firmware upgrade, and more recently charges of Samsung SSD 980 PRO are flashed/sold with a good Firmware out-of-the-box.
    While I am saying that my results with earlier Samsung have been less
    than glorious. triple layer nand's turning into half capacity for instance.

    Best regards,
    Klaus.


    Cheers, Gene Heskett, CET.
    --
    "There are four boxes to be used in defense of liberty:
    soap, ballot, jury, and ammo. Please use in that order."
    -Ed Howdershelt (Author, 1940)
    If we desire respect for the law, we must first make the law respectable.
    - Louis D. Brandeis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to gene heskett on Thu Dec 5 14:00:01 2024
    On Thu, Dec 05, 2024 at 07:32:03AM -0500, gene heskett wrote:
    While I am saying that my results with earlier Samsung have been less
    than glorious. triple layer nand's turning into half capacity for
    instance.

    There's simply no real value in looking at historic bad models as a
    guide to future performance (or the opposite). I can remember entire
    lines of hard drives from reputable manufacturers which were plauged by premature failures to the point that I replaced some multiple times
    under warranty before pulling them all (e.g., the IBM Deskstar 75GXP). I
    can also remember SSDs which had problems with repeated file corruption
    (OCZ Vertex, the only SSDs I ever saw reliably corrupt stored data).
    Bottom line is that sometimes you'll get a dud, and it doesn't really
    matter if you had a positive (or negative) experience with a
    superficially similar product decades ago. The Samsung 980 pros with the
    bad firmware were a ticking time bomb, but they haven't been sold with
    that version for years, and they haven't had issues since the fix. Other Samsung SSDs have been fine. The 860s have relatively low write
    endurance, but that's why they're as cheap as they are. You can either
    avoid using them in write-intensive settings and get a drive advertised
    for that role, or you can dramtically underprovision to lower the write
    cycle of individual cells and create space for caching. That's true for
    most low-cost drives, which is why they're low-cost, and why
    high-write-cycle drives are fantastically expensive. The average
    consumer will never write enough data to matter, but it is possible in pathological cases if something on the system goes nuts and starts
    sync-writing a really large number of small blocks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Klaus Singvogel@21:1/5 to gene heskett on Thu Dec 5 15:00:01 2024
    gene heskett wrote:
    On 12/5/24 06:59, Klaus Singvogel wrote:

    It can be fixed by a Firmware upgrade, and more recently charges of Samsung SSD 980 PRO are flashed/sold with a good Firmware out-of-the-box.
    While I am saying that my results with earlier Samsung have been less than glorious. triple layer nand's turning into half capacity for instance.

    In my memory, the PRO version of Samsung SSDs (not: NVMe) survived the test of a reputable website - many years ago. All other SSDs died when the article was written, except the Samsung PRO SSD. Their test continued.

    But, on the other hand, the regular version (no PRO in the name) of the Samsung SSD died first in the test.

    I think the data written to the Samsung SSD in the test exceeded twice the MTBF rate, several hundred TB.

    But I also noticed that the quality of Samsung SSDs has adapted to the quality of their competitors (not in the good way). I read a lot about the 980 PRO and the Firmware debacle, but also heard that the 990 PRO shouldn't be any better.

    Best regards,
    Klaus.
    --
    Klaus Singvogel
    GnuPG-Key-ID: 1024R/5068792D 1994-06-27

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erwan David@21:1/5 to Max Nikulin on Thu Dec 5 16:40:01 2024
    On Thu, Dec 05, 2024 at 04:26:18PM CET, Max Nikulin <manikulin@gmail.com> said:
    On 05/12/2024 16:19, Jörg-Volker Peetz wrote:
    1. SSD's have some self healing capacities (discarding defect sectors) which are performed when the drive is not mounted. Therefore, enter the BIOS of the computer and let it running for ca. an hour. Then restart
    the computer.

    I am curious which way OS notifies a drive that it is mounted. I believed that drivers read and write blocks, maybe switch power save states, but
    mount is performed on a higher level.


    We would the drive need to be notified ?

    --
    Erwan David

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daniel Harris@21:1/5 to mstone@debian.org on Thu Dec 5 16:40:01 2024
    On Thu, Dec 5, 2024 at 12:33 PM Michael Stone <mstone@debian.org> wrote:

    On Thu, Dec 05, 2024 at 10:53:54AM +0000, Daniel Harris wrote:
    So its not actually a crash. On the 2 occasions it has happened, I have been
    away from my computer for a while, and when I return and move the mouse,
    I can
    see messages scrolling on a black screen (no X running). I can move to a new
    vt but I cannot log in. When I try to log in I just get the errors repeating
    on the screen. After I do a hard reset everything works perfectly. No errors
    anywhere.

    Have you tried a memory test? Those symptoms and the smart output make
    me think the problem is in hardware other than the drive itself. Memory
    is the easiest to check and the easiest to remedy.

    Memtest passed with no errors

    Thanks
    Dan

    <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Dec 5, 2024 at 12:33 PM Michael Stone &lt;<a href="mailto:mstone@debian.org">mstone@debian.org</a>&gt; wrote:<br></div><
    blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Dec 05, 2024 at 10:53:54AM +0000, Daniel Harris wrote:<br>
    &gt;So its not actually a crash.  On the 2 occasions it has happened, I have been<br>
    &gt;away from my computer for a while, and when I return and move the mouse, I can<br>
    &gt;see messages scrolling on a black screen (no X running).  I can move to a new<br>
    &gt;vt but I cannot log in.  When I try to log in I just get the errors repeating<br>
    &gt;on the screen.  After I do a hard reset everything works perfectly. No errors<br>
    &gt;anywhere.<br>

    Have you tried a memory test? Those symptoms and the smart output make <br>
    me think the problem is in hardware other than the drive itself. Memory <br>
    is the easiest to check and the easiest to remedy.<br> <br></blockquote><div>Memtest passed with no errors</div><div><br></div><div>Thanks</div><div>Dan <br></div></div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Smith@21:1/5 to Erwan David on Thu Dec 5 17:10:01 2024
    Hi,

    On Thu, Dec 05, 2024 at 04:32:30PM +0100, Erwan David wrote:
    On Thu, Dec 05, 2024 at 04:26:18PM CET, Max Nikulin <manikulin@gmail.com> said:
    On 05/12/2024 16:19, Jörg-Volker Peetz wrote:
    1. SSD's have some self healing capacities (discarding defect sectors) which are performed when the drive is not mounted. Therefore, enter the BIOS of the computer and let it running for ca. an hour. Then restart
    the computer.

    I am curious which way OS notifies a drive that it is mounted. I believed that drivers read and write blocks, maybe switch power save states, but mount is performed on a higher level.

    We would the drive need to be notified ?

    Jörg-Volker claimed that there were self-healing routines that only
    happen when an SSD is not mounted.

    I am highly skeptical.

    Thanks,
    Andy

    --
    https://bitfolk.com/ -- No-nonsense VPS hosting

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to Max Nikulin on Thu Dec 5 18:50:01 2024
    On Thu, Dec 05, 2024 at 10:26:18PM +0700, Max Nikulin wrote:
    On 05/12/2024 16:19, Jörg-Volker Peetz wrote:
    1. SSD's have some self healing capacities (discarding defect
    sectors) which are performed when the drive is not mounted.
    Therefore, enter the BIOS of the computer and let it running for ca.
    an hour. Then restart the computer.

    I am curious which way OS notifies a drive that it is mounted. I
    believed that drivers read and write blocks, maybe switch power save
    states, but mount is performed on a higher level.

    It doesn't: leaving the system unmounted ensures that the drive is idle,
    but in general that's not necessary--just leaving the system alone will
    usually have the same result unless you've got a runaway process chewing
    on the disk. The SSD will do maintenance tasks when it's idle, or under pressure (has no other choice because there are no writable blocks
    available).

    The relevant limitation is that an SSD physical block can only be
    written once and then needs to be erased before another write. Changing
    a logical block writing the logical block to a different physical
    location. Physical blocks vary in size but are many times the size of a
    512 byte logically-addressible block. Many logical blocks (or versions
    of the same logical block) can be written to a physical block, and
    logical blocks that change leave unused older copies on the physical
    block. The entire physical block must be erased to write anything to the now-unused portions. This means copying all of the in-use logical blocks
    to a different physical block before erasing the original physical
    block. The drive will try to keep a pool of writable physical locations,
    and has a cache of faster storage to hold data pending a write to slower storage. Ideally your writes fit in cache, and the drive can do the
    erasing and moving when the drive is idle. If you write more data than
    can be cached, and there are no erased blocks to move data into, the
    drive needs to relocate existing logical blocks to free up and erase
    physical blocks before writing the new data. This has a significant
    performance impact if you're trying to write faster than the drive can relocate/erase.

    If you use fstrim/discard you'll notify the drive that certain logical
    blocks are not in use, allowing the physical block to be erased without
    the need to read & relocate those logical blocks. A block is marked unavailable/bad if it fails, and won't be used again. This will happen transparently if a block fails on erase/write (the data will simply be
    written to a different physical block and the logical block is
    unaffected). The drive will also notice if a physical block is readable
    but degrading, and will stop using it once any logical blocks it
    contains are written to a new physical block. If a block totally fails
    on read (much less common) it can't be relocated and the OS will get
    very non-transparent errors every time it tries to read that logical
    block. If you have a logical block that can't be read, discarding it can effectively make it disappear (i.e., the drive marks it as unused
    without needing to read it, and it will be available after it is written
    to again). You may be able to revitalize a drive with a troublesome bad
    block (e.g., underneath a directory entry so it can't be deleted and
    trimmed) by trimming the entire drive and restoring from backup. This is
    rare; in hundreds of TB of SSD I've encountered that situation exactly
    once. In that case it may be just a fluke that won't reoccur, but
    probably I wouldn't use that drive again (but if it was just a fluke,
    the drive is likely fine and not using it is overly paranoid; the right
    course of action is dependent on budget and risk tolerance).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gene heskett@21:1/5 to Klaus Singvogel on Thu Dec 5 22:30:01 2024
    On 12/5/24 08:55, Klaus Singvogel wrote:
    gene heskett wrote:
    On 12/5/24 06:59, Klaus Singvogel wrote:

    It can be fixed by a Firmware upgrade, and more recently charges of Samsung SSD 980 PRO are flashed/sold with a good Firmware out-of-the-box.
    While I am saying that my results with earlier Samsung have been less than >> glorious. triple layer nand's turning into half capacity for instance.

    In my memory, the PRO version of Samsung SSDs (not: NVMe) survived the test of a reputable website - many years ago. All other SSDs died when the article was written, except the Samsung PRO SSD. Their test continued.

    But, on the other hand, the regular version (no PRO in the name) of the Samsung SSD died first in the test.

    I think the data written to the Samsung SSD in the test exceeded twice the MTBF rate, several hundred TB.

    But I also noticed that the quality of Samsung SSDs has adapted to the quality of their competitors (not in the good way). I read a lot about the 980 PRO and the Firmware debacle, but also heard that the 990 PRO shouldn't be any better.

    The memory business is very competitive, so this doesn't surprise me.
    We've come a long way since the tv station I was working for pair $400
    for a 4 static memory to populate an 1802 based S100 board. We call
    todays slap it together and get it out the door attitude the bblb
    syndrome. So my next experiment is to destry





    Best regards,
    Klaus.


    Cheers, Gene Heskett, CET.
    --
    "There are four boxes to be used in defense of liberty:
    soap, ballot, jury, and ammo. Please use in that order."
    -Ed Howdershelt (Author, 1940)
    If we desire respect for the law, we must first make the law respectable.
    - Louis D. Brandeis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gene heskett@21:1/5 to Erwan David on Fri Dec 6 02:00:01 2024
    On 12/5/24 10:33, Erwan David wrote:
    On Thu, Dec 05, 2024 at 04:26:18PM CET, Max Nikulin <manikulin@gmail.com> said:
    On 05/12/2024 16:19, Jörg-Volker Peetz wrote:
    1. SSD's have some self healing capacities (discarding defect sectors)
    which are performed when the drive is not mounted. Therefore, enter the
    BIOS of the computer and let it running for ca. an hour. Then restart
    the computer.

    I am curious which way OS notifies a drive that it is mounted. I believed
    that drivers read and write blocks, maybe switch power save states, but
    mount is performed on a higher level.


    We would the drive need to be notified ?
    Wrong question IMO. If as you say, and that makes perfect sense, if the
    drive is boot mounted, there s/b a mechanism to advise the user that the
    boot mount will be delayed until such time as the drive reports its
    validation is completed. That would serve the purpose of advising the
    user that its use-by date is rapidly approaching. As is, we are at the
    mercy of the drive maker until it goes RO. And that most certainly is
    not a desirable situation.

    I think the case with u-sd's used for nearly everything in the arm
    arena, seem to be capable of doing this "housekeeping" while mounted is
    a good thing as I only power them down to do my mods, and my failure
    rate on those is actually much better. A kill-a-watt says my rebuilt but standing idle between jobs printers is drawing 14 watts. The pi clones
    have had zero u-sd card failures in over a decade..

    Cheers, Gene Heskett, CET.
    --
    "There are four boxes to be used in defense of liberty:
    soap, ballot, jury, and ammo. Please use in that order."
    -Ed Howdershelt (Author, 1940)
    If we desire respect for the law, we must first make the law respectable.
    - Louis D. Brandeis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to All on Fri Dec 6 15:00:02 2024
    On Fri, Dec 06, 2024 at 02:26:23PM +0100, Jörg-Volker Peetz wrote:
    Should have been more clear. The drive should be idle for a longer
    time. This is assured by not mounting any partition of the SSD.
    I was able to "repair" unreadable sectors on a built-in SSD of an
    HP-Probook laptop. As far as I remember I also deleted files which
    could not be read any more because of defective sectors and restored
    the files from backup. Such unreadable files can be found by
    performing, e.g., a checksum calculation of all files on the SSD.
    Then, leaving the SSD alone, it was able to "replace" the defective
    sectors by spare sectors.

    Sorry, I don't buy that. Whatever happened, it wasn't the drive
    pondering unreadable sectors and then regenerating them. I can believe
    that deleting unreadable files and restoring them made them readable
    again. (Overwriting a bad sector will cause the original block to be
    freed and potentially discarded; after rewriting, the data is not in the
    same physical location it was before.) As outlined in a previous post,
    trimming unused space may also let the drive discard bad blocks. None of
    that requires the drive to be unmounted.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?J=C3=B6rg-Volker_Peetz?=@21:1/5 to Michael Stone on Fri Dec 6 14:30:02 2024
    Hi,

    Michael Stone wrote on 05/12/2024 18:41:
    On Thu, Dec 05, 2024 at 10:26:18PM +0700, Max Nikulin wrote:
    On 05/12/2024 16:19, Jörg-Volker Peetz wrote:
    1. SSD's have some self healing capacities (discarding defect sectors) which
    are performed when the drive is not mounted. Therefore, enter the BIOS of the
    computer and let it running for ca. an hour. Then restart the computer.

    I am curious which way OS notifies a drive that it is mounted. I believed that
    drivers read and write blocks, maybe switch power save states, but mount is >> performed on a higher level.

    It doesn't: leaving the system unmounted ensures that the drive is idle, but in
    general that's not necessary--just leaving the system alone will usually have the same result unless you've got a runaway process chewing on the disk. The SSD
    will do maintenance tasks when it's idle, or under pressure (has no other choice
    because there are no writable blocks available).

    Should have been more clear. The drive should be idle for a longer time. This is
    assured by not mounting any partition of the SSD.
    I was able to "repair" unreadable sectors on a built-in SSD of an HP-Probook laptop. As far as I remember I also deleted files which could not be read any more because of defective sectors and restored the files from backup. Such unreadable files can be found by performing, e.g., a checksum calculation of all
    files on the SSD. Then, leaving the SSD alone, it was able to "replace" the defective sectors by spare sectors.

    Regards,
    Jörg.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?J=C3=B6rg-Volker_Peetz?=@21:1/5 to Michael Stone on Fri Dec 6 15:10:01 2024
    Hi,

    Michael Stone wrote on 06/12/2024 14:49:
    On Fri, Dec 06, 2024 at 02:26:23PM +0100, Jörg-Volker Peetz wrote:
    Should have been more clear. The drive should be idle for a longer time. This
    is assured by not mounting any partition of the SSD.
    I was able to "repair" unreadable sectors on a built-in SSD of an HP-Probook >> laptop. As far as I remember I also deleted files which could not be read any
    more because of defective sectors and restored the files from backup. Such >> unreadable files can be found by performing, e.g., a checksum calculation of >> all files on the SSD. Then, leaving the SSD alone, it was able to "replace" >> the defective sectors by spare sectors.

    Sorry, I don't buy that. Whatever happened, it wasn't the drive pondering unreadable sectors and then regenerating them. I can believe that deleting unreadable files and restoring them made them readable again. (Overwriting a bad
    sector will cause the original block to be freed and potentially discarded; after rewriting, the data is not in the same physical location it was before.)

    Yes, you are right. I also called 'fstrim -a' before restarting the computer into BIOS.

    As outlined in a previous post, trimming unused space may also let the drive discard bad blocks. None of that requires the drive to be unmounted.

    May be that would have worked if I had waited long enough. But I did it by letting the computer stay in BIOS for a while.

    As a check if the defective sectors are all mapped out I did read all sectors of
    the partitions:

    sudo dd if=/dev/sdaX of=/dev/null bs=8M status=progress

    Regards,
    Jörg.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to Max Nikulin on Fri Dec 6 17:40:02 2024
    On Fri, Dec 06, 2024 at 10:51:20PM +0700, Max Nikulin wrote:
    Michael, thank you for the long message. Actually I wonder what is
    "idle" that allows drive to perform self-maintenance. I expect that
    the device should not be in some deep power saving state (I am yet to >discover available tunables that allows drive to "sleep"). Should it
    be some period of time (seconds? minutes?) completely without any IO
    or is it enough if read/write speed is below some threshold and
    from/to another chip?

    Basically it means that it isn't busy doing I/O; if you're reading or
    writing, the drive can't also be reading and writing. It doesn't need to
    be absolutely unused.

    As to erase block size, I am aware of it. On the other hand I am
    surprised that a drive does not allow kernel to optimize writes on a
    higher level (as uSD does):

    grep '' /sys/block/*/queue/discard_granularity
    ...
    /sys/block/mmcblk0/queue/discard_granularity:4194304 >/sys/block/nvme0n1/queue/discard_granularity:4096 >/sys/block/sda/queue/discard_granularity:4096 # hdd (shingled)

    The discard_granularity *limits* how the kernel can tell the drive that
    there are free blocks--a granularity of 4M means that the kernel can
    only issue a TRIM command when it has at least 4M of empty space *and*
    that empty space is aligned on a 4M boundary. (That is, you can't
    discard locations 2-5M on the drive, only 0-3M, 4-7M, etc.) It's a big
    number on the sd card because sd cards are pretty much junk. On a decent
    NVMe drive it'll typically be 512 (i.e., you can discard any logical
    block) or maybe 4096 if you're in 4k mode.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to Max Nikulin on Sun Dec 8 18:20:01 2024
    On Sun, Dec 08, 2024 at 11:26:51PM +0700, Max Nikulin wrote:
    I switched this NVME drive to 4k mode. However I considered your
    message as statement that internally drives still use higher erase
    block size

    The erase block is going to be many megabytes, it has nothing to do with
    the logical blocks. The erase block isn't erased as each logical block
    is written, it is erased when it's empty. Many logical blocks can be
    written (sequentially) over time to the same erase block. Some drives
    work better with 4k logical blocks but in general I don't recommend
    using them--having a mix of 4k and 512b blocks on a system is a bit of a
    pain, and it makes replacing a drive more complicated. Not all drives
    support 4k, and many that do get no benefit from such a configuration.
    E.g.:

    # nvme id-ns -H /dev/nvme0n1 | grep Rel
    LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0 Best (in use)
    LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best

    This drive supports either format, but both are "Best". Other drives
    will recommend one or the other:

    # nvme id-ns -H /dev/nvme0n1 | grep Rel
    LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good (in use)
    LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0x1 Better

    Or support only one:

    # nvme id-ns -H /dev/nvme1n1 | grep Rel
    LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0 Best (in use)

    I'd rather just stick with 512b and not worry about it. Of the drives
    above, the one which doesn't care is the newest/fastest. It likely
    supports 4k format because it's a U.3 drive which could be used in a
    storage array already configured for 4k. The other two are in the same
    LVM VG, and if one of them were formatted 4k I wouldn't be able to
    migrate volumes between them (so any possible, likely not noticeable, performance benefit from a 4k format would be outweighed by the
    inconvenience.) The one that recommends a 4k format is the oldest,
    smallest, and slowest by far (Gen3 vs Gen4) and at best is half the
    speed of the others, regardless of format.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daniel Harris@21:1/5 to manikulin@gmail.com on Mon Jan 13 16:30:01 2025
    Just an Update to this thread.

    It was actually a software bug in desktop-portal (or something like that).
    Once that was removed my system has been rock solid.

    Thanks Dan


    On Tue, Dec 10, 2024 at 3:41 AM Max Nikulin <manikulin@gmail.com> wrote:

    On 09/12/2024 00:14, Michael Stone wrote:
    Not all drives
    support 4k, and many that do get no benefit from such a configuration.
    [...]
    # nvme id-ns -H /dev/nvme0n1 | grep Rel
    LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes -
    Relative Performance: 0x2 Good (in use)
    LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes -
    Relative Performance: 0x1 Better

    It is my case. I decided that ext4 uses 4k blocks anyway, so it is
    better to be consistent with hardware&firmware developers.

    As to erase block size, my expectation is that some drivers might
    benefit if that size is known: flushing caches (especially in laptop
    mode), allocating space for new files. I have no evidences that it is implemented though. Perhaps dedicated chips and caches inside drives may
    do it more efficiently (besides dumb cheap models).

    mkfs.* tools might use erase block size to align filesystem structures.

    It is the reason why I was surprised that erase block size is not
    exposed to kernel.

    My real curiosity was caused by "not mounting" a drive to allow self
    healing. "Idle" is imprecise from my point of view, but I think we may
    stop here. There is a chance that I will accidentally notice a detailed article on this topic.




    <div dir="ltr"><div>Just an Update to this thread.</div><div><br></div><div>It was actually a software bug in desktop-portal (or something like that).  Once that was removed my system has been rock solid.</div><div><br></div><div>Thanks Dan<br></div></
    <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Dec 10, 2024 at 3:41 AM Max Nikulin &lt;<a href="mailto:manikulin@gmail.com" target="_blank">manikulin@gmail.com</a>&gt; wrote:<br></div><
    blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 09/12/2024 00:14, Michael Stone wrote:<br>
    &gt; Not all drives <br>
    &gt; support 4k, and many that do get no benefit from such a configuration. <br>
    [...]<br>
    &gt; # nvme id-ns -H /dev/nvme0n1 | grep Rel<br>
    &gt; LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - <br> &gt; Relative Performance: 0x2 Good (in use)<br>
    &gt; LBA Format  1 : Metadata Size: 0   bytes - Data Size: 4096 bytes - <br> &gt; Relative Performance: 0x1 Better<br>

    It is my case. I decided that ext4 uses 4k blocks anyway, so it is <br>
    better to be consistent with hardware&amp;firmware developers.<br>

    As to erase block size, my expectation is that some drivers might <br>
    benefit if that size is known: flushing caches (especially in laptop <br> mode), allocating space for new files. I have no evidences that it is <br> implemented though. Perhaps dedicated chips and caches inside drives may <br> do it more efficiently (besides dumb cheap models).<br>

    mkfs.* tools might use erase block size to align filesystem structures.<br>

    It is the reason why I was surprised that erase block size is not <br>
    exposed to kernel.<br>

    My real curiosity was caused by &quot;not mounting&quot; a drive to allow self <br>
    healing. &quot;Idle&quot; is imprecise from my point of view, but I think we may <br>
    stop here. There is a chance that I will accidentally notice a detailed <br> article on this topic.<br>


    </blockquote></div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)