• [gentoo-user] e2fsck -c when bad blocks are in existing file?

    From Grant Edwards@21:1/5 to All on Tue Nov 8 04:40:01 2022
    I've got an SSD that's failing, and I'd like to know what files
    contain bad blocks so that I don't attempt to copy them to the
    replacement disk.

    According to e2fsck(8):

    -c This option causes e2fsck to use badblocks(8) program to do a
    read-only scan of the device in order to find any bad blocks. If
    any bad blocks are found, they are added to the bad block inode
    to prevent them from being allocated to a file or directory. If
    this option is specified twice, then the bad block scan will be
    done using a non-destructive read-write test.

    What happens when the bad block is _already_allocated_ to a file?

    --
    Grant

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael@21:1/5 to All on Tue Nov 8 13:20:40 2022
    On Tuesday, 8 November 2022 03:31:07 GMT Grant Edwards wrote:
    I've got an SSD that's failing, and I'd like to know what files
    contain bad blocks so that I don't attempt to copy them to the
    replacement disk.

    According to e2fsck(8):

    -c This option causes e2fsck to use badblocks(8) program to do
    a read-only scan of the device in order to find any bad blocks. If any
    bad blocks are found, they are added to the bad block inode to prevent them from being allocated to a file or directory. If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test.

    What happens when the bad block is _already_allocated_ to a file?

    --
    Grant

    Previously allocated to a file and now re-allocated or not, my understanding
    is with spinning disks the data in a bad block stays there unless you've dd'ed some zeros over it. Even then read or write operations could fail if the
    block is too far gone.[1] Some data recovery applications will try to read data off a bad block in different patterns to retrieve what's there. Once the bad block is categorized as such it won't be used by the filesystem to write new data to it again.

    With SSDs the situation is less deterministic, because the disk's internal
    wear levelling firmware moves things around according to its algorithms to remap bad blocks. This is all transparent to the filesystem, block addresses sent to the fs are virtual anyway. Bypassing the firmware controller to
    access individual cells on an SSD requires specialist equipment and your own lab, although things may have evolved since I last looked into this.

    The general advice is to avoid powering down an SSD which is suspected of corruption, until all the data is copied/recovered off it first. If you power it down, data on it may never be accessible again without the aforementioned lab.

    BTW, running badblocks in read-write mode on an ailing/aged SSD may exacerbate the problem without much benefit by accelerating wear and causing additional cells to fail. At the same time you could be relying on the suspect disk firmware to access via its virtual map the data on some of its cells. Data scrubbing (btrfs, zfs) and recent backups would probably be a better strategy with SSDs.


    [1] https://www.smartmontools.org/wiki/BadBlockHowto
    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEXqhvaVh2ERicA8Ceseqq9sKVZxkFAmNqV6gACgkQseqq9sKV ZxkN7w//UkZkxfLQUwULIifY3rzn74viNTdGy82iNL52TucPVcitP7TQDcM9Yz0g IneLZIkWa37bL+IeWxeemSl4aQEEFcZcQGkfL3z+tL6VZj3vitH/NhvLp55NTAUT wd/dE1v4YL0ooXh8ABPXlzsQ5HQten0I06Kgy3syFtpbZMe9dgI6csLa8LEluoal Wtrp2KdKVl3hjRPgo5nNMpn2CPQRk0/QRfMa+0cm2ebBBua1q+AtQIFIlQGxfn9s XOPge0rV6EuBqgr++xNzYagho8bcRnlr5Yzvcv1c+4MlFXbd907Gc5on+LErbtUC J9SyMzefYreqM3Oo8RDu8xENI4ygn9BGVlrtQDCRZC02a6OilA5jdWYSMtDlPATT so4uckVnr3LR1by9pY9qlpFiEcGC7fjt3eO+vy31DLA4nmuxFcQ7Y8jIxe96/uS0 7Bw7Edwz83NarU0NnJT4EX8VWypItxM2OFWjhX7JhdXEDE+8phIXVX4lyCq6xylk rCuqbnocpX3i3zugXxRWWzBTxiup60Ea1EsT7hlZ4+HXjaoMewFCQvFQi1kT7dQB /C+bDpskRsrMuAojduU8wl+B/RMRwCvXtYy59DdzAd04sIzbIp/ZEQrG8ORZSzas /5+vMSBb+P317TPx/bqWY0P4dOZ/4NJKxyMhcZmTDvR1o9orRtI=
    =myhI
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wols Lists@21:1/5 to Michael on Tue Nov 8 19:30:01 2022
    On 08/11/2022 13:20, Michael wrote:
    On Tuesday, 8 November 2022 03:31:07 GMT Grant Edwards wrote:
    I've got an SSD that's failing, and I'd like to know what files
    contain bad blocks so that I don't attempt to copy them to the
    replacement disk.

    According to e2fsck(8):

    -c This option causes e2fsck to use badblocks(8) program to do
    a read-only scan of the device in order to find any bad blocks. If any
    bad blocks are found, they are added to the bad block inode to prevent
    them from being allocated to a file or directory. If this option is
    specified twice, then the bad block scan will be done using a
    non-destructive read-write test.

    What happens when the bad block is _already_allocated_ to a file?

    --
    Grant

    Previously allocated to a file and now re-allocated or not, my understanding is with spinning disks the data in a bad block stays there unless you've dd'ed
    some zeros over it. Even then read or write operations could fail if the block is too far gone.[1] Some data recovery applications will try to read data off a bad block in different patterns to retrieve what's there. Once the
    bad block is categorized as such it won't be used by the filesystem to write new data to it again.

    With SSDs the situation is less deterministic, because the disk's internal wear levelling firmware moves things around according to its algorithms to remap bad blocks. This is all transparent to the filesystem, block addresses sent to the fs are virtual anyway. Bypassing the firmware controller to access individual cells on an SSD requires specialist equipment and your own lab, although things may have evolved since I last looked into this.

    Which is actually pretty much exactly the same as what happens with
    spinning rust.

    The primary aim of a hard drive - SSD or spinning rust - is to save the
    user's data. If the drive can't read the data it will do nothing save
    returning a read error. Think about it - any other action will simply
    make matters worse, namely the drive is actively destroying possibly-salvageable data.

    All being well, the user has raid or backups, and will be able to
    re-write the file, at which point the drive will attempt recovery, as it
    now has KNOWN GOOD data. If the write fails, the block will then be
    added to the *drive internal* badblock list, and will be remapped elsewhere.

    MODERN DRIVES SHOULD NEVER HAVE AN OS-LEVEL BADBLOCKS LIST. If they do, something is seriously wrong, because the drive should be hiding it from
    the OS.

    The general advice is to avoid powering down an SSD which is suspected of corruption, until all the data is copied/recovered off it first. If you power
    it down, data on it may never be accessible again without the aforementioned lab.

    Seriously, this is EXTREMELY GOOD advice. I don't know whether it is
    still true, but there have been plenty of stories in the past about
    SSDs, when they get too many errors, they self-destruct on power-down!!!

    This imho is a serious design fault - you can't recover data from an SSD
    that won't boot - but the fact is it appears to be a deliberate decision
    by the manufacturers.

    BTW, running badblocks in read-write mode on an ailing/aged SSD may exacerbate
    the problem without much benefit by accelerating wear and causing additional cells to fail. At the same time you could be relying on the suspect disk firmware to access via its virtual map the data on some of its cells. Data scrubbing (btrfs, zfs) and recent backups would probably be a better strategy with SSDs.

    Yup. If you suspect badblocks have damaged your data, you need backups
    or raid. And then don't worry about it - apart from making sure your
    drives look healthy and replacing any that are dodgy.

    Just make sure you interpret smartmontools data correctly - perfectly
    healthy drives can drop dead for no apparent reason, and drives that
    look at death's door will carry on for ever. In particular, read errors
    aren't serious unless they are accompanied by a growing number of
    relocation errors. If the relocation number jumps, watch it. If it
    doesn't move while you're watching, it was probably a glitch and the
    drive is okay. But use your head and be sensible. Any sign of regular
    failed writes, BIN THE DRIVE.

    (I think my 8TB drive says 1 read error per less-than-two end-to-end
    scans is well within spec...)

    Cheers,
    Wol

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael@21:1/5 to All on Wed Nov 9 08:46:43 2022
    On Tuesday, 8 November 2022 18:24:41 GMT Wols Lists wrote:

    MODERN DRIVES SHOULD NEVER HAVE AN OS-LEVEL BADBLOCKS LIST. If they do, something is seriously wrong, because the drive should be hiding it from
    the OS.

    If you run badblocks or e2fsck you'll find the application asks to write data to the disk, at the end of the run. Yes, the drive's firmware should manage badblocks transparently to the filesystem, but I have observed in hdparm
    output reallocations of badblocks do not happen in real time. Perhaps the filesystem level badblocks list which is LBA based, acts as an intermediate step until the hardware triggers a reallocation? Not sure. :-/


    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEXqhvaVh2ERicA8Ceseqq9sKVZxkFAmNraPMACgkQseqq9sKV ZxkMIA//crauUwOGQzA4eDxc+98IfyFPuos0YIQjq/B/ry25V94iTCdhKs2leI5u z9Q+9VbXoBkL9ZwNvuZs9D0loO2+PACe0UQC7hxWTw+gPEISneF1AI5kxSwn38Cw 6NmIKvZQIHf+D5I749vYnLAapBaBmSnzIJfGJ0TbYFCIoC1IJ2o4/jT35d1317dR n2cLosKBm3Fa+xh0t/rkSl3ASzdI7OwUH0VG7ty8qXvhCc+eMuBJhKB399fSSWgv dONY5TW8fZ90sl/QKOa9+H6c5N/FwN1O94mrhJaVezIOA17b/ESMg97uoETqSsWf 2QfBQxCAeltiZc/5J1/Hkrd6rSpiTlDZcIK+hBG33CbaD5YX784+LTmRGSLU4Lgy 8qXcRewrG/hQ04qUzRsQZqCJVjIBQ7JAy+87A3Wrk02BLXpg1SApQUPcW5slGV8l 6cOGswhEi/tykNUXCkm6aA7evYPXNaLnpNUja741BlvbZu/H+8KrQZIm/v/EdNuo t3g+1/N/ft9XNh6ruIbjTawh4JaQRmnmt03V3lctAqozj47C7BMmtegndFr1oWsm QCir3AKNz2eIc7WGrNDikdgiVcp01xFbrrbL7mNutHczxpFxUfk6Y86fPOOxplfE 9ejEbQ346Gu2zI6wl2fwMkkRycrs8xBbsq3m2A7t0eWXyy8+DUU=
    =V46a
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael@21:1/5 to All on Sat Nov 12 13:38:36 2022
    On Wednesday, 9 November 2022 16:53:13 GMT Laurence Perkins wrote:

    -----Original Message-----
    From: Michael <confabulate@kintzios.com>
    Sent: Wednesday, November 9, 2022 12:47 AM
    To: gentoo-user@lists.gentoo.org
    Subject: Re: [gentoo-user] e2fsck -c when bad blocks are in existing file?

    On Tuesday, 8 November 2022 18:24:41 GMT Wols Lists wrote:


    MODERN DRIVES SHOULD NEVER HAVE AN OS-LEVEL BADBLOCKS LIST. If they
    do, something is seriously wrong, because the drive should be hiding
    it from the OS.


    If you run badblocks or e2fsck you'll find the application asks to write >data to the disk, at the end of the run. Yes, the drive's firmware should >manage badblocks transparently to the filesystem, but I have observed in >hdparm output reallocations of badblocks do not happen in real time. >Perhaps the filesystem level badblocks list which is LBA based, acts as an >intermediate step until the hardware triggers a reallocation? Not sure. >:-/



    Badblocks doesn't ask to write anything at the end of the run. You tell it whether you want a read test, a write-read test or a
    read-write-read-replace test at the beginning.

    Not to labour the point, but 'e2fsck -v -c' runs a read test and at the end it informs me "... Updating bad block inode", even if it came across no read errors (0/0/0) and consequently does not prompt for a fs repair.

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEXqhvaVh2ERicA8Ceseqq9sKVZxkFAmNvodwACgkQseqq9sKV ZxlDyw/9Hk4rqdoUNSg2rcFwtbxtZKAYg7RAiuqFJU4dvRHh+eI5BEcPGU8H7GR8 KZKmAS4aux+c80EdCnVuJppkDSfTyroJJqJaA2SS0EZWVawyIgYYzOXhUw9psB3g ot3bW2OJks+1t+xtHCKZI25ykjs76dgeyFDQMX8p3av/sX7lQtlMAy+5m+nrSRDW dv2EvtaYoJVayvWJJd68oUWF6Mf2FNGduvG7hxU3/ZlxT8adQ2XF6vmHUemvgwnN Tdf34F5C9Vt8Y3aBfAUleoaP37Jeyu+P/BeVjmKnutIJY7KvtTz407Y/HJKNiF2w N+LY6ee5K+xPuR5LwwzAH9Woi6y0oMIPlDtq5JSufxknckQ8ekIl3eahk1HW3Tlm cM+HqOJdJzuwKUZ7moNAY5r+Y2jnrP7NpdzAQ5dkbjRkUdSdpr4m44is2N7oQDkV tpfb3x50QAW3nrjdbi+6CeA3RRxbPDIEIXBWrrK+NPOT1EBCgjAbzaw33xdCSnDY sKDuixdQD+X/INehB+hyD5RSruZVtD1XbxNzn/oGDCTe1jQvtEjEYezlahPD0phi zEgEBzW+8IMvp42MdXzYToWyOFpDoMCIK7BfoXb5aL6UTimPSdveVHczPRvaU6yv 5igBsJqXxQFYZf4rOipy8vE1Qkkn9sTyym/31A1pwcMj6psKDzA=
    =hKmo
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)