• Re: ext4 file system corruption - again

    From Arno Lehmann@21:1/5 to All on Mon Sep 23 12:00:01 2024
    Hi Jesper,

    Am 23.09.2024 um 11:20 schrieb Jesper Dybdal:
    ...
    ---------------- fsck log:
    Log of fsck -C -a -T -t ext4 /dev/md0
    Sun Sep 22 20:20:13 2024

    root has been mounted 13 times without being checked, check forced.
    root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
    root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
    root: 223986/32759808 files (6.5% non-contiguous), 6827061/131038976
    blocks


    I do not think it is likely to be a hardware problem. The bit patterns
    for the number of blocks you quote actually have all the high bits of
    the potentially used 48 bits of the block count set, for example 111111111111111111111111111111111111111111100111 and given that the 16
    higher bits are not stored next to the lower 32 bits, there should be
    many other values and flags set to all ones, which should result in
    other things to notice -- for example, the file should be considered to
    be part of the file system structure and compressed, which fsck should
    loudly complain about (not verified).

    The large values with peculiar bit patterns do look like some flag
    values to me. It might be worth asking among the ext4 developers if
    those values could be introduced by some particular condition, I think.

    Cheers,

    Arno

    --
    Arno Lehmann

    IT-Service Lehmann
    Sandstr. 6, 49080 Osnabrück

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesper Dybdal@21:1/5 to All on Mon Sep 23 11:30:01 2024
    Some time ago I had a problem with the i_blocks field of a few inodes
    being corrupted (replaced by extremely large numbers).

    It now happened again, and the strange thing is that one of the two
    files affected was also affected the earlier time (or more precisely,
    the file that now has the same name as the old one had). That file was /etc/postfix/master.cf - a simple text configuration file that is
    modified only manually using emacs.

    I wrote about it here when it first happened - that thread began with:
       Message-ID: <bf46ee0b-8af1-43ce-a48f-e304ef850b3d@dybdal.dk>
       Date: Tue, 19 Mar 2024 15:43:30 +0100
       Subject: Filsystemkorruption i ext4?

    I don't believe that it is a disk error - the file system is on a RAID1 partition and the RAID consistency is checked regularly.
    I also find it hard to believe that it is a RAM error - the mashine has
    run memtest86+ overnight without finding anything.
    There was a power outage some time ago, but surely ext4 should be able
    to handle that without introducing errors.

    Fsck fixes it.
    The system is an up-to-date Bookworm.

    Any ideas as to how this can happen - twice, and effecting (among
    others) the same file?

    ---------------- fsck log:
    Log of fsck -C -a -T -t ext4 /dev/md0
    Sun Sep 22 20:20:13 2024

    root has been mounted 13 times without being checked, check forced.
    root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
    root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
    root: 223986/32759808 files (6.5% non-contiguous), 6827061/131038976 blocks fsck exited with status code 1

    ---------------- stat(1) for the two files before fsck:
      File: main.cf
      Size: 16959         Blocks: 2251799813685048 IO Block: 4096 regular file
    Device: 9,0    Inode: 10748715    Links: 1
    Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/ root)
    Access: 2024-09-22 20:31:58.081768853 +0200
    Modify: 2024-08-03 19:33:20.665350446 +0200
    Change: 2024-09-22 22:09:54.053359071 +0200
     Birth: 2024-08-03 11:20:14.671832520 +0200

      File: master.cf
      Size: 10782         Blocks: 2251799813685176 IO Block: 4096 regular file
    Device: 9,0    Inode: 10751288    Links: 1
    Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/ root)
    Access: 2024-09-22 19:20:07.216191493 +0200
    Modify: 2024-06-05 14:37:24.205001097 +0200
    Change: 2024-09-22 22:09:54.053359071 +0200
     Birth: 2024-03-19 13:36:38.971618859 +0100
    ----------------

    Thanks,
    Jesper

    --
    Jesper Dybdal
    https://www.dybdal.dk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Mon Sep 23 16:20:01 2024
    root has been mounted 13 times without being checked, check forced.
    root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
    ^^^^^^^^^^^^^^^
    AKA -25

    root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
    ^^^^^^^^^^^^^^^
    AKA -9

    It's odd that it looks like -N². Maybe it's just happenstance, but I'd
    be curious to see if you have other "data points".


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesper Dybdal@21:1/5 to Arno Lehmann on Tue Sep 24 11:10:01 2024
    This is a multi-part message in MIME format.
    On 2024-09-23 11:55, Arno Lehmann wrote:
    Am 23.09.2024 um 11:20 schrieb Jesper Dybdal:
    ...
    root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED. root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED. root: 223986/32759808 files (6.5% non-contiguous), 6827061/131038976
    blocks

    I do not think it is likely to be a hardware problem. The bit patterns
    for the number of blocks you quote actually have all the high bits of
    the potentially used 48 bits of the block count set
    ...
    The large values with peculiar bit patterns do look like some flag
    values to me. It might be worth asking among the ext4 developers if
    those values could be introduced by some particular condition, I think.

    I think I'll do that.


    On 2024-09-23 16:10, Stefan Monnier wrote:
    root has been mounted 13 times without being checked, check forced.
    root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
    ^^^^^^^^^^^^^^^
    AKA -25

    root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
    ^^^^^^^^^^^^^^^
    AKA -9

    It's odd that it looks like -N². Maybe it's just happenstance, but I'd
    be curious to see if you have other "data points".
    Unfortunately, I have no other "data points".  The data from the first occurrence of this are lost.

    Many thanks to Arno and Stefan - I probably had not noticed the
    interesting values without their help.

    Jesper

    --
    Jesper Dybdal
    https://www.dybdal.dk

    <!DOCTYPE html>
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </head>
    <body>
    On 2024-09-23 11:55, Arno Lehmann wrote:<br>
    <blockquote type="cite"
    cite="mid:d8cd6fb8-22d2-4835-9524-d46120dc5d1e@its-lehmann.de">Am
    23.09.2024 um 11:20 schrieb Jesper Dybdal:
    <br>
    ...
    <br>
    &gt; root: Inode 10748715, i_blocks is 281474976710631, should be
    5. FIXED.
    <br>
    &gt; root: Inode 10751288, i_blocks is 281474976710647, should be
    3. FIXED.
    <br>
    &gt; root: 223986/32759808 files (6.5% non-contiguous),
    6827061/131038976 blocks
    <br>
    <br>
    I do not think it is likely to be a hardware problem. The bit
    patterns for the number of blocks you quote actually have all the
    high bits of the potentially used 48 bits of the block count set </blockquote>
    ...<br>
    <blockquote type="cite"
    cite="mid:d8cd6fb8-22d2-4835-9524-d46120dc5d1e@its-lehmann.de">The
    large values with peculiar bit patterns do look like some flag
    values to me. It might be worth asking among the ext4 developers
    if those values could be introduced by some particular condition,
    I think.
    <br>
    </blockquote>
    <br>
    I think I'll do that.<br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 2024-09-23 16:10, Stefan Monnier
    wrote:<br>
    </div>
    <blockquote type="cite"
    cite="mid:jwvh6a64g4o.fsf-monnier+gmane.linux.debian.user@gnu.org"><span
    style="white-space: pre-wrap">root has been mounted 13 times without being checked, check forced.</span>
    <blockquote type="cite">
    <pre class="moz-quote-pre" wrap="">root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
    </pre>
    </blockquote>
    <pre class="moz-quote-pre" wrap=""> ^^^^^^^^^^^^^^^
    AKA -25

    </pre>
    <blockquote type="cite">
    <pre class="moz-quote-pre" wrap="">root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
    </pre>
    </blockquote>
    <pre class="moz-quote-pre" wrap=""> ^^^^^^^^^^^^^^^
    AKA -9

    It's odd that it looks like -N². Maybe it's just happenstance, but I'd
    be curious to see if you have other "data points".</pre>
    </blockquote>
    Unfortunately, I have no other "data points".  The data from the
    first occurrence of this are lost.<br>
    <br>
    Many thanks to Arno and Stefan - I probably had not noticed the
    interesting values without their help.<br>
    <br>
    Jesper<br>
    <pre class="moz-signature" cols="72">--
    Jesper Dybdal
    <a class="moz-txt-link-freetext" href="https://www.dybdal.dk">https://www.dybdal.dk</a></pre>
    </body>
    </html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesper Dybdal@21:1/5 to Stefan Monnier on Thu Sep 26 15:30:01 2024
    This is a multi-part message in MIME format.
    On 2024-09-23 16:10, Stefan Monnier wrote:
    root has been mounted 13 times without being checked, check forced.
    root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
    ^^^^^^^^^^^^^^^
    AKA -25

    root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
    ^^^^^^^^^^^^^^^
    AKA -9

    It's odd that it looks like -N². Maybe it's just happenstance, but I'd
    be curious to see if you have other "data points".

    One more "data point" just appeared today.  I had edited /etc/fstab and rebooted, and then fstab suddenly had the problem.:

    root: Inode 10748542, i_blocks is 281474976710653, should be 1. FIXED.
    AKA -3, so not of the -N² form.

    From a stat() (after the fsck that repaired the file):
      File: fstab
      Size: 1855          Blocks: 8          IO Block: 4096   regular file

    I haven't yet had time to describe the problem on the ext4 mailing
    list.  I have remembered that there is one thing I do differently from
    the default: I use the ext4 option "nodelalloc" (because several years
    ago, there was a discussion about "delalloc or not" from which It seemed
    that nodelalloc was probably slightly safer - if the associated
    performance reduction is not a problem, which it is not for me).

    I've now turned nodelalloc off, just in case that changes something.

    --
    Jesper Dybdal
    https://www.dybdal.dk

    <!DOCTYPE html>
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </head>
    <body>
    On 2024-09-23 16:10, Stefan Monnier wrote:<br>
    <blockquote type="cite"
    cite="mid:jwvh6a64g4o.fsf-monnier+gmane.linux.debian.user@gnu.org">
    <blockquote type="cite">
    <pre class="moz-quote-pre" wrap="">root has been mounted 13 times without being checked, check forced.
    root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
    </pre>
    </blockquote>
    <pre class="moz-quote-pre" wrap=""> ^^^^^^^^^^^^^^^
    AKA -25

    </pre>
    <blockquote type="cite">
    <pre class="moz-quote-pre" wrap="">root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
    </pre>
    </blockquote>
    <pre class="moz-quote-pre" wrap=""> ^^^^^^^^^^^^^^^
    AKA -9

    It's odd that it looks like -N². Maybe it's just happenstance, but I'd
    be curious to see if you have other "data points".

    </pre>
    </blockquote>
    One more "data point" just appeared today.  I had edited /etc/fstab
    and rebooted, and then fstab suddenly had the problem.:<br>
    <br>
    root: Inode 10748542, i_blocks is 281474976710653, should be 1. 
    FIXED.<br>
    AKA -3, so not of the  <span style="white-space: pre-wrap">-N²</span>
    form.<br>
    <br>
    From a stat() (after the fsck that repaired the file):<br>
      File: fstab<br>
      Size: 1855          Blocks: 8          IO Block: 4096   regular
    file<br>
    <br>
    I haven't yet had time to describe the problem on the ext4 mailing
    list.  I have remembered that there is one thing I do differently
    from the default: I use the ext4 option "nodelalloc" (because
    several years ago, there was a discussion about "delalloc or not"
    from which It seemed that nodelalloc was probably slightly safer -
    if the associated performance reduction is not a problem, which it
    is not for me).<br>
    <br>
    I've now turned nodelalloc off, just in case that changes something.<br>
    <pre class="moz-signature" cols="72">--
    Jesper Dybdal
    <a class="moz-txt-link-freetext" href="https://www.dybdal.dk">https://www.dybdal.dk</a></pre>
    </body>
    </html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)