• Re: Debian 12.5.0 amd64 and OpenZFS bug #15526

    From Gareth Evans@21:1/5 to All on Fri Mar 22 22:10:01 2024
    --Apple-Mail-768CDE2A-5B49-454D-8FD8-08E26311BB95
    Content-Type: text/plain;
    charset=utf-8
    Content-Transfer-Encoding: quoted-printable


    On 27 Feb 2024, at 23:47, Gareth Evans <donotspam@fastmail.fm> wrote:
    On Tue 27/02/2024 at 22:52, David Christensen <dpchrist@holgerdanske.com> wrote:
    ...
    These appear to be the ZFS packages for the available Debian releases:

    https://packages.debian.org/buster/zfs-dkms

    buster zfs-dkms (0.7.12-2+deb10u2)
    buster-backports zfs-dkms (2.0.3-9~bpo10+1)
    bullseye zfs-dkms (2.0.3-9+deb11u1)
    bullseye-backports zfs-dkms (2.1.11-1~bpo11+1)
    bookworm zfs-dkms (2.1.11-1)
    bookworm-backports zfs-dkms (2.2.2-4~bpo12+1)
    trixie zfs-dkms (2.2.2-4)


    The question is, how far back to go? Is OpenZFS 2.1.x buggy? OpenZFS
    2.0.x? What is 0.7.12 -- OpenZFS, ZFS-on-Linux, or something else --
    and is it buggy?

    This seems to be very "involved"! The discussion in #15526 suggests a coreutils upgrade (particularly re. "cp") in combination with the addition of the zpool block cloning feature seems to have triggered the issue, which may have gone undetected for
    some time.

    After downgrading coreutils from 9.3 to 8.32, I am no longer able to reproduce this corruption.
    This seems to solve the corruption issue on my end too.
    -- https://github.com/openzfs/zfs/issues/15526#issuecomment-1810472547

    See also https://www.reddit.com/r/zfs/comments/1826lgs/psa_its_not_block_cloning_its_a_data_corruption/

    Debian users can't follow the gentoo/emerge-based reproduction/trigger steps for build of golang in
    https://github.com/openzfs/zfs/issues/15526 (for zfs 2.2.0)
    and
    https://github.com/openzfs/zfs/issues/15933 (for 2.2.3)

    If anyone can recommend steps to debianise these (15933 seem most likely to be useful, and slightly different), I would be happy to test openzfs 2.2.2-4 from bookworm-backports on deb 12.5

    Given that the original gentoo reporter, who seems to have tested extensively, considered the issue closed after upgrade to openzfs 2.2.2

    https://bugs.gentoo.org/917224#c26

    I wonder if the 2.2.3 issue is similar/related, or perhaps there are multiple triggers.

    Watching with interest.

    Best wishes,
    Gareth


    As anyone interested can see from the ref to #15933 in the below, there seems to have been considerable effort in getting to grips with this bug (actually multiple bugs), and it looks like a fix may be forthcoming, though not sure at the time of writing
    if there may be some further polishing first

    https://github.com/openzfs/zfs/pull/16019 --Apple-Mail-768CDE2A-5B49-454D-8FD8-08E26311BB95
    Content-Type: text/html;
    charset=utf-8
    Content-Transfer-Encoding: quoted-printable

    <html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"><meta http-equiv="content-type" content="text/html; charset=utf-8"><div dir="ltr"></div><div dir="ltr"><br></div><div dir="ltr"><div dir=
    "ltr"></div><blockquote type="cite">On 27 Feb 2024, at 23:47, Gareth Evans &lt;donotspam@fastmail.fm&gt; wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><span>On Tue 27/02/2024 at 22:52, David Christensen &lt;dpchrist@
    holgerdanske.com&gt; wrote:</span><br><blockquote type="cite"><span>...</span><br></blockquote><blockquote type="cite"><span>These appear to be the ZFS packages for the available Debian releases:</span><br></blockquote><blockquote type="cite"><span></
    span><br></blockquote><blockquote type="cite"><span>https://packages.debian.org/buster/zfs-dkms</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>buster &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;zfs-
    dkms (0.7.12-2+deb10u2)</span><br></blockquote><blockquote type="cite"><span>buster-backports &nbsp; &nbsp;zfs-dkms (2.0.3-9~bpo10+1)</span><br></blockquote><blockquote type="cite"><span>bullseye &nbsp; &nbsp; &nbsp; &nbsp;zfs-dkms (2.0.3-9+deb11u1)</
    span><br></blockquote><blockquote type="cite"><span>bullseye-backports &nbsp; &nbsp;zfs-dkms (2.1.11-1~bpo11+1)</span><br></blockquote><blockquote type="cite"><span>bookworm &nbsp; &nbsp; &nbsp; &nbsp;zfs-dkms (2.1.11-1)</span><br></blockquote><
    blockquote type="cite"><span>bookworm-backports &nbsp; &nbsp;zfs-dkms (2.2.2-4~bpo12+1)</span><br></blockquote><blockquote type="cite"><span>trixie &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;zfs-dkms (2.2.2-4)</span><br></blockquote><blockquote type="cite">
    <span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>The question is, how far back to go? &nbsp;Is OpenZFS 2.1.x buggy? &nbsp;OpenZFS </span><br></blockquote><blockquote type="cite"><span>2.0.x?
    &nbsp;What is 0.7.12 -- OpenZFS, ZFS-on-Linux, or something else -- </span><br></blockquote><blockquote type="cite"><span>and is it buggy?</span><br></blockquote><span></span><br><span>This seems to be very "involved"! &nbsp;The discussion in #15526
    suggests a coreutils upgrade (particularly re. "cp") in combination with the addition of the zpool block cloning feature seems to have triggered the issue, which may have gone undetected for some time.</span><br><span></span><br><blockquote type="cite"><
    blockquote type="cite"><span>After downgrading coreutils from 9.3 to 8.32, I am no longer able to reproduce this corruption.</span><br></blockquote></blockquote><blockquote type="cite"><span>This seems to solve the corruption issue on my end too.</span><
    </blockquote><span>-- https://github.com/openzfs/zfs/issues/15526#issuecomment-1810472547</span><br><span></span><br><span>See also</span><br><span>https://www.reddit.com/r/zfs/comments/1826lgs/psa_its_not_block_cloning_its_a_data_corruption/</span><
    <span></span><br><span>Debian users can't follow the gentoo/emerge-based reproduction/trigger steps for build of golang in </span><br><span>https://github.com/openzfs/zfs/issues/15526 (for zfs 2.2.0)</span><br><span>and</span><br><span>https://github.
    com/openzfs/zfs/issues/15933 (for 2.2.3)</span><br><span></span><br><span>If anyone can recommend steps to debianise these (15933 seem most likely to be useful, and slightly different), I would be happy to test openzfs 2.2.2-4 from bookworm-backports on
    deb 12.5</span><br><span></span><br><span>Given that the original gentoo reporter, who seems to have tested extensively, considered the issue closed after upgrade to openzfs 2.2.2</span><br><span></span><br><span>https://bugs.gentoo.org/917224#c26</span><
    <span></span><br><span>I wonder if the 2.2.3 issue is similar/related, or perhaps there are multiple triggers.</span><br><span></span><br><span>Watching with interest.</span><br><span></span><br><span>Best wishes,</span><br><span>Gareth</span><br><
    span></span><br></div></blockquote><br><div>As anyone interested can see from the ref to #15933 in the below, there seems to have been considerable effort in getting to grips with this bug (actually multiple bugs), and it looks like a fix may be
    forthcoming, though not sure at the time of writing if there may be some further polishing first</div><div><br></div><div><a href="https://github.com/openzfs/zfs/pull/16019">https://github.com/openzfs/zfs/pull/16019</a></div></div></body></html>
    --Apple-Mail-768CDE2A-5B49-454D-8FD8-08E26311BB95--

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gareth Evans@21:1/5 to Gareth Evans on Mon Mar 25 23:10:02 2024
    On Fri 22/03/2024 at 21:01, Gareth Evans <donotspam@fastmail.fm> wrote:
    As anyone interested can see from the ref to #15933 in the below, there seems to have been considerable effort in getting to grips with this bug (actually multiple bugs), and it looks like a fix may be forthcoming, though not sure at the time of
    writing if there may be some further polishing first

    https://github.com/openzfs/zfs/pull/16019

    https://github.com/openzfs/zfs/issues/15933

    is now closed as completed with fix

    https://github.com/openzfs/zfs/commit/102b468b5e190973fbaee6fe682727eb33079811

    which for the moment necessarily adds synchronous writes.

    FYI.
    Gareth

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Christensen@21:1/5 to Gareth Evans on Tue Mar 26 00:50:01 2024
    On 3/25/24 15:05, Gareth Evans wrote:
    On Fri 22/03/2024 at 21:01, Gareth Evans <donotspam@fastmail.fm> wrote:
    As anyone interested can see from the ref to #15933 in the below, there seems to have been considerable effort in getting to grips with this bug (actually multiple bugs), and it looks like a fix may be forthcoming, though not sure at the time of
    writing if there may be some further polishing first

    https://github.com/openzfs/zfs/pull/16019

    https://github.com/openzfs/zfs/issues/15933

    is now closed as completed with fix

    https://github.com/openzfs/zfs/commit/102b468b5e190973fbaee6fe682727eb33079811

    which for the moment necessarily adds synchronous writes.

    FYI.
    Gareth


    Thank you for keeping an eye on this.


    Looking at the github commit, the C code makes me worry -- it does not
    appear to use traditional C/C++ thread-safe programming techniques such
    as I learned in CS and used when I did systems programming (e.g. guard functions, critical sections, locks, semaphores, etc.). Do I need to
    look at more enclosing code to see such, are those techniques missing,
    are there some newer techniques I do not understand, or something else?


    David

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gareth Evans@21:1/5 to David Christensen on Tue Mar 26 09:20:01 2024
    On Mon 25/03/2024 at 23:40, David Christensen <dpchrist@holgerdanske.com> wrote:
    On 3/25/24 15:05, Gareth Evans wrote:
    On Fri 22/03/2024 at 21:01, Gareth Evans <donotspam@fastmail.fm> wrote:
    As anyone interested can see from the ref to #15933 in the below, there seems to have been considerable effort in getting to grips with this bug (actually multiple bugs), and it looks like a fix may be forthcoming, though not sure at the time of
    writing if there may be some further polishing first

    https://github.com/openzfs/zfs/pull/16019

    https://github.com/openzfs/zfs/issues/15933

    is now closed as completed with fix

    https://github.com/openzfs/zfs/commit/102b468b5e190973fbaee6fe682727eb33079811

    which for the moment necessarily adds synchronous writes.

    FYI.
    Gareth


    Thank you for keeping an eye on this.


    Looking at the github commit, the C code makes me worry -- it does not
    appear to use traditional C/C++ thread-safe programming techniques such
    as I learned in CS and used when I did systems programming (e.g. guard functions, critical sections, locks, semaphores, etc.).

    Do I need to
    look at more enclosing code to see such, are those techniques missing,
    are there some newer techniques I do not understand, or something else?

    I don't know, I will have a look too, though my C[++] is almost as rusty as my Rust :)

    G

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)