• Do no harm: Data loss from new (bookworm2trixie) discard=async default

    From Nicholas D Steeves@21:1/5 to All on Sat May 10 20:10:01 2025
    XPost: linux.debian.maint.boot

    Hi,

    Sorry for not preempting the backlog sooner.

    An ultra-brief history: many SSDs including various Samsungs, and if I
    remember correctly many drives with old SandForce controllers have
    broken discard=async. This was a big issue back in 2011-2014, and in
    some (many?) cases it was a data loss risk.

    Linux-6.2 started enabling discard=async by default (at least for
    btrfs), and deductively this appears to necessarily harm many users of
    at least pre2011-to-2014 SSDs. Does Linux-6.12.x, for trixie, have
    sufficient quirk coverage to make the new default safe, and fall to back
    to discard=sync for affected hardware? Alternatively, has our kernel
    been patched to maintain bookworm's 6.2.x behaviour of discard=sync?

    Security conscious users maintain that it presents a security risk when
    a filesystem issues discards to the underlying LUKS layer. Are we going
    to start doing this by default for trixie, or are we still going to
    block it at the dm-crypt layer?

    I'm most concerned about the btrfs-specific case, where "mount -o
    discard" is significantly riskier than running fstrim; it's a major contributing factor to those old "btrfs ate my data" stories, and the
    primary motivation for my Debian involvement is safe defaults for btrfs.

    Or do we ship a default configuration that provides the best performance
    for recent (five years) systems, and that is probably safe most
    mainstream systems? It looks like that's where we are now. In this
    case, are release notes really enough for what sounds like a data loss
    risk?

    Best,
    Nicholas

    -----BEGIN PGP SIGNATURE-----

    iQJEBAEBCgAuFiEE4qYmHjkArtfNxmcIWogwR199EGEFAmgfldcQHHN0ZW5AZGVi aWFuLm9yZwAKCRBaiDBHX30QYZsgD/9s0OfPOJuXnMhpWfltHLcOCO/yf+N8w9dp xdleMboWSqcUHnsaFsx+F3pGgKbfgvUaIlZcWTcL6AswDXxb3D2wFGhwyKVIYPku 6Vjy+0PO8odmlOCU3Sx3m39elv3DoQRkyVjADQQ3r42sYkRXV0jJgMcVvdcF4qWc a5WenFsHrZTCBy77jxdHviWJ9YINshtcHRnnG9ZCATQ8ZLdyr8L0icazO7ooUQfG +lDkz0FzGe/cKR/1GXAUIq4Se7NlVHMCxtl3ghWNi/lzUsRIEVgSe1o6v5Q01/ei lnnC4DzE1qoodlALFBRdr88KZEVbTAqfxNbouhQ/vh/vPsF8GRXiepG6gYH/x7hn kxotpUEzc8cg0zUXpcZ158/N0qDerbhw2vemGZCXR+KqLOfaKrJQKuanjrsuNKk5 54qQ4/52G9ZxkSq3eWaVJsdSkvx5Coluc1Maf7o9WPsVNrDEb/5DGHUXIhm779FS +TjY1viXoIdoRscqPtRW+nQQJiNE/1gCN1lHiAz5F0BTyMZoF/Zb9vJLn2ld2nCt 0qEHKvQcm72Pg7xyzNgSdoe/mwAv0Dzxa00YpivasJu2Q1I8yMdoSk4Rj4842eAY 4NREqKR5BMoqncFprZR03tcD2rWOy74uuloQa7mwRGycRWhWbGUY8R2LSYlv4A7s
    Bg4oyDps/w==
    =aVox
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Pascal Hambourg@21:1/5 to Nicholas D Steeves on Sun May 11 15:00:01 2025
    XPost: linux.debian.maint.boot

    On 10/05/2025 at 20:07, Nicholas D Steeves wrote:

    An ultra-brief history: many SSDs including various Samsungs, and if I remember correctly many drives with old SandForce controllers have
    broken discard=async.

    Correct me if I am wrong, but my understanding is that these SSDs have a
    broken *queued TRIM* feature. discard=sync and discard=async are btrfs
    features and both use queued TRIM if supported by the SSD (and not
    blackisted by the kernel) or non-queued TRIM otherwise.

    Linux-6.2 started enabling discard=async by default (at least for
    btrfs),

    To be clear for everyone: without an explicit discard mount option, the
    default for btrfs is now to enable discard in asynchronous mode
    (discard=async) instead of disabling discard (nodiscard). Explicit
    "discard" still enables discard in synchronous mode, (discard=sync).
    Explicit "nodiscard" is now needed to disable discard.

    and deductively this appears to necessarily harm many users of
    at least pre2011-to-2014 SSDs. Does Linux-6.12.x, for trixie, have sufficient quirk coverage to make the new default safe, and fall to back
    to discard=sync for affected hardware? Alternatively, has our kernel
    been patched to maintain bookworm's 6.2.x behaviour of discard=sync?

    As I understand it, it is not the asynchronous mode which may cause data
    loss with non-blacklisted broken TRIM but rather the discard option as a
    whole.

    Security conscious users maintain that it presents a security risk when
    a filesystem issues discards to the underlying LUKS layer. Are we going
    to start doing this by default for trixie, or are we still going to
    block it at the dm-crypt layer?

    The Debian installer already enables discard by default on encrypted
    devices since buster. The "discard" option is available for most
    filesystem types which support it (btrfs, ext4, FAT, HFS+, XFS) but is
    not enabled by default. JFS supports it since Linux 3.7 but this is not mentioned in mount(8). It is not available for swap.

    The Calamares installer may have different defaults. The package calamares-settings-debian has a file /etc/calamares/modules/fstab.conf
    which contains:

    ssdExtraMountOptions:
    ext4: discard
    jfs: discard
    xfs: discard
    swap: discard
    btrfs: discard,compress=lzo

    Also the fstrim systemd service is enabled and triggered once a week by default. Is it safer than online discard with broken TRIM ? If yes, can
    anyone explain why ?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)