Forum: >>> Magnum BBS <<<

Do no harm: Data loss from new (bookworm2trixie) discard=async default

From Nicholas D Steeves@21:1/5 to All on Sat May 10 20:10:01 2025

XPost: linux.debian.maint.boot

Hi,

Sorry for not preempting the backlog sooner.

An ultra-brief history: many SSDs including various Samsungs, and if I
remember correctly many drives with old SandForce controllers have
broken discard=async. This was a big issue back in 2011-2014, and in
some (many?) cases it was a data loss risk.

Linux-6.2 started enabling discard=async by default (at least for
btrfs), and deductively this appears to necessarily harm many users of
at least pre2011-to-2014 SSDs. Does Linux-6.12.x, for trixie, have
sufficient quirk coverage to make the new default safe, and fall to back
to discard=sync for affected hardware? Alternatively, has our kernel
been patched to maintain bookworm's 6.2.x behaviour of discard=sync?

Security conscious users maintain that it presents a security risk when
a filesystem issues discards to the underlying LUKS layer. Are we going
to start doing this by default for trixie, or are we still going to
block it at the dm-crypt layer?

I'm most concerned about the btrfs-specific case, where "mount -o
discard" is significantly riskier than running fstrim; it's a major contributing factor to those old "btrfs ate my data" stories, and the
primary motivation for my Debian involvement is safe defaults for btrfs.

Or do we ship a default configuration that provides the best performance
for recent (five years) systems, and that is probably safe most
mainstream systems? It looks like that's where we are now. In this
case, are release notes really enough for what sounds like a data loss
risk?

Best,
Nicholas

-----BEGIN PGP SIGNATURE-----

iQJEBAEBCgAuFiEE4qYmHjkArtfNxmcIWogwR199EGEFAmgfldcQHHN0ZW5AZGVi aWFuLm9yZwAKCRBaiDBHX30QYZsgD/9s0OfPOJuXnMhpWfltHLcOCO/yf+N8w9dp xdleMboWSqcUHnsaFsx+F3pGgKbfgvUaIlZcWTcL6AswDXxb3D2wFGhwyKVIYPku 6Vjy+0PO8odmlOCU3Sx3m39elv3DoQRkyVjADQQ3r42sYkRXV0jJgMcVvdcF4qWc a5WenFsHrZTCBy77jxdHviWJ9YINshtcHRnnG9ZCATQ8ZLdyr8L0icazO7ooUQfG +lDkz0FzGe/cKR/1GXAUIq4Se7NlVHMCxtl3ghWNi/lzUsRIEVgSe1o6v5Q01/ei lnnC4DzE1qoodlALFBRdr88KZEVbTAqfxNbouhQ/vh/vPsF8GRXiepG6gYH/x7hn kxotpUEzc8cg0zUXpcZ158/N0qDerbhw2vemGZCXR+KqLOfaKrJQKuanjrsuNKk5 54qQ4/52G9ZxkSq3eWaVJsdSkvx5Coluc1Maf7o9WPsVNrDEb/5DGHUXIhm779FS +TjY1viXoIdoRscqPtRW+nQQJiNE/1gCN1lHiAz5F0BTyMZoF/Zb9vJLn2ld2nCt 0qEHKvQcm72Pg7xyzNgSdoe/mwAv0Dzxa00YpivasJu2Q1I8yMdoSk4Rj4842eAY 4NREqKR5BMoqncFprZR03tcD2rWOy74uuloQa7mwRGycRWhWbGUY8R2LSYlv4A7s
Bg4oyDps/w==
=aVox
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Pascal Hambourg@21:1/5 to Nicholas D Steeves on Sun May 11 15:00:01 2025

XPost: linux.debian.maint.boot

On 10/05/2025 at 20:07, Nicholas D Steeves wrote:

An ultra-brief history: many SSDs including various Samsungs, and if I remember correctly many drives with old SandForce controllers have
broken discard=async.

Correct me if I am wrong, but my understanding is that these SSDs have a
broken *queued TRIM* feature. discard=sync and discard=async are btrfs
features and both use queued TRIM if supported by the SSD (and not
blackisted by the kernel) or non-queued TRIM otherwise.

Linux-6.2 started enabling discard=async by default (at least for
btrfs),

To be clear for everyone: without an explicit discard mount option, the
default for btrfs is now to enable discard in asynchronous mode
(discard=async) instead of disabling discard (nodiscard). Explicit
"discard" still enables discard in synchronous mode, (discard=sync).
Explicit "nodiscard" is now needed to disable discard.

and deductively this appears to necessarily harm many users of
at least pre2011-to-2014 SSDs. Does Linux-6.12.x, for trixie, have sufficient quirk coverage to make the new default safe, and fall to back
to discard=sync for affected hardware? Alternatively, has our kernel
been patched to maintain bookworm's 6.2.x behaviour of discard=sync?

As I understand it, it is not the asynchronous mode which may cause data
loss with non-blacklisted broken TRIM but rather the discard option as a
whole.

Security conscious users maintain that it presents a security risk when
a filesystem issues discards to the underlying LUKS layer. Are we going
to start doing this by default for trixie, or are we still going to
block it at the dm-crypt layer?

The Debian installer already enables discard by default on encrypted
devices since buster. The "discard" option is available for most
filesystem types which support it (btrfs, ext4, FAT, HFS+, XFS) but is
not enabled by default. JFS supports it since Linux 3.7 but this is not mentioned in mount(8). It is not available for swap.

The Calamares installer may have different defaults. The package calamares-settings-debian has a file /etc/calamares/modules/fstab.conf
which contains:

ssdExtraMountOptions:
ext4: discard
jfs: discard
xfs: discard
swap: discard
btrfs: discard,compress=lzo

Also the fstrim systemd service is enabled and triggered once a week by default. Is it safer than online discard with broken TRIM ? If yes, can
anyone explain why ?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Volatile_Memory
  Wed Sep 17 07:20:57 2025
  from Des Moines, Iowa via SSH
- Volatile_Memory
  Wed Sep 17 07:17:26 2025
  from Des Moines, Iowa via SSH
- Bob Worm
  Tue Sep 16 21:01:27 2025
  from Wales, Uk via Telnet
- Bob Worm
  Tue Sep 16 15:15:42 2025
  from Wales, Uk via Telnet
- Gretchiie
  Tue Sep 16 05:20:21 2025
  from Derry, Nh via Telnet
- Ginger1
  Mon Sep 15 19:33:54 2025
  from London via SSH
- Bob Worm
  Mon Sep 15 15:42:34 2025
  from Wales, Uk via Telnet
- Gretchiie
  Mon Sep 15 05:16:29 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (3 / 13)
Uptime:	45:25:44
Calls:	10,394
Calls today:	2
Files:	14,066
Messages:	6,417,264

Do no harm: Data loss from new (bookworm2trixie) discard=async default

Who's Online

Recent Visitors

System Info