• Switching INN 2 storage format

    From Tanguy Ortolo@21:1/5 to All on Tue Feb 11 15:11:54 2025
    Hello all,

    My news server running INN 2 is storing all articles to a timecaf. I am currently in the process of switching my file systems to btrfs (mainly
    to get bitrot detection, see below for more details about that).

    I do not expect timecaf and a CoW filesystem such as btrfs to play well together. Indeed, writing a new small article to a large file should
    inevitably fragment it. To avoid that, I disabled copy-on-write for the timecaf, but that also disables file extent checksuming and therefore
    bitrot detection, defeating the main reason I am switching to btrfs in
    the first place.

    Therefore, I am considering switching to a news storage format more
    suitable with a CoW filesystem. I think the best option would be
    timehash, any thoughts on that?

    As I understand it, switching storage format for /new/ articles can
    simply be done by simply adding the new storage backend in first
    position in storage.conf, can it not?

    I do not plan on migrating existing articles, but simply to wait for
    them to expire since there does not seem to exist any simple migration procedure. But if someone knows a way to do so, I could be interested.
    ;-)

    Thanks for reading!


    For those interested, I have two SSDs that are set up in a software
    RAID1 with Linux LVM lvmraid(7). This is very flexible and will support
    a drive failure… but not bitrot. Indeed, bitrot is detected by the
    LVM /scrubbing/ process, but it will not know which drive has the
    unaltered data (if any).

    Linux LVM RAID has an optional integrity layer that can identify
    corrupted data, but while it does fix it on-the-fly by querying the
    other drive, it does not report which drive is altering data. And it
    disables volume snapshotting.

    After doing some research, it seems btrfs does fix all this, since it
    maintains data checksums, and its scrubbing process does update
    per-drive error counters.

    Btrfshat checksuming works with its copy-on-write design. In practice, disabling CoW on some files, or even on an entire filesystem, also
    disables checksuming. Therefore, I am looking for a way to store news
    that would work well with btrfs' copy-on-write. :-)

    --
    Tanguy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tanguy Ortolo@21:1/5 to All on Thu Feb 13 15:14:04 2025
    Actually, while digging into INN 2 storage formats, I am more and more considering switching to tradspool. The idea is that I prefer something
    simple, easy to explain and understand, than something more complex.

    Indeed, timecaf, as documented in <https://github.com/InterNetNews/inn/blob/main/storage/timecaf/README.CAF>, really looks like some kind of filesystem. And I am a bit disturbed by
    the idea of stacking such a filesystem on top of my actual filesystem,
    because storing files, even if they are small and there are many of
    them, seems like a good job for a regular filesystem.

    (By comparison, CNFS is rightly described as a specialized filesystem.)

    timehash relies more on the actual filesystem, with articles as
    individual files, sorted in directories depending on their reception
    date and time. Compared to timecaf and CNFS, it is supposed to be slower because manipulating small files is slower than updating larger ones.

    As for tradspool, with articles as individual files in directories that replicate the newgroup hierarchy, it is supposed to by very slow with
    large groups it means manipulating files in directories with many files.
    And the expiration process is supposed to be slow as well, though I am
    not sure why it would be so.

    What I am now wondering, is how true the assumptions of slowness are
    with a modern filesystem such as btrfs. I just made a test, creating a
    million of small files (between 500 and 3000 bytes each) with random
    content. Listing is slow, but existence checking, file creation and
    deletion are not.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Fri Feb 14 20:55:49 2025
    Salut Tanguy,

    Therefore, I am considering switching to a news storage format more
    suitable with a CoW filesystem. I think the best option would be
    timehash, any thoughts on that?

    If you are looking for a news storage format writing one article per
    file, either timehash or tradspool could be used.


    As I understand it, switching storage format for /new/ articles can
    simply be done by simply adding the new storage backend in first
    position in storage.conf, can it not?

    Exactly. The first matching class found in the storage.conf file is
    used. You'll have to restart innd to take the modified file into account.


    I do not plan on migrating existing articles, but simply to wait for
    them to expire since there does not seem to exist any simple migration procedure. But if someone knows a way to do so, I could be interested.

    There is a program named "respool" in the contrib directory <https://github.com/InterNetNews/inn/blob/main/contrib/respool.c> but I
    have never used it so I do not know whether it works fine. Use at your
    own risk! :-)

    --
    Julien ÉLIE

    « – Dis Astérix ! Quelle salade pour un peu d'huile !
    – Oui, et dépêchons-nous de trouver un guérisseur avant que ça ne
    tourne au vinaigre. » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tanguy Ortolo@21:1/5 to All on Mon Feb 17 15:14:56 2025
    Julien ÉLIE, 2025-02-14 20:55+0100:
    If you are looking for a news storage format writing one article per
    file, either timehash or tradspool could be used.

    I got that already. I was considering timehash for performance reasons,
    but your insight eventually convinced me of using traspool. The simpler,
    the better. My time is more valuable than my CPU's time. ;-)

    Plus it will allow me to easily do some basic stats about article size
    and filesystem usage.

    As I understand it, switching storage format for /new/ articles can
    simply be done by simply adding the new storage backend in first
    position in storage.conf, can it not?

    Exactly. The first matching class found in the storage.conf file is
    used. You'll have to restart innd to take the modified file into account.

    Thanks for confirming. Of course I restarted innd, I never imagined such
    a change could be applied without restarting.

    There is a program named "respool" in the contrib directory <https://github.com/InterNetNews/inn/blob/main/contrib/respool.c> but I
    have never used it so I do not know whether it works fine. Use at your
    own risk! :-)

    Or not use at all. Keeping existing articles in timecaf is good enough,
    that worked for years after all. :-)

    --
    Tanguy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tanguy Ortolo@21:1/5 to All on Mon Feb 17 15:15:09 2025
    Merci Julien !

    Julien ÉLIE, 2025-02-14 20:55+0100:
    I don't believe the disadvantages mentioned in the storage.conf manual
    page for tradspool still apply today. They used to on older hardware
    and with a higher traffic than today.
    The expiration process is not that slow, especially when using the
    delayrm flag with news.daily.

    I thought so, thanks for confirming.

    I would just be inclined to change:

    "It takes a very fast file system and I/O system to keep up with current Usenet traffic volumes due to file system overhead. It requires a
    nightly expire program to delete old articles out of the news spool, a process that can slow down the server for several hours or more."

    to:

    "It needs a faster file system and I/O system than the cnfs and timecaf storage methods due to file system overhead. It also consumes more
    inodes and requires running a nightly expire program to delete old
    articles out of the news spool."

    By the way, timecaf also requires a nightly expire program, does it not?

    --
    Tanguy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Thu Feb 20 12:16:23 2025
    Salut Tanguy,

    [tradspool]
    "It needs a faster file system and I/O system than the cnfs and timecaf
    storage methods due to file system overhead. It also consumes more
    inodes and requires running a nightly expire program to delete old
    articles out of the news spool."

    By the way, timecaf also requires a nightly expire program, does it not?

    Yes, the expire program is useful for storage backends that are not self-expiring (CNFS). I'll homogenize the wording for timehash and
    timecaf to mention that. The nightly expire program deletes old
    articles by either compacting CAF files if they still contain available articles, or removing them.

    --
    Julien ÉLIE

    « – Par Poséidon ! Quel prodige !!!
    – Par Neptune ! Quel sans-gêne ! » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)