• Compacting CNFS buffers

    From Nigel Reed@21:1/5 to All on Sat May 4 21:07:46 2024
    Has anyone investigated the feasibility of compacting or compressing
    the cnfs buffer files?

    Here's a couple of scenarios to consider, keeping in mind that
    generally, articles are not expired.

    1. You are sent a bunch of articles but discover you've left some
    binary newsgroups in your active file. You put this groups in your
    expire list and delete rmgroup but you're left with a lot of empty
    space, never to be used again unless the buffer recycles.

    2. You receive a bunch of googlegroup spam articles that are deleted
    via NOCEM, however considering there are so many, that leaves a lot of
    unused space.


    If you can find where an expired article is on disk and then find the
    next article, you can just move it on disk and update the pointers to
    the file. This could be a process that you just kick off or,
    preferably, something that runs when innd isn't fully occupied using
    spare cycles or something.


    I know disk space is cheap these days but some people may be limited.
    It would be good not to waste space.

    --
    End Of The Line BBS - Plano, TX
    telnet endofthelinebbs.com 23

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Tue May 7 12:14:26 2024
    Hi Nigel,

    Has anyone investigated the feasibility of compacting or compressing
    the cnfs buffer files?

    Some people use ZFS to compress CNFS buffers (cancelled articles are
    still present though). I am not aware of a compaction feature like the
    one you want.


    If you can find where an expired article is on disk and then find
    the next article, you can just move it on disk and update the
    pointers to the file. This could be a process that you just kick off
    or, preferably, something that runs when innd isn't fully occupied
    using spare cycles or something.
    I understand your point; I can add it to the wish list.

    FWIW, though technically this is not what you are asking for, some
    mechanisms may be used to mitigate your problems:


    1. You are sent a bunch of articles but discover you've left some
    binary newsgroups in your active file. You put this groups in your
    expire list and delete rmgroup but you're left with a lot of empty
    space, never to be used again unless the buffer recycles.

    You may want to configure Cleanfeed to reject binaries (including in
    binary groups) so as not to store them and waste space. Since a few
    weeks, NoCeM notices have also been sent for misplaced binaries (in
    non-binary groups).


    2. You receive a bunch of googlegroup spam articles that are deleted
    via NOCEM, however considering there are so many, that leaves a lot of
    unused space.

    Christoph Biedl implemented a new feature for INN 2.7.2 to store
    articles by their Path header field. It is a new "path" option in storage.conf. A typical use case is to store articles from a spammy
    site in a small CNFS buffer to avoid overall retention impacts.

    There's also the delayer program (in the contrib directory before INN
    2.7.2) that you can use to delay articles, and give cancel control
    articles and NoCeM messages time to arrive. For instance, by having a
    frontend instance of innd receiving the articles from all your peers and another local instance of innd fed by your frontend with a delay except
    for cancels and NoCeM articles. The CNFS buffers of that second
    instance will be spam free.
    https://www.eyrie.org/~eagle/software/inn/docs/delayer.html

    --
    Julien ÉLIE

    « Aequum est ut cuius participauit lucrum, participet et damnun. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nigel Reed@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed May 8 00:43:09 2024
    On Tue, 7 May 2024 12:14:26 +0200
    Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Nigel,

    Has anyone investigated the feasibility of compacting or compressing
    the cnfs buffer files?

    Some people use ZFS to compress CNFS buffers (cancelled articles are
    still present though). I am not aware of a compaction feature like
    the one you want.

    I am using ZFS with CNFS and it does a good job. I also want to use the
    server for other purposes so reclaiming any space would be extremely
    useful.

    If you can find where an expired article is on disk and then find
    the next article, you can just move it on disk and update the
    pointers to the file. This could be a process that you just kick off
    or, preferably, something that runs when innd isn't fully occupied
    using spare cycles or something.
    I understand your point; I can add it to the wish list.

    That would be good.

    1. You are sent a bunch of articles but discover you've left some
    binary newsgroups in your active file. You put this groups in your
    expire list and delete rmgroup but you're left with a lot of empty
    space, never to be used again unless the buffer recycles.

    You may want to configure Cleanfeed to reject binaries (including in
    binary groups) so as not to store them and waste space. Since a few
    weeks, NoCeM notices have also been sent for misplaced binaries (in non-binary groups).

    Unfortunately the articles are already in the CFS buffers. My bad for forgetting to remove some binary groups from the active file. I did not
    have cleanfeed running when importing since it's advised to turn off
    perl and python filtering.

    2. You receive a bunch of googlegroup spam articles that are deleted
    via NOCEM, however considering there are so many, that leaves a lot
    of unused space.

    Christoph Biedl implemented a new feature for INN 2.7.2 to store
    articles by their Path header field. It is a new "path" option in storage.conf. A typical use case is to store articles from a spammy
    site in a small CNFS buffer to avoid overall retention impacts.

    I'll look into it, but again, the damage is already done.


    There's also the delayer program (in the contrib directory before INN
    2.7.2) that you can use to delay articles, and give cancel control
    articles and NoCeM messages time to arrive. For instance, by having
    a frontend instance of innd receiving the articles from all your
    peers and another local instance of innd fed by your frontend with a
    delay except for cancels and NoCeM articles. The CNFS buffers of
    that second instance will be spam free.
    https://www.eyrie.org/~eagle/software/inn/docs/delayer.html


    Sounds interesting but, again, I already have a lot of binary articles.
    I'm not sure I want to set up a second server. I have a hard enough
    time with one :)

    I'll hold out hope someone with more knowledge than I also sees the
    issue and decides to look into compacting CNFS buffers.

    Thanks,
    Nigel


    --
    End Of The Line BBS - Plano, TX
    telnet endofthelinebbs.com 23

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Mon May 13 22:56:50 2024
    Hi Nigel,

    I'll hold out hope someone with more knowledge than I also sees the
    issue and decides to look into compacting CNFS buffers.

    It may as well be a new type of storage method, mixing the best of cnfs
    and timecaf.
    As far as I understand, the use case is to have large compacted buffers
    without wrapping (articles do not expire but cancelled articles should
    not be kept). It would correspond to timecaf except that a new CAF file
    is created when it is full instead of every 256 seconds. Expiring CAF
    files just compacts them if articles have been cancelled, releasing disk
    space.
    The feature may be implemented as an evolution of the current timecaf
    method with options to parameterize it in storage.conf (like cnfs has
    options). For instance with a maxart and a maxtime option to specify
    the number of articles per CAF file (currently hard-coded to 262144) and
    the number of seconds before creating a new CAF file (currently
    hard-coded to 256 seconds but it may easily be a multiple of 256 seconds
    so as to keep the current file naming). With maxtime set to 0, a new
    file is created when maxart is reached.

    Naturally, though it is more work, a totally new storage method could
    also be created as timecaf is inherently linked to time and suffers from
    the limitation that you cannot store more than maxart articles received
    during maxtime seconds. They will just be dropped until a new CAF file
    is created. It is not what you expect from the storage method you're
    asking for. And re-using CNFS buffers may be tricky (to find and refill
    holes, or to totally rewrite them - changing the storage tokens of all articles).

    --
    Julien ÉLIE

    « Vinum bonum laetificat cor hominis. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)