• Improving trn: LISTGROUP with no arguments vs. ...?

    From Richard@21:1/5 to All on Mon Nov 4 23:16:17 2024
    [Please do not mail me a copy of your followup]

    Hi All,

    Currently trn4 does a LISTGROUP command with no arguments when
    entering a newsgroup. When entering a newsgroup with many messages
    (e.g. a large difference between low and high article numbers), this
    can be very slow. (See <https://github.com/LegalizeAdulthood/trn/issues/9>)

    LISTGROUP can take low and high article number arguments, and my initial thought was that trn should issue LISTGROUP with arguments based on
    the user's newsrc file to build a list of article numbers representing
    unread articles.

    Then I wondered about cross-reference headers, which mention articles
    by number, and I wondered if the reason trn was fetching all article
    numbers was in order to prefetch headers for every article in case it
    was mentioned in a cross-reference.

    Using strace, I see that trn will start fetching overview data for
    (seemingly all the listed) articles while paused for user input.

    trn is quite old code and doesn't even use the older NNTP extension,
    nevermind the newer extensions one could expect from a more modern
    server like inn.

    For reference the existing code that takes a long time is here: <https://github.com/LegalizeAdulthood/trn/blob/a70daddff78d08e0d2c97739009376ad29094905/libtrn/bits.cpp#L252>
    It's not the algorithm that is slow per se, but asking inn for all the
    article numbers in a group with thousands of articles takes quite a
    long time. I suspect that inn doesn't have any sort of in-memory
    cache for this information and instead is directory scanning the spool directory and reporting back the article numbers one by one.

    What's the best alternative to LISTGROUP with no args?

    Thanks
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Urs =?UTF-8?Q?Jan=C3=9Fen?=@21:1/5 to Richard on Tue Nov 5 00:53:27 2024
    Richard wrote:
    What's the best alternative to LISTGROUP with no args?

    you could implement something like tins -G option:

    | -G article-limit
    | Limit the number of articles/group to retrieve from the
    | server. If article-limit is > 0 not more than the last ar‐
    | ticle-limit articles/group are fetched from the server. If
    | article-limit is < 0 tin will start fetching articles from
    | your first unread minus absolute value of article-limit. De‐
    | fault is 0, which means no limit.

    see art.c:setup_hard_base() ~240

    (or just the part with the negative limit).

    or just rely on the overview data and article numbers (if present).

    both suggestion do have drawbacks (which partially undermine the actual
    purpose of listgroup - get zthe numbers of all available articles).

    with limits you do not see all articles in the group (which i.e. may
    affect threading) and relying on overview data is not fail-safe as that
    data may not be up to date (regadring expired or cancelled articles).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Tue Nov 5 21:10:55 2024
    Hi Richard,

    It's not the algorithm that is slow per se, but asking inn for all the article numbers in a group with thousands of articles takes quite a
    long time.

    How much time does it take?

    Just tried on my own INN server:

    LIST COUNTS fr.soc.politique
    215 Newsgroups in form "group high low count status"
    fr.soc.politique 2904875 2698 1365681 y
    .

    LISTGROUP returns the 1,365,681 articles in 2,6 seconds. Not that slow...


    I suspect that inn doesn't have any sort of in-memory
    cache for this information and instead is directory scanning the spool directory and reporting back the article numbers one by one.

    INN has the available articles and its overview database, and depending
    on the configuration of the server (nnrpdcheckart and groupexactcount parameters), either directly returns the information from overview or
    checks the article is still here.
    The 2,6 seconds above include the check.

    --
    Julien ÉLIE

    « Ex nihilo nihil. » (Perse)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Wed Nov 6 15:47:37 2024
    [Please do not mail me a copy of your followup]

    =?UTF-8?Q?Julien_=C3=89LIE?= <iulius@nom-de-mon-site.com.invalid> spake the secret code
    <vgdu4f$16ru$1@news.trigofacile.com> thusly:

    Hi Richard,

    It's not the algorithm that is slow per se, but asking inn for all the
    article numbers in a group with thousands of articles takes quite a
    long time.

    How much time does it take?

    On my server (admittedly, nntp isn't as important to my ISP as it was
    in the 90s, so they probably have it running on an old machine for the
    last few customers that actually read news -- like me), it takes
    multiple minutes to get all the articles for, say, comp.arch.

    When I watch it with strace it's about one article per second and the
    read buffer is so empty that it's literally reading just one response
    line per call to read.

    LISTGROUP returns the 1,365,681 articles in 2,6 seconds. Not that slow...

    Yes, it all varies, depending on your server, etc.

    My long-term goal for trn is to move more towards asynchronous I/O
    instead of synchronous I/O. So I'm thinking LISTGROUP with low and
    high used from your newsrc in order to fetch the likely never seen
    before article numbers on the blocking path and then asynchronously
    fetch whatever is needed in the background once you've entered the
    group. There's nothing wrong with doing a LISTGROUP for all the
    remaining article numbers asynchronously in the background.

    The current algorithm has you blocked waiting for every article number
    in the group before it shows you anything about new articles.

    I suspect that inn doesn't have any sort of in-memory
    cache for this information and instead is directory scanning the spool
    directory and reporting back the article numbers one by one.

    INN has the available articles and its overview database, and depending
    on the configuration of the server (nnrpdcheckart and groupexactcount >parameters), either directly returns the information from overview or
    checks the article is still here.
    The 2,6 seconds above include the check.

    Well, I'm not certain why it's so slow on my ISP. Watching it in
    strace reveals that reading all these article numbers back for a
    newsgroup with many articles is indeed the thing that takes a long
    time.
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Wed Nov 6 20:18:20 2024
    Hi Richard,

    On my server (admittedly, nntp isn't as important to my ISP as it was
    in the 90s, so they probably have it running on an old machine for the
    last few customers that actually read news -- like me), it takes
    multiple minutes to get all the articles for, say, comp.arch.

    Could you try to send a LISTGROUP command to fr.soc.politique with trn connected to my news server? (news.trigofacile.com, available in reading)
    It would be interesting to know the result. There may also be a
    bandwidth issue?


    When I watch it with strace it's about one article per second and the
    read buffer is so empty that it's literally reading just one response
    line per call to read.

    Very slow for just an article number (not the contents of a whole article).


    My long-term goal for trn is to move more towards asynchronous I/O
    instead of synchronous I/O. So I'm thinking LISTGROUP with low and
    high used from your newsrc in order to fetch the likely never seen
    before article numbers on the blocking path and then asynchronously
    fetch whatever is needed in the background once you've entered the
    group. There's nothing wrong with doing a LISTGROUP for all the
    remaining article numbers asynchronously in the background.

    It looks like a good move to do.
    Notably if you send an OVER command after the LISTGROUP, and it takes
    the same amount of time to run...

    --
    Julien ÉLIE

    « I had some words with my wife, and she had some paragraphs with me. »
    (Sigmund Freud)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Mon Nov 18 19:44:38 2024
    [Please do not mail me a copy of your followup]

    =?UTF-8?Q?Julien_=C3=89LIE?= <iulius@nom-de-mon-site.com.invalid> spake the secret code
    <vggfds$3bf0$1@news.trigofacile.com> thusly:

    Could you try to send a LISTGROUP command to fr.soc.politique with trn >connected to my news server? (news.trigofacile.com, available in reading)
    It would be interesting to know the result.

    Unfotunately trn (what's installed on my ISP, not my local build) core
    dumps:

    Entering fr.soc.politique:
    Getting overview file. [**************************** buffer overflow
    detected ***: terminated
    Aborted (core dumped)

    There may also be a bandwidth issue?

    In my case with the long delay in retuning the result from LISTGROUP
    is not a bandwidth issue; both the client and the server are running
    within the same local network.

    When I watch it with strace it's about one article per second and the
    read buffer is so empty that it's literally reading just one response
    line per call to read.

    Very slow for just an article number (not the contents of a whole article).

    Yep.

    -- Richard

    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)