• Generic 403 NNTP response code for TAKETHIS

    From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon Dec 12 23:40:02 2022
    Hi all,

    Contrary to IHAVE, for which the 436 response code can be sent to defer
    an article (retry later), TAKETHIS does not have such a response code.

    From RFC 4644:

    Responses
    239 message-id Article transferred OK
    439 message-id Transfer rejected; do not retry

    The server MUST return either a 239 response,
    indicating that the article was successfully transferred, or a 439
    response, indicating that the article was rejected. If the server
    encounters a temporary error that prevents it from processing the
    article but does not want to reject the article, it MUST reply with a
    400 response to the client and close the connection.



    Yet, RFC 3977 defines a generic response code (403) for temporary
    internal problems, as we can see in the example:

    Example of a temporary failure:

    [C] GROUP archive.local
    [S] 403 Archive server temporarily offline


    Is the generic code 403 really not allowed for temporary errors in
    TAKETHIS? (It seems to be the case when reading that explicit behaviour
    to use the 400 generic code.)

    Seems like a vestige from RFC 2980 which defined the following responses
    for TAKETHIS:

    239 article transferred ok
    400 not accepting articles
    439 article transfer failed
    480 Transfer permission denied
    500 Command not understood




    As far as INN is concerned, it currently returns 403 after TAKETHIS only
    in 2 cases: when innd can't store the article (empty token) or can't
    write the history file.

    Do you think a generic code could be possible in these cases (and then
    it would be an erratum to RFC 4644 so as to add the possibility to use
    the new 403 generic code introduced with RFC 3977) or should the
    connection really be terminated as it was the case with obsolete RFC 2980?


    FWIW, one of the case that can trigger the "can't store article" in innd
    is when INT_MAX is reached. The 403 (or 400) code should really be 439
    in this specific case because it is not a temporary failure. Same thing
    for IHAVE which should answer 437 (reject) instead of defer (436) in
    this case. We would otherwise end up with an infinite loop where the
    article is stuck and kept being proposed even though the maximum article
    number has been reached and INN won't be able to store more articles in
    the newsgroup. (A log is also present in news.notice.)

    --
    Julien ÉLIE

    « Il faut mépriser l'argent, surtout la petite monnaie. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to iulius@nom-de-mon-site.com.invalid on Mon Dec 12 15:53:41 2022
    Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> writes:

    Contrary to IHAVE, for which the 436 response code can be sent to defer an article (retry later), TAKETHIS does not have such a response code.

    From RFC 4644:

    Responses
    239 message-id Article transferred OK
    439 message-id Transfer rejected; do not retry

    The server MUST return either a 239 response,
    indicating that the article was successfully transferred, or a 439
    response, indicating that the article was rejected. If the server
    encounters a temporary error that prevents it from processing the
    article but does not want to reject the article, it MUST reply with a
    400 response to the client and close the connection.

    Yet, RFC 3977 defines a generic response code (403) for temporary internal problems, as we can see in the example:

    Example of a temporary failure:

    [C] GROUP archive.local
    [S] 403 Archive server temporarily offline

    Is the generic code 403 really not allowed for temporary errors in
    TAKETHIS? (It seems to be the case when reading that explicit behaviour
    to use the 400 generic code.)

    Yeah, I think that's the intent. I don't remember why, though. I wonder
    if someone tested with 403 and found it didn't work or something.

    At the time, we probably couldn't imagine a situation where storing the
    article fails but storing the next article is likely to succeed, so 400
    and closing the connection felt pretty reasonable.

    FWIW, one of the case that can trigger the "can't store article" in innd
    is when INT_MAX is reached. The 403 (or 400) code should really be 439
    in this specific case because it is not a temporary failure.

    Yes, and indeed I think 403 or 400 would be wrong here, because the peer
    would be entitled to just keep resending the same article, and it's never
    going to succeed.

    Same thing for IHAVE which should answer 437 (reject) instead of defer
    (436) in this case. We would otherwise end up with an infinite loop
    where the article is stuck and kept being proposed even though the
    maximum article number has been reached and INN won't be able to store
    more articles in the newsgroup. (A log is also present in news.notice.)

    Yup, exactly.

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to iulius@nom-de-mon-site.com.invalid on Tue Dec 13 15:08:40 2022
    Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> writes:

    Incidentally, though TAKETHIS does not have a response code for a "retry later" in both RFC 2980 and 4644, I've just checked what Diablo does
    with TAKETHIS (which is to use 431 - the "retry later" of CHECK). And previously to the implementation of NNTP Version 2 in INN, it was using
    436 (the "retry later" of IHAVE) - that I (wrongly) changed to 403. I'll
    fix it.

    My recollection of the history of a lot of this is that the expectation is
    that you use CHECK first, and servers really shouldn't be saying yes to
    the CHECK and then deferring the article on TAKETHIS. Usually deferrals
    are used because another peer has already sent CHECK for the same article
    and therefore we'll *probably* get the article from them, but we might
    not, so try again later just in case. After a TAKETHIS, well, you *have*
    the article, so why defer it? You're not going to save any bandwidth; you
    have already paid the transit cost and you have the whole article, so just
    do whatever you're going to do, accept it or reject it, don't put it off.

    I think the problem with this logic is that it was insufficiently
    imaginative about why deferrals might happen. It was thinking of the case where you're proactively deduplicating feeds so that you don't consume
    more bandwidth than you need to, but not thinking about cases where, say,
    the news server is consulting some external spam classification oracle
    that's malfunctioning and thus wants you to send the message again later.

    That said, one could also make an argument that news servers should be
    prepared to spool those articles locally, rather than telling peers to
    spool articles for them, if the article has already been sent. Deferring
    after TAKETHIS runs the risk of multiplying the bandwidth consumed by that article unnecessarily, so I think that's the logic behind reserving it for really serious problems like "my disk went away and I can't store any
    articles and I'm about to explode" sorts of things like 400. Short of
    that, well, you have the article now, so I think the idea is that you
    should put it somewhere and deal with your problems locally rather than
    asking your peer to send it yet again.

    (That being said, I don't think any news server actually *does* that.)

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue Dec 13 23:53:58 2022
    Hi Russ,

    Is the generic code 403 really not allowed for temporary errors in
    TAKETHIS? (It seems to be the case when reading that explicit behaviour
    to use the 400 generic code.)

    Yeah, I think that's the intent. I don't remember why, though. I wonder
    if someone tested with 403 and found it didn't work or something.

    OK, I see the point. It is a bit similar to 501 (syntax error) and
    other generic responses that are sometimes not understood by the remote
    peer which will consider that the article should be proposed again and
    again whereas it is a permanent error.


    At the time, we probably couldn't imagine a situation where storing the article fails but storing the next article is likely to succeed, so 400
    and closing the connection felt pretty reasonable.

    Yes that makes totally sense.


    FWIW, one of the case that can trigger the "can't store article" in innd
    is when INT_MAX is reached. The 403 (or 400) code should really be 439
    in this specific case because it is not a temporary failure.

    Yes, and indeed I think 403 or 400 would be wrong here, because the peer would be entitled to just keep resending the same article, and it's never going to succeed.

    I'll have a look to fix it, and use the expected 400 response code with
    a connection closure.


    Incidentally, though TAKETHIS does not have a response code for a "retry
    later" in both RFC 2980 and 4644, I've just checked what Diablo does
    with TAKETHIS (which is to use 431 - the "retry later" of CHECK). And previously to the implementation of NNTP Version 2 in INN, it was using
    436 (the "retry later" of IHAVE) - that I (wrongly) changed to 403.
    I'll fix it.

    --
    Julien ÉLIE

    « A program should always respond to the user in the way that astonishes
    him least. » (Plauger's Law of Least Astonishment)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon Dec 19 22:57:12 2022
    Hi Russ,

    After a TAKETHIS, well, you *have* the article, so why defer it?
    You're not going to save any bandwidth; you have already paid the
    transit cost and you have the whole article, so just do whatever
    you're going to do, accept it or reject it, don't put it off.
    [...]
    not thinking about cases where, say, the news server is consulting
    some external spam classification oracle that's malfunctioning and
    thus wants you to send the message again later.
    [...]> Short of that, well, you have the article now, so I think the idea is
    that you should put it somewhere and deal with your problems locally
    rather than asking your peer to send it yet again.

    (That being said, I don't think any news server actually *does* that.)

    All very wise thoughts, thanks for sharing them!

    --
    Julien ÉLIE

    « Mais pourquoi courent-ils si vite ? Pour gagner du temps ! Comme le
    temps, c'est de l'argent… plus ils courent vite, plus ils en
    gagnent. » (Raymond Devos)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jan 6 00:01:21 2023
    Responding to myself:

    FWIW, one of the case that can trigger the "can't store article" in
    innd is when INT_MAX is reached.
    I do not manage to reproduce it on my current VPS...
    A few years ago, with an older hardware and OS, I had "436 can't store article".

    Now, on my set up, INN is happily accepting articles > 2^31. No storage
    error, and article numbers > 2^31 are present in NNTP responses.

    LIST ACTIVE returns large article numbers for all the storage methods:

    trigofacile.test.maxartnum.cnfs 2147483649 2147483646 y trigofacile.test.maxartnum.timecaf 2147483648 2147483646 y trigofacile.test.maxartnum.timehash 2147483651 2147483646 y trigofacile.test.maxartnum.trash 2147483648 2147483646 y trigofacile.test.maxartnum.tradspool 2147483650 2147483646 y

    Articles are stored in disk, and accessible via ARTICLE <mid> requests
    (not by article numbers though).

    The overall behaviour depends on the overview method used.



    With tradindexed, article numbers > 2^31 are not returned in LISTGROUP,
    OVER and like commands because they are not present in the index
    ("cannot write index record for 18446744071562067968" - seems like
    article number wraps to -1 and then is written 2^64 as unsigned long in
    the log).


    buffindexed returns the articles but not with the right numbers:

    LISTGROUP
    211 3 2147483646 18446744071562067969 trigofacile.test.maxartnum.timehash 2147483647
    2147483648
    18446744071562067969
    .

    though the last one correctly has:

    Xref: news.trigofacile.com trigofacile.test.maxartnum.timehash:2147483649






    ovdb returns the right article numbers except for the reported high
    water mark:

    LISTGROUP
    211 3 2147483646 18446744071562067969 trigofacile.test.maxartnum.timehash 2147483647
    2147483648
    2147483649
    .

    but OVER responses report negative article numbers starting from -2^31: -2147483648, -2147483647, etc.




    ovsqlite has a different and strange behaviour as the low and high water
    marks are inverted, and article numbers are also returned in reverse order:

    LISTGROUP
    211 4 18446744071562067968 2147483647 trigofacile.test.maxartnum.timehash 18446744071562067968
    18446744071562067969
    18446744071562067970
    2147483647
    .



    It was just a quick testing of overview methods. (Expiry and other
    programs not tested.)
    So there's a bit of work and testing afterwards to make INN compatible
    with 2^64 article numbers. At least the good news is that the storage
    methods cope with a bit more than 2^31 (maybe 2^32 and beyond) but there
    will be a problem with numbers > 10^10 because of the format of the
    active file.


    For the time being, I'll have a look to prevent postings of new articles
    in newsgroups which have reached 2^31-1 articles, with the correct
    rejection code (and not deferral).
    It will fix the "can't store article" error I once had (which certainly
    still happens for some people), and otherwise these wrong article
    numbers returned if the article is accepted.

    --
    Julien ÉLIE

    « On appelle ça une insula. C'est une maison où les gens habitent les
    uns au-dessus des autres… » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)