• I'm hacking on trn...want to join me?

    From Richard@21:1/5 to All on Mon Apr 3 21:19:45 2023
    [Please do not mail me a copy of your followup]

    As near as I can tell the current "maintainer" of trn isn't. The
    official home page is on sourceforge and I don't see any activity
    there for 10+ years. The trn-4.0-test77.tar.gz distribution -- the
    most recent -- was created in 02-Sep-2010. The test76 release was
    another ten years before that on 02-Apr-2001.

    I may be the last user and it's somewhat of a pity/passion project :-).

    I forked the code from sourceforge into github and I'm hacking on the
    cmake branch. <https://github.com/LegalizeAdulthood/trn/tree/cmake>

    So far:
    - Added a CMake based build
    - Use vcpkg to get curses dependency (pdcurses on Windows)
    - Use CMake inspection of your environment and configure_file to
    generate the config.h instead of chatty bash Configure script.
    - Converted from C to C++
    - Applied various automated clean-ups to the code (yay ReSharper for C++!)
    - Gradually introducing more use of 'const'
    - Gradually introducing use of std::string instead of C style string
    - Unit tests with GTest
    - Working on adding tests before making any major changes.
    - A bunch of % interpolator tests added and some minor bugs fixed
    - Using CMake to generate test local news spool data and articles
    - Will use this to verify article/newsgroup related % interpolation
    - This part is current work-in-progress
    - Replace use of int with bool where appropriate
    - Replace use of int with strong enum types where appropriate
    - Legacy code considerations dropped; e.g. no special VMS code, no
    non-POSIX standard unix code, K&R style code modernized, no optional
    features to "save instruction and data space", etc.

    General target direction:
    - Replace low-level termios code with curses
    - Become more "event driven" insted of mingled read/write stdio
    - Decouple newsgroup/article logic from TUI to allow for GUI
    - Replace synchronous NNTP/newsgroup processing with asynchronous
    processing

    Vision of the final result (no particular order):
    - True STARTTLS/NNTPS support
    - Curses windows used to show all the various things:
    - newsgroup selection
    - article selection
    - thread selection
    - KILL files
    - etc.
    - A GUI for reading news that's as convenient as the trn TUI.
    - Let you manipulate the various windows like people do in vim/emacs
    allowing you to choose what is on-screen and where
    - Client should be asynchronously advancing through the unread articles
    at the "speed of light" to apply scoring/threading/KILL file processing
    as far ahead as possible while you read.
    - As easy to build on Windows as it is on *nix.

    I mostly develop on Windows, but try to keep the #ifdef'ed linux
    branches at least building by occasionally building in WSL. A github
    build workflow would help but not my immediate priority.

    If you're interested in joining me for the ride, pull requests are
    always welcome or drop an issue in the github repo.

    I'm surprised that after all this time there is still no decent C++
    library for NNTP. The fallout from this effort may be such a library
    using asynchronous I/O processing to get as close as possible to "the
    speed of light" for an NNTP client.

    I did a presentation on doing basic NNTP with POCO last year:
    Writing a Network Client with POCO <https://www.youtube.com/watch?v=rRR9RTUEn4k>

    I'm doing another presentation next week on basic NNTP with boost.asio: <https://www.meetup.com/utah-cpp-programmers/events/zhljbtyfcgbqb/>

    -- Richard

    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chime Hart@21:1/5 to All on Fri Apr 7 16:38:56 2023
    Hi Richard: While I am only a user-and-not a programmer, I've probably been enjoying trn since at least 1997. Other than hoping trn would some day have yenc support, I have 1 quite specific happening which mysteriously comes up while saving certain articles to a file.
    In separate binary groups, eventually I will see an article number save to a file name, but I am back at a prompt on the same line, no error messages, nothing obvious when I examin an end of that file. I can seemingly grab that same article in Alpine. I asked a Debian maintainer about this, but he hasn't looked at this software in 10years. However, we tried compiling trn for a debug option, so-far no luck. I surely am not running a GUI, but hopefully, we will some day have improvements we can more consistantly enjoy. Thanks in advance Chime

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Sat Apr 8 16:56:30 2023
    [Please do not mail me a copy of your followup]

    Chime Hart <chime@hubert-humphrey.com> spake the secret code <4143c063-c260-2a66-fb34-70d5b8ab7d66@hubert-humphrey.com> thusly:

    [...] Other than hoping trn would some day have yenc support,

    I'll open a github issue for yenc support.

    In separate binary groups, eventually I will see an article number save to a >file name, but I am back at a prompt on the same line, no error messages, >nothing obvious when I examin an end of that file. I can seemingly grab that >same article in Alpine.

    The next time you encounter such an article, if you can email it to me
    I'll take a look.
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael =?ISO-8859-1?Q?B=E4uerle?=@21:1/5 to Richard on Sat Apr 8 19:36:37 2023
    Richard wrote:

    [...]
    Vision of the final result (no particular order):
    - True STARTTLS/NNTPS support
    - Curses windows used to show all the various things:
    - newsgroup selection
    - article selection
    - thread selection
    - KILL files
    - etc.
    - A GUI for reading news that's as convenient as the trn TUI.
    - Let you manipulate the various windows like people do in vim/emacs
    allowing you to choose what is on-screen and where
    - Client should be asynchronously advancing through the unread articles
    at the "speed of light" to apply scoring/threading/KILL file processing
    as far ahead as possible while you read.
    - As easy to build on Windows as it is on *nix.

    I'm not using trn myself, but in the german hierarchy users sometimes
    break things with trn. Better support for MIME would be nice.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam H. Kerman@21:1/5 to Michael Bauerle on Sat Apr 8 23:52:54 2023
    Michael Bauerle <michael.baeuerle@gmx.net> wrote:

    I'm not using trn myself, but in the german hierarchy users sometimes
    break things with trn. Better support for MIME would be nice.

    When I declare a character set, I just copy MIME headers into the
    article. I can do it with a macro. The headers are right there for the
    user to edit.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to michael.baeuerle@gmx.net on Sun Apr 9 20:27:20 2023
    In news.software.readers, Michael Bäuerle <michael.baeuerle@gmx.net> wrote:
    I'm not using trn myself, but in the german hierarchy users sometimes
    break things with trn. Better support for MIME would be nice.

    The acli fork of trn vastly improves charset support.

    https://github.com/acli/trn

    It's what I use now.

    Elijah
    ------
    notes the accent mark in the name

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Mon Apr 10 19:44:19 2023
    [Please do not mail me a copy of your followup]

    Eli the Bearded <*@eli.users.panix.com> spake the secret code <eli$2304091625@qaz.wtf> thusly:

    In news.software.readers, Michael Bäuerle <michael.baeuerle@gmx.net> wrote: >> I'm not using trn myself, but in the german hierarchy users sometimes
    break things with trn. Better support for MIME would be nice.

    The acli fork of trn vastly improves charset support.

    I looked at his fork and there are more edits there than I thought
    based on the original patch ticket in sourceforge.

    It will take some time for me to integrate those edits over to my
    repository.
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Mon Apr 10 19:43:19 2023
    [Please do not mail me a copy of your followup]

    "Adam H. Kerman" <ahk@chinet.com> spake the secret code <u0suom$1eoqb$1@dont-email.me> thusly:

    Michael Bauerle <michael.baeuerle@gmx.net> wrote:

    I'm not using trn myself, but in the german hierarchy users sometimes
    break things with trn. Better support for MIME would be nice.

    When I declare a character set, I just copy MIME headers into the
    article. I can do it with a macro. The headers are right there for the
    user to edit.

    Presumably the problem is that the reply doesn't contain the necessary
    MIME headers that are relevant for the quoed material or for the
    content of the reply?
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam H. Kerman@21:1/5 to Richard on Mon Apr 10 21:19:27 2023
    legalize+jeeves@mail.xmission.com (Richard) wrote:
    "Adam H. Kerman" <ahk@chinet.com> spake:
    Michael Bauerle <michael.baeuerle@gmx.net> wrote:

    I'm not using trn myself, but in the german hierarchy users sometimes >>>break things with trn. Better support for MIME would be nice.

    When I declare a character set, I just copy MIME headers into the
    article. I can do it with a macro. The headers are right there for the
    user to edit.

    Presumably the problem is that the reply doesn't contain the necessary
    MIME headers that are relevant for the quoed material or for the
    content of the reply?

    I don't see why that matters. You're expecting the newsreader to do
    something it cannot do and no newsreader can.

    I try to substitute for or eliminate non-ASCII characters, and then not
    bother to declare a character set. All too often, I post in a thread
    with a string of precursor followups. The root article copied and pasted
    from the Web and may not have preserved the original character set in
    pasting, or if it did, the MIME header doesn't match.

    The author of the next followup uses a newsreader that cannot parse for
    the character set in use and messes up all the non-ASCII characters
    because the author declared a mismatched character set.

    In my followup, I inherit the mess, which may be a mix of non-ASCII
    8-bit characters and UTF-8 characters or just bad translations.
    Sometimes I have to go back to the original Web article that was
    plagarized to figure out what the hell the original character set was in
    a portion of the quote, then figure out what character set was in use in another portion of the quote.

    Then I perform ASCII substitutions so that when I'm quoted, there's no
    more mess.

    In followup, no newsreader should ever declare that the character set in
    use is the one used by the precursor article or the one that the user
    uses by default. It has to parse and the user has to fix the inherited
    mess at times anyway which will be unparseable.

    I don't expect trn to do any of that. That's why I use macros or just my
    own eyeballs. I really try to avoid using non-ASCII characters whenever possible.

    I have seen newsreaders that inherit encoded word on Subject, decode it,
    then put unencoded non-ASCII characters on Subject because it's not
    programmed to re-encoded before injecting. Aargh.

    Use ASCII. It's universally readable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Mon Apr 10 23:59:28 2023
    [Please do not mail me a copy of your followup]

    "Adam H. Kerman" <ahk@chinet.com> spake the secret code <u11ugv$2anur$1@dont-email.me> thusly:

    legalize+jeeves@mail.xmission.com (Richard) wrote:
    "Adam H. Kerman" <ahk@chinet.com> spake:
    Michael Bauerle <michael.baeuerle@gmx.net> wrote:

    I'm not using trn myself, but in the german hierarchy users sometimes >>>>break things with trn. Better support for MIME would be nice.

    When I declare a character set, I just copy MIME headers into the >>>article. I can do it with a macro. The headers are right there for the >>>user to edit.

    Presumably the problem is that the reply doesn't contain the necessary
    MIME headers that are relevant for the quoed material or for the
    content of the reply?

    I don't see why that matters. You're expecting the newsreader to do
    something it cannot do and no newsreader can.

    I see now, thanks for clarifying.

    So when you say "better support for MIME would be nice", what exactly
    is missing?
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam H. Kerman@21:1/5 to Richard on Tue Apr 11 00:25:21 2023
    legalize+jeeves@mail.xmission.com (Richard) wrote:
    Adam H. Kerman <ahk@chinet.com> spake:
    legalize+jeeves@mail.xmission.com (Richard) wrote:
    Adam H. Kerman <ahk@chinet.com> spake:
    Michael Bauerle <michael.baeuerle@gmx.net> wrote:

    I'm not using trn myself, but in the german hierarchy users sometimes >>>>>break things with trn. Better support for MIME would be nice.

    When I declare a character set, I just copy MIME headers into the >>>>article. I can do it with a macro. The headers are right there for the >>>>user to edit.

    Presumably the problem is that the reply doesn't contain the necessary >>>MIME headers that are relevant for the quoed material or for the
    content of the reply?

    I don't see why that matters. You're expecting the newsreader to do >>something it cannot do and no newsreader can.

    I see now, thanks for clarifying.

    So when you say "better support for MIME would be nice", what exactly
    is missing?

    I didn't say that; Michael did. I'm guessing that what's breaking
    articles is a mix of UTF-8 and 8-bit used by different users, then not declaring which one is in use in followup.

    You'd need to add a parsing mechanism, then add the appropriate MIME
    header to declare the character set. Still, the user has to clean up the
    mess of broken quotes himself.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam H. Kerman@21:1/5 to Richard on Tue Apr 11 16:58:13 2023
    legalize+jeeves@mail.xmission.com (Richard) wrote:
    "Adam H. Kerman" <ahk@chinet.com> spake:
    legalize+jeeves@mail.xmission.com (Richard) wrote:

    So when you say "better support for MIME would be nice", what exactly
    is missing?

    I didn't say that; Michael did.

    Oops, sorry for misattributing.

    I'm guessing that what's breaking
    articles is a mix of UTF-8 and 8-bit used by different users, then not >>declaring which one is in use in followup.

    Didn't we conclude on this thread that it's up to the user to set the
    headers correctly?

    Absolutely. trn expects the user to fix up his own headers!

    Technically, trn doesn't do the posting; inews does the posting as a
    separate program after having invoked your editor on the post.

    I suppose inews could inspect the content of the post more deeply and >complain about use of 8-bit characters and/or detect UTF-8. It could
    offer to adjust the headers as best it can and let you re-edit.

    You'd need to add a parsing mechanism, then add the appropriate MIME
    header to declare the character set. Still, the user has to clean up the >>mess of broken quotes himself.

    You could improve this yourself by setting NEWSPOSTER to a different >program/script that does this detecting and munging of the headers
    before invoking inews.

    You're right, of course. If an acceptable parsing mechanism already
    exists, why re-invent the wheel?

    Still, there's just no way to fix up multi-level quotes that used
    different character sets mis-translated without the user going back to
    the original text to figure out what it was supposed to be and then
    translating it consistently with the rest of the text.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Tue Apr 11 16:26:40 2023
    [Please do not mail me a copy of your followup]

    "Adam H. Kerman" <ahk@chinet.com> spake the secret code <u129dh$2c6tu$1@dont-email.me> thusly:

    legalize+jeeves@mail.xmission.com (Richard) wrote:
    So when you say "better support for MIME would be nice", what exactly
    is missing?

    I didn't say that; Michael did.

    Oops, sorry for misattributing.

    I'm guessing that what's breaking
    articles is a mix of UTF-8 and 8-bit used by different users, then not >declaring which one is in use in followup.

    Didn't we conclude on this thread that it's up to the user to set the
    headers correctly?

    Technically, trn doesn't do the posting; inews does the posting as a
    separate program after having invoked your editor on the post.

    I suppose inews could inspect the content of the post more deeply and
    complain about use of 8-bit characters and/or detect UTF-8. It could
    offer to adjust the headers as best it can and let you re-edit.

    You'd need to add a parsing mechanism, then add the appropriate MIME
    header to declare the character set. Still, the user has to clean up the
    mess of broken quotes himself.

    You could improve this yourself by setting NEWSPOSTER to a different program/script that does this detecting and munging of the headers
    before invoking inews.
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael =?ISO-8859-1?Q?B=E4uerle?=@21:1/5 to Adam H. Kerman on Wed Apr 12 15:32:53 2023
    Adam H. Kerman wrote:

    [...]
    Still, there's just no way to fix up multi-level quotes that used
    different character sets mis-translated without the user going back to
    the original text to figure out what it was supposed to be and then translating it consistently with the rest of the text.

    If it already happened, this is something that is hard to impossible
    to repair for some encodings. Most newsreaders don't even try and
    such a repair algorithm is not what I had in mind.


    What I wanted to propose is that trn does not create and send such
    mixture of encodings and correctly label the encoding used according
    to MIME (the problem is that trn users produce such broken articles,
    not that they cannnot read them).

    It looks like the users don't know what they are doing (not really the
    fault of trn in this sense) and their editor is not configured for the
    encoding of the content that is quoted. Maybe it is too inconvenient to
    change the encoding configuration.

    I think such mistakes would not occur if trn would automatically convert
    the content to quote into the encoding used by the editor.
    The source encoding is declared in the MIME header of the article.
    The target encoding should be the one the editor is using (manually
    configured, if trn cannot automatically detect it).
    The conversion itself can be done with iconv.
    This would preserve the users choice for the target encoding (would
    not enforce the usage of Unicode).

    If the user has configured e.g. US-ASCII to quote an article written in Unicode, replacement characters should be inserted if the encoding
    conversion is not possible (instead of simply copying the bytes that are
    then interpreted wrong by the editor and the recipients).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam H. Kerman@21:1/5 to michael.baeuerle@gmx.net on Wed Apr 12 15:36:40 2023
    Michael Bauerle <michael.baeuerle@gmx.net> wrote:
    Adam H. Kerman wrote:

    [...]
    Still, there's just no way to fix up multi-level quotes that used
    different character sets mis-translated without the user going back to
    the original text to figure out what it was supposed to be and then >>translating it consistently with the rest of the text.

    If it already happened, this is something that is hard to impossible
    to repair for some encodings. Most newsreaders don't even try and
    such a repair algorithm is not what I had in mind.

    What I wanted to propose is that trn does not create and send such
    mixture of encodings and correctly label the encoding used according
    to MIME (the problem is that trn users produce such broken articles,
    not that they cannnot read them).

    How is it trn's fault? trn isn't translating anything! It receives
    characters and passes them along to inews -h. It's up to the author to recognize the encoding.

    Take a typical situation I encounter: Root article has non-ASCII
    characters and declares UTF-8. The author of the followup has his
    default character set ISO-8859-1 and has no parsing mechanism. Because
    it's an Apple client, it does weird character substitutions, different
    from what a Windows client might do. Either way, it ignored the declared character set of the precursor article.

    If I do a further followup, I have to edit manually AND go back to the
    root article to see if I can figure out what the character was intended
    to be. If I can, I perform ASCII substitutions.

    There's no way to write a parsing mechanism to repair inherited
    mistranslated characters getting rid of the bad substitutions.

    At least with trn, given that the author is able to edit headers and is actually expected to add necessary headers, these things can be
    repaired. The author using the Apple client has no ability to fix the
    mess he created in the composer within the newsreader.

    In followup, if the declared character set truly describes the character
    set inherited from the precursor article, at least a decent parser
    called by trn could deal with it, or even translate to another character
    set if necessary and an outside process could add the matching MIME
    header.

    Since I'm a traditionalist, I'd like to have another chance to edit to
    verify that the parser caught everything, and I still want to get rid of characters that do not belong in plain text, like nonbreaking space.

    It looks like the users don't know what they are doing (not really the
    fault of trn in this sense) and their editor is not configured for the >encoding of the content that is quoted. Maybe it is too inconvenient to >change the encoding configuration.

    The editor is an outside process. It's not built into trn. It's not the
    job of the editor to parse, although there seems to be process that
    decodes BASE64 or QP, neither of which belong in Usenet. That happens
    before the editor sees it. There'd have to be another process before it
    gets to the editor to test that the declared character set of the
    precursor article matches what's actually there.

    The editor is likely using a character set from an environment variable.

    I think such mistakes would not occur if trn would automatically convert
    the content to quote into the encoding used by the editor.
    The source encoding is declared in the MIME header of the article.

    Snarf

    When it actually matches!

    The target encoding should be the one the editor is using (manually >configured, if trn cannot automatically detect it).

    Not what trn does. An outside process would have to be called.

    The conversion itself can be done with iconv.
    This would preserve the users choice for the target encoding (would
    not enforce the usage of Unicode).

    In my opinion, if there are non-ASCII characters but the lowest
    denomination character set is 8-bit, that's the character set to use.

    My opinion is shared with nearly no one.

    If the user has configured e.g. US-ASCII to quote an article written in >Unicode, replacement characters should be inserted if the encoding
    conversion is not possible (instead of simply copying the bytes that are
    then interpreted wrong by the editor and the recipients).

    Absolutely, especially ASCII punctuation characters for 8-bit or UTF-8 punctuation characters. Typesetting characters do not belong in plain
    text communication.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Wed Apr 12 16:57:12 2023
    [Please do not mail me a copy of your followup]

    "Adam H. Kerman" <ahk@chinet.com> spake the secret code <u16j68$32uql$1@dont-email.me> thusly:

    Absolutely, especially ASCII punctuation characters for 8-bit or UTF-8 >punctuation characters. Typesetting characters do not belong in plain
    text communication.

    With MIME headers, usenet isn't necessarily restricted to text/plain
    style communication, but it has been the historical norm.
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael =?ISO-8859-1?Q?B=E4uerle?=@21:1/5 to Adam H. Kerman on Wed Apr 12 19:28:23 2023
    Adam H. Kerman wrote:
    Michael Bäuerle wrote:
    Adam H. Kerman wrote:

    [...]
    Still, there's just no way to fix up multi-level quotes that used different character sets mis-translated without the user going back to the original text to figure out what it was supposed to be and then translating it consistently with the rest of the text.

    If it already happened, this is something that is hard to impossible
    to repair for some encodings. Most newsreaders don't even try and
    such a repair algorithm is not what I had in mind.

    What I wanted to propose is that trn does not create and send such
    mixture of encodings and correctly label the encoding used according
    to MIME (the problem is that trn users produce such broken articles,
    not that they cannnot read them).

    How is it trn's fault? trn isn't translating anything! It receives
    characters and passes them along to inews -h. It's up to the author to recognize the encoding.

    Then read my proposal "make it easier for the users to avoid mistakes".

    Take a typical situation I encounter: Root article has non-ASCII
    characters and declares UTF-8. The author of the followup has his
    default character set ISO-8859-1 and has no parsing mechanism. Because
    it's an Apple client, it does weird character substitutions, different
    from what a Windows client might do. Either way, it ignored the declared character set of the precursor article.

    Then we are back to the case of malformed incoming data.
    This is not what I am talking about.

    [...]
    In followup, if the declared character set truly describes the character
    set inherited from the precursor article, at least a decent parser
    called by trn could deal with it, or even translate to another character
    set if necessary and an outside process could add the matching MIME
    header.

    This is what I propose. It should be easy for the user to do this.
    The code for it should be shipped with the newsreader, not every user
    is a programmer too.

    Since I'm a traditionalist, I'd like to have another chance to edit to
    verify that the parser caught everything, and I still want to get rid of characters that do not belong in plain text, like nonbreaking space.

    I think this would always be the case, because all the automatic
    processing should be finished before the editor is launched.

    It looks like the users don't know what they are doing (not really the fault of trn in this sense) and their editor is not configured for the encoding of the content that is quoted. Maybe it is too inconvenient to change the encoding configuration.

    The editor is an outside process. It's not built into trn. It's not the
    job of the editor to parse, although there seems to be process that
    decodes BASE64 or QP, neither of which belong in Usenet.

    No longer true since RFC 5536: <https://www.rfc-editor.org/rfc/rfc5536#section-2.3>
    |
    | User agents MUST meet the definition of MIME conformance in [RFC2049]
    | and MUST also support [RFC2231]. This level of MIME conformance
    | provides support for internationalization and multimedia in message
    | bodies [RFC2045], [RFC2046], and [RFC2231], and support for
    | internationalization of header fields [RFC2047] and [RFC2231]. [...]

    That happens
    before the editor sees it. There'd have to be another process before it
    gets to the editor to test that the declared character set of the
    precursor article matches what's actually there.

    This part should be added.

    Decode transfer encoding first.
    Then convert the encoding to match the editor.

    Create a MIME declaration for the outgoing data, if not US-ASCII.

    The editor is likely using a character set from an environment variable.

    Maybe not generic enough to be used by the newsreader too.

    I think such mistakes would not occur if trn would automatically convert the content to quote into the encoding used by the editor.
    The source encoding is declared in the MIME header of the article.

    Snarf

    When it actually matches!

    It does not match in the articles I talk about (the user has not
    converted the encoding of the incoming data, but has written his reply
    with a different encoding). This is the case that should be easier to
    avoid.

    The target encoding should be the one the editor is using (manually configured, if trn cannot automatically detect it).

    Not what trn does. An outside process would have to be called.

    Nontrivial outside processes (that are required to correctly process
    RFC 5536 conformant articles) should be shipped with the newsreader.

    The conversion itself can be done with iconv.
    This would preserve the users choice for the target encoding (would
    not enforce the usage of Unicode).

    In my opinion, if there are non-ASCII characters but the lowest
    denomination character set is 8-bit, that's the character set to use.

    My opinion is shared with nearly no one.

    I share your opinion (as you can see in my articles that use ISO 8859-1,
    if possible).

    From RFC 2046:
    <https://www.rfc-editor.org/rfc/rfc2046#section-4.1.2>
    |
    | [...]
    | In general, composition software should always use the "lowest common
    | denominator" character set possible. [...] More generally,
    | if a widely-used character set is a subset of another character set,
    | and a body contains only characters in the widely-used subset, it
    | should be labelled as being in that subset. [...]

    But this text is decades old. If a successor to this RFC would be
    written today, it likely would say something like "Always use Unicode
    with UTF-8 encoding".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Wed Apr 12 18:58:18 2023
    [Please do not mail me a copy of your followup]

    "Adam H. Kerman" <ahk@chinet.com> spake the secret code <u16sn5$o1d$1@dont-email.me> thusly:

    Michael Bauerle <michael.baeuerle@gmx.net> wrote:
    Adam H. Kerman wrote:
    Michael Bauerle wrote:
    Adam H. Kerman wrote:

    [...]

    [...]
    In followup, if the declared character set truly describes the character >>>set inherited from the precursor article, at least a decent parser
    called by trn could deal with it, or even translate to another character >>>set if necessary and an outside process could add the matching MIME >>>header.

    This is what I propose. It should be easy for the user to do this.
    The code for it should be shipped with the newsreader, not every user
    is a programmer too.

    This isn't programming. The best I can do is write macros. Richard was >talking about calling outside processes that already exist, not
    requiring the user to write his own parser.

    I think it's a reasonable request for trn's inews program to be
    smarter about encodings and analyze the input file and add the
    necessary headers to decorate the content to the best of it's ability.

    I don't know if acli's fork of trn is already doing these things; I
    think that fork has been focused on proper presentation of UTF-8
    content in articles. It is my intention to merge those changes into
    my fork.
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Wed Apr 12 18:54:54 2023
    [Please do not mail me a copy of your followup]

    Michael =?ISO-8859-1?Q?B=E4uerle?= <michael.baeuerle@gmx.net> spake the secret code
    <AABkNuo349EAAAdh.A3.flnews@WStation5.stz-e.de> thusly:

    But this text is decades old. If a successor to this RFC would be
    written today, it likely would say something like "Always use Unicode
    with UTF-8 encoding".

    The current NNTP RFC recommends using UTF-8 wherever possible (and
    annotating it as such in messages).
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam H. Kerman@21:1/5 to michael.baeuerle@gmx.net on Wed Apr 12 18:19:17 2023
    Michael Bauerle <michael.baeuerle@gmx.net> wrote:
    Adam H. Kerman wrote:
    Michael Bauerle wrote:
    Adam H. Kerman wrote:

    [...]

    [...]
    In followup, if the declared character set truly describes the character >>set inherited from the precursor article, at least a decent parser
    called by trn could deal with it, or even translate to another character >>set if necessary and an outside process could add the matching MIME
    header.

    This is what I propose. It should be easy for the user to do this.
    The code for it should be shipped with the newsreader, not every user
    is a programmer too.

    This isn't programming. The best I can do is write macros. Richard was
    talking about calling outside processes that already exist, not
    requiring the user to write his own parser.

    Since I'm a traditionalist, I'd like to have another chance to edit to >>verify that the parser caught everything, and I still want to get rid of >>characters that do not belong in plain text, like nonbreaking space.

    I think this would always be the case, because all the automatic
    processing should be finished before the editor is launched.

    No, you need another parsing step after completing the editor to declare
    the character set in the MIME header. For instance, if I think I've used
    ASCII but there's still an invisible character like nonbreaking space
    that's undesireable, I'd want to figure out why the proto article failed parsing and get rid of the non-plain-text character.

    It looks like the users don't know what they are doing (not really the >>>fault of trn in this sense) and their editor is not configured for the >>>encoding of the content that is quoted. Maybe it is too inconvenient to >>>change the encoding configuration.

    The editor is an outside process. It's not built into trn. It's not the
    job of the editor to parse, although there seems to be process that
    decodes BASE64 or QP, neither of which belong in Usenet.

    No longer true since RFC 5536: ><https://www.rfc-editor.org/rfc/rfc5536#section-2.3>

    | User agents MUST meet the definition of MIME conformance in [RFC2049]
    | and MUST also support [RFC2231]. This level of MIME conformance
    | provides support for internationalization and multimedia in message
    | bodies [RFC2045], [RFC2046], and [RFC2231], and support for
    | internationalization of header fields [RFC2047] and [RFC2231]. [...]

    The editor isn't the user agent for the purpose of complying with this standard. trn is noncompliant and it's up to the user to add his own
    MIME headers to declare the character set.

    That happens
    before the editor sees it. There'd have to be another process before it >>gets to the editor to test that the declared character set of the
    precursor article matches what's actually there.

    This part should be added.

    Decode transfer encoding first.
    Then convert the encoding to match the editor.

    The editor? No. Use the character set in the terminal emulation,
    otherwise you'll introduce yet another point of mismatch.

    The trouble is that the terminal emulation needs to switch character
    sets on the fly by somehow reading MIME headers and I have no idea how
    any of that would happen. It's really really really outside trn.

    When I follow up to certain articles, I deliberately create a character
    set mismatch as it makes nonbreaking space visible and I can get rid of
    that. No one but me would do that.

    Create a MIME declaration for the outgoing data, if not US-ASCII.

    AFTER parsing the output from the composer

    The editor is likely using a character set from an environment variable.

    Maybe not generic enough to be used by the newsreader too.

    trn just outputs to display. It's up to the display -- in this case, a
    terminal emulation or an XTERM-like environment -- to show the article. Sometimes I have to change the character set the terminal emulation
    displays to see the article as intended.

    trn doesn't communicate the MIME header to the display. It's up to the
    user to make sure there's a character set match between the displayed
    article and the terminal emulation.

    I think such mistakes would not occur if trn would automatically convert >>>the content to quote into the encoding used by the editor.
    The source encoding is declared in the MIME header of the article.

    Snarf

    When it actually matches!

    It does not match in the articles I talk about (the user has not
    converted the encoding of the incoming data, but has written his reply
    with a different encoding). This is the case that should be easier to
    avoid.

    I agree with you here.

    . . .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam H. Kerman@21:1/5 to Richard on Wed Apr 12 22:08:14 2023
    legalize+jeeves@mail.xmission.com (Richard) wrote:
    "Adam H. Kerman" <ahk@chinet.com> spake:
    Michael Bauerle <michael.baeuerle@gmx.net> wrote:
    Adam H. Kerman wrote:
    Michael Bauerle wrote:
    Adam H. Kerman wrote:

    [...]

    [...]
    In followup, if the declared character set truly describes the character >>>>set inherited from the precursor article, at least a decent parser >>>>called by trn could deal with it, or even translate to another character >>>>set if necessary and an outside process could add the matching MIME >>>>header.

    This is what I propose. It should be easy for the user to do this.
    The code for it should be shipped with the newsreader, not every user
    is a programmer too.

    This isn't programming. The best I can do is write macros. Richard was >>talking about calling outside processes that already exist, not
    requiring the user to write his own parser.

    I think it's a reasonable request for trn's inews program to be
    smarter about encodings and analyze the input file and add the
    necessary headers to decorate the content to the best of it's ability.

    I thought we were using inews from INN. I remember decades ago inews had significant delays and timeouts but it got rewritten by Russ for
    background processing. I didn't recall we used a home-grown inews.

    The inews sanity checks are minimal and do not parse the body of the
    article to declare a character set.

    I don't know if acli's fork of trn is already doing these things; I
    think that fork has been focused on proper presentation of UTF-8
    content in articles. It is my intention to merge those changes into
    my fork.

    Ok.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Wed Apr 12 23:46:46 2023
    [Please do not mail me a copy of your followup]

    "Adam H. Kerman" <ahk@chinet.com> spake the secret code <u17a4d$2if3$1@dont-email.me> thusly:

    legalize+jeeves@mail.xmission.com (Richard) wrote:
    I think it's a reasonable request for trn's inews program to be
    smarter about encodings and analyze the input file and add the
    necessary headers to decorate the content to the best of it's ability.

    I thought we were using inews from INN.

    My local ISP is providing inews from INN, but there is an inews
    executable built in the trn sources as well. I was proposing making
    the one from trn smarter; I don't control INN.

    Again, all of this can be intercepted by setting appropriate
    environment variables to cause trn to send messages through an
    arbitrary processor before handing them off to a program like inews
    for injecting into the news feed.

    Also, as has been mentioned on this thread, you can configure your
    editor to be smarter about non-ASCII content and add the appropriate
    headers.
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to Richard on Mon Apr 17 22:42:16 2023
    In news.software.readers, Richard <> wrote:
    "Adam H. Kerman" <ahk@chinet.com> spake the secret code
    I'm guessing that what's breaking articles is a mix of UTF-8 and
    8-bit used by different users, then not declaring which one is in use
    in followup.

    In standard trn version 4, UTF-8 gets hosed because trn tries to strip
    control characters, and includes the 32 octets starting at 128 in that. Disabling the control character squash with -j helps, or you can make a
    ~ one line patch to the code to not squash highbit "control" characters.

    That's a display-for-the-user problem.

    Separately, there's a problem in display, if you have a utf-8 terminal
    (as is correct) and encounter a properly headered nonASCII, nonUTF8 post
    using highbits. Then trn just feeds raw wrong-charset stuff to your
    terminal. The acli trn fork fixes that.

    Thirdly there's a what-gets-sent issue if you don't post with correct
    MIME headers and people read with some tool expecting them. That's the not-declarining issue.

    Didn't we conclude on this thread that it's up to the user to set the
    headers correctly?

    Setting headers is tedious, tedium is best for computers.

    Technically, trn doesn't do the posting; inews does the posting as a
    separate program after having invoked your editor on the post.

    Kinda right, but not always. trn can act as inews for you. But it's true
    that trn always uses an external program as post editor, and that can
    correct headers for the user. Before acli trn existed, I modified my
    Pnews to lint check the highbit characters in my posts and warn me of mismatched headers.

    You could improve this yourself by setting NEWSPOSTER to a different program/script that does this detecting and munging of the headers
    before invoking inews.

    Depending on your install, NEWSPOSTER may just be looking for the first
    Pnews on the PATH. When I was new to shell scripting, I found Pnews and
    Rnmail to be excellent examples of how to do powerful stuff in lowest-common-denominator sh.

    Elijah
    ------
    modified his mailx to use Rnmail, too, for header editing

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Tue Apr 18 16:20:00 2023
    [Please do not mail me a copy of your followup]

    Eli the Bearded <*@eli.users.panix.com> spake the secret code <eli$2304171842@qaz.wtf> thusly:

    Thirdly there's a what-gets-sent issue if you don't post with correct
    MIME headers and people read with some tool expecting them. That's the >not-declarining issue.

    I'm open to ideas about how trn can handle this issue better; for
    instance, it's certainly possible to scan the edited article before
    posting and suggest header additions/changes based on content.

    In news.software.readers, Richard <> wrote:
    Didn't we conclude on this thread that it's up to the user to set the
    headers correctly?

    Setting headers is tedious, tedium is best for computers.

    I agree, I just don't know how the computer is supposed to distinguish
    between, say, Big5 encoding or UTF-8. Certainly it can assume UTF-8
    and then reject that assumption if the non-ASCII bytes aren't valid
    UTF-8 encoded code points.

    Technically, trn doesn't do the posting; inews does the posting as a
    separate program after having invoked your editor on the post.

    Kinda right, but not always. trn can act as inews for you.

    Yeah, there's an inews executable in the trn source tree, but I
    suspect that people don't include that in the trn package when they're packaging things up for linux distros.

    BTW, even though Wayne Davison isn't maintaining trn anymore, the trn
    related sourceforge mailing lists still work just fine and I've been
    posting updates to trn-workers mailing list: <https://sourceforge.net/p/trn/mailman/>
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to Richard on Tue Apr 18 22:37:01 2023
    In news.software.readers, Richard <> wrote:
    Eli the Bearded <*@eli.users.panix.com> spake
    Thirdly there's a what-gets-sent issue if you don't post with correct
    MIME headers and people read with some tool expecting them. That's the
    not-declarining issue.
    I'm open to ideas about how trn can handle this issue better; for
    instance, it's certainly possible to scan the edited article before
    posting and suggest header additions/changes based on content.

    I'd say scan for declared charset, see if it fits that, and object if
    not or if highbit and no declaration.

    For new articles, this can be tricky to be polite to user about it, but
    a guess based on LANG or similar environment variable probly gets you in
    the neighborhood.

    For follow-ups, you can start with the declared charset in the original
    post, and maybe provide a configuration for defaults by hierarchy.

    I agree, I just don't know how the computer is supposed to distinguish between, say, Big5 encoding or UTF-8. Certainly it can assume UTF-8
    and then reject that assumption if the non-ASCII bytes aren't valid
    UTF-8 encoded code points.

    Letter (octet) frequency heuristics, if LANG and hierarchy guided
    guesses are wrong.

    Yeah, there's an inews executable in the trn source tree, but I
    suspect that people don't include that in the trn package when they're packaging things up for linux distros.

    I've been using my own "mini inews" since the 1990s. I hacked it from nn sources to add some special configuration. The original readme dates to
    1989. I don't know that I've _ever_ used a distro package of trn. For a
    long time I was using the Panix locally compiled (for netbsd) version.
    But now I use my own build on Panix.

    posting updates to trn-workers mailing list: <https://sourceforge.net/p/trn/mailman/>

    I'm subscribed, but I've been a bit lax about reading it.

    Elijah
    ------
    panix.com offers Unix shell accounts with rich program selection

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard@21:1/5 to All on Wed Apr 19 21:22:14 2023
    [Please do not mail me a copy of your followup]

    Eli the Bearded <*@eli.users.panix.com> spake the secret code <eli$2304181836@qaz.wtf> thusly:

    In news.software.readers, Richard <> wrote:
    posting updates to trn-workers mailing list:
    <https://sourceforge.net/p/trn/mailman/>

    I'm subscribed, but I've been a bit lax about reading it.

    I saw that you'd submitted some PR's to acli's fork which I've merged
    into my fork.

    You're not missing much on trn-workers, just me posting progress
    updates :)
    --
    "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
    The Terminals Wiki <http://terminals-wiki.org>
    The Computer Graphics Museum <http://computergraphicsmuseum.org>
    Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)