Just a general moan about the state of the newsgroups files that I am
finding on my peers.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on yet another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
Thus spake Nigel Reed <sysop@endofthelinebbs.com>
Just a general moan about the state of the newsgroups files that I
am finding on my peers.
fr.bienvenue L'accueil des nouveaux venus dans leurs
premiers pas sur Usenet. fr.bienvenue L'accueil des
nouveaux venus dans leurs premiers pas sur Usenet. fr.bienvenue
Aider les nouveaux venus dans leurs premiers pas sur Usenet. fr.bienvenue L'accueil des nouveaux venus dans leurs
premiers pas sur Usenet. fr.bienvenue Aider les nouveaux
venus dans leurs premiers pas sur Usenet. (Moderated) fr.bienvenue
L'accueil des nouveaux venus dans leurs premiers pas sur
Usenet. fr.bienvenue Aide aux nouveaux venus dans leurs
premiers pas sur Usenet. fr.bienvenue Aider les nouveaux
venus dans leurs premiers pas sur Usenet. (Moderated) fr.bienvenue
Aider les nouveaux venus dans leurs premiers pas sur
Usenet. (Moderated) fr.bienvenue Aider les nouveaux
venus dans leurs premiers pas sur Usenet. fr.bienvenue
Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers
pas sur Usenet. fr.bienvenue L'accueil des nouveaux
venus dans leurs premiers pas sur Usenet. fr.bienvenue
L'accueil des nouveaux venus dans leurs premiers pas sur Usenet. fr.bienvenue L'accueil des nouveaux venus dans leurs
premiers pas sur Usenet. fr.bienvenue L'accueil des
nouveaux venus dans leurs premiers pas sur Usenet. One sample group
from 16 peers. the first thing, so many different encodings. I've
got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one identifying as GB18030. Next, 8 servers agree on one description, 3 on another, 2
more on yet another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
On E-S (both reader and transit server):
fr.bienvenue L'accueil des nouveaux venus dans leurs
premiers pas sur Usenet.
I seem to remember that this started when the checkgroups messages for
fr.* were changed to UTF-8 (that doesn't account for the "Moderated"
flag, though). Julien Élie might she some light on this, as he is the current issuer of control messages for fr.*.
Just a general moan about the state of the newsgroups files that I am
finding on my peers.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on yet another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
On E-S (both reader and transit server):
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
I seem to remember that this started when the checkgroups messages for
fr.* were changed to UTF-8 (that doesn't account for the "Moderated"
flag, though). Julien Élie might shed some light on this, as he is the current issuer of control messages for fr.*.
PS: Please refer to
http://usenet.trigofacile.com/hierarchies/fr.html
for the change from "Moderated" to unmoderated and the current content
of the official checkgroups file.
One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on yet another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
What is even worse when trying to automate this, is when the majority
of servers have the wrong description or it's half and half.
I'm probably just going to get a script to pull the most popular of the descriptions for the list and ignore the moderated part unless the
group has moderated in its name or a majority think its moderated when
do a manual check on those.
I keep an updated list of the fr.* groups, with their status and
description:
<http://usenet-fr.yakakwatik.org/groupes.html>
Hi Nigel,
I'm probably just going to get a script to pull the most popular of
the descriptions for the list and ignore the moderated part unless
the group has moderated in its name or a majority think its
moderated when do a manual check on those.
I would suggest to instead just use the latest known descriptions
(from checkgroups when they are sent).
I maintain the list encoded in UTF-8 (the standard according to RFCs)
here: https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.utf8
Also, FWIW, the same list in pure ASCII:
https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.ascii
The usual master file for these descriptions has unfortunately mixed charsets (like windows-1252 for some descriptions, UTF-8 for others, ISO-8859-xx variants, etc.):
https://ftp.isc.org/pub/usenet/CONFIG/newsgroups
That's why I generate the above first two lists :)
Feel free to use!
Nigel Reed wrote:
I'm probably just going to get a script to pull the most popular of
the descriptions for the list and ignore the moderated part unless
the group has moderated in its name or a majority think its
moderated when do a manual check on those.
On Julien Élie's website, the following changes can be seen for fr.bienvenue:
2011-12-19 23:30:02 changegroup fr.bienvenue from m to y
2020-12-25 21:50:02 changedesc fr.bienvenue
2023-10-28 18:20:02 changedesc fr.bienvenue
The group is currently not moderated, and its description is as
follows:
L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
The only moderate groups in the fr.* hierarchy are:
fr.misc.bavardages.dinosaures
fr.usenet.abus.rapports
fr.usenet.forums.annonces
fr.usenet.stats
Source: <http://usenet.trigofacile.com/hierarchies/fr.html>
I keep an updated list of the fr.* groups, with their status and
description:
<http://usenet-fr.yakakwatik.org/groupes.html>
Hi Nigel,
One sample group from 16 peers. the first thing, so many different encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on
yet another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
Because there originally wasn't any standard for the encoding of
control articles. Most of them did not declare anything (the usual
encoding locally used by the sender was assumed - like gb18030 for
cn.*, koi8-u for ukr.* [my sympathy to them!], big5 for tw.*,
iso-8859-15 for fr.*, cp1252 for most of the others, etc.).
Only "recently" a new version of the standard recommended the use of
UTF-8.
That why you end up seeing mixed and incoherent encodings in existing
news servers. Not all of them run a version which implements the new interoperable state of art (UTF-8) to parse control articles. And if
the descriptions pre-date the receival of new control articles, not
all the news administrators have manually homogenized the
descriptions to UTF-8. (No blame in my sentence, just a fact.)
What is even worse when trying to automate this, is when the
majority of servers have the wrong description or it's half and
half.
Just use https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.utf8
:)
Yes, we've sort of had this discussion before about encoding. This one
is more about the inconsistency of the labeling of the groups.
In the newsgroups list above, pretty much every group that contains non-standard A-Z letters is garbled.
Probably because it's ISO-8859 when I'm using UTF-8. The cn.* groups
are definitely garbled.
I'll just do my best to make a valid UTF-8 file for my server.
Hi Nigel,
That's a good start but I still have 36,519 groups in my active file
that aren't in your list.
Not all newsgroups have a description. Amongst these 36,519 groups,
do you already have a description in your own news server, or do you
see a valid description in another server?
It would be interesting to know whether fr.bienvenue is still
declared moderated in the active file of the news server which have "(Moderated)" at the end of its description. It may just happen that
they processed the newgroup control article once sent to unmoderate
it, but dit not update the description.
In fact, the newsgroup list from GitHub was properly encoded in UTF-8
but your navigator did not use UTF-8 to render it for a reason I do
not know. Might you have to force the charset in your navigator?
The HTTP headers correctly have:
Content-Type: text/plain; charset=utf-8
Does it appear better with this version?
http://usenet.trigofacile.com/hierarchies/data/newsgroups.utf8
It would be interesting to know whether fr.bienvenue is still
declared moderated in the active file of the news server which have
"(Moderated)" at the end of its description. It may just happen that
they processed the newgroup control article once sent to unmoderate
it, but dit not update the description.
2 out of the 3 still have it as moderated.
2 out of the 3 still have it as moderated.
Do you happen to know whether they honour control articles?
Do they manage their newsgroup list by hand?
If you know how to contact these 2 news admins who still have
fr.bienvenue marked as moderated, could you ask them?
I bet this is not the only discrepancy in their servers... Did theyI expect there are many discrepancies. I couldn't tell you if they have
reflect the latest changes in the Big-8?
I could try to send a "booster" for the unmoderation of fr.bienvenue
(dating back to 2011!) but I doubt they have the current PGP key of
fr.* (which changed in 2020 as the previous one, unused during
several years, was lost).
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 00:52:40 |
Calls: | 10,387 |
Calls today: | 2 |
Files: | 14,061 |
Messages: | 6,416,723 |