You can download the articles from your NNTP provider of all groups and
I thought this wouldn't be too much work. But I'm now on downloading
hour 4 after some failed starts and restarts ... and I've limited to
4000 groups and 200 articles per group.
Hopefully eternal-september.org doesn't think this is abuse.
let a software create statistics.
"let" feels a bit ambitious, but I didn't ask chatGPT so maybe it could
have been simpler. Instead I hacked together some perl and R.
Details and code are on
https://willforan.github.io/usenet/
Some highlights:
For 2023, and only the top 4 user-agents seen in each of the big8+sfnet,
I calculated the % of the total articles (with user-agents)
| top | TOTAL | G2|Mozilla|Xnews|ForteAgent|slrn|Gnus|XanaNews| |-------+-------+---+-------+-----+----------+----+----+--------|
| alt | 24382 |45%| 20%| 11%| 4%| 0| 0| 0|
| soc | 4966 |74%| 10%| 5%| 3%| 0| 0| 0|
| comp | 4693 |46%| 26%| 0| 0| 5%| 4%| 0|
| sci | 2697 |62%| 17%| 0| 4%| 3%| 0| 0|
| misc | 855 |27%| 28%| 13%| 6%| 0| 0| 0|
| talk | 840 |50%| 19%| 5%| 7%| 0| 0| 0|
| sfnet | 405 |42%| 38%| 0| 0| 0| 12%| 4%|
| news | 35 | 8%| 23%| 0| 8%| 0| 31%| 0|
I think G2 is google groups. That and Mozilla easily account for the
bulk of newswriters.
Because I have full text, I ran a fun but ultimately pointless
sentiment analysis per user-agent. The n_* columns might be useful
to someone else though!
* n_groups is number of groups that user agent has posted to.
* n_emails is the count of unique emails using that client
* n_articles are the number of messages sent
* wrd_article is average words per message sent with client
* "afn" is AFINN single word scores averaged within articles:
-5 to +5 sentiment score.
| agent | afn | n_groups | n_email | n_articles | wrd_article | |---------------------+-------+----------+---------+------------+-------------| | NeoMutt | 0.81 | 18 | 14 | 160 | 271 | | K-9 | 0.78 | 16 | 17 | 50 | 203 | | Pluto | 0.76 | 2 | 13 | 83 | 56.3 | | 40tude_Dialog | 0.67 | 47 | 27 | 656 | 70.5 | | Messenger-Pro | 0.67 | 3 | 13 | 88 | 58.5 | | Evolution | 0.64 | 37 | 54 | 447 | 203.4 | | Turnpike | 0.62 | 13 | 19 | 245 | 82.3 | | Mutt | 0.6 | 33 | 34 | 266 | 250.3 | | Usenapp | 0.52 | 37 | 24 | 266 | 59 | | G2 | 0.45 | 616 | 2634 | 29367 | 1151.5 | | Roundcube | 0.44 | 13 | 11 | 42 | 250.1 | | Gnus | 0.38 | 94 | 83 | 782 | 141.5 | | Thoth | 0.35 | 25 | 11 | 140 | 64.6 | | Mozilla | 0.33 | 512 | 1318 | 19797 | 225.3 | | XanaNews | 0.31 | 27 | 12 | 151 | 37.1 | | Unison | 0.3 | 36 | 25 | 147 | 59.5 | | NONE | 0.26 | 579 | 2313 | 25885 | 281.7 | | NewsTap | 0.25 | 116 | 62 | 1160 | 121.3 | | MicroPlanet-Gravity | 0.15 | 61 | 33 | 641 | 104 | | ForteAgent | 0.12 | 177 | 145 | 2345 | 88.8 | | Pan | 0.1 | 144 | 90 | 1132 | 1294.6 | | Alpine | 0.08 | 10 | 11 | 42 | 94.4 | | slrn | 0.08 | 135 | 95 | 1022 | 70.7 | | tin | 0.08 | 90 | 32 | 631 | 90.7 | | Hogwasher | 0.03 | 121 | 21 | 812 | 83.3 | | Mime | -0.33 | 44 | 62 | 86 | 219 | | Xnews | -0.43 | 172 | 275 | 1724 | 216 | | Nemo | -0.54 | 26 | 24 | 352 | 92.2 | | MacCafe | -0.84 | 12 | 13 | 343 | 115.9 |
And while I had the data, I did a quick pass at most positive and most
negative groups
| folder | afn | n_email | n_articles | wrd_article | |-------------------------------+------+---------+------------+-------------|
| alt/alien/visitors | 1.99 | 8 | 200 | 2055.4 |
| alt/support/stop-smoking | 1.6 | 13 | 43 | 38.7 |
| it/sport/formula1 | 1.52 | 17 | 200 | 66.4 |
| it/comp/os/win/windows10 | 1.5 | 46 | 200 | 65.4 |
| alt/sewing | 1.46 | 21 | 64 | 226.7 |
| it/hobby/elettronica | 1.46 | 27 | 202 | 57.8 |
| it/sport/motociclismo | 1.43 | 21 | 200 | 60.5 |
| alt/html | 1.42 | 13 | 32 | 107.4 |
| it/comp/console | 1.39 | 32 | 200 | 53.1 |
| dc/jobs | 1.38 | 9 | 76 | 192.8 |
negative
| folder | afn|n_email|n_articles|wrd_article| |--------------------------------------+-----+-------+----------+-----------|
| alt/games/microsoft/flight-sim |-1.23| 8| 200| 212.2|
| alt/online-service/webtv |-0.95| 14| 73| 186.4|
| alt/sports/football/pro/buffalo-bills|-0.95| 9| 19| 171.9|
| soc/culture/scottish |-0.94| 9| 139| 993.1|
| alt/crime | -0.9| 46| 189| 271.1|
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)