• newsgroup posting stats; User-Agent counts?

    From Will@21:1/5 to All on Sun Aug 13 17:04:27 2023
    I'm poking around the per newsgroup posted article counts plotted on http://www.eternal-september.org/postingstats.php

    Is anyone aware of public summary statistics/datasets like this,
    but in text instead of images?

    In addition hierarchy popularity, I'd love to see/generate histograms
    of the User-Agent header.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Tue Aug 15 17:54:23 2023
    Am 13.08.2023 um 17:04:27 Uhr schrieb Will:

    In addition hierarchy popularity, I'd love to see/generate histograms
    of the User-Agent header.

    You can download the articles from your NNTP provider of all groups and
    let a software create statistics.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Will@21:1/5 to All on Sat Aug 19 22:03:45 2023
    You can download the articles from your NNTP provider of all groups and
    I thought this wouldn't be too much work. But I'm now on downloading
    hour 4 after some failed starts and restarts ... and I've limited to
    4000 groups and 200 articles per group.

    Hopefully eternal-september.org doesn't think this is abuse.

    let a software create statistics.
    "let" feels a bit ambitious, but I didn't ask chatGPT so maybe it could
    have been simpler. Instead I hacked together some perl and R.

    Details and code are on https://willforan.github.io/usenet/

    Some highlights:

    For 2023, and only the top 4 user-agents seen in each of the big8+sfnet,
    I calculated the % of the total articles (with user-agents)

    | top | TOTAL | G2|Mozilla|Xnews|ForteAgent|slrn|Gnus|XanaNews| |-------+-------+---+-------+-----+----------+----+----+--------|
    | alt | 24382 |45%| 20%| 11%| 4%| 0| 0| 0|
    | soc | 4966 |74%| 10%| 5%| 3%| 0| 0| 0|
    | comp | 4693 |46%| 26%| 0| 0| 5%| 4%| 0|
    | sci | 2697 |62%| 17%| 0| 4%| 3%| 0| 0|
    | misc | 855 |27%| 28%| 13%| 6%| 0| 0| 0|
    | talk | 840 |50%| 19%| 5%| 7%| 0| 0| 0|
    | sfnet | 405 |42%| 38%| 0| 0| 0| 12%| 4%|
    | news | 35 | 8%| 23%| 0| 8%| 0| 31%| 0|

    I think G2 is google groups. That and Mozilla easily account for the
    bulk of newswriters.



    Because I have full text, I ran a fun but ultimately pointless
    sentiment analysis per user-agent. The n_* columns might be useful
    to someone else though!
    * n_groups is number of groups that user agent has posted to.
    * n_emails is the count of unique emails using that client
    * n_articles are the number of messages sent
    * wrd_article is average words per message sent with client
    * "afn" is AFINN single word scores averaged within articles:
    -5 to +5 sentiment score.

    | agent | afn | n_groups | n_email | n_articles | wrd_article | |---------------------+-------+----------+---------+------------+-------------| | NeoMutt | 0.81 | 18 | 14 | 160 | 271 | | K-9 | 0.78 | 16 | 17 | 50 | 203 | | Pluto | 0.76 | 2 | 13 | 83 | 56.3 | | 40tude_Dialog | 0.67 | 47 | 27 | 656 | 70.5 | | Messenger-Pro | 0.67 | 3 | 13 | 88 | 58.5 | | Evolution | 0.64 | 37 | 54 | 447 | 203.4 | | Turnpike | 0.62 | 13 | 19 | 245 | 82.3 | | Mutt | 0.6 | 33 | 34 | 266 | 250.3 | | Usenapp | 0.52 | 37 | 24 | 266 | 59 | | G2 | 0.45 | 616 | 2634 | 29367 | 1151.5 | | Roundcube | 0.44 | 13 | 11 | 42 | 250.1 | | Gnus | 0.38 | 94 | 83 | 782 | 141.5 | | Thoth | 0.35 | 25 | 11 | 140 | 64.6 | | Mozilla | 0.33 | 512 | 1318 | 19797 | 225.3 | | XanaNews | 0.31 | 27 | 12 | 151 | 37.1 | | Unison | 0.3 | 36 | 25 | 147 | 59.5 | | NONE | 0.26 | 579 | 2313 | 25885 | 281.7 | | NewsTap | 0.25 | 116 | 62 | 1160 | 121.3 | | MicroPlanet-Gravity | 0.15 | 61 | 33 | 641 | 104 | | ForteAgent | 0.12 | 177 | 145 | 2345 | 88.8 | | Pan | 0.1 | 144 | 90 | 1132 | 1294.6 | | Alpine | 0.08 | 10 | 11 | 42 | 94.4 | | slrn | 0.08 | 135 | 95 | 1022 | 70.7 | | tin | 0.08 | 90 | 32 | 631 | 90.7 | | Hogwasher | 0.03 | 121 | 21 | 812 | 83.3 | | Mime | -0.33 | 44 | 62 | 86 | 219 | | Xnews | -0.43 | 172 | 275 | 1724 | 216 | | Nemo | -0.54 | 26 | 24 | 352 | 92.2 | | MacCafe | -0.84 | 12 | 13 | 343 | 115.9 |


    And while I had the data, I did a quick pass at most positive and most
    negative groups

    | folder | afn | n_email | n_articles | wrd_article | |-------------------------------+------+---------+------------+-------------|
    | alt/alien/visitors | 1.99 | 8 | 200 | 2055.4 |
    | alt/support/stop-smoking | 1.6 | 13 | 43 | 38.7 |
    | it/sport/formula1 | 1.52 | 17 | 200 | 66.4 |
    | it/comp/os/win/windows10 | 1.5 | 46 | 200 | 65.4 |
    | alt/sewing | 1.46 | 21 | 64 | 226.7 |
    | it/hobby/elettronica | 1.46 | 27 | 202 | 57.8 |
    | it/sport/motociclismo | 1.43 | 21 | 200 | 60.5 |
    | alt/html | 1.42 | 13 | 32 | 107.4 |
    | it/comp/console | 1.39 | 32 | 200 | 53.1 |
    | dc/jobs | 1.38 | 9 | 76 | 192.8 |

    negative
    | folder | afn|n_email|n_articles|wrd_article| |--------------------------------------+-----+-------+----------+-----------|
    | alt/games/microsoft/flight-sim |-1.23| 8| 200| 212.2|
    | alt/online-service/webtv |-0.95| 14| 73| 186.4|
    | alt/sports/football/pro/buffalo-bills|-0.95| 9| 19| 171.9|
    | soc/culture/scottish |-0.94| 9| 139| 993.1|
    | alt/crime | -0.9| 46| 189| 271.1|

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Mon Aug 21 15:46:04 2023
    Am 19.08.2023 um 22:03:45 Uhr schrieb Will:

    Hopefully eternal-september.org doesn't think this is abuse.

    You will be in the top 100 access clients in tomorrows statistics.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Tue Aug 22 18:24:01 2023
    Am 21.08.2023 um 15:46:04 Uhr schrieb Marco Moock:

    Am 19.08.2023 um 22:03:45 Uhr schrieb Will:

    Hopefully eternal-september.org doesn't think this is abuse.

    You will be in the top 100 access clients in tomorrows statistics.

    153.100.174.20 https://www.eternal-september.org/stats/news-notice.2023.08.21-04.00.01.html#nnrpd_groups

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Will@21:1/5 to All on Tue Aug 22 20:50:43 2023
    You will be in the top 100 access clients in tomorrows statistics.
    153.100.174.20 https://www.eternal-september.org/stats/news-notice.2023.08.21-04.00.01.html#nnrpd_groups

    That's a bit disappointing! I barely show up in the logs so far!

    The biggest entry I can find is a whopping 411.4 KB.
    I'm 2600:4041:d0::/46 or 74.111.96.0/19, Verizon/Pittsburgh, PA.

    153.100.174.20 is German?

    [I don't have a good grip on how routing to/between nntp servers works,
    but I don't think I'm doing anything funny that'd obscure my ip.]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Wed Aug 23 07:42:26 2023
    Am 22.08.2023 um 20:50:43 Uhr schrieb Will:

    You will be in the top 100 access clients in tomorrows statistics.

    153.100.174.20 https://www.eternal-september.org/stats/news-notice.2023.08.21-04.00.01.html#nnrpd_groups


    That's a bit disappointing! I barely show up in the logs so far!

    The biggest entry I can find is a whopping 411.4 KB.
    I'm 2600:4041:d0::/46 or 74.111.96.0/19, Verizon/Pittsburgh, PA.

    153.100.174.20 is German?

    According to whois it is Australia.

    [I don't have a good grip on how routing to/between nntp servers
    works, but I don't think I'm doing anything funny that'd obscure my
    ip.]

    It is called peering and peers simply exchange the news articles they
    have.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)