I have used giganews for grabbing old articles, but they only reach
2004. Does anyone have older text retention available over NNTP (i.e not google newsgroups or web archives). I would love to slurp/archive
anything not stored on the major commercial providers.
If so, can you give a rough disk usage and storage backend?
I have seen people mention 50mb/day recently based on eternal-september stats, so assuming the average daily usage is static since 1980, it
should be under 1TB.
If not, I am planning to inject articles from archive.org and anywhere
else I can find them.
Are there any issues with injecting posts from 30 years ago? I don't
peer with anyone but if I can get everything imported and renumbered correctly for my local reader to understand, I might consider peering or making a public NNTP connection available.
-------------
ZMarkGC
I have used giganews for grabbing old articles, but they only reach
2004. Does anyone have older text retention available over NNTP (i.e not google newsgroups or web archives). I would love to slurp/archive
anything not stored on the major commercial providers.
If so, can you give a rough disk usage and storage backend?
I have seen people mention 50mb/day recently based on eternal-september stats, so assuming the average daily usage is static since 1980, it
should be under 1TB.
If not, I am planning to inject articles from archive.org and anywhere
else I can find them.
Are there any issues with injecting posts from 30 years ago? I don't
peer with anyone but if I can get everything imported and renumbered correctly for my local reader to understand, I might consider peering or making a public NNTP connection available.
ZMarkGC wrote:
I have used giganews for grabbing old articles, but they only reach
2004. Does anyone have older text retention available over NNTP (i.e not
google newsgroups or web archives). I would love to slurp/archive
anything not stored on the major commercial providers.
If so, can you give a rough disk usage and storage backend?
I have seen people mention 50mb/day recently based on eternal-september
stats, so assuming the average daily usage is static since 1980, it
should be under 1TB.
If not, I am planning to inject articles from archive.org and anywhere
else I can find them.
Are there any issues with injecting posts from 30 years ago? I don't
peer with anyone but if I can get everything imported and renumbered
correctly for my local reader to understand, I might consider peering or
making a public NNTP connection available.
It's been a while since I looked at them, but I grabbed some old archives
and took a look. The oldest ones I found (some were uni's sending their first test article) had some differences in headers.
I can't remember right now the specifics, but it would take some (probably simple) scripting to modify them to work correctly with current news servers.
I found an example: ----------
Autzoo.101
test
utzoo!henry
Fri Feb 6 00:19:47 1981
first_test
This is the first U of T test of the Duke news program.
Here is some more text.
And some more.
----------
The newer the article, the less work the header needs to work properly.
I have used giganews for grabbing old articles, but they only reach
2004. Does anyone have older text retention available over NNTP (i.e not google newsgroups or web archives). I would love to slurp/archive
anything not stored on the major commercial providers.
If so, can you give a rough disk usage and storage backend?
I have seen people mention 50mb/day recently based on eternal-september stats, so assuming the average daily usage is static since 1980, it
should be under 1TB.
If not, I am planning to inject articles from archive.org and anywhere
else I can find them.
Are there any issues with injecting posts from 30 years ago? I don't
peer with anyone but if I can get everything imported and renumbered correctly for my local reader to understand, I might consider peering or making a public NNTP connection available.
On Sun, 14 May 2023 22:56:39 +0100
ZMarkGC <ZMarkGC@example.com> wrote:
I have used giganews for grabbing old articles, but they only reach
2004. Does anyone have older text retention available over NNTP (i.e not
google newsgroups or web archives). I would love to slurp/archive
anything not stored on the major commercial providers.
If so, can you give a rough disk usage and storage backend?
I have seen people mention 50mb/day recently based on eternal-september
stats, so assuming the average daily usage is static since 1980, it
should be under 1TB.
If not, I am planning to inject articles from archive.org and anywhere
else I can find them.
https://www.xach.com/naggum/articles/notes.html has a link to a comp.lang.lisp archive , http://data.xach.com.s3.amazonaws.com/cll.txt.gz . This I think is close to what you're asking but specific to one newsgroup. Earliest posts are from 1987. The moderator of comp.compilers also keeps a comprehensive archive going back to the 1990s. You can find it with a bit of googling.
Are there any issues with injecting posts from 30 years ago? I don't
peer with anyone but if I can get everything imported and renumbered
correctly for my local reader to understand, I might consider peering or
making a public NNTP connection available.
A public NNTP connection to such an archive would be amazing.
Removed 'Relay-Version', 'Posting-Version' and 'Date-Received' headers.
Now they post except for one exception. I still get '441 Can't set
system Xref header field' on some articles, but it is a minority of
them.
If anyone has suggestions on the above error (Xref), I'd be glad to try
to get those articles to post also.
Hi Retro Guy,
Removed 'Relay-Version', 'Posting-Version' and 'Date-Received' headers.
Now they post except for one exception. I still get '441 Can't set
system Xref header field' on some articles, but it is a minority of
them.
If anyone has suggestions on the above error (Xref), I'd be glad to try
to get those articles to post also.
I would just suggest to remove existing Xref header fields, like you did
for Relay-Version & al.
I bet you'll find out that the more recent the articles are, the more
header fields you'll need adding in the list to remove as they are not supposed to be present in posted articles.
Like X-Trace, X-Complaints-To, NNTP-Posting-Host, Injection-Info, etc.
Hi Retro Guy,
Removed 'Relay-Version', 'Posting-Version' and 'Date-Received' headers.
Now they post except for one exception. I still get '441 Can't set
system Xref header field' on some articles, but it is a minority of
them.
If anyone has suggestions on the above error (Xref), I'd be glad to try
to get those articles to post also.
I would just suggest to remove existing Xref header fields, like you did
for Relay-Version & al.
I bet you'll find out that the more recent the articles are, the more
header fields you'll need adding in the list to remove as they are not supposed to be present in posted articles.
Like X-Trace, X-Complaints-To, NNTP-Posting-Host, Injection-Info, etc.
Spiros Bousbouras wrote:
On Sun, 14 May 2023 22:56:39 +0100
ZMarkGC <ZMarkGC@example.com> wrote:
I have used giganews for grabbing old articles, but they only reach
2004. Does anyone have older text retention available over NNTP (i.e not >>> google newsgroups or web archives). I would love to slurp/archive
anything not stored on the major commercial providers.
If so, can you give a rough disk usage and storage backend?
I have seen people mention 50mb/day recently based on eternal-september
stats, so assuming the average daily usage is static since 1980, it
should be under 1TB.
If not, I am planning to inject articles from archive.org and anywhere
else I can find them.
https://www.xach.com/naggum/articles/notes.html has a link to a
comp.lang.lisp archive , http://data.xach.com.s3.amazonaws.com/cll.txt.gz . >> This I think is close to what you're asking but specific to one newsgroup. >> Earliest posts are from 1987. The moderator of comp.compilers also keeps a >> comprehensive archive going back to the 1990s. You can find it with a bit of >> googling.
Are there any issues with injecting posts from 30 years ago? I don't
peer with anyone but if I can get everything imported and renumbered
correctly for my local reader to understand, I might consider peering or >>> making a public NNTP connection available.
A public NNTP connection to such an archive would be amazing.
I've taken some time to modify some articles so that inn2 will accept them. These are all from the 1980s.
I needed to change the Date: format, so all the articles now end up with
my timezone (MST), but the date/times are correct, just wrong timezone. Removed 'Relay-Version', 'Posting-Version' and 'Date-Received' headers.
Now they post except for one exception. I still get '441 Can't set system Xref
header field'
on some articles, but it is a minority of them.
I've started with the can.* hierarchy, and will continue through the rest of what I have (which is a lot), but it will take me a long time to complete.
You are free to view and/or pull the articles from news.novalink.us:119 if you are interested. It will probably take me most of the summer to get it all done as I don't have a ton of free time to work on it, but I want to complete at some point.
If anyone has suggestions on the above error (Xref), I'd be glad to try to get
those articles to post also.
No account required to read at news.novalink.us:119
I bet you'll find out that the more recent the articles are, the more
header fields you'll need adding in the list to remove as they are not
supposed to be present in posted articles.
Like X-Trace, X-Complaints-To, NNTP-Posting-Host, Injection-Info, etc.
When I use suck/pullnews, articles with these headers come in with no issue, is this due to a difference in the way the message gets to INN?
Hi Jesse,
I bet you'll find out that the more recent the articles are, the more
header fields you'll need adding in the list to remove as they are not
supposed to be present in posted articles.
Like X-Trace, X-Complaints-To, NNTP-Posting-Host, Injection-Info, etc.
When I use suck/pullnews, articles with these headers come in with no issue, >> is this due to a difference in the way the message gets to INN?
Yes, you've configured in incoming.conf your suck/pullnews connections
to be handled by innd.
Retro Guy uses nnrpd. He may want to try to feed innd, that's a good
idea (hoping it won't complain of missing headers).
Julien_ÉLIE wrote:
Hi Retro Guy,
Removed 'Relay-Version', 'Posting-Version' and 'Date-Received' headers.
Now they post except for one exception. I still get '441 Can't set
system Xref header field' on some articles, but it is a minority of
them.
If anyone has suggestions on the above error (Xref), I'd be glad to try
to get those articles to post also.
I would just suggest to remove existing Xref header fields, like you did
for Relay-Version & al.
I bet you'll find out that the more recent the articles are, the more
header fields you'll need adding in the list to remove as they are not
supposed to be present in posted articles.
Like X-Trace, X-Complaints-To, NNTP-Posting-Host, Injection-Info, etc.
Thank you for the hints. I will go ahead and add these headers for deletion as they don't need to be there anyway when posting as a READER.
Let's see how it goes :)
Hi Jesse,
I bet you'll find out that the more recent the articles are, the more
header fields you'll need adding in the list to remove as they are not
supposed to be present in posted articles.
Like X-Trace, X-Complaints-To, NNTP-Posting-Host, Injection-Info, etc.
When I use suck/pullnews, articles with these headers come in with no issue, >> is this due to a difference in the way the message gets to INN?
Yes, you've configured in incoming.conf your suck/pullnews connections
to be handled by innd.
Retro Guy uses nnrpd. He may want to try to feed innd, that's a good
idea (hoping it won't complain of missing headers).
On Jun 2, 2023 at 3:26:20 PM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:
Hi Jesse,
I bet you'll find out that the more recent the articles are, the more
header fields you'll need adding in the list to remove as they are not >>>> supposed to be present in posted articles.
Like X-Trace, X-Complaints-To, NNTP-Posting-Host, Injection-Info, etc.
When I use suck/pullnews, articles with these headers come in with no issue,
is this due to a difference in the way the message gets to INN?
Yes, you've configured in incoming.conf your suck/pullnews connections
to be handled by innd.
Retro Guy uses nnrpd. He may want to try to feed innd, that's a good
idea (hoping it won't complain of missing headers).
I never added anything to incoming.conf, but I'm running the tools on the same
server as INN. I never paid attention to how the tools actually 'post' the articles to be honest.
On Jun 2, 2023 at 1:48:43 PM CDT, "Retro Guy" <Retro Guy> wrote:
Spiros Bousbouras wrote:
On Sun, 14 May 2023 22:56:39 +0100
ZMarkGC <ZMarkGC@example.com> wrote:
I have used giganews for grabbing old articles, but they only reach
2004. Does anyone have older text retention available over NNTP (i.e not >>>> google newsgroups or web archives). I would love to slurp/archive
anything not stored on the major commercial providers.
If so, can you give a rough disk usage and storage backend?
I have seen people mention 50mb/day recently based on eternal-september >>>> stats, so assuming the average daily usage is static since 1980, it
should be under 1TB.
If not, I am planning to inject articles from archive.org and anywhere >>>> else I can find them.
https://www.xach.com/naggum/articles/notes.html has a link to a
comp.lang.lisp archive , http://data.xach.com.s3.amazonaws.com/cll.txt.gz .
This I think is close to what you're asking but specific to one newsgroup. >>> Earliest posts are from 1987. The moderator of comp.compilers also keeps a
comprehensive archive going back to the 1990s. You can find it with a bit of
googling.
Are there any issues with injecting posts from 30 years ago? I don't
peer with anyone but if I can get everything imported and renumbered
correctly for my local reader to understand, I might consider peering or >>>> making a public NNTP connection available.
A public NNTP connection to such an archive would be amazing.
I've taken some time to modify some articles so that inn2 will accept them. >> These are all from the 1980s.
I needed to change the Date: format, so all the articles now end up with
my timezone (MST), but the date/times are correct, just wrong timezone.
Removed 'Relay-Version', 'Posting-Version' and 'Date-Received' headers.
Now they post except for one exception. I still get '441 Can't set system Xref
header field'
on some articles, but it is a minority of them.
I've started with the can.* hierarchy, and will continue through the rest of >> what I have (which is a lot), but it will take me a long time to complete. >>
You are free to view and/or pull the articles from news.novalink.us:119 if >> you are interested. It will probably take me most of the summer to get it all
done as I don't have a ton of free time to work on it, but I want to complete
at some point.
If anyone has suggestions on the above error (Xref), I'd be glad to try to get
those articles to post also.
No account required to read at news.novalink.us:119
Are you going to take a crack at the net.* stuff that's available in various archives? That stuff I will definitely suck off of your server, if you do. :)
Keep us updated as you progress. If you come up with a scriptable or easily repeatable process and need another machine to help munge/inject articles let me know, I'd be happy to offer some assistance.
One other thing I forgot to mention is that I needed to remove lines of
just '.', so I converted them to '..', same as a newsreader should.
One thing that would really make a difference is not needing to create the groups by hand. Is it possible for inn2 to create groups on demand? That would make all the difference.
Hi Retro Guy,
One thing that would really make a difference is not needing to create the >> groups by hand. Is it possible for inn2 to create groups on demand? That
would make all the difference.
No, it does not create groups on-the-fly.
Note that the logtrash parameter in inn.conf can be used to have a list
of newsgroups not present on the server but which received an attempt of post.
As you're parsing all the articles before feeding them, why not parse
the Newsgroups header field and create a list of newsgroups you then
make unique and run "ctlinnd newgroup xxx" on all of them? (INN will
then create missing newsgroups)
Julien_ÉLIE wrote:
Hi Retro Guy,
One thing that would really make a difference is not needing to create the >>> groups by hand. Is it possible for inn2 to create groups on demand? That >>> would make all the difference.
No, it does not create groups on-the-fly.
Note that the logtrash parameter in inn.conf can be used to have a list
of newsgroups not present on the server but which received an attempt of
post.
As you're parsing all the articles before feeding them, why not parse
the Newsgroups header field and create a list of newsgroups you then
make unique and run "ctlinnd newgroup xxx" on all of them? (INN will
then create missing newsgroups)
That's an excellent idea. My brain was getting bit weak trying to come up with a plan. That's when you miss the obvious :)
As you're parsing all the articles before feeding them, why not parse
the Newsgroups header field and create a list of newsgroups you then
make unique and run "ctlinnd newgroup xxx" on all of them? (INN will
then create missing newsgroups)
Julien ÉLIE wrote:
As you're parsing all the articles before feeding them, why not parse
the Newsgroups header field and create a list of newsgroups you then
make unique and run "ctlinnd newgroup xxx" on all of them? (INN will
then create missing newsgroups)
.... including all typos. :)
Thomas Hochstein wrote:
Julien ÉLIE wrote:
As you're parsing all the articles before feeding them, why not parse
the Newsgroups header field and create a list of newsgroups you then
make unique and run "ctlinnd newgroup xxx" on all of them? (INN will
then create missing newsgroups)
.... including all typos. :)
Very true! I'll try to clean those up later.
Currently uploading net.* and it's been running now for about 24 hours.
Let's see if inn2 recovers after this is done, it's throttling right now,
but accepting the posts.
Retro Guy wrote:
Thomas Hochstein wrote:
Julien ÉLIE wrote:
As you're parsing all the articles before feeding them, why not parse
the Newsgroups header field and create a list of newsgroups you then
make unique and run "ctlinnd newgroup xxx" on all of them? (INN will
then create missing newsgroups)
.... including all typos. :)
Very true! I'll try to clean those up later.
Currently uploading net.* and it's been running now for about 24 hours.
Let's see if inn2 recovers after this is done, it's throttling right now,
but accepting the posts.
Finally have net.* on the server. I needed to rebuild history when complete due probably to all the messing around I was doing with the server.
I'll clean up the typo group names at some point, but for now I plan to
put can.* back on, then move to some more hierarchies.
The fact it's working is nice to see.
I'm having some trouble where after inn2 runs for a few hours I
get the error 'File exists writing SMstore file -- throttling'
Hi Retro Guy,
I'm having some trouble where after inn2 runs for a few hours I
get the error 'File exists writing SMstore file -- throttling'
Do you happen to use tradspool and some newsgroup names have components
with only digits?
For instance, if you have a newsgroup named net.test.17 or
net.test.17.help and another named net.test, I believe this error will
come up when receiving article number 17 for net.test. INN will try to
write the article into the file <patharticles>/net/test/17 whereas it is
a directory (belonging to the net.test.17 newsgroup or net.test.17.help).
Or the inverse is possible: having net.test and trying to insert article
1 for the net.test.17 newsgroup whereas net.test already has 17 articles.
You should either remove the <patharticles>/net/test/17 file or the net.test.17 newsgroup. Or use another storage method.
Julien_ÉLIE wrote:
Hi Retro Guy,
I'm having some trouble where after inn2 runs for a few hours I
get the error 'File exists writing SMstore file -- throttling'
Do you happen to use tradspool and some newsgroup names have components
with only digits?
For instance, if you have a newsgroup named net.test.17 or
net.test.17.help and another named net.test, I believe this error will
come up when receiving article number 17 for net.test. INN will try to
write the article into the file <patharticles>/net/test/17 whereas it is
a directory (belonging to the net.test.17 newsgroup or net.test.17.help).
Or the inverse is possible: having net.test and trying to insert article
1 for the net.test.17 newsgroup whereas net.test already has 17 articles.
You should either remove the <patharticles>/net/test/17 file or the
net.test.17 newsgroup. Or use another storage method.
Thank you for the pointer. This appears to be exactly the problem. I found net.micro, net.micro432 and net.micro6809. Following your advice and the problem appears to be resolved.
On Jun 8, 2023 at 11:13:44 AM CDT, "Retro Guy" <Retro Guy> wrote:
Julien_ÉLIE wrote:
Hi Retro Guy,
I'm having some trouble where after inn2 runs for a few hours I
get the error 'File exists writing SMstore file -- throttling'
Do you happen to use tradspool and some newsgroup names have components
with only digits?
For instance, if you have a newsgroup named net.test.17 or
net.test.17.help and another named net.test, I believe this error will
come up when receiving article number 17 for net.test. INN will try to
write the article into the file <patharticles>/net/test/17 whereas it is >>> a directory (belonging to the net.test.17 newsgroup or net.test.17.help). >>
Or the inverse is possible: having net.test and trying to insert article >>> 1 for the net.test.17 newsgroup whereas net.test already has 17 articles. >>
You should either remove the <patharticles>/net/test/17 file or the
net.test.17 newsgroup. Or use another storage method.
Thank you for the pointer. This appears to be exactly the problem. I found >> net.micro, net.micro432 and net.micro6809. Following your advice and the
problem appears to be resolved.
How's your effort coming along?
I currently have 1.49 million posts on novalink.us:119. Some visible by web
On 23.08.23 14:31, Retro Guy wrote:
I currently have 1.49 million posts on novalink.us:119. Some visible by web
this is the utzoo archive?
i scanned novalink vs my server and sucked only few missing messages.
Actually, you need adding an additional dot to lines *beginning* with a
dot, not only lines containing only a dot.
On 02.06.23 23:36, Julien ÉLIE wrote:
Actually, you need adding an additional dot to lines *beginning* with a
dot, not only lines containing only a dot.
are you sure?
if a line contains a dot at the beginning and any text following can't
be a <CRLF>. the beginning dot in for example ".anytext" or ". any
text" does not need to be escaped.
3. If any line of the data block begins with the "termination octet"
("." or %x2E), that line MUST be "dot-stuffed" by prepending an
additional termination octet to that line of the block.
"Billy G. (go-while)" <no-reply@no.spam> writes:
On 02.06.23 23:36, Julien ÉLIE wrote:
Actually, you need adding an additional dot to lines *beginning* with a
dot, not only lines containing only a dot.
are you sure?
Yes. :)
But see the bit that you quoted:
3. If any line of the data block begins with the "termination octet"
("." or %x2E), that line MUST be "dot-stuffed" by prepending an
additional termination octet to that line of the block.
to send data use "dotwriter": add a dot to every leading dot.
to read data use "dotreader": cut any first dot if line is not only a (closing) dot.
i hope this is correct or did i miss something?
1) a) server sends article via IHAVE/TAKETHIS OR
  b) client sends article via POST to server OR
  c) server sends ARTICLE/BODY to a client:
  --> use "dotwriter"
2) server receives article via IHAVE/TAKETHIS/POST
  --> use "dotreader"
      + server writes data to storage
      + client requests ARTICLE/BODY: jump to 1) c).
3) client receives (reads) ARTICLE/BODY:
  --> use "dotreader" and print it.
On 12.09.23 19:39, Julien ÉLIE wrote:
Also for HEAD when you say ARTICLE/BODY.
hm but first char of a Header Line should by either a space to indicate
a continuing line or [A-Z] (maybe [a-z] for some strange (old) clients
which should not exist nowadays, in theory)?
any leading dot in the header should break when server receives it?
So it's allowed to have a header field name that starts with a period, as well as all sorts of other exotic and fascinating stuff that you don't see
in practice.
Also for HEAD when you say ARTICLE/BODY.
So it's allowed to have a header field name that starts with a period, as well as all sorts of other exotic and fascinating stuff that you don't see
in practice.
On 12.09.23 20:49, Russ Allbery wrote:
So it's allowed to have a header field name that starts with a period,
as well as all sorts of other exotic and fascinating stuff that you
don't see in practice.
great thanks your help is priceless!
Just tried, but looks like there's a bug in INN as headers are not dot-stuffed when retrieved via for instance ARTICLE:
POST
[...]
..header-test: valid
Adding a dot-stuffed .header-test header field in headers.
.. as well as a dot-stuffed line in the body.
.
ARTICLE
[...]
.header-test: valid
Adding a dot-stuffed .header-test header field in headers.
.. as well as a dot-stuffed line in the body.
.
FWIW, Thunderbird correctly shows the .header-test header field.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 498 |
Nodes: | 16 (0 / 16) |
Uptime: | 64:08:43 |
Calls: | 9,813 |
Calls today: | 1 |
Files: | 13,754 |
Messages: | 6,189,172 |