me get back up on and running by converting to CNFS.
If I'm looking for something in particular, I could just go into the >spoo/aritlces directory and start grepping. This isn't possible with
cnfs. So the questions is:
1. How can I search the entire spool for a given phrase.
2. How can I search a given hierarchy recursively for a given phrase.
I'm sure I'm not the first person who wants to do this so hopeful
someone has a solution.
1. How can I search the entire spool for a given phrase.
2. How can I search a given hierarchy recursively for a given phrase.
I'm sure I'm not the first person who wants to do this so hopeful
someone has a solution.
Hi Nigel,
1. How can I search the entire spool for a given phrase.
2. How can I search a given hierarchy recursively for a given
phrase.
I'm sure I'm not the first person who wants to do this so hopeful
someone has a solution.
I'm unfortunately not aware of such a tool for other storage methods
than tradindexed. Even timehash won't respond to the second point as articles are not classified by hierarchy.
If someone has ever written a script to do that, I would happily add
it to INN.
I would otherwise just suggest to retrieve articles one by one from
the history file, and parse them.
To do that, take the last field of each line in the history field,
and give its value to "sm -q".
Example for the first article in the history file in pathdb:
% head -n1 history | cut -f3 | sm -q
You'll get the article on standard output. You could then grep in it whatever you want.
You now need to iterate over each article.
Of course a more complex script should be written if your search has
several parameters (in a hierarchy, from someone, etc.).
1. How can I search the entire spool for a given phrase.
2. How can I search a given hierarchy recursively for a given phrase.
Nigel Reed <sysop@endofthelinebbs.com> wrote:
1. How can I search the entire spool for a given phrase.
2. How can I search a given hierarchy recursively for a given
phrase.
I don't have a direct answer (other than what Julien said), but if
the search phrase is always the same, then you could add another
newsfeeds entry and feed all new articles to the program with "Tp".
This program (script) could do the grep and, for example, send you an
email, or post to a private group, or whatever (~15 years ago I used
to have a similar setup, reposting all replies to my posts to a
private group, accessible only by me -- it was more convenient to
reply to them this way).
Now I'm using it to count articles on Polish newsgroups to produce
results like these:
http://news.chmurka.net/top15.php
The line is:
chmurka.postprocessor.pl\
:!*,pl.*,alt.pl.*\
:Tp:/usr/local/news/local/bin/post-processor-pl.sh %s
Now, as I think of it, it would be doable to do such grepping script
by writing a program that reads the CNFS file (the format has to be
described somewhere, or can be deduced from the source code) and
maybe dumps its contents to stdout (or searches for a phrase and
dumps the whole article), but I don't know of anything like that.
On the other hand, CNFS is a binary format, but posts are stored in a
text format, so maybe something like this will suffice?
strings cnfs-file | grep phrase
grep -a phrase cnfs-file
At this point I have over 7 million articles. Do you have any idea how
long that is going to take? :)
At this point I have over 7 million articles. Do you have any idea how
long that is going to take? :)
Maybe I should have just got a new disk and formatted it with 3 times
the inodes!
At this point I have over 7 million articles. Do you have any idea how
long that is going to take? :)
Nigel Reed <sysop@endofthelinebbs.com> writes:
At this point I have over 7 million articles. Do you have any idea how
long that is going to take? :)
grep is probably somewhat more optimized than sm when reading files, but searching 7 million articles is just going to be slow no matter how you retrieve the article. sm is retrieving the article in mostly similar ways (mmap) to how grep is retrieving it.
Searching 7 million articles without an index is just going to be slow no matter how you do it. This is why when search is an anticipated
operation, people pre-create the search index.
(There have been multiple proposals for a search capability in NNTP and
ways to integrate search into INN over the years, but none of them have stuck, in part because the open source search tools keep changing and everyone stops using the old ones.)
Unfortunately the queries would be ad-hoc. I was looking at CPAN,
212,470 modules and nothing to work with CNFS.
Anyway, if you want to do a text search, it would be best if you made
a dictionary first (for example, linking all words to tokens that
contain them), but it would be better to use a ready-made indexer for
that (there has to be one)...
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 13:39:03 |
Calls: | 10,389 |
Calls today: | 4 |
Files: | 14,061 |
Messages: | 6,416,888 |
Posted today: | 1 |