Re: From JoyceUlysses.txt -- words occurring exactly once
On 5/30/2024, HenHanna wrote:
could someone give me a pretty fast (and simple) program
that'd give me a list of all words occurring exactly once?
-- Also, a list of words occurring once, twice or 3 times
re: hyphenated words (you can treat it anyway you like)
ideally, i'd treat [editor-in-chief]
[go-ahead] [pen-knife]
[know-how] [far-fetched] ...
as one unit.
SP-Forth
REQUIRE /STRING lib/include/string.f
REQUIRE PLACE ~mak/place.f
REQUIRE PcreMatch ~ac/lib/string/regexp.f \ PCRE wrapper
REQUIRE new-hash ~pinka/lib/hash-table.f
REQUIRE {STR@LOCAL} ~ac/lib/str5.f \ to slurp file: FILE
REQUIRE CASE-INS lib/ext/caseins.f
\ Trim the string being searched so that the next
\ search point is past the string just found.
: advance-search-point ( cadr1 u1 cadr2 u2 -- cadr3 u3 )
+ ( a1 u1 a3)
2 pick - /string ;
\ Doesn't handle captures.
: PCRE-for-each-match ( str regex xt -- )
{ r-adr r-len xt }
begin
2dup
r-adr r-len PcreGetMatch
while
2dup xt execute
advance-search-point
repeat
2drop
;
: increment-ht-value { caddr u ht }
caddr u ht HASH@N
if 1+ else 1 then
caddr u ht HASH!N ;
3000 new-hash value hash-table
: process-word ( cadr u -- )
hash-table increment-ht-value ;
: go
s" Alice.txt" FILE \ Slurp.
s" [a-zA-Z][-a-zA-Z]*[a-zA-Z]"
['] process-word
PCRE-for-each-match
;
go
s" Entries: " type hash-table hash-count . cr
\ Show words that were found 3 times and that
\ have a length > 8:
hash-table
:noname { n adr len }
n 3 = len 8 > and
if n . adr len type cr then ;
for-hash
===>
3 considering
3 considered
3 unfortunate
3 directions
3 rose-tree
3 listening
3 impossible
3 carefully
3 cautiously
3 spectacles
3 e--e--evening
3 particular
3 mentioned
3 forgetting
3 immediately
3 pattering
3 repeating
3 themselves
3 yesterday
3 hedgehogs
3 out-of-the-way
3 knowledge
3 expecting
3 attending
3 adventures
3 remarking
3 confusing
3 muttering
3 telescope
3 succeeded
3 advantage
hash-table del-hash
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)