I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.
I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.
Looking at them will hopefully recognition of problems to be solved in
real world.
Suggestions?
TIA
I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file and determine word frequencies in the individual include message bodies.
I'd like to see examples of determining word frequencies in plain text files < 10 kbytes >.
Looking at them will hopefully recognition of problems to be solved in real world.
Suggestions?
TIA
I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.
I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.
Looking at them will hopefully recognition of problems to be solved in
real world.
Suggestions?
TIA
On Saturday, February 25, 2023 at 1:05:27 PM UTC+1, Richard Owlett wrote:
I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.
I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.
Looking at them will hopefully recognition of problems to be solved in
real world.
Suggestions?
TIA
You can look at the Wiki - wiki.tcl-lang.org - and the tutorial there.
If you want to examine words, perhaps:
- use [read] to read in a whole file into a single string
- use [string map] to get rid of non-letters like commas and periods
- use [foreach] and an array to loop over the words and to count them
- use [parray] to print the contents of the array
Well, this is merely a sketch of course, I leave the actual implementation as an exercise :).
Note: this presumes the string of characters in the file can implicitly be interpreted as a list. Not all strings can be so interpreted, notably when they contain braces ({}). But you should be able to get a feel for the algorithm with the abovereceipe.
Regards,
Arjen
On 2/25/2023 4:05 AM, Richard Owlett wrote:
I was introduced to TCL a few years ago but never gained fluency.As Arjen mentioned, the use of the [read] command is quite
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.
I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.
Looking at them will hopefully recognition of problems to be solved in
real world.
Suggestions?
TIA
useful here. You could then scan the single input string using
the [regex] command to extract words.
Besides the wiki, I highly recommend Ashok's book, shown on
his main webpage, https://www.magicsplat.com/
The book has quite many examples that are small and fully
explained. With the pdf format, it's easy to copy/paste.
You might want to search the internet for a dictionary file. I
have 2 I use. One is just a file of English words, about 400k
or so, one per line.
The other is lines of definitions, <word definition>. If a
word has more than 1 definition, it's then repeated on a
following line.
Using an actual dictionary and an array, it's easy to
determine if some string extraction is in fact a word.
I used both of these in a fun project I wrote during the
pandemic, to solve word puzzles found in the game show "Wheel
of fortune".
Using a tcl array, was perfect for this use.
One pleasant surprise was that when looking for a word that
matched some pattern of missing letters, the best choices were
the ones that had the most definitions, since the game show
doesn't normally use obscure words. The larger dictionary
would have the word as plural, -ing, -ed, etc. I use both.
Have fun!
I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.
I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.
Looking at them will hopefully recognition of problems to be solved in
real world.
Suggestions?
TIA
El 25/02/2023 a las 13:05, Richard Owlett escribió:
I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.
I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.
The best tool for this kind of text processing tasks is probably not
Tcl, but awk (or better gawk, the GNU variant). It should be available
on most Unix/Linux platforms. A naive program to count word frequencies looks like the following:
{ for (k=1; k<=NF; k++) count[$k]++ }
END { for (k in count) print k ": " count[k] }
That's all.
Looking at them will hopefully recognition of problems to be solved in
real world.
Suggestions?
TIA
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 485 |
Nodes: | 16 (2 / 14) |
Uptime: | 131:51:42 |
Calls: | 9,655 |
Calls today: | 3 |
Files: | 13,707 |
Messages: | 6,166,575 |