Forum: >>> Magnum BBS <<<

How/where to find relevant TCL code examples

From Richard Owlett@21:1/5 to All on Sat Feb 25 06:05:12 2023

I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.

I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.

Looking at them will hopefully recognition of problems to be solved in
real world.

Suggestions?
TIA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arjen Markus@21:1/5 to Richard Owlett on Sat Feb 25 04:30:47 2023

On Saturday, February 25, 2023 at 1:05:27 PM UTC+1, Richard Owlett wrote:

I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.

I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.

Looking at them will hopefully recognition of problems to be solved in
real world.

Suggestions?
TIA

You can look at the Wiki - wiki.tcl-lang.org - and the tutorial there.

If you want to examine words, perhaps:
- use [read] to read in a whole file into a single string
- use [string map] to get rid of non-letters like commas and periods
- use [foreach] and an array to loop over the words and to count them
- use [parray] to print the contents of the array

Well, this is merely a sketch of course, I leave the actual implementation as an exercise :).

Note: this presumes the string of characters in the file can implicitly be interpreted as a list. Not all strings can be so interpreted, notably when they contain braces ({}). But you should be able to get a feel for the algorithm with the above receipe.

Regards,

Arjen

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From et99@21:1/5 to Richard Owlett on Sat Feb 25 13:05:37 2023

On 2/25/2023 4:05 AM, Richard Owlett wrote:

I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file and determine word frequencies in the individual include message bodies.

I'd like to see examples of determining word frequencies in plain text files < 10 kbytes >.

Looking at them will hopefully recognition of problems to be solved in real world.

Suggestions?
TIA

As Arjen mentioned, the use of the [read] command is quite
useful here. You could then scan the single input string using
the [regex] command to extract words.

Besides the wiki, I highly recommend Ashok's book, shown on
his main webpage, https://www.magicsplat.com/

The book has quite many examples that are small and fully
explained. With the pdf format, it's easy to copy/paste.

You might want to search the internet for a dictionary file. I
have 2 I use. One is just a file of English words, about 400k
or so, one per line.

The other is lines of definitions, <word definition>. If a
word has more than 1 definition, it's then repeated on a
following line.

Using an actual dictionary and an array, it's easy to
determine if some string extraction is in fact a word.

I used both of these in a fun project I wrote during the
pandemic, to solve word puzzles found in the game show "Wheel
of fortune".

Using a tcl array, was perfect for this use.

One pleasant surprise was that when looking for a word that
matched some pattern of missing letters, the best choices were
the ones that had the most definitions, since the game show
doesn't normally use obscure words. The larger dictionary
would have the word as plural, -ing, -ed, etc. I use both.

Have fun!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pd@21:1/5 to Richard Owlett on Sun Feb 26 19:27:47 2023

On 25/2/23 22:05, Richard Owlett wrote:

I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.

I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.

Looking at them will hopefully recognition of problems to be solved in
real world.

Suggestions?
TIA

Ashok has a simple example using dict incr (page 159)

A simple example of maintaining word counts using a dictionary.
% foreach word {Do what you can, ignore what you can't.} {
dict incr word_counts $word
}
% puts $word_counts
→ Do 1 what 2 you 2 can, 1 ignore 1 can't. 1

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Arjen Markus on Sun Feb 26 06:07:09 2023

On 02/25/2023 06:30 AM, Arjen Markus wrote:

On Saturday, February 25, 2023 at 1:05:27 PM UTC+1, Richard Owlett wrote:

I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.

I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.

Looking at them will hopefully recognition of problems to be solved in
real world.

Suggestions?
TIA

You can look at the Wiki - wiki.tcl-lang.org - and the tutorial there.

I like that reference. I will quickly bring me back up to where I was.

If you want to examine words, perhaps:
- use [read] to read in a whole file into a single string
- use [string map] to get rid of non-letters like commas and periods

Some thinking while browsing suggests that [string first] should help
grab bodies of individual emails for examinations. A problem I had had
no idea of how to attack. Thank you.

- use [foreach] and an array to loop over the words and to count them
- use [parray] to print the contents of the array

Well, this is merely a sketch of course, I leave the actual implementation as an exercise :).

You mean "student should do *homework*"? *GRIN*

Note: this presumes the string of characters in the file can implicitly be interpreted as a list. Not all strings can be so interpreted, notably when they contain braces ({}). But you should be able to get a feel for the algorithm with the above

receipe.

Regards,

Arjen

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to All on Sun Feb 26 06:57:28 2023

On 02/25/2023 03:05 PM, et99 wrote:

On 2/25/2023 4:05 AM, Richard Owlett wrote:

I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.

I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.

Looking at them will hopefully recognition of problems to be solved in
real world.

Suggestions?
TIA

As Arjen mentioned, the use of the [read] command is quite
useful here. You could then scan the single input string using
the [regex] command to extract words.

We are using different definitions of "word".
For *MY* purposes:
"A word is a space delimited string containing ONLY a->z or A->Z."
I'll have to create a "word list" before performing any count.

Besides the wiki, I highly recommend Ashok's book, shown on
his main webpage, https://www.magicsplat.com/

Site leads to a interesting chain of links. Thanks.

The book has quite many examples that are small and fully
explained. With the pdf format, it's easy to copy/paste.

You might want to search the internet for a dictionary file. I
have 2 I use. One is just a file of English words, about 400k
or so, one per line.

The other is lines of definitions, <word definition>. If a
word has more than 1 definition, it's then repeated on a
following line.

Using an actual dictionary and an array, it's easy to
determine if some string extraction is in fact a word.

I used both of these in a fun project I wrote during the
pandemic, to solve word puzzles found in the game show "Wheel
of fortune".

Using a tcl array, was perfect for this use.

One pleasant surprise was that when looking for a word that
matched some pattern of missing letters, the best choices were
the ones that had the most definitions, since the game show
doesn't normally use obscure words. The larger dictionary
would have the word as plural, -ing, -ed, etc. I use both.

Have fun!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Manuel Collado@21:1/5 to All on Sun Feb 26 14:41:03 2023

El 25/02/2023 a las 13:05, Richard Owlett escribió:

I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.

I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.

The best tool for this kind of text processing tasks is probably not
Tcl, but awk (or better gawk, the GNU variant). It should be available
on most Unix/Linux platforms. A naive program to count word frequencies
looks like the following:

{ for (k=1; k<=NF; k++) count[$k]++ }
END { for (k in count) print k ": " count[k] }

That's all.

Looking at them will hopefully recognition of problems to be solved in
real world.

Suggestions?
TIA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Manuel Collado on Sun Feb 26 08:12:05 2023

On 02/26/2023 07:41 AM, Manuel Collado wrote:

El 25/02/2023 a las 13:05, Richard Owlett escribió:

I was introduced to TCL a few years ago but never gained fluency.
I didn't have any target problem to solve so interest waned.
Now there is one. A subset of it needs to take an mbox formatted file
and determine word frequencies in the individual include message bodies.

I'd like to see examples of determining word frequencies in plain text
files < 10 kbytes >.

The best tool for this kind of text processing tasks is probably not
Tcl, but awk (or better gawk, the GNU variant). It should be available
on most Unix/Linux platforms. A naive program to count word frequencies looks like the following:

{ for (k=1; k<=NF; k++) count[$k]++ }
END { for (k in count) print k ": " count[k] }

That's all.

Looking at them will hopefully recognition of problems to be solved in
real world.

Suggestions?
TIA

As I run Debian, I have both gawk and mawk available.
For my "real world" problem, that's probably the way to go.

I will still pursue a TCL solution as part of my motivation was gaining proficiency in TCL.
Thanks

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ralf Fassel@21:1/5 to All on Mon Feb 27 10:51:49 2023

* Richard Owlett <rowlett@cloud85.net>
| As I run Debian, I have both gawk and mawk available.
| For my "real world" problem, that's probably the way to go.

A word of caution regarding 'mawk': I had problems with it in the past
due to 32/64 bit issues.

$ echo 4294967295 | gawk '{printf "%lu\n", $1+1}'
4294967296

$ echo 4294967295 | mawk '{printf "%lu\n", $1+1}'
4294967295

I found that mawk was not suited for integer values > 4GB (imagine file
sizes etc). Since gawk is available, I did not pursue this further.

R'

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Plume
  Sat Jun 7 11:13:29 2025
  from Uk via SSH
- Gwylbert
  Sat Jun 7 08:57:45 2025
  from Sydney, Nsw via Telnet
- Centurion
  Sat Jun 7 04:30:40 2025
  from Berea, Ohio via Telnet
- Plume
  Fri Jun 6 22:32:36 2025
  from Uk via Telnet
- Adam Fancher
  Fri Jun 6 18:28:55 2025
  from Winsted, Ct via Telnet
- Centurion
  Fri Jun 6 16:53:47 2025
  from Berea, Ohio via Telnet
- Centurion
  Fri Jun 6 16:52:24 2025
  from Berea, Ohio via Telnet
- Centurion
  Fri Jun 6 16:29:51 2025
  from Berea, Ohio via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	485
Nodes:	16 (2 / 14)
Uptime:	131:51:42
Calls:	9,655
Calls today:	3
Files:	13,707
Messages:	6,166,575

How/where to find relevant TCL code examples

Who's Online

Recent Visitors

System Info