• How/where to find relevant TCL code examples

    From Richard Owlett@21:1/5 to All on Sat Feb 25 06:05:12 2023
    I was introduced to TCL a few years ago but never gained fluency.
    I didn't have any target problem to solve so interest waned.
    Now there is one. A subset of it needs to take an mbox formatted file
    and determine word frequencies in the individual include message bodies.

    I'd like to see examples of determining word frequencies in plain text
    files < 10 kbytes >.

    Looking at them will hopefully recognition of problems to be solved in
    real world.

    Suggestions?
    TIA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arjen Markus@21:1/5 to Richard Owlett on Sat Feb 25 04:30:47 2023
    On Saturday, February 25, 2023 at 1:05:27 PM UTC+1, Richard Owlett wrote:
    I was introduced to TCL a few years ago but never gained fluency.
    I didn't have any target problem to solve so interest waned.
    Now there is one. A subset of it needs to take an mbox formatted file
    and determine word frequencies in the individual include message bodies.

    I'd like to see examples of determining word frequencies in plain text
    files < 10 kbytes >.

    Looking at them will hopefully recognition of problems to be solved in
    real world.

    Suggestions?
    TIA

    You can look at the Wiki - wiki.tcl-lang.org - and the tutorial there.

    If you want to examine words, perhaps:
    - use [read] to read in a whole file into a single string
    - use [string map] to get rid of non-letters like commas and periods
    - use [foreach] and an array to loop over the words and to count them
    - use [parray] to print the contents of the array

    Well, this is merely a sketch of course, I leave the actual implementation as an exercise :).

    Note: this presumes the string of characters in the file can implicitly be interpreted as a list. Not all strings can be so interpreted, notably when they contain braces ({}). But you should be able to get a feel for the algorithm with the above receipe.

    Regards,

    Arjen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to Richard Owlett on Sat Feb 25 13:05:37 2023
    On 2/25/2023 4:05 AM, Richard Owlett wrote:
    I was introduced to TCL a few years ago but never gained fluency.
    I didn't have any target problem to solve so interest waned.
    Now there is one. A subset of it needs to take an mbox formatted file and determine word frequencies in the individual include message bodies.

    I'd like to see examples of determining word frequencies in plain text files < 10 kbytes >.

    Looking at them will hopefully recognition of problems to be solved in real world.

    Suggestions?
    TIA

    As Arjen mentioned, the use of the [read] command is quite
    useful here. You could then scan the single input string using
    the [regex] command to extract words.

    Besides the wiki, I highly recommend Ashok's book, shown on
    his main webpage, https://www.magicsplat.com/

    The book has quite many examples that are small and fully
    explained. With the pdf format, it's easy to copy/paste.

    You might want to search the internet for a dictionary file. I
    have 2 I use. One is just a file of English words, about 400k
    or so, one per line.

    The other is lines of definitions, <word definition>. If a
    word has more than 1 definition, it's then repeated on a
    following line.

    Using an actual dictionary and an array, it's easy to
    determine if some string extraction is in fact a word.

    I used both of these in a fun project I wrote during the
    pandemic, to solve word puzzles found in the game show "Wheel
    of fortune".

    Using a tcl array, was perfect for this use.

    One pleasant surprise was that when looking for a word that
    matched some pattern of missing letters, the best choices were
    the ones that had the most definitions, since the game show
    doesn't normally use obscure words. The larger dictionary
    would have the word as plural, -ing, -ed, etc. I use both.

    Have fun!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pd@21:1/5 to Richard Owlett on Sun Feb 26 19:27:47 2023
    On 25/2/23 22:05, Richard Owlett wrote:
    I was introduced to TCL a few years ago but never gained fluency.
    I didn't have any target problem to solve so interest waned.
    Now there is one. A subset of it needs to take an mbox formatted file
    and determine word frequencies in the individual include message bodies.

    I'd like to see examples of determining word frequencies in plain text
    files < 10 kbytes >.

    Looking at them will hopefully recognition of problems to be solved in
    real world.

    Suggestions?
    TIA

    Ashok has a simple example using dict incr (page 159)

    A simple example of maintaining word counts using a dictionary.
    % foreach word {Do what you can, ignore what you can't.} {
    dict incr word_counts $word
    }
    % puts $word_counts
    → Do 1 what 2 you 2 can, 1 ignore 1 can't. 1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Arjen Markus on Sun Feb 26 06:07:09 2023
    On 02/25/2023 06:30 AM, Arjen Markus wrote:
    On Saturday, February 25, 2023 at 1:05:27 PM UTC+1, Richard Owlett wrote:
    I was introduced to TCL a few years ago but never gained fluency.
    I didn't have any target problem to solve so interest waned.
    Now there is one. A subset of it needs to take an mbox formatted file
    and determine word frequencies in the individual include message bodies.

    I'd like to see examples of determining word frequencies in plain text
    files < 10 kbytes >.

    Looking at them will hopefully recognition of problems to be solved in
    real world.

    Suggestions?
    TIA

    You can look at the Wiki - wiki.tcl-lang.org - and the tutorial there.

    I like that reference. I will quickly bring me back up to where I was.


    If you want to examine words, perhaps:
    - use [read] to read in a whole file into a single string
    - use [string map] to get rid of non-letters like commas and periods

    Some thinking while browsing suggests that [string first] should help
    grab bodies of individual emails for examinations. A problem I had had
    no idea of how to attack. Thank you.

    - use [foreach] and an array to loop over the words and to count them
    - use [parray] to print the contents of the array

    Well, this is merely a sketch of course, I leave the actual implementation as an exercise :).

    You mean "student should do *homework*"? *GRIN*


    Note: this presumes the string of characters in the file can implicitly be interpreted as a list. Not all strings can be so interpreted, notably when they contain braces ({}). But you should be able to get a feel for the algorithm with the above
    receipe.

    Regards,

    Arjen


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to All on Sun Feb 26 06:57:28 2023
    On 02/25/2023 03:05 PM, et99 wrote:
    On 2/25/2023 4:05 AM, Richard Owlett wrote:
    I was introduced to TCL a few years ago but never gained fluency.
    I didn't have any target problem to solve so interest waned.
    Now there is one. A subset of it needs to take an mbox formatted file
    and determine word frequencies in the individual include message bodies.

    I'd like to see examples of determining word frequencies in plain text
    files < 10 kbytes >.

    Looking at them will hopefully recognition of problems to be solved in
    real world.

    Suggestions?
    TIA

    As Arjen mentioned, the use of the [read] command is quite
    useful here. You could then scan the single input string using
    the [regex] command to extract words.

    We are using different definitions of "word".
    For *MY* purposes:
    "A word is a space delimited string containing ONLY a->z or A->Z."
    I'll have to create a "word list" before performing any count.


    Besides the wiki, I highly recommend Ashok's book, shown on
    his main webpage, https://www.magicsplat.com/

    Site leads to a interesting chain of links. Thanks.


    The book has quite many examples that are small and fully
    explained. With the pdf format, it's easy to copy/paste.

    You might want to search the internet for a dictionary file. I
    have 2 I use. One is just a file of English words, about 400k
    or so, one per line.

    The other is lines of definitions, <word definition>. If a
    word has more than 1 definition, it's then repeated on a
    following line.

    Using an actual dictionary and an array, it's easy to
    determine if some string extraction is in fact a word.

    I used both of these in a fun project I wrote during the
    pandemic, to solve word puzzles found in the game show "Wheel
    of fortune".

    Using a tcl array, was perfect for this use.

    One pleasant surprise was that when looking for a word that
    matched some pattern of missing letters, the best choices were
    the ones that had the most definitions, since the game show
    doesn't normally use obscure words. The larger dictionary
    would have the word as plural, -ing, -ed, etc. I use both.

    Have fun!


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Manuel Collado@21:1/5 to All on Sun Feb 26 14:41:03 2023
    El 25/02/2023 a las 13:05, Richard Owlett escribió:
    I was introduced to TCL a few years ago but never gained fluency.
    I didn't have any target problem to solve so interest waned.
    Now there is one. A subset of it needs to take an mbox formatted file
    and determine word frequencies in the individual include message bodies.

    I'd like to see examples of determining word frequencies in plain text
    files < 10 kbytes >.

    The best tool for this kind of text processing tasks is probably not
    Tcl, but awk (or better gawk, the GNU variant). It should be available
    on most Unix/Linux platforms. A naive program to count word frequencies
    looks like the following:

    { for (k=1; k<=NF; k++) count[$k]++ }
    END { for (k in count) print k ": " count[k] }

    That's all.


    Looking at them will hopefully recognition of problems to be solved in
    real world.

    Suggestions?
    TIA


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Manuel Collado on Sun Feb 26 08:12:05 2023
    On 02/26/2023 07:41 AM, Manuel Collado wrote:
    El 25/02/2023 a las 13:05, Richard Owlett escribió:
    I was introduced to TCL a few years ago but never gained fluency.
    I didn't have any target problem to solve so interest waned.
    Now there is one. A subset of it needs to take an mbox formatted file
    and determine word frequencies in the individual include message bodies.

    I'd like to see examples of determining word frequencies in plain text
    files < 10 kbytes >.

    The best tool for this kind of text processing tasks is probably not
    Tcl, but awk (or better gawk, the GNU variant). It should be available
    on most Unix/Linux platforms. A naive program to count word frequencies looks like the following:

    { for (k=1; k<=NF; k++) count[$k]++ }
    END { for (k in count) print k ": " count[k] }

    That's all.


    Looking at them will hopefully recognition of problems to be solved in
    real world.

    Suggestions?
    TIA



    As I run Debian, I have both gawk and mawk available.
    For my "real world" problem, that's probably the way to go.

    I will still pursue a TCL solution as part of my motivation was gaining proficiency in TCL.
    Thanks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Mon Feb 27 10:51:49 2023
    * Richard Owlett <rowlett@cloud85.net>
    | As I run Debian, I have both gawk and mawk available.
    | For my "real world" problem, that's probably the way to go.

    A word of caution regarding 'mawk': I had problems with it in the past
    due to 32/64 bit issues.

    $ echo 4294967295 | gawk '{printf "%lu\n", $1+1}'
    4294967296

    $ echo 4294967295 | mawk '{printf "%lu\n", $1+1}'
    4294967295

    I found that mawk was not suited for integer values > 4GB (imagine file
    sizes etc). Since gawk is available, I did not pursue this further.

    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)