• regexp to locate instances of a specific character when not enclosed in

    From The Rickster@21:1/5 to All on Fri Nov 4 10:56:05 2022
    given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}

    I am able to use regexp negative look ahead to locate all comma chars in positions not enclosed in braces.
    ,(?![^\{]*\})

    However, an regular expression that identify locations of commas external to double quotes and braces would be appreciated.

    Thanks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luc@21:1/5 to The Rickster on Fri Nov 4 15:29:48 2022
    On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

    given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy
    Bob, Jr}

    I am able to use regexp negative look ahead to locate all comma chars in positions not enclosed in braces. ,(?![^\{]*\})

    However, an regular expression that identify locations of commas external
    to double quotes and braces would be appreciated.

    Thanks


    Those problems are better handled in multiple stages.

    For example, begin by removing everything in quotes or brackets:

    regsub -all {"[^"]*"|{[^{}]*}} $string "" variable

    Then all remaining commas are what you're looking for. Case closed.


    But if you still need to know their exact positions within the original complete string, then I think you'll have to write your own parser.
    But regexp still can help you a lot:

    % set FN {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
    % regexp -all -inline -indices {"} $FN
    {10 10} {25 25}

    The first double quote is in position 10, the second in position 25.
    There are no others.

    Let's confirm:

    % string first \" $FN 0
    10
    % string first \" $FN 11
    25

    OK. Let's continue:

    % regexp -all -inline -indices {,} $FN
    {20 20} {26 26} {38 38} {55 55}

    Four commas: 20, 26, 38, 55.

    The first comma is in that 10 to 25 range of the double quotes
    so it's out. All others are valid.

    The same trick with brackets is too hard because they are a pair, i.e.
    not one same character. So I recommend using string first for that:

    % string first \{ $FN 0
    33

    % string first \} $FN 33
    37

    So all commas within the 33 to 37 range are invalid.

    But there are more bracket pairs in the string. You have to find
    them all.

    You probably can pick up from there.

    --
    Luc


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Rickster@21:1/5 to Luc on Fri Nov 4 21:13:23 2022
    On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:
    On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

    given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy
    Bob, Jr}

    I am able to use regexp negative look ahead to locate all comma chars in positions not enclosed in braces. ,(?![^\{]*\})

    However, an regular expression that identify locations of commas external to double quotes and braces would be appreciated.

    Thanks
    Those problems are better handled in multiple stages.

    For example, begin by removing everything in quotes or brackets:

    regsub -all {"[^"]*"|{[^{}]*}} $string "" variable

    Then all remaining commas are what you're looking for. Case closed.


    But if you still need to know their exact positions within the original complete string, then I think you'll have to write your own parser.
    But regexp still can help you a lot:

    % set FN {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
    % regexp -all -inline -indices {"} $FN
    {10 10} {25 25}

    The first double quote is in position 10, the second in position 25.
    There are no others.

    Let's confirm:

    % string first \" $FN 0
    10
    % string first \" $FN 11
    25

    OK. Let's continue:

    % regexp -all -inline -indices {,} $FN
    {20 20} {26 26} {38 38} {55 55}

    Four commas: 20, 26, 38, 55.

    The first comma is in that 10 to 25 range of the double quotes
    so it's out. All others are valid.

    The same trick with brackets is too hard because they are a pair, i.e.
    not one same character. So I recommend using string first for that:

    % string first \{ $FN 0
    33

    % string first \} $FN 33
    37

    So all commas within the 33 to 37 range are invalid.

    But there are more bracket pairs in the string. You have to find
    them all.

    You probably can pick up from there.

    --
    Luc

    Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From heinrichmartin@21:1/5 to The Rickster on Sat Nov 5 06:25:53 2022
    On Saturday, November 5, 2022 at 5:13:26 AM UTC+1, The Rickster wrote:
    On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:
    On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

    given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}

    I am able to use regexp negative look ahead to locate all comma chars in positions not enclosed in braces. ,(?![^\{]*\})

    However, an regular expression that identify locations of commas external to double quotes and braces would be appreciated.

    A few notes about precise problem statements:
    Where does the string end? As there is not closing single quote, is the single quote part of the string?
    We currently experience two threads on c.l.t. that demonstrate the XY-problem - everyone can learn from that, too.

    Are you looking for the comma or are you looking for the definitions "* be *"? What is the grammar? May quotation marks or braces be nested or escaped?
    Does repeated whitespace matter? Which whitespace? May the statement span lines?

    % set in {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
    Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}
    % set kv [regexp -all -inline {(^Let\s|,)\s*(\w+)\s+be\s+([^,]+|"[^"]+"|\{[^\}]+\})\s*(?=,|$)} $in]
    {Let FN be "Billy Bob, jr."} {Let } FN {"Billy Bob, jr."} {,MI be {Bob}} , MI {{Bob}} {,LN be {Billy Bob, Jr}} , LN {{Billy Bob, Jr}}
    % dict get $kv FN
    "Billy Bob, jr."
    % dict get $kv MI
    {Bob}
    % dict get $kv LN
    {Billy Bob, Jr}

    Those problems are better handled in multiple stages.

    I disagree with the general statement (e.g. most parsers work be traversing the input only once), but I agree that regexp need not be the best solution.

    Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.

    A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Rickster@21:1/5 to The Rickster on Thu Nov 10 21:58:29 2022
    On Thursday, November 10, 2022 at 9:50:37 PM UTC-8, The Rickster wrote:
    On Saturday, November 5, 2022 at 6:25:56 AM UTC-7, heinrichmartin wrote:
    On Saturday, November 5, 2022 at 5:13:26 AM UTC+1, The Rickster wrote:
    On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:
    On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

    given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}

    I am able to use regexp negative look ahead to locate all comma chars in
    positions not enclosed in braces. ,(?![^\{]*\})

    However, an regular expression that identify locations of commas external
    to double quotes and braces would be appreciated.
    A few notes about precise problem statements:
    Where does the string end? As there is not closing single quote, is the single quote part of the string?
    We currently experience two threads on c.l.t. that demonstrate the XY-problem - everyone can learn from that, too.

    Are you looking for the comma or are you looking for the definitions "* be *"?
    What is the grammar? May quotation marks or braces be nested or escaped? Does repeated whitespace matter? Which whitespace? May the statement span lines?

    % set in {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
    Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}
    % set kv [regexp -all -inline {(^Let\s|,)\s*(\w+)\s+be\s+([^,]+|"[^"]+"|\{[^\}]+\})\s*(?=,|$)} $in]
    {Let FN be "Billy Bob, jr."} {Let } FN {"Billy Bob, jr."} {,MI be {Bob}} , MI {{Bob}} {,LN be {Billy Bob, Jr}} , LN {{Billy Bob, Jr}}
    % dict get $kv FN
    "Billy Bob, jr."
    % dict get $kv MI
    {Bob}
    % dict get $kv LN
    {Billy Bob, Jr}
    Those problems are better handled in multiple stages.
    I disagree with the general statement (e.g. most parsers work be traversing the input only once), but I agree that regexp need not be the best solution.
    Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.
    A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.
    Answers:
    There is no closing quote. A string of text may be enclosed in double quotes or braces.
    One can ignore the "Let" portion of the string. The intent is to be able to evaluate each ?varname be ?textstring, where text string is delimited by braces or double quotes.
    The statement does not span lines and repeated white space does not matter. Initial thoughts were to replace each comma with a 'non printable' character (e.g. x03) and then split the string..
    Hey, lesson learned - A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.
    The regexp you supplies is what was needed. What is c.l.t. ? how can I access? Rick

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Rickster@21:1/5 to heinrichmartin on Thu Nov 10 21:50:34 2022
    On Saturday, November 5, 2022 at 6:25:56 AM UTC-7, heinrichmartin wrote:
    On Saturday, November 5, 2022 at 5:13:26 AM UTC+1, The Rickster wrote:
    On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:
    On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

    given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}

    I am able to use regexp negative look ahead to locate all comma chars in
    positions not enclosed in braces. ,(?![^\{]*\})

    However, an regular expression that identify locations of commas external
    to double quotes and braces would be appreciated.
    A few notes about precise problem statements:
    Where does the string end? As there is not closing single quote, is the single quote part of the string?
    We currently experience two threads on c.l.t. that demonstrate the XY-problem - everyone can learn from that, too.

    Are you looking for the comma or are you looking for the definitions "* be *"?
    What is the grammar? May quotation marks or braces be nested or escaped?
    Does repeated whitespace matter? Which whitespace? May the statement span lines?

    % set in {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
    Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}
    % set kv [regexp -all -inline {(^Let\s|,)\s*(\w+)\s+be\s+([^,]+|"[^"]+"|\{[^\}]+\})\s*(?=,|$)} $in]
    {Let FN be "Billy Bob, jr."} {Let } FN {"Billy Bob, jr."} {,MI be {Bob}} , MI {{Bob}} {,LN be {Billy Bob, Jr}} , LN {{Billy Bob, Jr}}
    % dict get $kv FN
    "Billy Bob, jr."
    % dict get $kv MI
    {Bob}
    % dict get $kv LN
    {Billy Bob, Jr}
    Those problems are better handled in multiple stages.
    I disagree with the general statement (e.g. most parsers work be traversing the input only once), but I agree that regexp need not be the best solution.
    Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.
    A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.
    Answers:
    There is no closing quote. A string of text may be enclosed in double quotes or braces.
    One can ignore the "Let" portion of the string. The intent is to be able to evaluate each ?varname be ?textstring, where text string is delimited by braces or double quotes.
    The statement does not span lines and repeated white space does not matter. Initial thoughts were to replace each comma with a 'non printable' character (e.g. x03) and then split the string..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Fri Nov 11 08:12:30 2022
    Am 11.11.2022 um 06:58 schrieb The Rickster:
    The regexp you supplies is what was needed. What is c.l.t. ? how can I access?
    Rick

    Rick,
    happy, that it worked for you.
    clt (or c.l.t) is the abreviation of the comp.lang.tcl news group.
    That is, what this message ues

    Abou the acess: many people are quite unhappy with the access provided
    by google and a "real" news reader may be helpful.

    You may consult the wiki page for information:
    wiki.tcl-lang.org/clt

    Enjoy,
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From heinrichmartin@21:1/5 to Rick on Thu Nov 10 23:14:00 2022
    On Friday, November 11, 2022 at 6:58:32 AM UTC+1, Rick wrote:
    On Thursday, November 10, 2022 at 9:50:37 PM UTC-8, The Rickster wrote:
    On Saturday, November 5, 2022 at 6:25:56 AM UTC-7, heinrichmartin wrote:
    On Saturday, November 5, 2022 at 5:13:26 AM UTC+1, The Rickster wrote:
    On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:
    On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

    given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy
    Bob, Jr}

    I am able to use regexp negative look ahead to locate all comma chars in
    positions not enclosed in braces. ,(?![^\{]*\})

    However, an regular expression that identify locations of commas external
    to double quotes and braces would be appreciated.
    A few notes about precise problem statements:
    Where does the string end? As there is not closing single quote, is the single quote part of the string?
    We currently experience two threads on c.l.t. that demonstrate the XY-problem - everyone can learn from that, too.

    Are you looking for the comma or are you looking for the definitions "* be *"?
    What is the grammar? May quotation marks or braces be nested or escaped? Does repeated whitespace matter? Which whitespace? May the statement span lines?

    % set in {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}} Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}
    % set kv [regexp -all -inline {(^Let\s|,)\s*(\w+)\s+be\s+([^,]+|"[^"]+"|\{[^\}]+\})\s*(?=,|$)} $in]
    {Let FN be "Billy Bob, jr."} {Let } FN {"Billy Bob, jr."} {,MI be {Bob}} , MI {{Bob}} {,LN be {Billy Bob, Jr}} , LN {{Billy Bob, Jr}}
    % dict get $kv FN
    "Billy Bob, jr."
    % dict get $kv MI
    {Bob}
    % dict get $kv LN
    {Billy Bob, Jr}
    Those problems are better handled in multiple stages.
    I disagree with the general statement (e.g. most parsers work be traversing the input only once), but I agree that regexp need not be the best solution.
    Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.
    A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.
    Answers:
    There is no closing quote. A string of text may be enclosed in double quotes or braces.
    One can ignore the "Let" portion of the string. The intent is to be able to evaluate each ?varname be ?textstring, where text string is delimited by braces or double quotes.
    The statement does not span lines and repeated white space does not matter. Initial thoughts were to replace each comma with a 'non printable' character (e.g. x03) and then split the string..
    Hey, lesson learned - A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.
    The regexp you supplies is what was needed.

    lesson*s* ;-) You gave up too soon, and you had asked for Y while trying to solve X[1].

    What is c.l.t. ? how can I access?

    It is an abbreviation for the usenet[2] group comp.lang.tcl. We are currently exchanging messages there.

    [1] https://en.wikipedia.org/wiki/XY_problem
    [2] https://en.wikipedia.org/wiki/Usenet

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luc@21:1/5 to The Rickster on Fri Nov 11 08:44:29 2022
    On Thu, 10 Nov 2022 21:58:29 -0800 (PST), The Rickster wrote:

    What is c.l.t. ? how can I access? Rick

    You're in it. :-)

    Whatever you're doing, you're doing it right.


    On Thu, 10 Nov 2022 23:14:00 -0800 (PST), heinrichmartin wrote:

    What is c.l.t. ? how can I access?

    It is an abbreviation for the usenet[2] group comp.lang.tcl. We are
    currently exchanging messages there.


    Your use of "there" may confuse him. It's not "there." It's here.


    --
    Luc


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)