Forum: >>> Magnum BBS <<<

regexp to locate instances of a specific character when not enclosed in

From The Rickster@21:1/5 to All on Fri Nov 4 10:56:05 2022

given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}

I am able to use regexp negative look ahead to locate all comma chars in positions not enclosed in braces.
,(?![^\{]*\})

However, an regular expression that identify locations of commas external to double quotes and braces would be appreciated.

Thanks

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Luc@21:1/5 to The Rickster on Fri Nov 4 15:29:48 2022

On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy
Bob, Jr}

I am able to use regexp negative look ahead to locate all comma chars in positions not enclosed in braces. ,(?![^\{]*\})

However, an regular expression that identify locations of commas external
to double quotes and braces would be appreciated.

Thanks

Those problems are better handled in multiple stages.

For example, begin by removing everything in quotes or brackets:

regsub -all {"[^"]*"|{[^{}]*}} $string "" variable

Then all remaining commas are what you're looking for. Case closed.

But if you still need to know their exact positions within the original complete string, then I think you'll have to write your own parser.
But regexp still can help you a lot:

% set FN {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
% regexp -all -inline -indices {"} $FN
{10 10} {25 25}

The first double quote is in position 10, the second in position 25.
There are no others.

Let's confirm:

% string first \" $FN 0
10
% string first \" $FN 11
25

OK. Let's continue:

% regexp -all -inline -indices {,} $FN
{20 20} {26 26} {38 38} {55 55}

Four commas: 20, 26, 38, 55.

The first comma is in that 10 to 25 range of the double quotes
so it's out. All others are valid.

The same trick with brackets is too hard because they are a pair, i.e.
not one same character. So I recommend using string first for that:

% string first \{ $FN 0
33

% string first \} $FN 33
37

So all commas within the 33 to 37 range are invalid.

But there are more bracket pairs in the string. You have to find
them all.

You probably can pick up from there.

--
Luc

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From The Rickster@21:1/5 to Luc on Fri Nov 4 21:13:23 2022

On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:

On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy
Bob, Jr}

I am able to use regexp negative look ahead to locate all comma chars in positions not enclosed in braces. ,(?![^\{]*\})

However, an regular expression that identify locations of commas external to double quotes and braces would be appreciated.

Thanks

Those problems are better handled in multiple stages.

For example, begin by removing everything in quotes or brackets:

regsub -all {"[^"]*"|{[^{}]*}} $string "" variable

Then all remaining commas are what you're looking for. Case closed.

But if you still need to know their exact positions within the original complete string, then I think you'll have to write your own parser.
But regexp still can help you a lot:

% set FN {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
% regexp -all -inline -indices {"} $FN
{10 10} {25 25}

The first double quote is in position 10, the second in position 25.
There are no others.

Let's confirm:

% string first \" $FN 0
10
% string first \" $FN 11
25

OK. Let's continue:

% regexp -all -inline -indices {,} $FN
{20 20} {26 26} {38 38} {55 55}

Four commas: 20, 26, 38, 55.

The first comma is in that 10 to 25 range of the double quotes
so it's out. All others are valid.

The same trick with brackets is too hard because they are a pair, i.e.
not one same character. So I recommend using string first for that:

% string first \{ $FN 0
33

% string first \} $FN 33
37

So all commas within the 33 to 37 range are invalid.

But there are more bracket pairs in the string. You have to find
them all.

You probably can pick up from there.

--
Luc

Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From heinrichmartin@21:1/5 to The Rickster on Sat Nov 5 06:25:53 2022

On Saturday, November 5, 2022 at 5:13:26 AM UTC+1, The Rickster wrote:

On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:

On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}

I am able to use regexp negative look ahead to locate all comma chars in positions not enclosed in braces. ,(?![^\{]*\})

However, an regular expression that identify locations of commas external to double quotes and braces would be appreciated.

A few notes about precise problem statements:
Where does the string end? As there is not closing single quote, is the single quote part of the string?
We currently experience two threads on c.l.t. that demonstrate the XY-problem - everyone can learn from that, too.

Are you looking for the comma or are you looking for the definitions "* be *"? What is the grammar? May quotation marks or braces be nested or escaped?
Does repeated whitespace matter? Which whitespace? May the statement span lines?

% set in {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}
% set kv [regexp -all -inline {(^Let\s|,)\s*(\w+)\s+be\s+([^,]+|"[^"]+"|\{[^\}]+\})\s*(?=,|$)} $in]
{Let FN be "Billy Bob, jr."} {Let } FN {"Billy Bob, jr."} {,MI be {Bob}} , MI {{Bob}} {,LN be {Billy Bob, Jr}} , LN {{Billy Bob, Jr}}
% dict get $kv FN
"Billy Bob, jr."
% dict get $kv MI
{Bob}
% dict get $kv LN
{Billy Bob, Jr}

Those problems are better handled in multiple stages.

I disagree with the general statement (e.g. most parsers work be traversing the input only once), but I agree that regexp need not be the best solution.

Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.

A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From The Rickster@21:1/5 to The Rickster on Thu Nov 10 21:58:29 2022

On Thursday, November 10, 2022 at 9:50:37 PM UTC-8, The Rickster wrote:

On Saturday, November 5, 2022 at 6:25:56 AM UTC-7, heinrichmartin wrote:

On Saturday, November 5, 2022 at 5:13:26 AM UTC+1, The Rickster wrote:

On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:

On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}

I am able to use regexp negative look ahead to locate all comma chars in
positions not enclosed in braces. ,(?![^\{]*\})

However, an regular expression that identify locations of commas external
to double quotes and braces would be appreciated.

A few notes about precise problem statements:
Where does the string end? As there is not closing single quote, is the single quote part of the string?
We currently experience two threads on c.l.t. that demonstrate the XY-problem - everyone can learn from that, too.

Are you looking for the comma or are you looking for the definitions "* be *"?
What is the grammar? May quotation marks or braces be nested or escaped? Does repeated whitespace matter? Which whitespace? May the statement span lines?

% set in {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}
% set kv [regexp -all -inline {(^Let\s|,)\s*(\w+)\s+be\s+([^,]+|"[^"]+"|\{[^\}]+\})\s*(?=,|$)} $in]
{Let FN be "Billy Bob, jr."} {Let } FN {"Billy Bob, jr."} {,MI be {Bob}} , MI {{Bob}} {,LN be {Billy Bob, Jr}} , LN {{Billy Bob, Jr}}
% dict get $kv FN
"Billy Bob, jr."
% dict get $kv MI
{Bob}
% dict get $kv LN
{Billy Bob, Jr}

Those problems are better handled in multiple stages.

I disagree with the general statement (e.g. most parsers work be traversing the input only once), but I agree that regexp need not be the best solution.

Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.

A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.

Answers:
There is no closing quote. A string of text may be enclosed in double quotes or braces.
One can ignore the "Let" portion of the string. The intent is to be able to evaluate each ?varname be ?textstring, where text string is delimited by braces or double quotes.
The statement does not span lines and repeated white space does not matter. Initial thoughts were to replace each comma with a 'non printable' character (e.g. x03) and then split the string..

Hey, lesson learned - A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.
The regexp you supplies is what was needed. What is c.l.t. ? how can I access? Rick

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From The Rickster@21:1/5 to heinrichmartin on Thu Nov 10 21:50:34 2022

On Saturday, November 5, 2022 at 6:25:56 AM UTC-7, heinrichmartin wrote:

On Saturday, November 5, 2022 at 5:13:26 AM UTC+1, The Rickster wrote:

On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:

On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}

I am able to use regexp negative look ahead to locate all comma chars in
positions not enclosed in braces. ,(?![^\{]*\})

However, an regular expression that identify locations of commas external
to double quotes and braces would be appreciated.

A few notes about precise problem statements:
Where does the string end? As there is not closing single quote, is the single quote part of the string?
We currently experience two threads on c.l.t. that demonstrate the XY-problem - everyone can learn from that, too.

Are you looking for the comma or are you looking for the definitions "* be *"?
What is the grammar? May quotation marks or braces be nested or escaped?
Does repeated whitespace matter? Which whitespace? May the statement span lines?

% set in {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}}
Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}
% set kv [regexp -all -inline {(^Let\s|,)\s*(\w+)\s+be\s+([^,]+|"[^"]+"|\{[^\}]+\})\s*(?=,|$)} $in]
{Let FN be "Billy Bob, jr."} {Let } FN {"Billy Bob, jr."} {,MI be {Bob}} , MI {{Bob}} {,LN be {Billy Bob, Jr}} , LN {{Billy Bob, Jr}}
% dict get $kv FN
"Billy Bob, jr."
% dict get $kv MI
{Bob}
% dict get $kv LN
{Billy Bob, Jr}

Those problems are better handled in multiple stages.

I disagree with the general statement (e.g. most parsers work be traversing the input only once), but I agree that regexp need not be the best solution.

Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.

A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.

Answers:
There is no closing quote. A string of text may be enclosed in double quotes or braces.
One can ignore the "Let" portion of the string. The intent is to be able to evaluate each ?varname be ?textstring, where text string is delimited by braces or double quotes.
The statement does not span lines and repeated white space does not matter. Initial thoughts were to replace each comma with a 'non printable' character (e.g. x03) and then split the string..

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Harald Oehlmann@21:1/5 to All on Fri Nov 11 08:12:30 2022

Am 11.11.2022 um 06:58 schrieb The Rickster:

The regexp you supplies is what was needed. What is c.l.t. ? how can I access?
Rick

Rick,
happy, that it worked for you.
clt (or c.l.t) is the abreviation of the comp.lang.tcl news group.
That is, what this message ues

Abou the acess: many people are quite unhappy with the access provided
by google and a "real" news reader may be helpful.

You may consult the wiki page for information:
wiki.tcl-lang.org/clt

Enjoy,
Harald

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From heinrichmartin@21:1/5 to Rick on Thu Nov 10 23:14:00 2022

On Friday, November 11, 2022 at 6:58:32 AM UTC+1, Rick wrote:

On Thursday, November 10, 2022 at 9:50:37 PM UTC-8, The Rickster wrote:

On Saturday, November 5, 2022 at 6:25:56 AM UTC-7, heinrichmartin wrote:

On Saturday, November 5, 2022 at 5:13:26 AM UTC+1, The Rickster wrote:

On Friday, November 4, 2022 at 11:29:52 AM UTC-7, Luc wrote:

On Fri, 4 Nov 2022 10:56:05 -0700 (PDT), The Rickster wrote:

given the string 'Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy
Bob, Jr}

I am able to use regexp negative look ahead to locate all comma chars in
positions not enclosed in braces. ,(?![^\{]*\})

However, an regular expression that identify locations of commas external
to double quotes and braces would be appreciated.

A few notes about precise problem statements:
Where does the string end? As there is not closing single quote, is the single quote part of the string?
We currently experience two threads on c.l.t. that demonstrate the XY-problem - everyone can learn from that, too.

Are you looking for the comma or are you looking for the definitions "* be *"?
What is the grammar? May quotation marks or braces be nested or escaped? Does repeated whitespace matter? Which whitespace? May the statement span lines?

% set in {Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}} Let FN be "Billy Bob, jr.",MI be {Bob},LN be {Billy Bob, Jr}
% set kv [regexp -all -inline {(^Let\s|,)\s*(\w+)\s+be\s+([^,]+|"[^"]+"|\{[^\}]+\})\s*(?=,|$)} $in]
{Let FN be "Billy Bob, jr."} {Let } FN {"Billy Bob, jr."} {,MI be {Bob}} , MI {{Bob}} {,LN be {Billy Bob, Jr}} , LN {{Billy Bob, Jr}}
% dict get $kv FN
"Billy Bob, jr."
% dict get $kv MI
{Bob}
% dict get $kv LN
{Billy Bob, Jr}

Those problems are better handled in multiple stages.

I disagree with the general statement (e.g. most parsers work be traversing the input only once), but I agree that regexp need not be the best solution.

Thanks for confirming that I wasn't missing some trivial solution...;your suggestions are appreciated and will be followed.

A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.

Answers:
There is no closing quote. A string of text may be enclosed in double quotes or braces.
One can ignore the "Let" portion of the string. The intent is to be able to evaluate each ?varname be ?textstring, where text string is delimited by braces or double quotes.
The statement does not span lines and repeated white space does not matter. Initial thoughts were to replace each comma with a 'non printable' character (e.g. x03) and then split the string..

Hey, lesson learned - A single negative answer does not mean that no trivial solution exists - especially in the setting of an XY-problem.
The regexp you supplies is what was needed.

lesson*s* ;-) You gave up too soon, and you had asked for Y while trying to solve X[1].

What is c.l.t. ? how can I access?

It is an abbreviation for the usenet[2] group comp.lang.tcl. We are currently exchanging messages there.

[1] https://en.wikipedia.org/wiki/XY_problem
[2] https://en.wikipedia.org/wiki/Usenet

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Luc@21:1/5 to The Rickster on Fri Nov 11 08:44:29 2022

On Thu, 10 Nov 2022 21:58:29 -0800 (PST), The Rickster wrote:

What is c.l.t. ? how can I access? Rick

You're in it. :-)

Whatever you're doing, you're doing it right.

On Thu, 10 Nov 2022 23:14:00 -0800 (PST), heinrichmartin wrote:

What is c.l.t. ? how can I access?

It is an abbreviation for the usenet[2] group comp.lang.tcl. We are
currently exchanging messages there.

Your use of "there" may confuse him. It's not "there." It's here.

--
Luc

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Ian Rihard Kosednar
  Mon Jun 23 17:19:07 2025
  from No via SSH
- Bob Worm
  Mon Jun 23 13:40:10 2025
  from Wales, Uk via Telnet
- Plume
  Mon Jun 23 10:43:22 2025
  from Uk via Telnet
- Plume
  Mon Jun 23 10:20:22 2025
  from Uk via Telnet
- Centurion
  Mon Jun 23 09:46:15 2025
  from Berea, Ohio via Telnet
- Gwylbert
  Mon Jun 23 09:00:34 2025
  from Sydney, Nsw via Telnet
- Centurion
  Mon Jun 23 02:07:35 2025
  from Berea, Ohio via Telnet
- Bob Worm
  Sun Jun 22 21:19:20 2025
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	498
Nodes:	16 (2 / 14)
Uptime:	23:04:53
Calls:	9,828
Calls today:	7
Files:	13,761
Messages:	6,191,777

regexp to locate instances of a specific character when not enclosed in

Who's Online

Recent Visitors

System Info