Forum: >>> Magnum BBS <<<

Who first invented dotted-item notation?

From Christopher F Clark@21:1/5 to All on Sun May 22 12:32:29 2022

I know the notation from LR(0) machine construction, but also know
that Gluskhov used it in his solution to NFA construction. Earley
also used the notation to describe his method if I understand right.
I'm presuming that there is some first use of the notation. Do we
know who invented it?

Thanks in advance,
Chris

-- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Christopher F Clark on Sun May 22 18:29:01 2022

On 2022-05-22, Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:

I know the notation from LR(0) machine construction, but also know
that Gluskhov used it in his solution to NFA construction. Earley
also used the notation to describe his method if I understand right.
I'm presuming that there is some first use of the notation. Do we
know who invented it?

In Lisp, we can write (1 2 3 4 5 6) in any of these ways:

(1 . (2 3 4 5 6))
(1 2 . (3 4.5 6))
(1 2 3 . (4 5 6))
(1 2 3 4 . (5 6))
(1 2 3 4 5 . (6))
(1 2 3 4 5 6 . NIL)

As well as:

(1 . (2 . (3 4 5 6)))

The dot doesn't actually exist; it isn't represented anywhere in the
data structure, but in a hand-written expression it could be used to
convey (purely as a comment) some semanticaly interesting division in
the list.

E.g. suppose we have some data structure which holds a list of symbols,
and an integer indicating a position among those symbols.

A custom printing routine for that structure could take advantage
of this, rendering it as, say:

#S(sym-structure
:dot-position 2
:syms (a b . (c d)))

where the custom printer for sym-structure looks at dot-position
and renders the syms slot accordingly, to provide a visual aid.

The following is not possible in any mainstream Lisp I know of: (. (a b c))
as a way of denoting (a b c). The custom printer could omit the
dot notation.

(I know of a dialect in which it that leading dot notation is allowed,
with that semantics, but starting in a git commit made in 2014, making
it irrelevant here.)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Mon May 23 13:23:41 2022

Let me explain a little further.

Dotted-items are useful in translating (and visualizing) state
machines as translations of regular expressions or grammar rules.

In Glushkov's construction one takes a regular expression /ab*(cb)+a/
and makes copies of it with a dot before each "symbol" (character you
can recognize) and one at the end. Thus, 6 dotted-items:
/.ab*(cb)+a/
/a.b*(cb)+a/
/ab*(.cb)+a/
/ab*(c.b)+a/
/ab*(cb)+.a/
/ab*(cb)+a./

Each state in a Glushkov NFA is a subset of the powerset of these
items: (see below)

state 0:
/.ab*(cb)+a/

state 1:
/a.b*(cb)+a/
/ab*(.cb)+a/

state 2:
/ab*(c.b)+a/

state 3:
/ab*(.cb)+a/
/ab*(cb)+.a/

state 4:
/ab*(cb)+a./

The algorithm explains how to calculate what the states and their
transitions are.

Yielding something like this:

state 0:
/.ab*(cb)+a/ -> state 1

state 1:
/a.b*(cb)+a/ -> state 1
/ab*(.cb)+a/ -> state 2

state 2:
/ab*(c.b)+a/ -> state 3

state 3:
/ab*(.cb)+a/ -> state 2
/ab*(cb)+.a/ -> state 4

state 4:
/ab*(cb)+a./ -> accept

Below: Actually, in a Gluskov NFA the 6 dotted-items might be the
states, and those subsets might be how you change it to a DFA (the
subset construction algorithm). In my mind, those things have gotten
merged, but that might be an error on my part.

Anyway, when building the LR(0) machine, you build similar-dotted
items (and subsets) to define the states of the LR(0) DFA. And, the
Earley construction adds some context information to the doffed-items.

It's how I think about state machines. It's how they are documented
in Yacc++ with the debug option. Yacc (and Bison) does something
similar using _ instead of dot if I recall correctly.

But, someone must have introduced the idea first. My question is who?
Did Backus use dotted-items? Or did Knuth or Glushkov invent them?
Or maybe Naur. Or someone else, maybe one unfamiliar to me? Maybe it
goes all the way back to Kleene. Is the question lost to time?

That's my question....

Again, thanks,
Chris

-- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [BNF was descriptive, don't think Backus ever did anything with formal parsers. I looked at Knuth's 1965 LR parsing paper which has a lot of notation but no dots. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to Christopher F Clark on Mon May 23 18:31:18 2022

On Sunday, May 22, 2022 at 10:15:07 AM UTC-7, Christopher F Clark wrote:

I know the notation from LR(0) machine construction, but also know
that Gluskhov used it in his solution to NFA construction. Earley
also used the notation to describe his method if I understand right.
I'm presuming that there is some first use of the notation. Do we
know who invented it?

This could be asked in general, for the origin of different notations
used in programming languages and their documentation.

The first use of dot that I know of, similar to an operator, is for structure reference qualifiers in PL/I, and later adopted by C and others from there.

I always thought that PL/I adopted structures from COBOL, but never
looked up to see how COBOL did it. It seems that COBOL uses AS
for structure references.

As well as I know it, it was Fortran that first introduced variable names
with more than one character, though mathematics still hasn't done that. (Fortran also uses dot for operators like .AND. and .LT., as in the early
days the character set was restricted.)

In algebra, it is traditional that two variables next to each other are multiplied, convenient for readers. There needs to be some separator
to indicate when one name ends and the next begins. In some languages,
that can just be space or some non-operator.
[I believe Chris is asking about the specific use of a dot to show the
position in a partially parsed BNF rule.
PL/I copied its structure declarations close to verbatim from Cobol, with level numbers to show the nesting. It invented . and -> for structures, while
Cobol used "OF" to refer to elements and doesn't have pointers. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to Our wise moderator on Tue May 24 11:27:59 2022

Our wise moderator wrote:

I believe Chris is asking about the specific use of a dot to show the position in a partially parsed BNF rule.

Yes, exactly.

It gives a kind of "operational semantics" to BNF/Regexes. "I am
here. This is what I expect to see next. Or this, or this."

The way many kids learn to write stories with "This happened. And
then this. And then this. And then this." It's considered bad style.
However, it gives a very simple semantics.

Someone must have invented the idea of specifically marking the
rule/regular expression with an indication on where you are in it. It
may be an obvious thing, but someone had to realize that it should be
made explicit and one can reason from it.

Glushkov NFAs have fewer states and epsilon transitions than Thompson
NFAs although both represent the same regular expressions.

It's something I think anyone interested in automata or parsing should
learn. I actually started writing about it in a blog/book I was
thinking of writing on the topic to explain some things like what we
did at Intel making hardware accelerators or how we did ELR parsing
(LR + regular expressions) in Yacc++. Or even how one should look at
PEG grammars or GLR or Okhotin's work. It's a basic foundation and
too many people don't seem to know it. But someone must have first
thought of expressing it that way.

Kind regards,
Chris

-- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Christopher F Clark on Sat May 28 04:51:45 2022

Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:

Our wise moderator wrote:

I believe Chris is asking about the specific use of a dot to show the position in a partially parsed BNF rule.

Yes, exactly.

It gives a kind of "operational semantics" to BNF/Regexes. "I am
here. This is what I expect to see next. Or this, or this."

The way many kids learn to write stories with "This happened. And
then this. And then this. And then this." It's considered bad style. However, it gives a very simple semantics.

Someone must have invented the idea of specifically marking the
rule/regular expression with an indication on where you are in it. It
may be an obvious thing, but someone had to realize that it should be
made explicit and one can reason from it.

This idea is in paper by Floyd "A Descriptive Language for Symbol
Manipulation" where he descibes system of rules for parsing.
Instead of dot he uses a triangle as a marker, and his rules
are different than context-free rules.

I think that to have definite answer you need to be very precise
what you ask. Marking positions is very old technique. Implicitely
it appears in math works on Tue systems and word problems.
In a sense position of head in Turing machine is doing the
same work.

Dot is used in Earley paper from 1967. Knuth 1965 paper seem to
be hidden by paywalls, so I can not see what is in it, but
reasonable guess is that Knuth used dot. Knuth ideas are
innovative enough that if you specify your question to parsing
using set of items of context-free grammars, then I believe
he is the first. OTOH in more general setting marker tokens
for "current position" were crucial for proof that systems
of substitutions (rewrites) are Turing complete and this
seem to be done in early fifties.

--
Waldek Hebisch
[Knuth didn't use dots. You can find a copy of the paper at https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.8867
Click the "cached" link at the upper right. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Centurion
  Sun Jun 8 00:08:04 2025
  from Berea, Ohio via Telnet
- Centurion
  Sat Jun 7 21:52:22 2025
  from Berea, Ohio via Telnet
- Tnmoc
  Sat Jun 7 13:44:20 2025
  from Milton Keynes via Telnet
- Tnmoc
  Sat Jun 7 13:40:01 2025
  from Milton Keynes via Telnet
- Plume
  Sat Jun 7 11:13:29 2025
  from Uk via SSH
- Gwylbert
  Sat Jun 7 08:57:45 2025
  from Sydney, Nsw via Telnet
- Centurion
  Sat Jun 7 04:30:40 2025
  from Berea, Ohio via Telnet
- Plume
  Fri Jun 6 22:32:36 2025
  from Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	486
Nodes:	16 (2 / 14)
Uptime:	147:05:10
Calls:	9,659
Calls today:	1
Files:	13,708
Messages:	6,167,983

Who first invented dotted-item notation?

Who's Online

Recent Visitors

System Info