• Who first invented dotted-item notation?

    From Christopher F Clark@21:1/5 to All on Sun May 22 12:32:29 2022
    I know the notation from LR(0) machine construction, but also know
    that Gluskhov used it in his solution to NFA construction. Earley
    also used the notation to describe his method if I understand right.
    I'm presuming that there is some first use of the notation. Do we
    know who invented it?

    Thanks in advance,
    Chris

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Christopher F Clark on Sun May 22 18:29:01 2022
    On 2022-05-22, Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
    I know the notation from LR(0) machine construction, but also know
    that Gluskhov used it in his solution to NFA construction. Earley
    also used the notation to describe his method if I understand right.
    I'm presuming that there is some first use of the notation. Do we
    know who invented it?

    In Lisp, we can write (1 2 3 4 5 6) in any of these ways:

    (1 . (2 3 4 5 6))
    (1 2 . (3 4.5 6))
    (1 2 3 . (4 5 6))
    (1 2 3 4 . (5 6))
    (1 2 3 4 5 . (6))
    (1 2 3 4 5 6 . NIL)

    As well as:

    (1 . (2 . (3 4 5 6)))

    The dot doesn't actually exist; it isn't represented anywhere in the
    data structure, but in a hand-written expression it could be used to
    convey (purely as a comment) some semanticaly interesting division in
    the list.

    E.g. suppose we have some data structure which holds a list of symbols,
    and an integer indicating a position among those symbols.

    A custom printing routine for that structure could take advantage
    of this, rendering it as, say:

    #S(sym-structure
    :dot-position 2
    :syms (a b . (c d)))

    where the custom printer for sym-structure looks at dot-position
    and renders the syms slot accordingly, to provide a visual aid.

    The following is not possible in any mainstream Lisp I know of: (. (a b c))
    as a way of denoting (a b c). The custom printer could omit the
    dot notation.

    (I know of a dialect in which it that leading dot notation is allowed,
    with that semantics, but starting in a git commit made in 2014, making
    it irrelevant here.)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Mon May 23 13:23:41 2022
    Let me explain a little further.

    Dotted-items are useful in translating (and visualizing) state
    machines as translations of regular expressions or grammar rules.

    In Glushkov's construction one takes a regular expression /ab*(cb)+a/
    and makes copies of it with a dot before each "symbol" (character you
    can recognize) and one at the end. Thus, 6 dotted-items:
    /.ab*(cb)+a/
    /a.b*(cb)+a/
    /ab*(.cb)+a/
    /ab*(c.b)+a/
    /ab*(cb)+.a/
    /ab*(cb)+a./

    Each state in a Glushkov NFA is a subset of the powerset of these
    items: (see below)

    state 0:
    /.ab*(cb)+a/

    state 1:
    /a.b*(cb)+a/
    /ab*(.cb)+a/

    state 2:
    /ab*(c.b)+a/

    state 3:
    /ab*(.cb)+a/
    /ab*(cb)+.a/

    state 4:
    /ab*(cb)+a./

    The algorithm explains how to calculate what the states and their
    transitions are.

    Yielding something like this:

    state 0:
    /.ab*(cb)+a/ -> state 1

    state 1:
    /a.b*(cb)+a/ -> state 1
    /ab*(.cb)+a/ -> state 2

    state 2:
    /ab*(c.b)+a/ -> state 3

    state 3:
    /ab*(.cb)+a/ -> state 2
    /ab*(cb)+.a/ -> state 4

    state 4:
    /ab*(cb)+a./ -> accept

    Below: Actually, in a Gluskov NFA the 6 dotted-items might be the
    states, and those subsets might be how you change it to a DFA (the
    subset construction algorithm). In my mind, those things have gotten
    merged, but that might be an error on my part.

    Anyway, when building the LR(0) machine, you build similar-dotted
    items (and subsets) to define the states of the LR(0) DFA. And, the
    Earley construction adds some context information to the doffed-items.

    It's how I think about state machines. It's how they are documented
    in Yacc++ with the debug option. Yacc (and Bison) does something
    similar using _ instead of dot if I recall correctly.

    But, someone must have introduced the idea first. My question is who?
    Did Backus use dotted-items? Or did Knuth or Glushkov invent them?
    Or maybe Naur. Or someone else, maybe one unfamiliar to me? Maybe it
    goes all the way back to Kleene. Is the question lost to time?

    That's my question....

    Again, thanks,
    Chris

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [BNF was descriptive, don't think Backus ever did anything with formal parsers. I looked at Knuth's 1965 LR parsing paper which has a lot of notation but no dots. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Christopher F Clark on Mon May 23 18:31:18 2022
    On Sunday, May 22, 2022 at 10:15:07 AM UTC-7, Christopher F Clark wrote:
    I know the notation from LR(0) machine construction, but also know
    that Gluskhov used it in his solution to NFA construction. Earley
    also used the notation to describe his method if I understand right.
    I'm presuming that there is some first use of the notation. Do we
    know who invented it?

    This could be asked in general, for the origin of different notations
    used in programming languages and their documentation.

    The first use of dot that I know of, similar to an operator, is for structure reference qualifiers in PL/I, and later adopted by C and others from there.

    I always thought that PL/I adopted structures from COBOL, but never
    looked up to see how COBOL did it. It seems that COBOL uses AS
    for structure references.

    As well as I know it, it was Fortran that first introduced variable names
    with more than one character, though mathematics still hasn't done that. (Fortran also uses dot for operators like .AND. and .LT., as in the early
    days the character set was restricted.)

    In algebra, it is traditional that two variables next to each other are multiplied, convenient for readers. There needs to be some separator
    to indicate when one name ends and the next begins. In some languages,
    that can just be space or some non-operator.
    [I believe Chris is asking about the specific use of a dot to show the
    position in a partially parsed BNF rule.
    PL/I copied its structure declarations close to verbatim from Cobol, with level numbers to show the nesting. It invented . and -> for structures, while
    Cobol used "OF" to refer to elements and doesn't have pointers. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to Our wise moderator on Tue May 24 11:27:59 2022
    Our wise moderator wrote:
    I believe Chris is asking about the specific use of a dot to show the position in a partially parsed BNF rule.

    Yes, exactly.

    It gives a kind of "operational semantics" to BNF/Regexes. "I am
    here. This is what I expect to see next. Or this, or this."

    The way many kids learn to write stories with "This happened. And
    then this. And then this. And then this." It's considered bad style.
    However, it gives a very simple semantics.

    Someone must have invented the idea of specifically marking the
    rule/regular expression with an indication on where you are in it. It
    may be an obvious thing, but someone had to realize that it should be
    made explicit and one can reason from it.

    Glushkov NFAs have fewer states and epsilon transitions than Thompson
    NFAs although both represent the same regular expressions.

    It's something I think anyone interested in automata or parsing should
    learn. I actually started writing about it in a blog/book I was
    thinking of writing on the topic to explain some things like what we
    did at Intel making hardware accelerators or how we did ELR parsing
    (LR + regular expressions) in Yacc++. Or even how one should look at
    PEG grammars or GLR or Okhotin's work. It's a basic foundation and
    too many people don't seem to know it. But someone must have first
    thought of expressing it that way.

    Kind regards,
    Chris

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Christopher F Clark on Sat May 28 04:51:45 2022
    Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
    Our wise moderator wrote:
    I believe Chris is asking about the specific use of a dot to show the position in a partially parsed BNF rule.

    Yes, exactly.

    It gives a kind of "operational semantics" to BNF/Regexes. "I am
    here. This is what I expect to see next. Or this, or this."

    The way many kids learn to write stories with "This happened. And
    then this. And then this. And then this." It's considered bad style. However, it gives a very simple semantics.

    Someone must have invented the idea of specifically marking the
    rule/regular expression with an indication on where you are in it. It
    may be an obvious thing, but someone had to realize that it should be
    made explicit and one can reason from it.

    This idea is in paper by Floyd "A Descriptive Language for Symbol
    Manipulation" where he descibes system of rules for parsing.
    Instead of dot he uses a triangle as a marker, and his rules
    are different than context-free rules.

    I think that to have definite answer you need to be very precise
    what you ask. Marking positions is very old technique. Implicitely
    it appears in math works on Tue systems and word problems.
    In a sense position of head in Turing machine is doing the
    same work.

    Dot is used in Earley paper from 1967. Knuth 1965 paper seem to
    be hidden by paywalls, so I can not see what is in it, but
    reasonable guess is that Knuth used dot. Knuth ideas are
    innovative enough that if you specify your question to parsing
    using set of items of context-free grammars, then I believe
    he is the first. OTOH in more general setting marker tokens
    for "current position" were crucial for proof that systems
    of substitutions (rewrites) are Turing complete and this
    seem to be done in early fifties.

    --
    Waldek Hebisch
    [Knuth didn't use dots. You can find a copy of the paper at https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.8867
    Click the "cached" link at the upper right. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)