• New parser rewrite appears to be working nicely

    From luser droog@21:1/5 to All on Mon May 30 10:43:21 2022
    I just spent the evening and morning writing a parser for BNF syntax.
    Pretty meager stuff, I know. But the debugging effort went rather
    smoothly thanks to the introduction of error message support.
    Even the bare-bones error messages told me which production was
    failing.

    (pc12.ps)run
    /to-string{ dup type /stringtype ne {
    dup length string exch 0 exch {3 copy putinterval length add} forall pop } if }
    def

    /space ( \n) anyof def
    /space? //space many def
    /alpha (a)(z) range (A)(Z) range alt def
    /identifier (-) char //alpha alt some
    {to-string} using
    def

    /terminal (") char (") noneof many
    xthen (") char thenx def
    /non-terminal (<) char //identifier
    xthen (>) char thenx def
    /Symbol //non-terminal def
    /Expression //terminal //non-terminal alt //space? thenx some def
    /BNF //space? //Symbol xthen //space? thenx
    (::=) str then //space? thenx
    //Expression then //space? thenx
    def

    /BNF-parse {
    0 0 3 2 roll string-input BNF
    dup first /OK eq { second first } if
    } def

    (
    <postal-address> ::= <name-part> <street-address> <zip-part>
    ) BNF-parse
    pq


    [Terminal transcript. 'Norah' is my laptop's name.]

    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    Error: /undefined in alpha
    Operand stack:
    identifier
    Execution stack:
    %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1
    3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push
    Dictionary stack:
    --dict:737/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)-- --dict:37/58(L)-- --dict:2/2(L)-- --dict:62/120(L)--
    Current allocation mode is local
    GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    Error: /undefined in BNF
    Operand stack:
    BNF --nostringval-- --nostringval--
    Execution stack:
    %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1
    3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval--
    Dictionary stack:
    --dict:737/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)-- --dict:37/58(L)-- --dict:2/2(L)-- --dict:69/120(L)--
    Current allocation mode is local
    Current file position is 611
    GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    Fail
    [[{(<) eq} (not satisfied)] [[(\n) [0 0]] {0 0 ( <postal-address> ::= <name-part> <street-address> <zip-part>\n) string-input}]]
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    Fail
    [[( after ) (\n)] [[{(<) eq} (not satisfied)] [[( ) [0 0]] {0 1 ( <postal-address> ::= <name-part> <street-address> <zip-part>\n) string-input}]]]
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    Fail
    [[( after ) [(<) (p)]] [[{(>) eq} (not satisfied)] [[(o) [0 4]] {0 5 (stal-address> ::= <name-part> <street-address> <zip-part>\n) string-input}]]]
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    Fail
    [[( after ) [(<) (p) (o) (s) (t) (a) (l) (-) (a) (d) (d) (r) (e) (s) (s) (>) (:) (:) (=)]] [[{(-) eq} (not satisfied)] [[(<) [0 23]] {0 24 (name-part> <street-address> <zip-part>\n) string-input}]]]
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    Fail
    [[( after ) [(<) (p) (o) (s) (t) (a) (l) (-) (a) (d) (d) (r) (e) (s) (s) (>) (:) (:) (=)]] [[{dup (A) ge exch (Z) le and} (not satisfied)] [[(<) [0 23]] {0 24 (name-part> <street-address> <zip-part>\n) string-input}]]]
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    p
    (o)
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    stack:
    [(p) (o) (s) (t) (a) (l) (-) (a) (d) (d) (r) (e) (s) (s) (:) (:) (=) (n) (a) (m) (e) (-) (p) (a) (r) (t)]
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    stack:
    [(p) (o) (s) (t) (a) (l) (-) (a) (d) (d) (r) (e) (s) (s) (:) (:) (=) (n) (a) (m) (e) (-) (p) (a) (r) (t) (s) (t) (r) (e) (e) (t) (-) (a) (d) (d) (r) (e) (s) (s) (z) (i) (p) (-) (p) (a) (r) (t)]
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    Error: /undefined in to-string
    Operand stack:
    --nostringval-- (<) --nostringval-- --nostringval--
    Execution stack:
    %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1
    3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- --nostringval-- --
    nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval--
    Dictionary stack:
    --dict:737/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)-- --dict:37/58(L)-- --dict:2/2(L)-- --dict:70/120(L)--
    Current allocation mode is local
    Current file position is 779
    GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
    0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $ gsq pc12bnf.ps
    stack:
    [(postal-address) (:) (:) (=) (name-part) (street-address) (zip-part)] 0;~/pcomb/ps
    Norah@laptop ~/pcomb/ps
    $

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to luser droog on Fri Jun 3 19:11:55 2022
    On Monday, May 30, 2022 at 12:43:22 PM UTC-5, luser droog wrote:
    I just spent the evening and morning writing a parser for BNF syntax.
    Pretty meager stuff, I know. But the debugging effort went rather
    smoothly thanks to the introduction of error message support.
    Even the bare-bones error messages told me which production was
    failing.


    Ok. It's written now. Now I'm not sure if it's "good" or not. It all works.
    But down towards the bottom of the file, there's just so very, very much
    hidden business going on ... I'm just not so sure anymore. Maybe there's
    such a thing as "too much abstraction".

    Some commentary.

    First it loads the parser combinator library. Then it uses the combinators
    to build a parser for EBNF notation. The EBNF-parse function receives
    a string-input or file-input structure and produces a dictionary of parsers.

    Each of these parsers has been built as an indirect function, ie. wrapped
    in { ... exec }. This was done so that they could be composed for the
    right hand side expressions and then filled in later, to link each one to
    its symbol. Either this or some kind of dependency analysis or topographical sort was necessary to deal with all the potential mutual recursion and just general interdependency in an ENBF grammar. As a bonus, they can be
    extracted and wrapped individually with handlers, which is done at the bottom.

    So, the top half of the file describes a grammar which yields a parser for
    EBNF notation. A little further down the EBNF parser is applied to a
    grammar for a USA postal address, which yields parsers for each of the
    symbols in the grammar, notably postal-address itself, the start symbol.
    And then at the very bottom, the postal-address parser is applied to
    a string.

    It's working, but somehow I lack confidence in it. It's just so weird.


    (pc12.ps)run
    /to-string{ dup type /stringtype ne {
    dup 0 exch {length add} forall string exch
    0 exch {3 copy putinterval length add} forall pop } if }
    def

    /choice-symbol (|) char def
    /defining-symbol (=) str def
    /terminating-symbol (;) char def

    /space ( \n) anyof def
    /space? //space many def
    /digit (0)(9) range def
    /alpha (a)(z) range (A)(Z) range alt def
    /name (-_) anyof //alpha alt some def
    /identifier //name
    {to-string cvn} using
    def

    /terminal (") char (") noneof many xthen (") char thenx
    (') char (') noneof many xthen (') char thenx alt
    {to-string {str} curry} using
    def
    /non-terminal //identifier
    {{load executeonly} curry} using
    def
    /Expression {-777 exec} def
    /Symbol //identifier def
    /Factor //terminal
    //non-terminal alt
    ([) char //space? thenx //Expression xthen //space? thenx (]) char thenx
    {{maybe} compose} using alt
    ({) char //space? thenx //Expression xthen //space? thenx (}) char thenx
    {{many} compose} using alt
    def
    /Term //Factor //space? thenx some
    {dup xcheck not { {compose {then} compose} reduce } if} using
    def
    //Expression 0 //Term
    //choice-symbol //space? thenx
    //Term xthen many then
    {dup xcheck not { {compose {alt} compose} reduce } if} using
    put
    /BNF-definition //space? //Symbol xthen //space? thenx
    //defining-symbol thenx //space? thenx
    //Expression then //space? thenx
    //terminating-symbol thenx //space? thenx
    def
    /EBNF //BNF-definition some def

    /EBNF-parse {
    0 0 3 2 roll string-input EBNF
    dup first /OK eq { second first } if %%fixme what if it's not OK?

    1 dict begin
    aload length 2 idiv {def} repeat
    currentdict end
    dup ===

    1 dict begin
    dup
    {pop {-777 exec} ll def} forall
    {exch load exch 0 exch exec put} forall
    currentdict end
    } def

    /EOL (\n) char def
    /upper (A)(Z) range def
    /_ ( ) char maybe def

    (
    postal-address = name-part street-address zip-part ;
    name-part = personal-part _ last-name _ opt-suffix-part EOL
    | personal-part _ name-part ;
    personal-part = initial "." | first-name ;
    street-address = house-num _ street-name opt-apt-num EOL ;
    zip-part = town-name "," _ state-code _ zip-code EOL ;
    opt-suffix-part = "Sr." | "Jr." | roman-numeral | "" ;
    opt-apt-num = [ apt-num ] ;
    apt-num = digit { digit } ;
    town-name = name ;
    state-code = upper upper ;
    zip-code = digit digit digit digit digit ;
    initial = "Mr" | "Mrs" | "Ms" | "M" ;
    roman-numeral = "I" [ "V" ] { "I" } ;
    first-name = name ;
    last-name = name ;
    house-num = digit { digit } ;
    street-name = name ;
    ) EBNF-parse

    ps
    dup length =
    dup {pop =} forall
    dup /postal-address get ==

    begin
    <<
    /street-name {to-string one one}
    /first-name {to-string}
    /last-name {to-string}
    >>
    {{using}curry exch load exch 0 exch update}forall

    0 0
    (Mr. luser droog I\n2357 Streetname\nAnytown, ST 00000\n)
    string-input postal-address
    ps

    quit


    $ gsq pc12ebnf.ps
    << /apt-num {/digit load executeonly /digit load executeonly many then} /house-num {/digit load executeonly /digit load executeonly many then} /street-address {/house-num load executeonly /_ load executeonly then /street-name load executeonly then /opt-
    apt-num load executeonly then /EOL load executeonly then} /state-code {/upper load executeonly /upper load executeonly then} /personal-part {/initial load executeonly (.) str then /first-name load executeonly alt} /postal-address {/name-part load
    executeonly /street-address load executeonly then /zip-part load executeonly then} /street-name {/name load executeonly} /opt-suffix-part {(Sr.) str (Jr.) str alt /roman-numeral load executeonly alt () str alt} /zip-code {/digit load executeonly /digit
    load executeonly then /digit load executeonly then /digit load executeonly then /digit load executeonly then} /opt-apt-num {/apt-num load executeonly maybe} /initial {(Mr) str (Mrs) str alt (Ms) str alt (M) str alt} /zip-part {/town-name load executeonly
    (,) str then /_ load executeonly then /state-code load executeonly then /_ load executeonly then /zip-code load executeonly then /EOL load executeonly then} /roman-numeral {(I) str (V) str maybe then (I) str many then} /first-name {/name load executeonly}
    /last-name {/name load executeonly} /name-part {/personal-part load executeonly /_ load executeonly then /last-name load executeonly then /_ load executeonly then /opt-suffix-part load executeonly then /EOL load executeonly then /personal-part load
    executeonly /_ load executeonly then /name-part load executeonly then alt} /town-name {/name load executeonly} >>
    stack:
    -dict-
    17
    apt-num
    house-num
    street-address
    state-code
    personal-part
    postal-address
    street-name
    opt-suffix-part
    zip-code
    opt-apt-num
    initial
    zip-part
    roman-numeral
    first-name
    last-name
    name-part
    town-name
    {{{-array- exec +is-ok {second forcexs x-xs -array- exec +is-ok {second x-xs 3 1 roll {append} exec success} {x-xs 3 2 roll ( after ) exch cons exch cons cons} ifelse} if} exec +is-ok {second forcexs x-xs -array- exec +is-ok {second x-xs 3 1 roll {append}
    exec success} {x-xs 3 2 roll ( after ) exch cons exch cons cons} ifelse} if} exec}
    stack:
    [/OK [[(M) (r) (.) ( ) (luser) ( ) (droog) ( ) (I) (\n) (2) (3) (5) (7) ( ) [(Streetname)] (\n) (A) (n) (y) (t) (o) (w) (n) (,) ( ) (S) (T) ( ) (0) (0) (0) (0) (0) (\n)] {3 0 () string-input}]]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)