• Command line globber/tokenizer library for C?

    From Ted Nolan @21:1/5 to All on Tue Sep 10 19:01:37 2024
    I have the case where my C program is handed a string which is basically
    a command line.

    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    hello -world "This is foo.*" foo.*

    becomes something like

    my_argv[0] "hello"
    my_argv[1] "-world"
    my_argv[2] "This is foo.*"
    my_argv[3] foo.h
    my_argv[4] foo.c
    my_argv[5] foo.txt

    my_argc = 6

    I could live without the globbing if that's a bridge too far.
    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to All on Tue Sep 10 23:05:32 2024
    On 10.09.2024 21:01, Ted Nolan <tednolan> wrote:
    I have the case where my C program is handed a string which is basically
    a command line.

    IIUC you don't want the shell to do the expansion but, sort of,
    re-invent the wheel in your application (a'la DOS). - Okay.


    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    I also suppose that by "tokenizing" you don't mean something like
    strtok (3) - extract tokens from strings
    but a field separation as the Unix shell does using 'IFS'.

    I don't know of a C library but if I'd want to implement a function
    that all POSIX shells do then I'd look into the shell packages...

    For Kornshell (e.g. version 93u+m) I see these files in the package
    src/lib/libast/include/glob.h
    src/lib/libast/misc/glob.c
    that obviously care about the globbing function. (I suspect you'll
    need some more supporting files from the ksh package.)

    HTH

    Janis


    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    hello -world "This is foo.*" foo.*

    becomes something like

    my_argv[0] "hello"
    my_argv[1] "-world"
    my_argv[2] "This is foo.*"
    my_argv[3] foo.h
    my_argv[4] foo.c
    my_argv[5] foo.txt

    my_argc = 6

    I could live without the globbing if that's a bridge too far.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Nolan @21:1/5 to Keith.S.Thompson+u@gmail.com on Tue Sep 10 22:13:06 2024
    In article <87ldzzyyus.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    I have the case where my C program is handed a string which is basically
    a command line.

    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    hello -world "This is foo.*" foo.*

    becomes something like

    my_argv[0] "hello"
    my_argv[1] "-world"
    my_argv[2] "This is foo.*"
    my_argv[3] foo.h
    my_argv[4] foo.c
    my_argv[5] foo.txt

    my_argc = 6

    I could live without the globbing if that's a bridge too far.

    What environment(s) does this need to run in?

    I don't know of a standard(ish) function that does this. POSIX defines
    the glob() function, but it only does globbing, not word-splitting.

    If you're trying to emulate the way the shell (which one?) parses
    command lines, and *if* you're on a system that has a shell, you can
    invoke a shell to do the work for you. Here's a quick and dirty
    example:

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    int main(void) {
    const char *line = "hello -world \"This is foo.*\" foo.*";
    char *cmd = malloc(50 + strlen(line));
    sprintf(cmd, "printf '%%s\n' %s", line);
    system(cmd);
    }

    This prints the arguments to stdout, one per line (and doesn't handle >arguments with embedded newlines very well). You could modify the
    command to write the output to a temporary file and then read that file,
    or you could use popen() if it's available.

    Of course this is portable only to systems that have a Unix-style shell,
    and it can even behave differently depending on how the default shell >behaves. And invoking a new process is going to make this relatively
    slow, which may or may not matter depending on how many times you need
    to do it.

    There is no completely portable solution, since you need to be able to
    get directory listings to handle wildcards.

    Yeah, that's the kind of thing I was hoping to avoid, and probably more
    than I want to get into, but thanks!


    A quick Google search points to this question:

    https://stackoverflow.com/q/21335041/827263
    "How to split a string using shell-like rules in C++?"

    An answer refers to Boost.Program_options, which is specific to C++. >Apparently boost::program_options::split_unix() does what you're looking
    for.

    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Nolan @21:1/5 to janis_papanagnou+ng@hotmail.com on Tue Sep 10 22:11:29 2024
    In article <vbqcat$35kjh$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 10.09.2024 21:01, Ted Nolan <tednolan> wrote:
    I have the case where my C program is handed a string which is basically
    a command line.

    IIUC you don't want the shell to do the expansion but, sort of,
    re-invent the wheel in your application (a'la DOS). - Okay.


    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    I also suppose that by "tokenizing" you don't mean something like
    strtok (3) - extract tokens from strings
    but a field separation as the Unix shell does using 'IFS'.


    More or less, and homething that understands double and single quoting
    so that a token can have white space inside. Backslash handling
    would be nice too so

    'Who\'s a good boy?'

    would work as one token.

    I don't know of a C library but if I'd want to implement a function
    that all POSIX shells do then I'd look into the shell packages...

    For Kornshell (e.g. version 93u+m) I see these files in the package
    src/lib/libast/include/glob.h
    src/lib/libast/misc/glob.c
    that obviously care about the globbing function. (I suspect you'll
    need some more supporting files from the ksh package.)

    HTH

    Janis


    Thanks, fixing up something out of shell components is probably more
    than I want to take on here though.

    Ted
    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Nolan @21:1/5 to Kenny McCormack on Wed Sep 11 02:54:32 2024
    In article <vbqtcb$1r0i3$1@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <lkbjchFebk9U1@mid.individual.net>,
    Ted Nolan <tednolan> <tednolan> wrote:
    I have the case where my C program is handed a string which is basically
    a command line.

    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    Have a look at wordexp(3).


    Very interesting, thanks!

    Something added since lasttime I paged through section 3...

    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Tue Sep 10 20:58:36 2024
    On 10 Sep 2024 19:01:37 GMT, Ted Nolan <tednolan> wrote:

    I have the case where my C program is handed a string which is basically
    a command line.

    If that’s what your OS is giving you, your OS is doing it wrong.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to All on Wed Sep 11 01:56:27 2024
    In article <lkbjchFebk9U1@mid.individual.net>,
    Ted Nolan <tednolan> <tednolan> wrote:
    I have the case where my C program is handed a string which is basically
    a command line.

    Is there a common open source C library for tokenizing and globbing
    this into an argc/argv as a shell would do? I've googled, but I get
    too much C++ & other language stuff.

    Note that I'm not asking for getopt(), that comes afterwards, and
    I'm not asking for any variable interpolation, but just that a string
    like, say

    Have a look at wordexp(3).

    --
    Trump has normalized hate.

    The media has normalized Trump.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Bonita Montero on Wed Sep 11 13:42:00 2024
    On 11/09/2024 13:17, Bonita Montero wrote:
    #include <Windows.h>
    #include <iostream>
    #include <string_view>

    using namespace std;

    template<typename CharType, typename Consumer>
        requires requires( Consumer consumer, basic_string_view<CharType>
    sv ) { { consumer( sv ) }; }
    void Tokenize( basic_string_view<CharType> sv, Consumer consumer )
    {
        using sv_t = basic_string_view<CharType>;
        auto it = sv.begin();
        for( ; it != sv.end(); )
        {
            CharType end;
            typename sv_t::iterator tkBegin;
            if( *it == '\"' )
            {
                end = '\"';
                tkBegin = ++it;
            }
            else
            {
                end = ' ';
                tkBegin = it++;
            }
            for( ; it != sv.end() && *it != end; ++it );
            consumer( sv_t( tkBegin, it ) );
            if( it != sv.end() ) [[unlikely]]
            {
                while( ++it != sv.end() && *it == ' ' );
                continue;
            }
        }
    }

    int main()
    {
        LPWSTR pCmdLine = GetCommandLineW();
        size_t i = 1;
        Tokenize( wstring_view( pCmdLine ), [&]( wstring_view sv )
            {
                wcout << i++ << L": \"" << sv << L"\"" << endl;
            } );
    }


    This doesn't do globbing (expanding non-quoted wildcard filenames into
    lists of individual filenames).

    Neither is it clear if the OP is on Windows. (Otherwise I can supply
    something in C for the globbing part. Chopping up into line into
    separate items is fairly trivial.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Nolan @21:1/5 to Bonita.Montero@gmail.com on Wed Sep 11 12:44:19 2024
    In article <vbs2da$3jobe$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 14:22 schrieb Ted Nolan <tednolan>:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?


    No, not an option, thanks.


    I could write a C-bridge for you.


    No, thank you.
    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Nolan @21:1/5 to Bonita.Montero@gmail.com on Wed Sep 11 12:22:16 2024
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?


    No, not an option, thanks.

    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to Bonita.Montero@gmail.com on Wed Sep 11 14:59:48 2024
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line
    in your header before clicking "Send"?


    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/IceCream

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Nolan @21:1/5 to Bonita.Montero@gmail.com on Wed Sep 11 18:49:19 2024
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line >> in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.


    Thanks, I appreciate that, but it does have to be C.
    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to Bonita.Montero@gmail.com on Wed Sep 11 18:17:04 2024
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line >> in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    You know the rules around here, just as well as I do.

    --
    The coronavirus is the first thing, in his 74 pathetic years of existence,
    that the orange menace has come into contact with, that he couldn't browbeat, bully, bullshit, bribe, sue, legally harrass, get Daddy to fix, get his siblings to bail him out of, or, if all else fails, simply wish it away.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Bonita Montero on Wed Sep 11 21:19:58 2024
    On 11/09/2024 19:14, Bonita Montero wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero  <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups"
    line
    in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.


    C++ is a simpler language? You're having a laugh!

    I made a version of your code that was about 50 lines, so a higher line
    count, but was some 10% smaller in character count.

    It doesn't need 'templates', or 'basic-string-view', or 'Consumer',
    whatever that is, or iterators. This is a trivial exercise as I said.

    However, if working on Windows, there may be no need: there is already a CommandLineToArgvW function.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Nolan @21:1/5 to Keith.S.Thompson+u@gmail.com on Thu Sep 12 03:06:15 2024
    In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line >>>> in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    Thanks, I appreciate that, but it does have to be C.

    We could help you more effectively if we understood your requirements.

    Why exactly does it have to be C?

    What system or systems do you need to support? (I asked this before and
    you didn't answer.)

    If you only care about Windows, for example, that's going to affect what >solutions we can offer; likewise if you only care about POSIX-based
    systems, or only about Linux-based systems.

    It might also be useful to know more about the context. If this is for
    some specific application, what is that application intended to do, and
    why does it need to do tokenization and globbing?


    This would be for work, so I am limited in what I can say about it, but
    it has to be in C because it is would be a C callout from a GT.M mumps
    process. GT.M stores the command line tail (everything it doesn't need
    to get a program running) in the special variable $ZCMDLINE which can
    be passed to a callout. I would like to parse that string as the
    shell does a command line. Basically, if it isn't a C library that
    is commonly available through Linux package managers I probably can't
    use it. In the end this is a "nice to have" and I have a q&d approach
    that I will probably use.
    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Nolan @21:1/5 to Keith.S.Thompson+u@gmail.com on Thu Sep 12 03:56:09 2024
    In article <87r09py23h.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line
    in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    Thanks, I appreciate that, but it does have to be C.

    We could help you more effectively if we understood your requirements.

    Why exactly does it have to be C?

    What system or systems do you need to support? (I asked this before and >>>you didn't answer.)

    If you only care about Windows, for example, that's going to affect what >>>solutions we can offer; likewise if you only care about POSIX-based >>>systems, or only about Linux-based systems.

    It might also be useful to know more about the context. If this is for >>>some specific application, what is that application intended to do, and >>>why does it need to do tokenization and globbing?

    This would be for work, so I am limited in what I can say about it, but
    it has to be in C because it is would be a C callout from a GT.M mumps
    process. GT.M stores the command line tail (everything it doesn't need
    to get a program running) in the special variable $ZCMDLINE which can
    be passed to a callout. I would like to parse that string as the
    shell does a command line. Basically, if it isn't a C library that
    is commonly available through Linux package managers I probably can't
    use it. In the end this is a "nice to have" and I have a q&d approach
    that I will probably use.

    Since you mentioned Linux package managers, I presume this only needs to
    work on Linux-based systems, which means you can use POSIX-specific >functions. That could have been useful to know earlier.

    And you might consider posting to comp.unix.programmer for more >system-specific solutions.

    Earlier I suggested using system() to pass the string to the shell.
    That wouldn't work on Windows, but it should be ok for your
    requirements. There are good reasons not to want to do that, but "there >might not be a POSIX shell available" apparently isn't one of them.

    I'd also suggest nailing down your exact requirements; "as the
    shell does" is inexact, since different shells behave differently.

    Suggested reading: >https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */

    Thank you. system() would not work as I don't want to execute
    anything, just parse into an argv-like array.

    I appreciate the responses, but it looks like I will be staying with
    my q&d approach for now.
    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Thu Sep 12 04:14:05 2024
    On 12 Sep 2024 03:06:15 GMT, Ted Nolan <tednolan> wrote:

    GT.M stores the command line tail (everything it doesn't need
    to get a program running) in the special variable $ZCMDLINE which can be passed to a callout.

    What, all the arguments smooshed together into a single string?

    That’s a dumb way to do it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to ted@loft.tnolan.com on Thu Sep 12 10:43:52 2024
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:

    In article <87cyl9zx14.fsf@nosuchdomain.example.com>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ted@loft.tnolan.com (Ted Nolan <tednolan>) writes:
    In article <vbsmlb$3o6n2$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 11.09.2024 um 16:59 schrieb Kenny McCormack:
    In article <vbs1om$3jkch$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Do you think it would make sense to switch the language ?

    Do you think it would make sense to pay attention to the "Newsgroups" line
    in your header before clicking "Send"?

    I just wanted to suggest a simpler language.
    Compare that with a manual implementation of the same in C.

    Thanks, I appreciate that, but it does have to be C.

    We could help you more effectively if we understood your requirements.

    Why exactly does it have to be C?

    What system or systems do you need to support? (I asked this before and >>you didn't answer.)

    If you only care about Windows, for example, that's going to affect what >>solutions we can offer; likewise if you only care about POSIX-based >>systems, or only about Linux-based systems.

    It might also be useful to know more about the context. If this is for >>some specific application, what is that application intended to do, and
    why does it need to do tokenization and globbing?


    This would be for work, so I am limited in what I can say about it, but
    it has to be in C because it is would be a C callout from a GT.M mumps process. GT.M stores the command line tail (everything it doesn't need
    to get a program running) in the special variable $ZCMDLINE which can
    be passed to a callout. I would like to parse that string as the
    shell does a command line. Basically, if it isn't a C library that
    is commonly available through Linux package managers I probably can't
    use it. In the end this is a "nice to have" and I have a q&d approach
    that I will probably use.

    If it were down to me I'd do the word splitting "by hand" and use POSIX
    glob(3) to do the file expansion.

    For the word splitting, the key would be to know where these strings
    come from and what is really needed. That would enable you to pick a
    syntax that makes sense for your particular use-case. For example, if
    the string are typed by people, I wouldn't use the typical shell
    quoting. I would not want anyone (other than technical Unix users) to
    have to type

    'He said "you can'"'""t"

    You might get away with a very simple word splitting algorithm.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Bonita Montero on Thu Sep 12 12:29:26 2024
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to bc@freeuk.com on Thu Sep 12 12:13:55 2024
    In article <vbujak$733i$3@dont-email.me>, Bart <bc@freeuk.com> wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    And, of course, totally off-topic.

    Maybe I should start posting Fortran "solutions".

    Or maybe Haskell?

    Or Intercal?

    --
    Mike Huckabee has yet to consciously uncouple from Josh Duggar.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Kenny McCormack on Thu Sep 12 14:24:37 2024
    On 12.09.2024 14:13, Kenny McCormack wrote:

    Maybe I should start posting Fortran "solutions".

    Or maybe Haskell?

    Or Intercal?

    The latter might certainly be enlightening. I had always problems
    to write such code. And seeing functional code would help. - But
    it's off-topic as you say. Less off-topic are (IMO) C++ solutions
    in contrast to C; C++ has a C base and C appears to me to advance
    "with an eye on" C++.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Bart on Thu Sep 12 14:20:01 2024
    On 12.09.2024 13:29, Bart wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    I don't know of the other poster's solutions. But a quick browse seems
    to show nothing incomprehensible or anything that should be difficult
    to understand. (YMMV; especially if you're not familiar with C++ then
    I'm sure the code may look like noise to you.)

    In the given context of C and C++ I've always perceived the features
    of C++ to add to comprehensibility of source code where the respective
    C code required writing clumsy code and needed (unnecessary) syntactic
    ballast to implement similar functions and program constructs.

    Your undifferentiated complaint sounds more like someone not willing
    to understand the other concepts or have a reluctance or laziness to
    make yourself familiar with them.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to All on Thu Sep 12 13:22:48 2024
    In article <lkf72pFd61U1@mid.individual.net>,
    Ted Nolan <tednolan> <tednolan> wrote:
    ...
    Thank you. system() would not work as I don't want to execute
    anything, just parse into an argv-like array.

    I appreciate the responses, but it looks like I will be staying with
    my q&d approach for now.

    This is a "solved problem". Or, to put it another way, if wordexp(3) is
    not the solution, then there is no general solution (and that means, yes, you'll have to "roll your own", as many here have suggested you do).

    columbiaclosings.com
    What's not in Columbia anymore..

    Which Columbia are we talking about here? And why?

    --
    Mike Huckabee has yet to consciously uncouple from Josh Duggar.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Nolan @21:1/5 to Kenny McCormack on Thu Sep 12 13:50:38 2024
    In article <vbupv8$1t2d8$2@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <lkf72pFd61U1@mid.individual.net>,
    Ted Nolan <tednolan> <tednolan> wrote:
    ...
    Thank you. system() would not work as I don't want to execute
    anything, just parse into an argv-like array.

    I appreciate the responses, but it looks like I will be staying with
    my q&d approach for now.

    This is a "solved problem". Or, to put it another way, if wordexp(3) is
    not the solution, then there is no general solution (and that means, yes, >you'll have to "roll your own", as many here have suggested you do).

    columbiaclosings.com
    What's not in Columbia anymore..

    Which Columbia are we talking about here? And why?


    SC. It keeps me busy.
    --
    columbiaclosings.com
    What's not in Columbia anymore..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Janis Papanagnou on Thu Sep 12 14:44:03 2024
    On 12/09/2024 13:20, Janis Papanagnou wrote:
    On 12.09.2024 13:29, Bart wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    I don't know of the other poster's solutions. But a quick browse seems
    to show nothing incomprehensible or anything that should be difficult
    to understand. (YMMV; especially if you're not familiar with C++ then
    I'm sure the code may look like noise to you.)

    In the given context of C and C++ I've always perceived the features
    of C++ to add to comprehensibility of source code where the respective
    C code required writing clumsy code and needed (unnecessary) syntactic ballast to implement similar functions and program constructs.

    Your undifferentiated complaint sounds more like someone not willing
    to understand the other concepts or have a reluctance or laziness to
    make yourself familiar with them.

    I'm saying it's not necessary to use such advanced features to do some
    trivial parsing.

    I've given a C solution below. (To test outside of Windows, remove
    windows.h and set cmdline to any string containing a test input or use a
    local function to get the program's command line as one string.)

    It uses no special features. Anybody can understand such code. Anybody
    can port it to another language far more easily than the C++. (Actually
    I wrote it first in my language then ported it to C. I only needed to do
    1- to 0-based conversion.)

    There are two things missing compared with the C++ (other than it uses
    UTF8 strings):

    * Individual parameters are capped in length (to 1023 chars here). This
    can be solved by determining only the span of the item then working from
    that.

    * Handling an unknown number of parameters is not automatic:

    For the latter, the example uses a fixed array size. For a dynamic array
    size, call 'strtoargs' with a count of 0 to first determine the number
    of args, then allocate an array and call again to populate it.


    -------------------------------------------
    #include <windows.h>
    #include <stdio.h>
    #include <string.h>

    int strtoargs(char* cmd, char** dest, int count) {
    enum {ilen=1024};
    char item[ilen];
    int n=0, length, c;
    char *p=cmd, *q, *end=&item[ilen-1];

    while (c=*p++) {
    if (c==' ' || c=='\t')
    continue;
    else if (c=='"') {
    length=0;
    q=item;

    while (c=*p++, c!='"') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }
    goto store;
    } else {
    length=0;
    q=item;
    --p;

    while (c=*p++, c!=' ' && c!='\t') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }

    store: *q=0;
    ++n;
    if (n<=count) dest[n-1]=strdup(item);
    }
    }
    return n;
    }

    int main(void) {
    char* cmdline;
    enum {cap=30};
    char* args[cap];
    int n;

    cmdline = GetCommandLineA();

    n=strtoargs(cmdline, args, cap);

    for (int i=0; i<n; ++i) {
    if (i<cap)
    printf("%d %s\n", i, args[i]);
    else
    printf("%d <overflow>\n", i);
    }
    }
    -------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Bart on Thu Sep 12 15:16:02 2024
    On 12/09/2024 14:44, Bart wrote:
    On 12/09/2024 13:20, Janis Papanagnou wrote:
    On 12.09.2024 13:29, Bart wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as in C.

    In this case, it actually needed somewhat more code, even if the line
    count was half.

    But your solutions are always incomprehensible because they strive for
    the most advanced features possible.

    I don't know of the other poster's solutions. But a quick browse seems
    to show nothing incomprehensible or anything that should be difficult
    to understand. (YMMV; especially if you're not familiar with C++ then
    I'm sure the code may look like noise to you.)

    In the given context of C and C++ I've always perceived the features
    of C++ to add to comprehensibility of source code where the respective
    C code required writing clumsy code and needed (unnecessary) syntactic
    ballast to implement similar functions and program constructs.

    Your undifferentiated complaint sounds more like someone not willing
    to understand the other concepts or have a reluctance or laziness to
    make yourself familiar with them.

    I'm saying it's not necessary to use such advanced features to do some trivial parsing.

    I've given a C solution below.

    BTW here are the sources sizes for the tokeniser function. (For C++ I've included the 'using' statement.)

    Spaces Hard tabs

    C++ 829 682 characters
    C 959 634
    M 785 548 (My original of the C version)

    So my C version is actually smaller than the C++ when using hard tabs.

    In any case, the C++ is not significantly smaller than the C, and
    certainly not a fifth the size.

    For proper higher level solutions in different languages, below is one
    of mine. That function is 107 bytes with hard tabs.

    (It's not possible to just split the string on white space because of
    quoted items with embedded spaces.)

    -------------------------------
    func strtoargs(cmdline)=
    args::=()
    sreadln(cmdline)

    while k:=sread("n") do
    args &:= k
    od
    args
    end

    println strtoargs(getcommandlinea())

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Bonita Montero on Thu Sep 12 17:30:26 2024
    On 12.09.2024 16:04, Bonita Montero wrote:
    Am 12.09.2024 um 14:20 schrieb Janis Papanagnou:

    I don't know of the other poster's solutions. But a quick browse seems
    to show nothing incomprehensible or anything that should be difficult
    to understand. (YMMV; especially if you're not familiar with C++ then
    I'm sure the code may look like noise to you.)

    C++ shared a property with C: The language facilties are mostly that
    simple that it's easy to roughly imagine the resulting code. So C++
    can be written with the same mindset.

    Not only "roughly imagine"; I think the imperative languages have
    so many common basic concepts that you can have a quite good idea,
    especially if you know more than just two or three such languages.

    But there are features, even basic ones, that are not existing in
    "C" thus making especially folks who are focused to some specific
    restricted or poorer language(s) obviously get confused.

    Yes, C++ can be written with a "C" mindset. But this is nothing
    I'd suggest. Better make yourself familiar with the new concepts
    (OO, genericity, or even simple things like references). - IMO.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Bart on Thu Sep 12 18:16:25 2024
    On Thu, 12 Sep 2024 14:44:03 +0100
    Bart <bc@freeuk.com> wrote:

    On 12/09/2024 13:20, Janis Papanagnou wrote:
    On 12.09.2024 13:29, Bart wrote:
    On 12/09/2024 03:22, Bonita Montero wrote:
    Am 11.09.2024 um 22:19 schrieb Bart:

    C++ is a simpler language? You're having a laugh!

    The solutions are simpler because you've a fifth of the code as
    in C.

    In this case, it actually needed somewhat more code, even if the
    line
    count was half.

    But your solutions are always incomprehensible because they strive
    for the most advanced features possible.

    I don't know of the other poster's solutions. But a quick browse
    seems to show nothing incomprehensible or anything that should be
    difficult to understand. (YMMV; especially if you're not familiar
    with C++ then I'm sure the code may look like noise to you.)

    In the given context of C and C++ I've always perceived the features
    of C++ to add to comprehensibility of source code where the
    respective C code required writing clumsy code and needed
    (unnecessary) syntactic ballast to implement similar functions and
    program constructs.

    Your undifferentiated complaint sounds more like someone not willing
    to understand the other concepts or have a reluctance or laziness to
    make yourself familiar with them.

    I'm saying it's not necessary to use such advanced features to do
    some trivial parsing.

    I've given a C solution below. (To test outside of Windows, remove
    windows.h and set cmdline to any string containing a test input or
    use a local function to get the program's command line as one string.)

    It uses no special features. Anybody can understand such code.
    Anybody can port it to another language far more easily than the C++. (Actually I wrote it first in my language then ported it to C. I only
    needed to do 1- to 0-based conversion.)

    There are two things missing compared with the C++ (other than it
    uses UTF8 strings):

    * Individual parameters are capped in length (to 1023 chars here).
    This can be solved by determining only the span of the item then
    working from that.

    * Handling an unknown number of parameters is not automatic:

    For the latter, the example uses a fixed array size. For a dynamic
    array size, call 'strtoargs' with a count of 0 to first determine the
    number of args, then allocate an array and call again to populate it.


    -------------------------------------------
    #include <windows.h>
    #include <stdio.h>
    #include <string.h>

    int strtoargs(char* cmd, char** dest, int count) {
    enum {ilen=1024};
    char item[ilen];
    int n=0, length, c;
    char *p=cmd, *q, *end=&item[ilen-1];

    while (c=*p++) {
    if (c==' ' || c=='\t')
    continue;
    else if (c=='"') {
    length=0;
    q=item;

    while (c=*p++, c!='"') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }
    goto store;
    } else {
    length=0;
    q=item;
    --p;

    while (c=*p++, c!=' ' && c!='\t') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }

    store: *q=0;
    ++n;
    if (n<=count) dest[n-1]=strdup(item);
    }
    }
    return n;
    }

    int main(void) {
    char* cmdline;
    enum {cap=30};
    char* args[cap];
    int n;

    cmdline = GetCommandLineA();

    n=strtoargs(cmdline, args, cap);

    for (int i=0; i<n; ++i) {
    if (i<cap)
    printf("%d %s\n", i, args[i]);
    else
    printf("%d <overflow>\n", i);
    }
    }
    -------------------------------------------


    Apart from unnecessary ilen limit, of unnecessary goto into block (I
    have nothing against forward gotos out of blocks, but gotos into blocks
    make me nervous) and of variable 'length' that serves no purpose, your
    code simply does not fulfill requirements of OP.
    I can immediately see two gotchas: no handling of escaped double
    quotation marks \" and no handling of single quotation marks. Quite
    possibly there are additional omissions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Thu Sep 12 15:37:33 2024
    Michael S <already5chosen@yahoo.com> writes:
    On Thu, 12 Sep 2024 14:44:03 +0100
    Bart <bc@freeuk.com> wrote:

    On 12/09/2024 13:20, Janis Papanagnou wrote:

    * Handling an unknown number of parameters is not automatic:

    For the latter, the example uses a fixed array size. For a dynamic
    array size, call 'strtoargs' with a count of 0 to first determine the
    number of args, then allocate an array and call again to populate it.


    -------------------------------------------
    #include <windows.h>
    #include <stdio.h>
    #include <string.h>

    int strtoargs(char* cmd, char** dest, int count) {
    enum {ilen=1024};
    char item[ilen];
    int n=0, length, c;
    char *p=cmd, *q, *end=&item[ilen-1];

    while (c=*p++) {
    if (c==' ' || c=='\t')
    continue;
    else if (c=='"') {
    length=0;
    q=item;

    while (c=*p++, c!='"') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }
    goto store;
    } else {
    length=0;
    q=item;
    --p;

    while (c=*p++, c!=' ' && c!='\t') {
    if (c==0) {
    --p;
    break;
    } else {
    if (q<end) *q++ = c;
    }
    }

    store: *q=0;
    ++n;
    if (n<=count) dest[n-1]=strdup(item);
    }
    }
    return n;
    }

    int main(void) {
    char* cmdline;
    enum {cap=30};
    char* args[cap];
    int n;

    cmdline = GetCommandLineA();

    n=strtoargs(cmdline, args, cap);

    for (int i=0; i<n; ++i) {
    if (i<cap)
    printf("%d %s\n", i, args[i]);
    else
    printf("%d <overflow>\n", i);
    }
    }
    -------------------------------------------


    Apart from unnecessary ilen limit, of unnecessary goto into block (I
    have nothing against forward gotos out of blocks, but gotos into blocks
    make me nervous) and of variable 'length' that serves no purpose, your
    code simply does not fulfill requirements of OP.
    I can immediately see two gotchas: no handling of escaped double
    quotation marks \" and no handling of single quotation marks. Quite
    possibly there are additional omissions.

    /*
    * For most commands, we'll split the rest of the line into
    * individual arguments, separated by whitespace. However,
    * some commands may wish to process the entire remainder of
    * the line as a single argument. Those commands will set the
    * ce_splitargs field to zero in the command table.
    */
    if (cep->ce_splitargs) {
    argcount = 0;
    cp = line;
    while (*cp != '\0') {
    if (argcount == MAX_ARGCOUNT) {
    fprintf(stdout,
    "Error: More than %d arguments unsupported\n",
    MAX_ARGCOUNT);
    return 1;
    }
    while (*cp != '\0' && isspace(*cp)) cp++;
    if (*cp == '\0') continue;
    if (*cp == '"') {
    in_quote = true;
    cp++;
    }
    arglist[argcount++] = cp;
    if (in_quote) {
    while (*cp != '\0' && *cp != '"') cp++;
    in_quote = false;
    } else {
    while (*cp != '\0' && !isspace(*cp)) cp++;
    }
    if (*cp == '\0') continue;
    *cp++ = '\0';
    }
    } else {
    arglist[0] = command;
    arglist[1] = line;
    argcount = 2;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Bart on Thu Sep 12 17:46:44 2024
    On 12.09.2024 16:16, Bart wrote:
    Spaces Hard tabs

    C++ 829 682 characters

    You are counting spaces, tabs and characters to characterize programs'
    quality or legibility or what? - Abandon all hope ye who enter here.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to All on Thu Sep 12 18:49:11 2024
    On Thu, 12 Sep 2024 15:37:33 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    <snip code from unidentified source>

    Huh?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Bonita Montero on Thu Sep 12 17:56:58 2024
    On 12.09.2024 17:47, Bonita Montero wrote:
    Am 12.09.2024 um 17:30 schrieb Janis Papanagnou:

    Not only "roughly imagine"; I think the imperative languages have
    so many common basic concepts that you can have a quite good idea,
    especially if you know more than just two or three such languages.

    Then tell me which lanuage a) has this kind of mostly minimized language-facilities and b) you can layout data structures 1:1
    like they fit into memory (platform-dependent).

    Don't know what you're trying to say here or what it is you aim
    at. If you think it's worth discussing please elaborate.


    Yes, C++ can be written with a "C" mindset. But this is nothing
    I'd suggest. Better make yourself familiar with the new concepts
    (OO, genericity, or even simple things like references). - IMO.

    I'm using mostly all new features as you can see from my code.
    But the mindset is still the same.

    I don't know you or your background or much of your programming.
    So please understand that I'm not inclined to make any comments
    about you or your code; this would be all speculative and not
    contribute anything to the discussion. If you had the impression
    that what I said was referring to you personally you were wrong.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Janis Papanagnou on Thu Sep 12 17:15:35 2024
    On 12/09/2024 16:46, Janis Papanagnou wrote:
    On 12.09.2024 16:16, Bart wrote:
    Spaces Hard tabs

    C++ 829 682 characters

    You are counting spaces, tabs and characters to characterize programs' quality or legibility or what? - Abandon all hope ye who enter here.

    I'm counting the number of characters needed to express the function.
    Since one of BM's claims is that the C++ example was smaller than C.

    The difference between the two columns is whether indentation uses hard
    tabs or spaces. The C version is more deeply indentated so that makes a difference. (Also the width of the tabs, but everything was measured
    with tabs set to 4 characters.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Michael S on Thu Sep 12 17:28:27 2024
    On 12/09/2024 16:16, Michael S wrote:
    On Thu, 12 Sep 2024 14:44:03 +0100
    Bart <bc@freeuk.com> wrote:

    Apart from unnecessary ilen limit, of unnecessary goto into block (I
    have nothing against forward gotos out of blocks, but gotos into blocks
    make me nervous) and of variable 'length' that serves no purpose, your
    code simply does not fulfill requirements of OP.
    I can immediately see two gotchas: no handling of escaped double
    quotation marks \" and no handling of single quotation marks. Quite
    possibly there are additional omissions.

    BM's C++ version doesn't handle embedded quotes or single quotes either. Neither expand wildcards into sequences of filename arguments.

    But you're right about 'length' which in the end was not used. It makes
    the C version even smaller without it.

    I wasn't trying to match the OP's requirements, as I don't know what
    they are.

    If this has to exactly match how the OS parses the command line into
    separate parameters, then that's likely to be a significantly more
    complex program, especially if it is to run on Linux.

    There's probably no point in trying to create such program; you'd need
    to find a way of utilising the OS to do the work.

    Note that I wasn't posting to solve the OP's problem, but as a
    counter-example to that C++ code which literally hurt my eyes to look at.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to Bonita.Montero@gmail.com on Thu Sep 12 17:39:46 2024
    In article <vbv6r1$bhc9$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 12.09.2024 um 18:28 schrieb Bart:

    BM's C++ version doesn't handle embedded quotes or single quotes either.
    Neither expand wildcards into sequences of filename arguments.

    Ok, that must be impossible with C++.
    I just wanted to show how to do it basically and what are the
    advantages: no intermediate data structure through functional
    progtamming and debug iterators.

    All of which would have been fine - and they'd probably all be raving about what a clever boy you are - if you'd only posted it to an appropriate newsgroup.

    --
    Many (most?) Trump voters voted for him because they thought if they
    supported Trump enough, they'd get to *be* Trump.

    Similarly, Trump believes that if *he* praises Putin enough, he'll get to *be* Putin.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Bonita Montero on Thu Sep 12 22:38:28 2024
    On Thu, 12 Sep 2024 19:02:35 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 12.09.2024 um 18:28 schrieb Bart:

    BM's C++ version doesn't handle embedded quotes or single quotes
    either.
    Neither expand wildcards into sequences of filename arguments.

    Ok, that must be impossible with C++.
    I just wanted to show how to do it basically and what are the
    advantages: no intermediate data structure through functional
    progtamming and debug iterators.



    Callback is as easy in C as in C++.
    Debug iterators not needed in such simple program. At least, I don't
    need them.
    Here is an equivalent of your parser written in C. It does not look 5
    times longer.

    Attention! That is an equivalent of Bonita's code, no more and
    hopefully no less. The routine does not fulfill requirements of OP!

    #include <stddef.h>

    void parse(const char* src,
    void (*OnToken)(const char* beg, size_t len, void* context),
    void* context) {
    char c0 = ' ', c1 = '\t';
    const char* beg = 0;
    for (;;src++) {
    char c = *src;
    if (c == c0 || c == c1 || c == 0) {
    if (beg) {
    OnToken(beg, src-beg, context);
    c0 = ' ', c1 = '\t';
    beg = 0;
    }
    if (c == 0)
    break;
    } else if (!beg) {
    beg = src;
    if (c == '"') {
    c0 = c1 = c;
    ++beg;
    }
    }
    }
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Thu Sep 12 22:09:52 2024
    On Thu, 12 Sep 2024 17:08:24 +0200, Bonita Montero wrote:

    I tried to experiment with that with /proc/<pid>/cmdline. The first
    problem was that the arguments aren't space delimited, but broken up
    with zeroes.

    That’s not a “problem”: it actually simplifies the parsing, because you can unambiguously extract the original command arguments without having to apply any complicated parsing/quoting/unquoting rules.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Fri Sep 13 02:43:56 2024
    On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:

    Then tell me which lanuage a) has this kind of mostly minimized language-facilities and b) you can layout data structures 1:1 like they
    fit into memory (platform-dependent).

    Python.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Fri Sep 13 06:49:20 2024
    On Fri, 13 Sep 2024 07:27:47 +0200, Bonita Montero wrote:

    Am 13.09.2024 um 04:43 schrieb Lawrence D'Oliveiro:

    On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:

    Then tell me which lanuage a) has this kind of mostly minimized
    language-facilities and b) you can layout data structures 1:1 like
    they fit into memory (platform-dependent).

    Python.

    Have a look at <https://gitlab.com/ldo/inotipy_examples/-/blob/master/fanotify_7_example?ref_type=heads>,
    and compare the C original from <https://manpages.debian.org/7/fanotify.7.en.html>. The Python code is
    half the size and can use high-level async calls.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Bonita Montero on Fri Sep 13 11:38:15 2024
    On Fri, 13 Sep 2024 07:28:34 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 12.09.2024 um 21:38 schrieb Michael S:

    Callback is as easy in C as in C++.

    Absolutely not because callbacks can't have state in C.



    So what is 'context' parameter in my code?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Fri Sep 13 11:49:35 2024
    On Fri, 13 Sep 2024 06:49:20 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Fri, 13 Sep 2024 07:27:47 +0200, Bonita Montero wrote:

    Am 13.09.2024 um 04:43 schrieb Lawrence D'Oliveiro:

    On Thu, 12 Sep 2024 17:47:23 +0200, Bonita Montero wrote:

    Then tell me which lanuage a) has this kind of mostly minimized
    language-facilities and b) you can layout data structures 1:1 like
    they fit into memory (platform-dependent).

    Python.

    Have a look at <https://gitlab.com/ldo/inotipy_examples/-/blob/master/fanotify_7_example?ref_type=heads>,
    and compare the C original from <https://manpages.debian.org/7/fanotify.7.en.html>. The Python code is
    half the size and can use high-level async calls.

    What exactly your response has to do with producing data structures
    with predefined layout?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Bonita Montero on Fri Sep 13 15:25:00 2024
    On Fri, 13 Sep 2024 14:12:32 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 13.09.2024 um 10:38 schrieb Michael S:
    On Fri, 13 Sep 2024 07:28:34 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    Am 12.09.2024 um 21:38 schrieb Michael S:

    Callback is as easy in C as in C++.

    Absolutely not because callbacks can't have state in C.



    So what is 'context' parameter in my code?

    In C++ the state is an own internal "this"-like object and you dont't
    need any explicit parameters.

    So, do you admit that callback in C can have state?

    Just a [&] and the lambda refers to the whole outer context.

    Bad software engineering practice that easily leads to incomprehensible
    code.
    When in C++ and not in mood for C-style, I very much prefer functors. Ideologically they are the same as C-style context, but a little
    sugarized syntactically.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Fri Sep 13 09:05:04 2024
    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    #include <stddef.h>

    void parse(const char* src,
    void (*OnToken)(const char* beg, size_t len, void* context),
    void* context) {
    char c0 = ' ', c1 = '\t';
    const char* beg = 0;
    for (;;src++) {
    char c = *src;
    if (c == c0 || c == c1 || c == 0) {
    if (beg) {
    OnToken(beg, src-beg, context);
    c0 = ' ', c1 = '\t';
    beg = 0;
    }
    if (c == 0)
    break;
    } else if (!beg) {
    beg = src;
    if (c == '"') {
    c0 = c1 = c;
    ++beg;
    }
    }
    }
    }

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * ); };

    static _Bool collect_word( const char *, const char *, _Bool, Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go ) :
    c ? collect_word( s, s, 1, go ) :
    /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Fri Sep 13 22:04:43 2024
    On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:

    What exactly your response has to do with producing data structures with predefined layout?

    Look at those structures: they have a specific predefined layout.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Fri Sep 13 22:24:00 2024
    On Fri, 13 Sep 2024 14:12:32 +0200, Bonita Montero wrote:

    In C++ the state is an own internal "this"-like object and you dont't
    need any explicit parameters.

    But you need a calling convention that passes “this” explicitly.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Lawrence D'Oliveiro on Fri Sep 13 23:48:38 2024
    On 13/09/2024 23:04, Lawrence D'Oliveiro wrote:
    On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:

    What exactly your response has to do with producing data structures with
    predefined layout?

    Look at those structures: they have a specific predefined layout.

    Look at them where? One link is a man-page with several C structs
    defined (triple-spaced for some reason).

    But I can't see anything in the Python link that looks like it might be defining a struct layout.

    So I would also question what it has to do with it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Sat Sep 14 01:41:32 2024
    On Sat, 14 Sep 2024 01:42:05 +0200, Bonita Montero wrote:

    Am 14.09.2024 um 00:24 schrieb Lawrence D'Oliveiro:

    On Fri, 13 Sep 2024 14:12:32 +0200, Bonita Montero wrote:

    In C++ the state is an own internal "this"-like object and you dont't
    need any explicit parameters.

    But you need a calling convention that passes “this” explicitly.

    That's not part of the C++-language.

    If the implementation doesn’t do it, it doesn’t work.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bart on Sat Sep 14 01:41:02 2024
    On Fri, 13 Sep 2024 23:48:38 +0100, Bart wrote:

    On 13/09/2024 23:04, Lawrence D'Oliveiro wrote:

    On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:

    What exactly your response has to do with producing data structures
    with predefined layout?

    Look at those structures: they have a specific predefined layout.

    One link is a man-page with several C structs defined ...

    Correct. Structures that the Python wrapper is able to map exactly.

    And the choice between which particular structure variants to use is
    dynamic, dependent on the event type. So the Python wrapper is able to dynamically generate a suitable type-safe wrapper -- something that a statically-typed language cannot do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Lawrence D'Oliveiro on Sat Sep 14 10:58:34 2024
    On 14/09/2024 02:41, Lawrence D'Oliveiro wrote:
    On Fri, 13 Sep 2024 23:48:38 +0100, Bart wrote:

    On 13/09/2024 23:04, Lawrence D'Oliveiro wrote:

    On Fri, 13 Sep 2024 11:49:35 +0300, Michael S wrote:

    What exactly your response has to do with producing data structures
    with predefined layout?

    Look at those structures: they have a specific predefined layout.

    One link is a man-page with several C structs defined ...

    Correct. Structures that the Python wrapper is able to map exactly.

    And the choice between which particular structure variants to use is
    dynamic, dependent on the event type. So the Python wrapper is able to dynamically generate a suitable type-safe wrapper -- something that a statically-typed language cannot do.

    So, where IS the struct defined in that Python code? Which line number?

    If it is the defined is elsewhere, then that Python proves nothing.

    For example, where is this struct:

    struct fanotify_event_info_header {
    __u8 info_type;
    __u8 pad;
    __u16 len;
    };

    defined in that Python? I have an idea how this might be done using the
    ctypes module for example, but it's not pretty. However I'm not even
    seeing that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bart on Sat Sep 14 22:37:49 2024
    On Sat, 14 Sep 2024 10:58:34 +0100, Bart wrote:

    So, where IS the struct defined in that Python code?

    In the API wrapper module, of course.

    <https://gitlab.com/ldo/inotipy>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sun Sep 15 12:22:11 2024
    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    #include <stddef.h>

    void parse(const char* src,
    void (*OnToken)(const char* beg, size_t len, void* context),
    void* context) {
    char c0 = ' ', c1 = '\t';
    const char* beg = 0;
    for (;;src++) {
    char c = *src;
    if (c == c0 || c == c1 || c == 0) {
    if (beg) {
    OnToken(beg, src-beg, context);
    c0 = ' ', c1 = '\t';
    beg = 0;
    }
    if (c == 0)
    break;
    } else if (!beg) {
    beg = src;
    if (c == '"') {
    c0 = c1 = c;
    ++beg;
    }
    }
    }
    }

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * ); };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }


    Can you give an example implementation of go->f() ?
    It seems to me that it would have to use CONTAINING_RECORD or
    container_of or analogous non-standard macro.

    Also, while formally the program is written in C, by spirit it's
    something else. May be, Lisp.
    Lisp compilers are known to be very good at tail call elimination.
    C compilers also can do it, but not reliably. In this particular case I
    am afraid that common C compilers will implement it as written, i.e.
    without turning recursion into iteration.

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Mon Sep 16 00:52:26 2024
    Michael S <already5chosen@yahoo.com> writes:

    [comments reordered]

    Also, while formally the program is written in C, by spirit it's
    something else. May be, Lisp.

    I would call it a functional style, but still C. Not a C style
    as most people are used to seeing it, I grant you that. I still
    think of it as C though.


    Lisp compilers are known to be very good at tail call elimination.
    C compilers also can do it, but not reliably. In this particular
    case I am afraid that common C compilers will implement it as
    written, i.e. without turning recursion into iteration.

    I routinely use gcc and clang, and both are good at turning
    this kind of mutual recursion into iteration (-Os or higher,
    although clang was able to eliminate all the recursion at -O1).
    I agree the recursion elimination is not as reliable as one
    would like; in practice though I find it quite usable.


    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0

    Both as expected.

    Latest icc still does not turn it into iteration at least along one
    code paths.

    That's disappointing, but good to know.

    Latest MSVC implements it as written, 100% recursion.

    I'm not surprised at all. In my admittedly very limited experience,
    MSVC is garbage.


    Can you give an example implementation of go->f() ?
    It seems to me that it would have to use CONTAINING_RECORD or
    container_of or analogous non-standard macro.

    You say that like you think such macros don't have well-defined
    behavior. If I needed such a macro probably I would just
    define it myself (and would be confident that it would
    work correctly).

    In this case I don't need a macro because I would put the gopher
    struct at the beginning of the containing struct. For example:

    #include <stdio.h>

    typedef struct {
    struct gopher_s go;
    unsigned words;
    } WordCounter;


    static void
    print_word( Gopher go, const char *s, const char *t ){
    WordCounter *context = (void*) go;
    int n = t-s;

    printf( " word: %.*s\n", n, s );
    context->words ++;
    }


    int
    main(){
    WordCounter wc = { { print_word }, 0 };
    char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";

    words_do( words, &wc.go );
    printf( "\n" );
    printf( " There were %u words found\n", wc.words );
    return 0;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Mon Sep 16 12:23:38 2024
    On Mon, 16 Sep 2024 00:52:26 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [comments reordered]

    Also, while formally the program is written in C, by spirit it's
    something else. May be, Lisp.

    I would call it a functional style, but still C. Not a C style
    as most people are used to seeing it, I grant you that. I still
    think of it as C though.


    Lisp compilers are known to be very good at tail call elimination.
    C compilers also can do it, but not reliably. In this particular
    case I am afraid that common C compilers will implement it as
    written, i.e. without turning recursion into iteration.

    I routinely use gcc and clang, and both are good at turning
    this kind of mutual recursion into iteration (-Os or higher,
    although clang was able to eliminate all the recursion at -O1).
    I agree the recursion elimination is not as reliable as one
    would like; in practice though I find it quite usable.


    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0

    Both as expected.


    So, only 15 years for gcc and only 7 years for clang.

    Latest icc still does not turn it into iteration at least along one
    code paths.

    That's disappointing, but good to know.

    Latest MSVC implements it as written, 100% recursion.

    I'm not surprised at all. In my admittedly very limited experience,
    MSVC is garbage.


    For sort of code that is important to me, gcc, clang and MSVC tend to
    generate code of similar quality. clang is most suspect of the three to sometimes unexpectedly produce utter crap. On the other hand, it is
    sometimes most brilliant.
    In case of gcc, I hate that recently they put tree-slp-vectorize under
    -O2 umbrella.


    Can you give an example implementation of go->f() ?
    It seems to me that it would have to use CONTAINING_RECORD or
    container_of or analogous non-standard macro.

    You say that like you think such macros don't have well-defined
    behavior. If I needed such a macro probably I would just
    define it myself (and would be confident that it would
    work correctly).

    In this case I don't need a macro because I would put the gopher
    struct at the beginning of the containing struct. For example:

    #include <stdio.h>

    typedef struct {
    struct gopher_s go;
    unsigned words;
    } WordCounter;


    static void
    print_word( Gopher go, const char *s, const char *t ){
    WordCounter *context = (void*) go;

    That's what I was missing. Simple and adequate.

    int n = t-s;

    printf( " word: %.*s\n", n, s );
    context->words ++;
    }



    int
    main(){
    WordCounter wc = { { print_word }, 0 };
    char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";

    words_do( words, &wc.go );
    printf( "\n" );
    printf( " There were %u words found\n", wc.words );
    return 0;
    }

    There are couple of differences between your and my parsing.
    1. "42""43"
    You parse it as a single word, I split. It seems, your behavior is
    closer to that of both bash and cmd.exe
    2. I strip " characters from "-delimited words. You seem to leave them.
    In this case what I do is more similar to both bash and cmd.exe

    Not that it matters.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Tue Sep 17 03:12:04 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 16 Sep 2024 00:52:26 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [comments reordered]

    Also, while formally the program is written in C, by spirit it's
    something else. May be, Lisp.

    I would call it a functional style, but still C. Not a C style
    as most people are used to seeing it, I grant you that. I still
    think of it as C though.


    Lisp compilers are known to be very good at tail call elimination.
    C compilers also can do it, but not reliably. In this particular
    case I am afraid that common C compilers will implement it as
    written, i.e. without turning recursion into iteration.

    I routinely use gcc and clang, and both are good at turning
    this kind of mutual recursion into iteration (-Os or higher,
    although clang was able to eliminate all the recursion at -O1).
    I agree the recursion elimination is not as reliable as one
    would like; in practice though I find it quite usable.


    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.

    That's disappointing, but good to know.

    Latest MSVC implements it as written, 100% recursion.

    I'm not surprised at all. In my admittedly very limited experience,
    MSVC is garbage.

    For sort of code that is important to me, gcc, clang and MSVC tend to generate code of similar quality.

    To clarify, my earlier comment about MSVC is about what it thinks
    the language is, not anything about quality of generated code. But
    the lack of tail call elimination fits in with what else I have
    seen.

    clang is most suspect of the three to sometimes unexpectedly
    produce utter crap. On the other hand, it is sometimes most
    brilliant.

    That's interesting. Recently I encountered a problem where clang
    did just fine but gcc generated bad code under -O3.

    In case of gcc, I hate that recently they put tree-slp-vectorize
    under -O2 umbrella.

    Yes, gcc is like a box of chocolates - you never know what you're
    going to get.

    Can you give an example implementation of go->f() ?
    It seems to me that it would have to use CONTAINING_RECORD or
    container_of or analogous non-standard macro.

    You say that like you think such macros don't have well-defined
    behavior. If I needed such a macro probably I would just
    define it myself (and would be confident that it would
    work correctly).

    In this case I don't need a macro because I would put the gopher
    struct at the beginning of the containing struct. For example:

    #include <stdio.h>

    typedef struct {
    struct gopher_s go;
    unsigned words;
    } WordCounter;


    static void
    print_word( Gopher go, const char *s, const char *t ){
    WordCounter *context = (void*) go;

    That's what I was missing. Simple and adequate.

    I now prefer this technique for callbacks. Cuts down on the
    number of parameters, safer than a (void*) parameter, and it puts
    the function pointer near the context state so it's easier to
    connect the two (and less worry about them getting out of sync).

    int n = t-s;

    printf( " word: %.*s\n", n, s );
    context->words ++;
    }

    int
    main(){
    WordCounter wc = { { print_word }, 0 };
    char *words = "\tthe quick \"brown fox\" jumps over the lazy dog.";

    words_do( words, &wc.go );
    printf( "\n" );
    printf( " There were %u words found\n", wc.words );
    return 0;
    }

    There are couple of differences between your and my parsing.
    1. "42""43"
    You parse it as a single word, I split. It seems, your behavior is
    closer to that of both bash and cmd.exe

    Yes. I chose that deliberately because I often use patterns like
    foo."$suffix" and it made sense to allow quoted subparts for that
    reason.

    2. I strip " characters from "-delimited words. You seem to leave them.
    In this case what I do is more similar to both bash and cmd.exe

    I do, both because it's easier, and in case the caller wants to
    know where the quotes are. If it's important to strip them out
    it's up to the caller to do that.

    Not that it matters.

    Yeah. These choices are only minor details; the general
    approach taken is the main thing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@fricas.org@21:1/5 to Michael S on Tue Sep 17 22:34:33 2024
    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * ); };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.
    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Michael S on Wed Sep 18 01:07:17 2024
    On 18/09/2024 00:46, Michael S wrote:
    On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
    antispam@fricas.org wrote:

    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * );
    };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Naturally.

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.


    "Should" or not, MSVC does not eliminate them.

    The funny thing is that it does eliminate all four calls after I rewrote
    the code in more boring style.

    static
    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;
    #if 1
    if (c == 0) {
    go->f( go, r, s );
    return w;
    }
    if (is_space(c) && w) {
    go->f( go, r, s );
    return words_do( s, go );
    }
    return collect_word( s+1, r, w ^ c == '"', go );
    #else
    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    #endif
    }

    I find such a coding style pretty much impossible to grasp and
    unpleasant to look at. I had to refactor it like this:

    ---------------

    static_Bool collect_word(char *s, char *r, _Bool w, Gopher go ) {
    char c = *s;
    #if 1
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    return collect_word(s+1, r, (w ^ c) == '"', go);

    #else
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    else if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    else {
    return collect_word(s+1, r, (w ^ c) = '"', go);
    }

    #endif
    }

    ---------------

    When I'd finished, I realised that those two conditional blocks do more
    or less the same thing! If that's what you mean by 'boring', then I'll
    all for it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to antispam@fricas.org on Tue Sep 17 16:33:16 2024
    antispam@fricas.org writes:

    Michael S <already5chosen@yahoo.com> wrote:

    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * ); };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Right, they are not tail calls, simply ordinary calls (indirect
    calls, but still ordinary calls).

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.

    Yes, a different set of calling conventions could result in the
    call to collect_word from words_do being a normal call. It
    should be possible to correct that by adding two dummy parameters
    to words_do(), and wrapping the result in one outer function so
    that there is at most one extra call besides the call from outide.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to antispam@fricas.org on Wed Sep 18 02:46:11 2024
    On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
    antispam@fricas.org wrote:

    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * );
    };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Naturally.

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.


    "Should" or not, MSVC does not eliminate them.

    The funny thing is that it does eliminate all four calls after I rewrote
    the code in more boring style.

    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;
    #if 1
    if (is_space(c))
    return words_do( s+1, go );
    if (c)
    return collect_word( s, s, 1, go );
    return 1;
    #else
    return
    is_space(c) ? words_do( s+1, go ) :
    c ? collect_word( s, s, 1, go ):
    /***************/ 1;
    #endif
    }

    static
    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;
    #if 1
    if (c == 0) {
    go->f( go, r, s );
    return w;
    }
    if (is_space(c) && w) {
    go->f( go, r, s );
    return words_do( s, go );
    }
    return collect_word( s+1, r, w ^ c == '"', go );
    #else
    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    #endif
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Tue Sep 17 18:31:10 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
    antispam@fricas.org wrote:

    Michael S <already5chosen@yahoo.com> wrote:

    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char * );
    };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along one
    code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Naturally.

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.

    "Should" or not, MSVC does not eliminate them.

    The funny thing is that it does eliminate all four calls after I rewrote
    the code in more boring style.

    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;
    #if 1
    if (is_space(c))
    return words_do( s+1, go );
    if (c)
    return collect_word( s, s, 1, go );
    return 1;
    #else
    return
    is_space(c) ? words_do( s+1, go ) :
    c ? collect_word( s, s, 1, go ):
    /***************/ 1;
    #endif
    }

    static
    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;
    #if 1
    if (c == 0) {
    go->f( go, r, s );
    return w;
    }
    if (is_space(c) && w) {
    go->f( go, r, s );
    return words_do( s, go );
    }
    return collect_word( s+1, r, w ^ c == '"', go );
    #else
    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    #endif
    }

    That's amusing. :)

    Do you know if icc will do tail call elimination for
    the boring version of the code?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Wed Sep 18 02:20:18 2024
    On Wed, 18 Sep 2024 02:46:11 +0300, Michael S wrote:

    "Should" or not, MSVC does not eliminate them.

    Another reason to stay away from MSVC?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Wed Sep 18 11:03:44 2024
    On Tue, 17 Sep 2024 18:31:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    That's amusing. :)

    Do you know if icc will do tail call elimination for
    the boring version of the code?

    Output of 'icc -O2' does recursive inlining to quite significant depth,
    so it is rather hard to follow.
    But it seems that the answer is "No".

    Anyway, by now icc is mostly of historical interest.
    They ceased independent compiler development 2-3 years ago and turned
    into yet another LLVM/clang distributor.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Wed Sep 18 11:05:24 2024
    On Wed, 18 Sep 2024 02:20:18 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Wed, 18 Sep 2024 02:46:11 +0300, Michael S wrote:

    "Should" or not, MSVC does not eliminate them.

    Another reason to stay away from MSVC?

    No, it isn't.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Bart on Wed Sep 18 11:43:05 2024
    On Wed, 18 Sep 2024 01:07:17 +0100
    Bart <bc@freeuk.com> wrote:

    On 18/09/2024 00:46, Michael S wrote:
    On Tue, 17 Sep 2024 22:34:33 -0000 (UTC)
    antispam@fricas.org wrote:

    Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 13 Sep 2024 09:05:04 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..iterate over words in a string..]

    I couldn't resist writing some code along similar lines. The
    entry point is words_do(), which returns one on success and
    zero if the end of string is reached inside double quotes.


    typedef struct gopher_s *Gopher;
    struct gopher_s { void (*f)( Gopher, const char *, const char *
    ); };

    static _Bool collect_word( const char *, const char *, _Bool,
    Gopher ); static _Bool is_space( char );


    _Bool
    words_do( const char *s, Gopher go ){
    char c = *s;

    return
    is_space(c) ? words_do( s+1, go )
    : c ? collect_word( s, s, 1, go )
    : /***************/ 1;
    }

    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;

    return
    c == 0 ? go->f( go, r, s ), w
    : is_space(c) && w ? go->f( go, r, s ), words_do( s, go )
    : /***************/ collect_word( s+1, r, w ^ c == '"', go );
    }

    _Bool
    is_space( char c ){
    return c == ' ' || c == '\t';
    }



    <snip>

    Tested on godbolt.
    gcc -O2 turns it into iteration starting from v.4.4
    clang -O2 turns it into iteration starting from v.4.0
    Latest icc still does not turn it into iteration at least along
    one code paths.
    Latest MSVC implements it as written, 100% recursion.

    I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word'
    are not tail calls and gcc 12 compiles them as normal call.

    Naturally.

    The other calls are compiled to jumps. But call to 'collect_word'
    in 'words_do' is not "sibicall" and dependig in calling convention
    compiler may treat it narmal call. Two other calls, that is
    call to 'words_do' in 'words_do' and call to 'collect_word' in
    'collect_word' are clearly tail self recursion and compiler
    should always optimize them to a jump.


    "Should" or not, MSVC does not eliminate them.

    The funny thing is that it does eliminate all four calls after I
    rewrote the code in more boring style.

    static
    _Bool
    collect_word( const char *s, const char *r, _Bool w, Gopher go ){
    char c = *s;
    #if 1
    if (c == 0) {
    go->f( go, r, s );
    return w;
    }
    if (is_space(c) && w) {
    go->f( go, r, s );
    return words_do( s, go );
    }
    return collect_word( s+1, r, w ^ c == '"', go );
    #else
    return
    c == 0 ? go->f( go, r, s ), w :
    is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) :
    /***************/ collect_word( s+1, r, w ^ c == '"', go );
    #endif
    }

    I find such a coding style pretty much impossible to grasp and
    unpleasant to look at. I had to refactor it like this:

    ---------------

    static_Bool collect_word(char *s, char *r, _Bool w, Gopher go ) {
    char c = *s;
    #if 1
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    return collect_word(s+1, r, (w ^ c) == '"', go);

    That's not how it was written in original. Should be:
    return collect_word(s+1, r, w ^ c == '"', go);
    Not the same thing at all. https://en.cppreference.com/w/c/language/operator_precedence


    #else
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    else if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    else {
    return collect_word(s+1, r, (w ^ c) = '"', go);

    The same here.

    }

    #endif
    }

    ---------------

    When I'd finished, I realised that those two conditional blocks do
    more or less the same thing! If that's what you mean by 'boring',
    then I'll all for it.


    Since I am not accustomed to the functional programming style, for me
    even a boring variant is way too entertaining.
    I prefer mundane (untested, could be buggy):

    static
    const char* collect_word(const char *s) {
    _Bool w = 0;
    char c;
    while ((c = *s) != 0) {
    if (!w && is_space(c))
    break;
    if (c == '"')
    w = !w;
    ++s;
    }
    return s;
    }

    void words_do(const char *s, Gopher go ){
    char c;
    while ((c = *s) != 0) {
    if (is_space(c)) {
    ++s;
    } else {
    const char *r = s;
    s = collect_word(s);
    go->f(go, r, s);
    }
    }
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Michael S on Wed Sep 18 10:49:38 2024
    On 18/09/2024 09:43, Michael S wrote:
    On Wed, 18 Sep 2024 01:07:17 +0100
    Bart <bc@freeuk.com> wrote:

    I find such a coding style pretty much impossible to grasp and
    unpleasant to look at. I had to refactor it like this:

    ---------------

    static_Bool collect_word(char *s, char *r, _Bool w, Gopher go ) {
    char c = *s;
    #if 1
    if (c == 0) {
    go->f(go, r, s);
    return w;
    }
    if (is_space(c) && w) {
    go->f(go, r, s);
    return words_do(s, go);
    }
    return collect_word(s+1, r, (w ^ c) == '"', go);

    That's not how it was written in original. Should be:
    return collect_word(s+1, r, w ^ c == '"', go);
    Not the same thing at all.

    So, what you are saying is that it means 'w ^ (c == '"')'? Because there
    could be some ambiguity, I put in the brackets. I had to to guess the precedence and chose the one that seemed more plausible, but I guessed
    wrong.

    Mine version then should be:

    return collect_word(s+1, r, w ^ (c == '"'), go);


    The same here.

    I'm surprised there weren't more typos, but that's not what my post was
    about which was presentation and layout.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Wed Sep 18 12:44:51 2024
    On 18/09/2024 11:49, Bart wrote:
    On 18/09/2024 09:43, Michael S wrote:
    On Wed, 18 Sep 2024 01:07:17 +0100
    Bart <bc@freeuk.com> wrote:


              return collect_word(s+1, r, (w ^ c) == '"', go);

    That's not how it was written in original. Should be:
              return collect_word(s+1, r, w ^ c == '"', go);
    Not the same thing at all.

    So, what you are saying is that it means 'w ^ (c == '"')'? Because there could be some ambiguity, I put in the brackets. I had to to guess the precedence and chose the one that seemed more plausible, but I guessed
    wrong.


    There is no ambiguity in the C language - the equality operator has
    higher precedence than the bitwise operators and logical operators.

    However, I fully agree with you that code is clearer if parenthesis are
    added. It makes the code easier to read, easier to write, and
    eliminates the risk of programmers (either those writing the code or
    those reading it) getting it wrong.

    Mine version then should be:

          return collect_word(s+1, r, w ^ (c == '"'), go);


    Put spaces around the "+" operator, and it would be perfect :-)


    The same here.

    I'm surprised there weren't more typos, but that's not what my post was
    about which was presentation and layout.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Sep 18 05:09:08 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 17 Sep 2024 18:31:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    That's amusing. :)

    Do you know if icc will do tail call elimination for
    the boring version of the code?

    Output of 'icc -O2' does recursive inlining to quite significant depth,
    so it is rather hard to follow.
    But it seems that the answer is "No".

    Anyway, by now icc is mostly of historical interest.
    They ceased independent compiler development 2-3 years ago and turned
    into yet another LLVM/clang distributor.

    Thank you, that is good to know.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Sep 18 06:01:33 2024
    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Since I am not accustomed to the functional programming style, for
    me even a boring variant [not shown] is way too entertaining. I
    prefer mundane (untested, could be buggy):

    static
    const char* collect_word(const char *s) {
    _Bool w = 0;
    char c;
    while ((c = *s) != 0) {
    if (!w && is_space(c))
    break;
    if (c == '"')
    w = !w;
    ++s;
    }
    return s;
    }

    void words_do(const char *s, Gopher go ){
    char c;
    while ((c = *s) != 0) {
    if (is_space(c)) {
    ++s;
    } else {
    const char *r = s;
    s = collect_word(s);
    go->f(go, r, s);
    }
    }
    }

    If writing in an imperative-rather-than-functional style, I would
    likely gravitate toward something like this:


    static const char *process_word( const char *, Gopher );

    void
    words_do( const char *s, Gopher go ){
    char c;

    while( c = *s++ ){
    if( ! is_space(c) ) s = process_word( s-1, go );
    }
    }

    const char *
    process_word( const char *r, Gopher go ){
    const char *s = r;
    _Bool q = 0;

    do q ^= *s++ == '"'; while( *s && (q || !is_space(*s)) );

    return go->f( go, r, s ), s;
    }


    which seems to result in slightly better generated code than my
    functional version, in a few spot checks using gcc or clang
    under various -O settings (-Os, -O2, -O3).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)