• Dereference relative to increment and decrement operators ++ --

    From James Harris@21:1/5 to All on Mon Nov 7 11:55:41 2022
    A piece of code Bart wrote in another thread happened to relate to
    something I've been working on but without me coming up with a clear
    answer so I thought I'd ask you guys what you think.

    The basic question is: If ^ is a postfix dereference operator then what
    should be the relative precedences of the following (where E is any subexpression)?

    ++E
    E++
    E^

    (The same goes for -- but to make description easier I'll mention only ++.)

    Taking a step back and considering general expression evaluation I have,
    so far, been defining the apparent order. And I'd like to continue with
    that. So it should be possible to combine multiple ++ operators
    arbitrarily. For example,

    ++E + E++
    ++E++
    V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something a programmer can rely on.

    [[

    As an aside, one thing I should point out is that while both pre-and post-increment require an lvalue it is easy for prefix ++ to also result
    in an lvalue whereas postfix ++ more naturally produces an rvalue.
    Prefix ++ can be translated to

    increment the value at a certain address
    use that /address/

    By contrast, postfix ++ more naturally translates to

    load into a register the /value/ at a certain address
    increment the value left at that address

    After postfix ++ the address may not be so usable because its value has
    already been changed and yet the code said to increment it /after/ the operation (for some definition of 'after').

    At any rate, that distinction between prefix and postfix ++ seems to be recognised at the following link where it says "Prefix versions of the
    built-in operators return references and postfix versions return values."

    https://en.cppreference.com/w/cpp/language/operator_incdec

    ]]

    Setting that aside aside ... and going back to the query, what should be
    the relative precedences of the three operators? For example, how should
    the following be evaluated?

    ++E++^
    ++E^++

    Or should some ways of combining ^ with either or both of the ++
    operators be prohibited because they make code too difficult to
    understand?!!

    I guess it boils down to what's most convenient and comprehensible for a programmer but I don't know if there is a clear answer. What do you guys
    think?

    I've been scratching my head over this for a while so other opinions
    would be most welcome!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 7 12:52:23 2022
    On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
    On 2022-11-07 12:55, James Harris wrote:

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something
    a programmer can rely on.

    One major contribution of PL/1 was clear understanding that "every
    garbage must mean something" was a bad language design principle.

    That's all very well but what specifically would you prohibit? While
    doing so be careful not to prohibit something that programmers have a legitimate reason to want.

    Perhaps another lesson from the PL/1 era was to avoid arbitrary rules.
    And yet an arbitrary prohibition of something can be such a rule.

    :-)


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 7 13:22:41 2022
    On 2022-11-07 12:55, James Harris wrote:

      ++E + E++
      ++E++
      V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something a programmer can rely on.

    One major contribution of PL/1 was clear understanding that "every
    garbage must mean something" was a bad language design principle.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 7 14:43:46 2022
    On 2022-11-07 13:52, James Harris wrote:
    On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
    On 2022-11-07 12:55, James Harris wrote:

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something
    a programmer can rely on.

    One major contribution of PL/1 was clear understanding that "every
    garbage must mean something" was a bad language design principle.

    That's all very well but what specifically would you prohibit?

    Your very question was about some arbitrary sequence of operators you
    fail to give a meaning! Stop right here. (:-))

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 7 15:06:54 2022
    On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
    On 2022-11-07 13:52, James Harris wrote:
    On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
    On 2022-11-07 12:55, James Harris wrote:

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so
    something a programmer can rely on.

    One major contribution of PL/1 was clear understanding that "every
    garbage must mean something" was a bad language design principle.

    That's all very well but what specifically would you prohibit?

    Your very question was about some arbitrary sequence of operators you
    fail to give a meaning! Stop right here. (:-))

    + usually means numeric addition

    = here presumably means an assignment (and from right to left)

    ++ can also be assumed to mean in-place increment. Specifically:

    ++E is equivalent to: (E := E + 1; E)
    E++ is equivalent to: (T := E; E := E + 1; T)

    (When E can be harmlessly evaluated more than once; otherwise an extra temporary reference would need to be used.)

    But I'm sure you know this already.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 7 15:58:27 2022
    On 07/11/2022 12:55, James Harris wrote:
    A piece of code Bart wrote in another thread happened to relate to
    something I've been working on but without me coming up with a clear
    answer so I thought I'd ask you guys what you think.

    The basic question is: If ^ is a postfix dereference operator then what should be the relative precedences of the following (where E is any subexpression)?

      ++E
      E++
      E^

    (The same goes for -- but to make description easier I'll mention only ++.)

    Taking a step back and considering general expression evaluation I have,
    so far, been defining the apparent order. And I'd like to continue with
    that. So it should be possible to combine multiple ++ operators
    arbitrarily. For example,

      ++E + E++
      ++E++
      V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something a programmer can rely on.

    I disagree entirely - unless you include giving an error message saying
    the programmer should be fired for writing gibberish as "well defined
    and something you can rely on". I can appreciate not wanting such
    things to be run-time undefined behaviour, but there is no reason at all
    to insist that it is acceptable by the compiler.


    Setting that aside aside ... and going back to the query, what should be
    the relative precedences of the three operators? For example, how should
    the following be evaluated?

      ++E++^
      ++E^++

    Or should some ways of combining ^ with either or both of the ++
    operators be prohibited because they make code too difficult to
    understand?!!


    Make it a syntax error. You are not trying to implement <https://en.wikipedia.org/wiki/Brainfuck>, and are not under any
    obligation to support people who want to code like that. On the other
    hand, you /do/ have an obligation to try to catch mistakes, typos, and accidental errors in code.

    I guess it boils down to what's most convenient and comprehensible for a programmer but I don't know if there is a clear answer. What do you guys think?

    I've been scratching my head over this for a while so other opinions
    would be most welcome!



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Mon Nov 7 14:23:33 2022
    On 07/11/2022 11:55, James Harris wrote:
    A piece of code Bart wrote in another thread happened to relate to
    something I've been working on but without me coming up with a clear
    answer so I thought I'd ask you guys what you think.

    The basic question is: If ^ is a postfix dereference operator then what should be the relative precedences of the following (where E is any subexpression)?

      ++E
      E++
      E^

    (The same goes for -- but to make description easier I'll mention only ++.)

    For unary operators, the evaluation order is rather peculiar yet seems
    to be used in quite a few languages without anyone questioning it. So if
    `a b c d` are unary operators, then the following:

    a b E c d

    is evaluated like this:

    a (b ((E c) d))

    That is, first all the post-fix operators in left-to-right order, then
    all the prefix ones in right-left order. It sounds bizarre when put like
    that!

    Taking a step back and considering general expression evaluation I have,
    so far, been defining the apparent order. And I'd like to continue with
    that. So it should be possible to combine multiple ++ operators
    arbitrarily. For example,

      ++E + E++


    This is well defined, as unary operators bind more tightly than binary
    ones. This is just (++E) + (++E).

    However the evaluation order for '+' is not usually well-defined, so you
    don't know which operand will be done first.

      ++E++

    This may not work, or not work as espected. The binding using my above
    scheme means this is equivalent to ++(E++). But the second ++ requires
    an lvalue as input, and yields an rvalue, which would be an invalid
    input to the first ++.

      V = V++

    This one doesn't have any problems, but is probably not useful: you're modifying V then replacing its value anyway, and with its original
    value. That new V+1 value is discarded.

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something a programmer can rely on.

    [[

    As an aside, one thing I should point out is that while both pre-and post-increment require an lvalue it is easy for prefix ++ to also result
    in an lvalue whereas postfix ++ more naturally produces an rvalue.
    Prefix ++ can be translated to

      increment the value at a certain address
      use that /address/

    By contrast, postfix ++ more naturally translates to

      load into a register the /value/ at a certain address
      increment the value left at that address

    After postfix ++ the address may not be so usable because its value has already been changed and yet the code said to increment it /after/ the operation (for some definition of 'after').

    At any rate, that distinction between prefix and postfix ++ seems to be recognised at the following link where it says "Prefix versions of the built-in operators return references and postfix versions return values."

      https://en.cppreference.com/w/cpp/language/operator_incdec

    I tried to get ++E++ to work using a suitable type for E, but in my
    language it cannot work, as the first ++ still needs an lvalue; just an
    rvalue which has a pointer type won't cut it.

    However ++E++^ can work, where ^ is deref, and E is a pointer.

    I think this is because in my language, for something to be a valid
    lvalue, you need to be able to apply & address-of to it. The result of
    E++ doesn't have an address. But (E++)^ works because & and ^ cancel
    out. Or something...

    Setting that aside aside ... and going back to the query, what should be
    the relative precedences of the three operators? For example, how should
    the following be evaluated?

      ++E++^
      ++E^++

    Or should some ways of combining ^ with either or both of the ++
    operators be prohibited because they make code too difficult to
    understand?!!

    You have the same issues in C, but that's OK because people are so
    familiar with it. Also * deref is a prefix operator so you never have
    two distinct postfix operators, unless you write E++ --.

    But yes, parentheses are recommended when mixing certain prefix/postfix
    ops. I think this one is clear enough however:

    -E^

    Deference E then negate the result. As is this: -E[i]; you wouldn't
    assume that meant (-E)[i].

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Mon Nov 7 15:23:11 2022
    On 07/11/2022 14:58, David Brown wrote:
    On 07/11/2022 12:55, James Harris wrote:
    A piece of code Bart wrote in another thread happened to relate to
    something I've been working on but without me coming up with a clear
    answer so I thought I'd ask you guys what you think.

    The basic question is: If ^ is a postfix dereference operator then
    what should be the relative precedences of the following (where E is
    any subexpression)?

       ++E
       E++
       E^

    (The same goes for -- but to make description easier I'll mention only
    ++.)

    Taking a step back and considering general expression evaluation I
    have, so far, been defining the apparent order. And I'd like to
    continue with that. So it should be possible to combine multiple ++
    operators arbitrarily. For example,

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something
    a programmer can rely on.

    I disagree entirely - unless you include giving an error message saying
    the programmer should be fired for writing gibberish as "well defined
    and something you can rely on".  I can appreciate not wanting such
    things to be run-time undefined behaviour, but there is no reason at all
    to insist that it is acceptable by the compiler.

    gcc accepts this C code (when E, V are both ints):


    ++E + E++;
    V = V++;

    It won't accept ++E++ because the first ++ expects an lvalue. Probably
    the same will happen when you try and implement it elsewhere. So no
    actual need to prohibit in the language - it just won't work.



    Setting that aside aside ... and going back to the query, what should
    be the relative precedences of the three operators? For example, how
    should the following be evaluated?

       ++E++^
       ++E^++


    Make it a syntax error.

    The equivalent in C syntax for the first is:

    ++*(P++);

    This compiles fine when P has type int* for example. It means this:

    - Increment the pointe P
    - Increment the location that P now points to (using the * deref op)

    So no reason to prohibit anything; it is perfectly well-defined. The
    first example is equivalent to:

    ++((*P)++);

    This won't work for the same reason as above. This is hard to prohibit
    via grammar rules, but it it not necessary as it fails on type-checking.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Mon Nov 7 15:43:20 2022
    On 07/11/2022 15:18, David Brown wrote:
    On 07/11/2022 15:23, Bart wrote:

       V = V++

    This one doesn't have any problems, but is probably not useful: you're
    modifying V then replacing its value anyway, and with its original
    value. That new V+1 value is discarded.

    In C, it has /big/ problems - the side-effects on V are not sequenced,
    so the expression is undefined behaviour.  Other languages may differ - you'd have to read the specifications or standards for those languages.

    I'd suggest that in C it would be a compiler problem. For example if it
    did the assignment, and then decided to increment V.

    To me that would be bizarre: I'd expect to evaluate the RHS as a single
    term (V++), including any side-effects entailed, before writing the
    resulting value (the old value of V) into V.

    But in general you're right: I'm not keen on multiple things being
    changed inside one expression. I tolerate ++ and -- (and chained
    assignment) because they are so handy. But I don't allow augmented
    assignments inside an expression as C does.


    I think this is because in my language, for something to be a valid
    lvalue, you need to be able to apply & address-of to it. The result of
    E++ doesn't have an address. But (E++)^ works because & and ^ cancel
    out. Or something...


    It is a bad sign for a language when even the language author,
    implementer, and experienced user is not sure how it works.  As long as
    the language is only ever meant to be for a single person, you can get
    away with saying "I wouldn't write that, so it doesn't matter what it means".  But if the OP has hopes that more than one person will ever see
    his language, it should be specified well enough that these things are written down.

    The 'or something' refers to the mechanism within my compiler which
    determines what is a legal lvalue. I'd have to study 3700 lines of code
    to discover exactly how it worked.

    But it should be obvious (now that I've thought about it!) that a term
    of the form X^, which is all that `E++^` is, should be a legal lvalue as
    it can be used on either side of an assignment:

    X^ := X^

    (Although no doubt C will make that UB because that's what it likes to do.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Mon Nov 7 16:18:19 2022
    On 07/11/2022 15:23, Bart wrote:
    On 07/11/2022 11:55, James Harris wrote:
    A piece of code Bart wrote in another thread happened to relate to
    something I've been working on but without me coming up with a clear
    answer so I thought I'd ask you guys what you think.

    The basic question is: If ^ is a postfix dereference operator then
    what should be the relative precedences of the following (where E is
    any subexpression)?

       ++E
       E++
       E^

    (The same goes for -- but to make description easier I'll mention only
    ++.)

    For unary operators, the evaluation order is rather peculiar yet seems
    to be used in quite a few languages without anyone questioning it.

    People /do/ question it. Knowing the order for a given language, and
    using the operators, does not imply liking them or not questioning them.


    And note that "operator precedence" is about parsing the expression - it
    is /not/ the same as "order of evaluation". A language specification
    should be clear on this.

    So if
    `a b c d` are unary operators, then the following:

       a b E c d

    is evaluated like this:

          a (b ((E c) d))

    That is, first all the post-fix operators in left-to-right order, then
    all the prefix ones in right-left order. It sounds bizarre when put like that!

    Taking a step back and considering general expression evaluation I
    have, so far, been defining the apparent order. And I'd like to
    continue with that. So it should be possible to combine multiple ++
    operators arbitrarily. For example,

       ++E + E++


    This is well defined, as unary operators bind more tightly than binary
    ones. This is just (++E) + (++E).


    It is not remotely "well defined" in C, but it might be well defined in
    /your/ language. The /precedence/ of the operators and the parsing of
    the expression is well defined in C, but its /behaviour/ is undefined as
    the order of evaluation is unspecified so the side-effects are unsequenced.

    However the evaluation order for '+' is not usually well-defined, so you don't know which operand will be done first.

       ++E++

    This may not work, or not work as espected. The binding using my above
    scheme means this is equivalent to ++(E++). But the second ++ requires
    an lvalue as input, and yields an rvalue, which would be an invalid
    input to the first ++.

       V = V++

    This one doesn't have any problems, but is probably not useful: you're modifying V then replacing its value anyway, and with its original
    value. That new V+1 value is discarded.

    In C, it has /big/ problems - the side-effects on V are not sequenced,
    so the expression is undefined behaviour. Other languages may differ -
    you'd have to read the specifications or standards for those languages.


       https://en.cppreference.com/w/cpp/language/operator_incdec

    I tried to get ++E++ to work using a suitable type for E, but in my
    language it cannot work, as the first ++ still needs an lvalue; just an rvalue which has a pointer type won't cut it.

    However ++E++^ can work, where ^ is deref, and E is a pointer.

    I think this is because in my language, for something to be a valid
    lvalue, you need to be able to apply & address-of to it. The result of
    E++ doesn't have an address. But (E++)^ works because & and ^ cancel
    out. Or something...


    It is a bad sign for a language when even the language author,
    implementer, and experienced user is not sure how it works. As long as
    the language is only ever meant to be for a single person, you can get
    away with saying "I wouldn't write that, so it doesn't matter what it
    means". But if the OP has hopes that more than one person will ever see
    his language, it should be specified well enough that these things are
    written down.

    Setting that aside aside ... and going back to the query, what should
    be the relative precedences of the three operators? For example, how
    should the following be evaluated?

       ++E++^
       ++E^++

    Or should some ways of combining ^ with either or both of the ++
    operators be prohibited because they make code too difficult to
    understand?!!

    You have the same issues in C, but that's OK because people are so
    familiar with it. Also * deref is a prefix operator so you never have
    two distinct postfix operators, unless you write E++ --.

    But yes, parentheses are recommended when mixing certain prefix/postfix
    ops. I think this one is clear enough however:

       -E^

    Deference E then negate the result. As is this: -E[i]; you wouldn't
    assume that meant (-E)[i].



    C was fixed and unchangeable long ago (at least for such fundamental
    things). A new language can be made better. If you think "parentheses
    are recommend here", change it to "parentheses are /required/ here". If
    you think "++E++" is confusing or questionable, make it a hard
    compile-time error.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Mon Nov 7 17:43:15 2022
    On 07/11/2022 16:06, Bart wrote:
    On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
    On 2022-11-07 13:52, James Harris wrote:
    On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
    On 2022-11-07 12:55, James Harris wrote:

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so
    something a programmer can rely on.

    One major contribution of PL/1 was clear understanding that "every
    garbage must mean something" was a bad language design principle.

    That's all very well but what specifically would you prohibit?

    Your very question was about some arbitrary sequence of operators you
    fail to give a meaning! Stop right here. (:-))

    + usually means numeric addition

    = here presumably means an assignment (and from right to left)

    ++ can also be assumed to mean in-place increment. Specifically:

     ++E is equivalent to:  (E := E + 1;  E)
     E++ is equivalent to:  (T := E;  E := E + 1;  T)

    (When E can be harmlessly evaluated more than once; otherwise an extra temporary reference would need to be used.)

    In C, if there are side-effects from evaluating E then you will have
    either undefined behaviour or at least implementation-dependent
    behaviour, depending on the exact expression.


    But I'm sure you know this already.


    What you have given are the interpretations for C and similar languages, operating on arithmetic operands. Other languages may have different
    meanings for the symbols. Even if the OP's language gives the same
    meaning to the operators for integers, it might mean something different
    for other types - including the possibility of operator overloads for
    user types.

    It might make sense for the language to define precedence order and
    other details for the operators independent of the semantics. Or it
    might make sense to have them depend on the types and the semantics -
    languages differ in how they work.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Mon Nov 7 17:34:02 2022
    On 07/11/2022 16:23, Bart wrote:
    On 07/11/2022 14:58, David Brown wrote:
    On 07/11/2022 12:55, James Harris wrote:
    A piece of code Bart wrote in another thread happened to relate to
    something I've been working on but without me coming up with a clear
    answer so I thought I'd ask you guys what you think.

    The basic question is: If ^ is a postfix dereference operator then
    what should be the relative precedences of the following (where E is
    any subexpression)?

       ++E
       E++
       E^

    (The same goes for -- but to make description easier I'll mention
    only ++.)

    Taking a step back and considering general expression evaluation I
    have, so far, been defining the apparent order. And I'd like to
    continue with that. So it should be possible to combine multiple ++
    operators arbitrarily. For example,

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something
    a programmer can rely on.

    I disagree entirely - unless you include giving an error message
    saying the programmer should be fired for writing gibberish as "well
    defined and something you can rely on".  I can appreciate not wanting
    such things to be run-time undefined behaviour, but there is no reason
    at all to insist that it is acceptable by the compiler.

    gcc accepts this C code (when E, V are both ints):


        ++E + E++;
        V = V++;


    That's like saying that you can hit a screw with a hammer. Use the tool properly, and you will see the complaints. gcc is a C compiler, not
    some kind of "official" guide to the language, and everyone knows that
    without flags it is far too accepting of code that has undefined
    behaviour or is otherwise clearly wrong even in cases that can be
    spotted easily. With even basic warning flags enabled, these are marked.

    (You've had this explained to you a few hundred times over the last
    decade or so. I know you get some kind of perverse pleasure out of find
    any way of making C and/or gcc look bad in your own eyes, but would you /please/ stop being such a petty child and stop writing things
    deliberately intended to confuse, mislead or annoy others?)

    For those that want to know the details, the most "official" C reference
    site shows similar expressions as examples of undefined behaviour:

    <https://en.cppreference.com/w/c/language/eval_order>

    And the C standards give some related examples in section 6.5.

    It won't accept ++E++ because the first ++ expects an lvalue. Probably
    the same will happen when you try and implement it elsewhere. So no
    actual need to prohibit in the language - it just won't work.


    That makes /no/ sense. If by "it just won't work" you mean the compiler
    won't accept it, then it is prohibited by the language - or your
    compiler fails to implement the language. If you mean the compiler
    accepts it but it "just won't work" at run-time, then that is exactly
    what we want to avoid.



    Setting that aside aside ... and going back to the query, what should
    be the relative precedences of the three operators? For example, how
    should the following be evaluated?

       ++E++^
       ++E^++


    Make it a syntax error.

    The equivalent in C syntax for the first is:

        ++*(P++);

    This compiles fine when P has type int* for example. It means this:

      - Increment the pointe P
      - Increment the location that P now points to (using the * deref op)

    So no reason to prohibit anything; it is perfectly well-defined.

    There is good reason to prohibit it - you got it wrong, so despite being well-defined by the language, it is not clear code.

    The actual meaning of "++*(P++);" is :

    1. Remember the original value of P - call it P_orig
    2. Increment P (that is, add sizeof(*P) to it).
    3. Increment the int at the location pointed to by P_orig.
    4. The value of the expression is the new updated value pointed to by P_orig.

    No specific ordering of the two increments is implied here - they can be
    done in either order. The compiler can assume that P and *P do not
    overlap (something that could only happen using a union) - if they do,
    the behaviour is undefined.

    (Note that "++*(P++)" is the same as "++*P++" in C, but the extra
    parentheses make it slightly less unclear.)

    The
    first example is equivalent to:

        ++((*P)++);

    This won't work for the same reason as above. This is hard to prohibit
    via grammar rules, but it it not necessary as it fails on type-checking.



    In C, prohibitions against such code come from "constraints", which are
    not part of the BNF grammar rules, but come before any kind of type
    checking. Whether an expression is an "rvalue", a "modifible lvalue",
    "a non-modifiable lvalue", or other classification, is not part of the
    type system.

    Other languages may handle this sort of thing differently - I can only
    say what C does here. I see no fundamental reason why it cannot be
    considered part of the grammar rules, but it might need a more advanced
    grammar than C has.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Mon Nov 7 17:56:37 2022
    On 07/11/2022 16:43, Bart wrote:
    On 07/11/2022 15:18, David Brown wrote:
    On 07/11/2022 15:23, Bart wrote:

       V = V++

    This one doesn't have any problems, but is probably not useful:
    you're modifying V then replacing its value anyway, and with its
    original value. That new V+1 value is discarded.

    In C, it has /big/ problems - the side-effects on V are not sequenced,
    so the expression is undefined behaviour.  Other languages may differ
    - you'd have to read the specifications or standards for those languages.

    I'd suggest that in C it would be a compiler problem. For example if it
    did the assignment, and then decided to increment V.

    It is not a compiler problem - it is undefined behaviour in the
    language, and if someone writes that code and tries to compile it as C,
    they can have no reasonable expectation of any particular behaviour.
    The compiler /may/ increment V, it may not, it may reject the code with
    a compiler error (if it can prove beyond doubt that the code would
    actually be run - you are allowed to put code with undefined runtime
    behaviour in code that is never run). Other behaviour would, I think,
    be so surprising to a programmer that it would be considered a poor implementation, even though the compiler might still be conforming.


    To me that would be bizarre: I'd expect to evaluate the RHS as a single
    term (V++), including any side-effects entailed, before writing the
    resulting value (the old value of V) into V.

    The C language does not have a sequence point on assignment. So if you
    write "x = y++;", there is no sequencing between writing to "x" or
    writing the incremented value to "y". (The value of "y + 1", and the
    address of "x", must be evaluated before the assignment, obviously - but
    these are also not sequenced with regard to each other.) When you write
    "x = y++;", that is convenient - the compiler can generate the code in
    whatever order is most efficient. But if "x" and "y" refer to the same
    thing, you have two unsequenced side-effects to the same objects - that
    is clearly undefined behaviour.

    A language could certainly have a sequence point on assignment. But C
    does not do so.


    But in general you're right: I'm not keen on multiple things being
    changed inside one expression. I tolerate ++ and -- (and chained
    assignment) because they are so handy. But I don't allow augmented assignments inside an expression as C does.


    C allows multiple things to be changed in one expression - but it does
    not allow the /same/ thing to be changed multiple times without sequencing.

    (I too generally prefer to change only one thing at a time in an
    expression, regardless of what the language may allow.)


    I think this is because in my language, for something to be a valid
    lvalue, you need to be able to apply & address-of to it. The result
    of E++ doesn't have an address. But (E++)^ works because & and ^
    cancel out. Or something...


    It is a bad sign for a language when even the language author,
    implementer, and experienced user is not sure how it works.  As long
    as the language is only ever meant to be for a single person, you can
    get away with saying "I wouldn't write that, so it doesn't matter what
    it means".  But if the OP has hopes that more than one person will
    ever see his language, it should be specified well enough that these
    things are written down.

    The 'or something' refers to the mechanism within my compiler which determines what is a legal lvalue. I'd have to study 3700 lines of code
    to discover exactly how it worked.

    But it should be obvious (now that I've thought about it!) that a term
    of the form X^, which is all that `E++^` is, should be a legal lvalue as
    it can be used on either side of an assignment:

        X^ := X^

    (Although no doubt C will make that UB because that's what it likes to do.)



    "*p = *p" is fine and fully defined in C (unless you have a pointer to volatile, when it will be implementation dependent).

    "*p++ = *p++" is a different matter entirely.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Mon Nov 7 17:16:56 2022
    On 2022-11-07 16:06, Bart wrote:
    On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
    On 2022-11-07 13:52, James Harris wrote:
    On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
    On 2022-11-07 12:55, James Harris wrote:

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so
    something a programmer can rely on.

    One major contribution of PL/1 was clear understanding that "every
    garbage must mean something" was a bad language design principle.

    That's all very well but what specifically would you prohibit?

    Your very question was about some arbitrary sequence of operators you
    fail to give a meaning! Stop right here. (:-))

    + usually means numeric addition

    = here presumably means an assignment (and from right to left)

    = means equality.

    ++ can also be assumed to mean in-place increment. Specifically:

    ++ means cheap keyboard with broken keys or coffee spilled over it... (:-))

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 7 17:26:57 2022
    On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
    On 2022-11-07 13:52, James Harris wrote:
    On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
    On 2022-11-07 12:55, James Harris wrote:

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so
    something a programmer can rely on.

    One major contribution of PL/1 was clear understanding that "every
    garbage must mean something" was a bad language design principle.

    That's all very well but what specifically would you prohibit?

    Your very question was about some arbitrary sequence of operators you
    fail to give a meaning! Stop right here. (:-))

    It's easy to dislike a certain sequence of operators. It's harder to
    define rules for their prohibition.

    A programmer has freedom to put in any sequence of operators which
    comply with the syntax and semantics of the language. A language
    designer, by contrast, if he wants to add a rule to prohibit certain permutations has (a) to define what permutations are ruled out and (b)
    think of the consequences of such a rule on other expressions which may
    look much more legitimate.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 7 18:53:53 2022
    On 2022-11-07 18:26, James Harris wrote:
    On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
    On 2022-11-07 13:52, James Harris wrote:
    On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
    On 2022-11-07 12:55, James Harris wrote:

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so
    something a programmer can rely on.

    One major contribution of PL/1 was clear understanding that "every
    garbage must mean something" was a bad language design principle.

    That's all very well but what specifically would you prohibit?

    Your very question was about some arbitrary sequence of operators you
    fail to give a meaning! Stop right here. (:-))

    It's easy to dislike a certain sequence of operators. It's harder to
    define rules for their prohibition.

    1. Reduce number of precedence level to logical, additive,
    multiplicative, highest order.

    2. Require parenthesis for mixed operations at the same level (except
    for * and /)

    3. No side effects of operators.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Nov 7 17:32:36 2022
    On 07/11/2022 16:43, David Brown wrote:
    On 07/11/2022 16:06, Bart wrote:

    ..

    But I'm sure you know this already.


    What you have given are the interpretations for C and similar languages, operating on arithmetic operands.  Other languages may have different meanings for the symbols.  Even if the OP's language gives the same
    meaning to the operators for integers, it might mean something different
    for other types - including the possibility of operator overloads for
    user types.

    If precedences were to vary with operand types then expressions would be
    vary hard for programmers to read so IMO it's important for program
    readability that precedences go with the operators and that they are independent of the types of the operands. If a programmer didn't know
    what order

    a + b * c

    would be evaluated in until he looked up the types then even simple
    programs would be very confusing.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Mon Nov 7 18:16:58 2022
    On 07/11/2022 14:23, Bart wrote:
    On 07/11/2022 11:55, James Harris wrote:

    ..

       ++E++

    This may not work, or not work as espected. The binding using my above
    scheme means this is equivalent to ++(E++). But the second ++ requires
    an lvalue as input, and yields an rvalue, which would be an invalid
    input to the first ++.

    Yes. If

    ++E++

    is going to be permitted then for programmer sanity wouldn't it be true
    to say that both ++ operators need to refer to the same lvalue? If so then

    ++p

    should probably have higher precedence than

    p++

    or perhaps their precedences could be the same but they be applied in left-to-right order.

    It may be worth looking at other operators which take in AND produce
    lvalues, most familiarly array indexing and field referencing, and hence
    they can be incremented. Isn't it true that for both ++ operators of

    ++points.x[1]
    points.x[1]++

    that a programmer would normally want points.x[1] incremented, i.e.
    field referencing and array indexing would take precedence over either
    ++ operator?

    But now what about dereference? Should it also take precedence over the
    ++ operators or should it come after one or both? For instance, what
    should the following mean?

    ++p^

    Should it be

    (++p)^

    or

    ++(p^)

    ?

    Which interpretation would programmers prefer? Frankly, I don't know
    which would be best. :(


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Nov 7 19:24:07 2022
    On 07/11/2022 14:58, David Brown wrote:
    On 07/11/2022 12:55, James Harris wrote:
    A piece of code Bart wrote in another thread happened to relate to
    something I've been working on but without me coming up with a clear
    answer so I thought I'd ask you guys what you think.

    The basic question is: If ^ is a postfix dereference operator then
    what should be the relative precedences of the following (where E is
    any subexpression)?

       ++E
       E++
       E^

    (The same goes for -- but to make description easier I'll mention only
    ++.)

    Taking a step back and considering general expression evaluation I
    have, so far, been defining the apparent order. And I'd like to
    continue with that. So it should be possible to combine multiple ++
    operators arbitrarily. For example,

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something
    a programmer can rely on.

    I disagree entirely

    Good. :)

    - unless you include giving an error message saying
    the programmer should be fired for writing gibberish as "well defined
    and something you can rely on".  I can appreciate not wanting such
    things to be run-time undefined behaviour, but there is no reason at all
    to insist that it is acceptable by the compiler.

    As I said to Dmitry, if one wants to prohibit the above then one has to
    define what exactly is being prohibited and to be careful not thereby to prohibit something else that may be more legitimate. Further, such a prohibition is an additional rule the programmer has to learn.

    All in all, ISTM better to define such expressions. The programmer is
    not forced to use them but at least if they are present in code and well defined then their meaning will be plain.

    Take the first one,

    ++E + E++

    It could be defined fairly easily. If operands to + are defined to
    appear as though they were evaluated left then right and the ++
    operators are set to be of higher precedence and defined to take effect
    as soon as they are evaluated than

    ++E + E++

    would evaluate as though the operations were

    ++E; E++; +

    If E were a variable of value 5 then the result would be

    6; 6++; + ===> 12 with E ending as 7

    E&OE the expression is not actually all that hard to parse if the rules
    are simple.



    Setting that aside aside ... and going back to the query, what should
    be the relative precedences of the three operators? For example, how
    should the following be evaluated?

       ++E++^
       ++E^++

    Or should some ways of combining ^ with either or both of the ++
    operators be prohibited because they make code too difficult to
    understand?!!


    Make it a syntax error.

    Why? What's so wrong with it? AISI if all three operators have the
    requisite number of operands then how can it be an error in syntax?

    ..

    On the other
    hand, you /do/ have an obligation to try to catch mistakes, typos, and accidental errors in code.

    Is it at least partially true that C defines a bunch of expressions as
    UB because the rules were not clearly specified initially and different compilers chose different interpretations?

    With a new language I cannot see why you might be against clear
    definition. I am aware that it might make optimisation harder to achieve
    but that would only apply in some cases and is still, IMO, better than
    simply saying "that's not defined".

    IOW I welcome your disagreement but don't understand it!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 7 20:45:19 2022
    On 07/11/2022 18:32, James Harris wrote:
    On 07/11/2022 16:43, David Brown wrote:
    On 07/11/2022 16:06, Bart wrote:

    ..

    But I'm sure you know this already.


    What you have given are the interpretations for C and similar
    languages, operating on arithmetic operands.  Other languages may have
    different meanings for the symbols.  Even if the OP's language gives
    the same meaning to the operators for integers, it might mean
    something different for other types - including the possibility of
    operator overloads for user types.

    If precedences were to vary with operand types then expressions would be
    vary hard for programmers to read so IMO it's important for program readability that precedences go with the operators and that they are independent of the types of the operands. If a programmer didn't know
    what order

      a + b * c

    would be evaluated in until he looked up the types then even simple
    programs would be very confusing.


    Agreed - but it doesn't make it impossible to use.

    And precedence is not the only feature of operators. For example, in C
    and C++, the && and || operators have the additional "short-circuit"
    property where the second operand is evaluated if and only if necessary, depending on the result of the first operand. But if you overload these operators for your own types in C++, this is not the case - they act
    like normal two-input functions and evaluate (without sequencing) both operands.

    Some languages also allow you to make your own operators, perhaps also
    using non-ASCII symbols. That will make some aspects of the language
    more complex, but would also allow neater and clearer user code in some
    cases.

    (I'm not arguing for or against such things in /your/ language, merely
    pointing out the possibilities.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 7 21:59:40 2022
    On 07/11/2022 20:24, James Harris wrote:
    On 07/11/2022 14:58, David Brown wrote:
    On 07/11/2022 12:55, James Harris wrote:
    A piece of code Bart wrote in another thread happened to relate to
    something I've been working on but without me coming up with a clear
    answer so I thought I'd ask you guys what you think.

    The basic question is: If ^ is a postfix dereference operator then
    what should be the relative precedences of the following (where E is
    any subexpression)?

       ++E
       E++
       E^

    (The same goes for -- but to make description easier I'll mention
    only ++.)

    Taking a step back and considering general expression evaluation I
    have, so far, been defining the apparent order. And I'd like to
    continue with that. So it should be possible to combine multiple ++
    operators arbitrarily. For example,

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so something
    a programmer can rely on.

    I disagree entirely

    Good. :)

    - unless you include giving an error message saying the programmer
    should be fired for writing gibberish as "well defined and something
    you can rely on".  I can appreciate not wanting such things to be
    run-time undefined behaviour, but there is no reason at all to insist
    that it is acceptable by the compiler.

    As I said to Dmitry, if one wants to prohibit the above then one has to define what exactly is being prohibited and to be careful not thereby to prohibit something else that may be more legitimate. Further, such a prohibition is an additional rule the programmer has to learn.

    No one said this was easy! Though Dmitry had some suggestions of rules
    to try.

    These prohibitions aren't really additional rules for the programmer to
    learn - it is primarily about disallowing things that a good programmer
    is not going to write in the first place. No one should actually care
    if "++E++" is allowed or not, because they should never write it.
    Prohibiting it means you don't have to specify the order these operators
    are applied, or whether the expression must be evaluated for
    side-effects twice, or any of the rest of it. The only people that will
    have to learn something extra are the sort of programmers who think it
    is smart to write line noise.


    All in all, ISTM better to define such expressions. The programmer is
    not forced to use them but at least if they are present in code and well defined then their meaning will be plain.


    No, the meaning will /not/ be plain. That's the point. Ideally you
    should only allow constructs that do exactly what they appear to do,
    without the reader having to study the manuals to understand some indecipherable gibberish that is technically legal code but completely
    alien to them because no sane programmer would write it.

    Take the first one,

      ++E + E++

    It could be defined fairly easily. If operands to + are defined to
    appear as though they were evaluated left then right and the ++
    operators are set to be of higher precedence and defined to take effect
    as soon as they are evaluated than

      ++E + E++

    would evaluate as though the operations were

      ++E; E++; +


    Then define it as "syntax error" and insist the programmer writes it
    sensibly.

    I cannot conceive of a reason to have a pre-increment operator in a
    modern language, nor would I want post-increment to return a value (nor
    any other kind of assignment). Ban side-effects in expressions -
    require a statement. "x = y + 1;" is a statement, so it can affect "x".
    "y++;" is a statement - a convenient abbreviation for "y = y + 1;".
    "++x" no longer exists, and "x + x++;" makes no sense because it mixes
    an expression and a statement.

    What is the cost? The programmer might have to split things into a few
    lines - but we have much bigger screens and vastly bigger disks than the
    days when C was born. The programmer might need a few extra temporary variables - these are free with modern compiler techniques.

    Ask yourself why "++x;" and the like exist in languages like C. The
    reason is that early compilers were weak - they were close to dumb
    translators into assembly, and if you wanted efficient results using the features of the target processor, you needed to write your code in a way
    that mimicked the actual processor instructions. "INC A" was faster
    than "ADD A, 1", so you write "x++" rather than "x = x + 1". This is no
    longer the case in the modern world.


    If E were a variable of value 5 then the result would be

      6; 6++; +  ===> 12 with E ending as 7

    E&OE the expression is not actually all that hard to parse if the rules
    are simple.



    Setting that aside aside ... and going back to the query, what should
    be the relative precedences of the three operators? For example, how
    should the following be evaluated?

       ++E++^
       ++E^++

    Or should some ways of combining ^ with either or both of the ++
    operators be prohibited because they make code too difficult to
    understand?!!


    Make it a syntax error.

    Why? What's so wrong with it? AISI if all three operators have the
    requisite number of operands then how can it be an error in syntax?


    /You/ are designing the syntax. You don't have to accept such
    meaningless drivel in the code - /you/ can choose to make it a syntax
    error. You can't pretend that it is a useful or intuitive code to human
    eyes, so why make it legal for the compiler?


    On the other hand, you /do/ have an obligation to try to catch
    mistakes, typos, and accidental errors in code.

    Is it at least partially true that C defines a bunch of expressions as
    UB because the rules were not clearly specified initially and different compilers chose different interpretations?


    Not really, no. Such cases are more often "implementation dependent".
    Things are more often "undefined behaviour" because there is no sensible
    way to define the behaviour, or no efficient way to implement defined behaviour, or where making it UB gives more benefits (such as
    optimisation opportunities or debugging/warnings/error checking/run-time checks) than giving it a definition that would mostly be wrong. There
    are a few cases of UB in C that could better as compile-time errors or implementation-dependent behaviour.

    With a new language I cannot see why you might be against clear
    definition.

    I am /for/ a clear definition - I recommend it be defined as a
    compile-time error. That /is/ a clear definition, it is not undefined behaviour.

    (There are also situations where I think "undefined behaviour" is better
    than defined behaviour, but that would be a different thread.)

    I am aware that it might make optimisation harder to achieve
    but that would only apply in some cases and is still, IMO, better than
    simply saying "that's not defined".

    IOW I welcome your disagreement but don't understand it!


    I think it is great that you are happy to discuss this and I try my bes
    to explain it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Tue Nov 8 00:15:37 2022
    On 07/11/2022 16:34, David Brown wrote:
    On 07/11/2022 16:23, Bart wrote:

    gcc accepts this C code (when E, V are both ints):


         ++E + E++;
         V = V++;


    That's like saying that you can hit a screw with a hammer.  Use the tool properly, and you will see the complaints.  gcc is a C compiler, not
    some kind of "official" guide to the language, and everyone knows that without flags it is far too accepting of code that has undefined
    behaviour or is otherwise clearly wrong even in cases that can be
    spotted easily.  With even basic warning flags enabled, these are marked.

    (You've had this explained to you a few hundred times over the last
    decade or so.  I know you get some kind of perverse pleasure out of find
    any way of making C and/or gcc look bad in your own eyes,

    Well, isn't it? You recommended that a new language doesn't allow it,
    but C does anyway, or at least its implementations do so.

    (Unless you go out of /your/ way to ensure it doesn't pass. But you'd be
    better off avoiding such code. There are a million ways of writing
    nonsense code that cannot be prohibited by a compiler.)

    It won't accept ++E++ because the first ++ expects an lvalue. Probably
    the same will happen when you try and implement it elsewhere. So no
    actual need to prohibit in the language - it just won't work.


    That makes /no/ sense.  If by "it just won't work"

    I mean that you will not get any C compilers to get it to work: all
    report hard errors, and will not generate any code.

    All the errors mention that some operand is not an lvalue. You don't
    really need a special rule in grammar to prohibit certain combinations
    of expressions.

    For the same reasons, it won't work in other languages unless they have
    very different intepretations of what ++ means.

    Now compare this kind of unequivocal error report with the wishy-washing handling of C compilers of those other two lines:

    3 compilers pass them with no comment
    1 compiler reports only a warning (and an invisible one to boot: Clang
    shows certain messages in a light grey font, exactly the colour of my
    console background!)

    So no reason to prohibit anything; it is perfectly well-defined.

    There is good reason to prohibit it - you got it wrong, so despite being well-defined by the language, it is not clear code.

    The actual meaning of "++*(P++);" is :

        1. Remember the original value of P - call it P_orig
        2. Increment P (that is, add sizeof(*P) to it).
        3. Increment the int at the location pointed to by P_orig.
        4. The value of the expression is the new updated value pointed to by P_orig.

    So, the meaning is that. The point is, it's well-defined and makes
    sense. It may be confusing to look at, but look at ANY C source and you
    will see complex expressions that are much harder to grok, like:

    OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
    }

    (Is it even an expression? I /think/ it's function definition.)

    So why single out increment operators? Because I got ++P confused with
    P++ for a second? Then let's ban those two varieties of increment op too!

    Note that ++*(P++) is equivalent to:

    *(P += 1) += 1;

    Do we ban this or not? (My language doesn't allow this, but again it's a
    type issue because `+:=` doesn't return a value.



    No specific ordering of the two increments is implied here - they can be
    done in either order.

    As I said in another post, that would be perverse.
    In C, prohibitions against such code come from "constraints", which are
    not part of the BNF grammar rules, but come before any kind of type checking.  Whether an expression is an "rvalue", a "modifible lvalue",
    "a non-modifiable lvalue", or other classification, is not part of the
    type system.

    That's up to the implementation. In my compilers including for C,
    validating lvalues is part of the type-checking.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Tue Nov 8 08:04:54 2022
    On 07/11/2022 17:53, Dmitry A. Kazakov wrote:
    On 2022-11-07 18:26, James Harris wrote:
    On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
    On 2022-11-07 13:52, James Harris wrote:
    On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
    On 2022-11-07 12:55, James Harris wrote:

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual >>>>>> meaning is less important than it being well defined and so
    something a programmer can rely on.

    One major contribution of PL/1 was clear understanding that "every
    garbage must mean something" was a bad language design principle.

    That's all very well but what specifically would you prohibit?

    Your very question was about some arbitrary sequence of operators you
    fail to give a meaning! Stop right here. (:-))

    It's easy to dislike a certain sequence of operators. It's harder to
    define rules for their prohibition.

    1. Reduce number of precedence level to logical, additive,
    multiplicative, highest order.

    2. Require parenthesis for mixed operations at the same level (except
    for * and /)

    3. No side effects of operators.

    Good suggestions, especially ruling out operators with side effects. You wouldn't believe how much trouble they've been giving me. (It would be
    alright if one was willing to make a language out of all kinds of odd
    features but not if one wants a language to be cohesive.)

    I like the simplicity of the language which would result from your
    suggestions but can't help but think they would make programming in it
    less comfortable, like the simplicity of a hair shirt. ;)


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Tue Nov 8 09:33:49 2022
    On 08/11/2022 01:15, Bart wrote:
    On 07/11/2022 16:34, David Brown wrote:
    On 07/11/2022 16:23, Bart wrote:

    gcc accepts this C code (when E, V are both ints):


         ++E + E++;
         V = V++;


    That's like saying that you can hit a screw with a hammer.  Use the
    tool properly, and you will see the complaints.  gcc is a C compiler,
    not some kind of "official" guide to the language, and everyone knows
    that without flags it is far too accepting of code that has undefined
    behaviour or is otherwise clearly wrong even in cases that can be
    spotted easily.  With even basic warning flags enabled, these are marked. >>
    (You've had this explained to you a few hundred times over the last
    decade or so.  I know you get some kind of perverse pleasure out of
    find any way of making C and/or gcc look bad in your own eyes,

    Well, isn't it? You recommended that a new language doesn't allow it,
    but C does anyway, or at least its implementations do so.

    C is a language from 5 decades ago, and has its flaws. Learn from them
    and avoid repeating them in new languages. gcc is a tool for compiling programs written in C that is essential for an enormous mass of existing
    code. A major feature is that it must remain compatible with existing
    usage and existing code, when used in the same way as before - this
    greatly restricts the features it can provide by default. So you need
    to use compiler flags to change default behaviour.

    How can this be difficult for you to understand? I don't believe it is.
    You are a smart guy - yet you insist on playing the fool, again and
    again. For years, you have posted to comp.lang.c with your hatred and misunderstandings about C and its tools, sometimes deliberately trying
    to confuse and mislead others who are newer to the language. Please
    don't do it here too.


    (Unless you go out of /your/ way to ensure it doesn't pass. But you'd be better off avoiding such code. There are a million ways of writing
    nonsense code that cannot be prohibited by a compiler.)


    Yes, because "gcc -Wall" is /so/ hard to write. I mean, it takes hours
    extra work, far out of your way. Write yourself a batch file with gcc
    flags - you could have done it 20 years ago and saved yourself and
    everyone else enormous effort.

    A major point of a good programming language - aided by good tools - is
    to reduce the amount of bad code that is accepted. Of course people
    could write perfect code without the help from tools, but people are
    usually imperfect and write bad code sometimes, knowingly or
    unknowingly. As you say, a compiler can't prohibit all nonsense code,
    but languages and tools can be designed to do their best.

    It won't accept ++E++ because the first ++ expects an lvalue.
    Probably the same will happen when you try and implement it
    elsewhere. So no actual need to prohibit in the language - it just
    won't work.


    That makes /no/ sense.  If by "it just won't work"

    I mean that you will not get any C compilers to get it to work: all
    report hard errors, and will not generate any code.


    So it is prohibited by the language.

    All the errors mention that some operand is not an lvalue. You don't
    really need a special rule in grammar to prohibit certain combinations
    of expressions.

    No, indeed you don't - because it is prohibited by the language.


    For the same reasons, it won't work in other languages unless they have
    very different intepretations of what ++ means.

    Now compare this kind of unequivocal error report with the wishy-washing handling of C compilers of those other two lines:


    No, let's not. We are not talking about C - we are talking about how
    the OP might handle such code in /his/ language. And if you want to
    talk about C, grow up and use proper tools in a proper manner.


    So no reason to prohibit anything; it is perfectly well-defined.

    There is good reason to prohibit it - you got it wrong, so despite
    being well-defined by the language, it is not clear code.

    The actual meaning of "++*(P++);" is :

         1. Remember the original value of P - call it P_orig
         2. Increment P (that is, add sizeof(*P) to it).
         3. Increment the int at the location pointed to by P_orig.
         4. The value of the expression is the new updated value pointed
    to by P_orig.

    So, the meaning is that. The point is, it's well-defined and makes
    sense.

    It makes sense to the language - it does not make sense to humans (the
    fact that /you/ got it wrong proves this, if there were any doubt).
    Therefore it is not good programming. Therefore, if it is practical for
    a language and/or tool to disallow it without too much other harm to the language, it should disallow it.

    It may be confusing to look at, but look at ANY C source and you
    will see complex expressions that are much harder to grok, like:

    OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
             }

    (Is it even an expression? I /think/ it's function definition.)

    Are you arguing that because some people write C code that is even
    harder to understand, the OP should allow these nonsense expressions in
    his language? That "logic" is like saying that because there are bank
    robbers, people should be allowed to drunk-drive.


    So why single out increment operators? Because I got ++P confused with
    P++ for a second? Then let's ban those two varieties of increment op too!


    No one is singling out these operators - it's just the example the OP gave.

    And yes, ban ++P - it is a pointless operator now. (See my other post discussing that.)

    Note that ++*(P++) is equivalent to:

      *(P += 1) += 1;

    No, it is not. Again, your mistakes show why it is a really bad idea to
    allow these kinds of expression.


    Do we ban this or not? (My language doesn't allow this, but again it's a
    type issue because `+:=` doesn't return a value.


    Good. Assignment should be a statement, not an expression, and should
    not return a value.



    No specific ordering of the two increments is implied here - they can
    be done in either order.

    As I said in another post, that would be perverse.
    In C, prohibitions against such code come from "constraints", which
    are not part of the BNF grammar rules, but come before any kind of
    type checking.  Whether an expression is an "rvalue", a "modifible
    lvalue", "a non-modifiable lvalue", or other classification, is not
    part of the type system.

    That's up to the implementation. In my compilers including for C,
    validating lvalues is part of the type-checking.


    You can mix up phases of translation as much as you want, and you can
    happily check lvalue/rvalue constraints in the same part of the code as
    you do type checking. But that does not make lvalue/rvalue
    classification anything to do with types in C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 8 09:23:26 2022
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from your suggestions but can't help but think they would make programming in it
    less comfortable, like the simplicity of a hair shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++; you
    are wrong. Actually that train left the station. Today C++ fun is
    templates. It is monstrous instantiations over instantiations barely
    resembling program code. Modern times is a glorious combination of
    Python performance with K&R C readability! (:-))

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to James Harris on Tue Nov 8 11:00:58 2022
    On 07/11/2022 18:16, James Harris wrote:

    ..

    But now what about dereference? Should it also take precedence over the
    ++ operators or should it come after one or both? For instance, what
    should the following mean?

      ++p^

    Should it be

      (++p)^

    or

      ++(p^)

    ?

    Lightbulb moment! It occurred to me last night that although dereference
    "is about" lvalues it doesn't actually take in an lvalue; it takes an
    rvalue (i.e. if supplied an lvalue it will be 'converted' to an rvalue
    before being input to the dereference operation). I had it wrongly
    classified in my operators spreadsheet. Yet that feature of dereference
    may help suggest where its precedence should put relative to the ++
    operators, i.e. it should come after both of them.

    If so, that makes the order

    prefix ++ ;lvalue -> lvalue
    postfix ++ ;lvalue -> rvalue
    ^ (dereference) ;rvalue ->

    That may be the solution: to put those three operators in that order
    relative to each other. I'll have to see how it would work out in
    practice but it is certainly a decision with logical underpinnings.

    Feel free to tell me if that order of precedences is bad!

    I should say I omitted what dereference produces as the lvalue it
    produces is lexically unrelated to any variable in the expression.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Tue Nov 8 12:43:02 2022
    On 08/11/2022 08:33, David Brown wrote:
    On 08/11/2022 01:15, Bart wrote:

    (Unless you go out of /your/ way to ensure it doesn't pass. But you'd
    be better off avoiding such code. There are a million ways of writing
    nonsense code that cannot be prohibited by a compiler.)


    Yes, because "gcc -Wall" is /so/ hard to write.

    And it's SO hard for a compiler to just use that as a default! So it
    stays safe for EVERYONE no matter how they invoke the compiler.

    Take a function like this which I consider much more dangerous than
    anything we've been discussing:

    void fred() {}

    My bcc compiler gives a hard error: "() params are not allowed". But
    this works:

    gcc c.c -c

    OK, I'll have to write -Wall as you say:

    gcc -Wall c.c -c

    But, it still passes!

    (So much existing code wrongly uses () to mean no parameters - thanks no
    doubt to gcc's lax approach over decades - that I have to give bcc a
    special option to enable it when it comes up.)

      I mean, it takes hours
    extra work, far out of your way.  Write yourself a batch file with gcc
    flags - you could have done it 20 years ago and saved yourself and
    everyone else enormous effort.

    Why do you expect people to have to themselves implement a chunk of the compiler they're using?

    And have to do so for every compiler - at one time I was using 7 or 8.
    ALL of them should be doing their jobs properly without being told.



    A major point of a good programming language - aided by good tools - is
    to reduce the amount of bad code that is accepted.

    Yeah. In my language, A[i] only works when A is an array; P^ (pointer
    deref) only works when P is a pointer.

    Sounds obvious when put like that, but in C anything goes; Allowing A^
    and P[i] enables a huge amout of dangerous nonsense.


    I mean that you will not get any C compilers to get it to work: all
    report hard errors, and will not generate any code.


    So it is prohibited by the language.

    So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
    sounds like, it depends!


    It may be confusing to look at, but look at ANY C source and you will
    see complex expressions that are much harder to grok, like:

    OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
              }

    (Is it even an expression? I /think/ it's function definition.)

    Are you arguing that because some people write C code that is even
    harder to understand, the OP should allow these nonsense expressions in
    his language?  That "logic" is like saying that because there are bank robbers, people should be allowed to drunk-drive.

    YOU are arguing that you shouldn't be allowed to compose certain
    operators because the result might be confusing. But why single out
    these particular ones?

    That is, combinations of increment and deref. More importantly, how
    exactly do you expect the language to do so? If the result is sound, syntax-wise and type-wise, what criterea do you expect it to apply?

    To be clear, the expression we are talking about isn't a nonsense one at
    all and is perfectly well-behaved:

    a := (10, 20, 30)

    p:=^a[1] # p points to the 10

    ++(p++^) # step p to the 20 while incrementing the 10

    println a # displays (11, 20, 30)
    println p^ # displays 20

    (This is dynamic scripting code.)

    As for ++e + e++, while I would never write such a thing, I'm not going
    to lose sleep over it. It that was banned, there are 99 other ways to
    write code where behaviour depends on evaluation order:

    #include <stdio.h>

    int f1(void) {return puts("One");}
    int f2(void) {return puts("Two");}
    int f3(void) {return puts("Three");}

    void f(int a, int b, int c){}


    int main(void) {
    f(f1(), f2(), f3());
    }


    This displays:

    Three
    Two
    One

    with gcc and bcc. With tcc, it shows:

    One
    Two
    Three

    I know what, let's ban functions! This is what you are saying. You're
    throwing out the baby with the bathwater.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Tue Nov 8 13:24:28 2022
    On 07/11/2022 16:34, David Brown wrote:
    On 07/11/2022 16:23, Bart wrote:
    On 07/11/2022 14:58, David Brown wrote:
    On 07/11/2022 12:55, James Harris wrote:

    ..

    gcc accepts this C code (when E, V are both ints):


         ++E + E++;
         V = V++;

    ..

    (You've had this explained to you a few hundred times over the last
    decade or so.  I know you get some kind of perverse pleasure out of find
    any way of making C and/or gcc look bad in your own eyes, but would you /please/ stop being such a petty child and stop writing things
    deliberately intended to confuse, mislead or annoy others?)

    Steady on, old boy! Surely we can make comments here without getting too personal.

    ..

       ++E++^
       ++E^++


    Make it a syntax error.

    The equivalent in C syntax for the first is:

         ++*(P++);

    This compiles fine when P has type int* for example. It means this:

       - Increment the pointe P
       - Increment the location that P now points to (using the * deref op)

    So no reason to prohibit anything; it is perfectly well-defined.

    There is good reason to prohibit it - you got it wrong, so despite being well-defined by the language, it is not clear code.

    The actual meaning of "++*(P++);" is :

        1. Remember the original value of P - call it P_orig
        2. Increment P (that is, add sizeof(*P) to it).
        3. Increment the int at the location pointed to by P_orig.
        4. The value of the expression is the new updated value pointed to by P_orig.

    That's a good example of how legitimate code can me made to look like
    gibberish by the evil programmer (tm). As I've mentioned elsewhere it's
    hard to invent rules to prohibit particular constructs simply because
    'we don't like the look of them' and it would make the language harder
    to implement and understand if the language design included rules on 'aesthetics'.

    It seems to have parallels with the free-speech debate. Free speech is
    easy when we get to choose what should be free and what should be banned
    - but then that's not free speech. In reality, free speech is hard
    because others may be free to say things we don't like (although IMO
    those who don't want to hear them should not have to listen to them ...
    but that's another topic and getting off the point of the simile). In a
    similar way, programming can be hard when other programmers write
    constructs we don't like. I agree that it's best for a language to help programmers write readable and comprehensible programs - and even to
    make them the easiest to write, if possible - but the very flexibility
    which may allow them to do so may also give then the freedom to write
    code we don't care for. I don't think one can legislate against that.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to James Harris on Tue Nov 8 15:13:20 2022
    On 08/11/2022 08:04, James Harris wrote:
    On 07/11/2022 17:53, Dmitry A. Kazakov wrote:
    1. Reduce number of precedence level to logical, additive,
    multiplicative, highest order.

    Many people expect "and" to bind more tightly than "or", so you
    perhaps need [at least] two levels of logical. Somewhere between C and
    hair shirts, there is perhaps some more sensible number?

    2. Require parenthesis for mixed operations at the same level
    (except for * and /)

    Don't know why the exception?

    3. No side effects of operators.

    How is "side effect" defined for this purpose? But in any case
    you can do this if only language-defined operators are allowed, but if
    you want to allow users to [re-]define their own, it's much harder.
    Once you've used languages with proper operators, you'll never want to
    go back! Lots of things other than numbers can sensibly be added or multiplied, even subtracted or divided.

    Good suggestions, especially ruling out operators with side effects. You wouldn't believe how much trouble they've been giving me.

    You can get too paranoid about side-effects. They're like many
    aspects of programming; they can be used for good or evil, and on the
    whole you should let programmers use them that way. Good programmers
    will use them wisely, bad programmers will write bad programs no matter
    how hard you try to make them write good ones.

    [...]
    I like the simplicity of the language which would result from your suggestions but can't help but think they would make programming in
    it less comfortable, like the simplicity of a hair shirt. ;)

    If operator precedence and parentheses are a problem for you,
    you could always switch to Polish [or Reverse Polish] notation. Also
    solves the problem of operators that take three or more operands.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Farnaby

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Andy Walker on Tue Nov 8 17:05:49 2022
    On 2022-11-08 16:13, Andy Walker wrote:
    On 08/11/2022 08:04, James Harris wrote:
    On 07/11/2022 17:53, Dmitry A. Kazakov wrote:
    1. Reduce number of precedence level to logical, additive,
    multiplicative, highest order.

        Many people expect "and" to bind more tightly than "or", so you perhaps need [at least] two levels of logical.  Somewhere between C and
    hair shirts, there is perhaps some more sensible number?

    There could be other operators like xor, implication etc.

    2. Require parenthesis for mixed operations at the same level
    (except for * and /)

        Don't know why the exception?

    Because traditionally - and / do not require them. Actually the rule
    should be: any same-valued non-associative operator need parenthesis:

    a ^ b ^ c

    is illegal (assuming ^ is exponentiation).

    a + b + c

    is OK.

    3. No side effects of operators.

        How is "side effect" defined for this purpose?

    I would have a stated contract on a subprogram not to have side effects.
    Only such subprograms may implement operators. One can partially enforce
    this contract by checking calls statically. Should the programmer
    violate the contract in some other way, the result would be a bounded
    run-time error.

        You can get too paranoid about side-effects.  They're like many aspects of programming;  they can be used for good or evil, and on the
    whole you should let programmers use them that way.  Good programmers
    will use them wisely, bad programmers will write bad programs no matter
    how hard you try to make them write good ones.

    One could consider further contracts on actual arguments in order to
    disallow expressions like:

    Read + Read / Read

    The general rule must be that evaluation order if not explicit must not
    change the result (within the margins of rounding errors and exception propagation). Though programmers would like to address exceptions too.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Tue Nov 8 16:42:35 2022
    On 07/11/2022 18:16, James Harris wrote:
    On 07/11/2022 14:23, Bart wrote:
    On 07/11/2022 11:55, James Harris wrote:

    ..

       ++E++

    This may not work, or not work as espected. The binding using my above
    scheme means this is equivalent to ++(E++). But the second ++ requires
    an lvalue as input, and yields an rvalue, which would be an invalid
    input to the first ++.

    Yes. If

      ++E++

    is going to be permitted

    Then you need to define what it means. Here, suppose that in each case E
    starts off as 100:

    E++ # What value does E have afterwards?

    ++E # What value does E have afterwards?

    X := E++ # What is the value of X?

    ++E++ # What is the value of E after?

    X := ++E++ # What is the value of X? What is the type and value
    # of the E++ portion?

    I can't make ++E++ work in any of my languages because of type/lvalue discrepancies.

    then for programmer sanity wouldn't it be true
    to say that both ++ operators need to refer to the same lvalue? If so then

      ++p

    should probably have higher precedence than

      p++

    or perhaps their precedences could be the same but they be applied in left-to-right order.

    It would already be a big deal, and a vast improvement over C, that "^"
    is a postfix op; don't push it!

    It may be worth looking at other operators which take in AND produce
    lvalues, most familiarly array indexing and field referencing, and hence
    they can be incremented. Isn't it true that for both ++ operators of

      ++points.x[1]
      points.x[1]++

    that a programmer would normally want points.x[1] incremented, i.e.
    field referencing and array indexing would take precedence over either
    ++ operator?


    I'm not sure what you're asking here or where producing lvalues comes in
    to it. Those examples work as expected in my language:

    record R =
    var x
    end

    points:=R((10,20,30))
    println points # (10, 20, 30)

    ++points.x[1]
    println points # (11, 20, 30)

    points.x[1]++
    println points # (12, 20, 30)

    However, my syntax works a specific way:

    * "." is not considered a normal binary operator (because it isn't).

    * "[]" is not considered that either (this is more typical)

    So `points.x[1]` forms a single expression term. Unary ops like `++`
    work on a term. If "." was a normal binary op, then your example would
    be parsed as:

    (++points).x[1]

    unless you make special rules just for ++.

    Note, usually ++A and A++ are interchangeable. There is only different behaviour if you try to use the resulting value (the first then returns
    new A, the second returns old A).

    But now what about dereference? Should it also take precedence over the
    ++ operators or should it come after one or both? For instance, what
    should the following mean?

      ++p^

    Should it be

      (++p)^

    or

      ++(p^)

    Isn't it just up to unary op evaluation? I already said how it's
    typically done, so that ++p^ means ++(p^). If it's unclear, then just
    use parentheses.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Tue Nov 8 17:29:50 2022
    On 08/11/2022 14:24, James Harris wrote:
    On 07/11/2022 16:34, David Brown wrote:
    On 07/11/2022 16:23, Bart wrote:

    The equivalent in C syntax for the first is:

         ++*(P++);

    This compiles fine when P has type int* for example. It means this:

       - Increment the pointe P
       - Increment the location that P now points to (using the * deref op) >>>
    So no reason to prohibit anything; it is perfectly well-defined.

    There is good reason to prohibit it - you got it wrong, so despite
    being well-defined by the language, it is not clear code.

    The actual meaning of "++*(P++);" is :

         1. Remember the original value of P - call it P_orig
         2. Increment P (that is, add sizeof(*P) to it).
         3. Increment the int at the location pointed to by P_orig.
         4. The value of the expression is the new updated value pointed
    to by P_orig.

    That's a good example of how legitimate code can me made to look like gibberish by the evil programmer (tm). As I've mentioned elsewhere it's
    hard to invent rules to prohibit particular constructs simply because
    'we don't like the look of them' and it would make the language harder
    to implement and understand if the language design included rules on 'aesthetics'.

    You can't stop everything - those evil programmers have better
    imaginations than any well-meaning language designer. But you can try.
    Aim to make it harder to write convoluted code, and easier to write
    clearer code. And try to make the clearer code more efficient, to
    reduce the temptation to write evil code.


    It seems to have parallels with the free-speech debate. Free speech is
    easy when we get to choose what should be free and what should be banned
    - but then that's not free speech. In reality, free speech is hard
    because others may be free to say things we don't like (although IMO
    those who don't want to hear them should not have to listen to them ...
    but that's another topic and getting off the point of the simile).

    There are /always/ limits on free speech. People don't always
    understand that, but they exist. Failure to enforce appropriate limits
    is as bad for society as failure to allow appropriate freedom of speech.
    (But I agree that there is no easy answer as to where to draw the
    lines, or who should be making or enforcing these limits.)

    In a
    similar way, programming can be hard when other programmers write
    constructs we don't like. I agree that it's best for a language to help programmers write readable and comprehensible programs - and even to
    make them the easiest to write, if possible - but the very flexibility
    which may allow them to do so may also give then the freedom to write
    code we don't care for. I don't think one can legislate against that.


    I'm not sure it is the same - after all, if some one exercises their
    rights to speak gibberish, or to give long, convoluted and
    incomprehensible speaches, the listener has the right to go away, ignore
    them, or fall asleep. It's harder for a compiler to do that!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Tue Nov 8 17:20:54 2022
    On 08/11/2022 13:43, Bart wrote:
    On 08/11/2022 08:33, David Brown wrote:
    On 08/11/2022 01:15, Bart wrote:

    (Unless you go out of /your/ way to ensure it doesn't pass. But you'd
    be better off avoiding such code. There are a million ways of writing
    nonsense code that cannot be prohibited by a compiler.)


    Yes, because "gcc -Wall" is /so/ hard to write.

    And it's SO hard for a compiler to just use that as a default!

    Yes, it /is/ hard to have it as the default. As I said, gcc is an
    essential tool for vast amounts of software. You can't change the
    behaviour of such a critical tool without massive consequences.

    I would be very happy to see the warnings of -Wall, and more, as hard
    errors by default in gcc. But it's not feasible - it will break endless
    amount of code and build scripts, some of which has not been looked at
    in decades. I have been supportive of moves in gcc towards better
    defaults, but it is inevitably a very slow and limited process.


    You are used to your own little world of your own tools, your own
    language, your own code. Perhaps you don't realise what it means to
    work with other people - certainly you don't understand what it means
    for a language and a toolchain to be vital for /millions/ of programs.

    When you start a /new/ tool, you get to make different decisions. You
    get to learn from the past. You can enable a range of warnings by
    default (and if it is a new language, rather than warnings you aim to
    prohibit the poor code in the language rules). You can enable
    optimisation by default in your new tool, so that halfwits will no
    longer compile without optimisation and complain about the quality of
    the code generation. You can make things like the choice of language
    standard a mandatory option, so that people can't get it wrong. (Even
    better, it should be mandatory in each code file.)

    Of course, anyone who has a clue how to use a computer, cares about the
    quality of the their coding, understands the need to learn about their
    tools, can set up whatever scripts, makefiles, CFLAGS, batch files, or
    other conveniences to get their particular preferences for compiler
    flags simply and easily. That does mean you have to spend a few minutes effort, and it means you no longer have an excuse for endless rants.
    But I think most decent developers can manage it. Those that can't, or
    won't, are likely to write shite code no matter what tools and languages
    they are given because they simply don't care about what they are doing.

    So it
    stays safe for EVERYONE no matter how they invoke the compiler.

    Take a function like this which I consider much more dangerous than
    anything we've been discussing:

       void fred() {}

    My bcc compiler gives a hard error: "() params are not allowed". But
    this works:

    Listen carefully. No one gives a **** about your compiler. It's /your/
    toy, just like your own language. If you like it, great. If you can
    make a living from it, even better. But it does not compare to real
    toolchains in any sense. You live in a different world here. You can
    do /exactly/ what you like with your own tools, because they are for you
    alone. Other toolchains and languages are not, and do not have the
    luxury of the choices you can make.

    The world runs on bad defaults - they are everywhere, in all aspects of
    life and of society. We have them because they have been that way for
    so long that it is impossible, or nearly impossible, to change them.
    Often they were good choices when they were made, long ago. You can
    sometimes make gradual changes, and you can make better choices for new
    things, but you can't make big and sudden changes to things that lots of
    people rely upon.


       gcc c.c -c

    OK, I'll have to write -Wall as you say:

       gcc -Wall c.c -c

    But, it still passes!

    It is valid code - why would it not pass?


    (So much existing code wrongly uses () to mean no parameters - thanks no doubt to gcc's lax approach over decades - that I have to give bcc a
    special option to enable it when it comes up.)

    No, existing C code uses () to mean unspecified number of parameters -
    anything from zero upwards.

    You claim to have made a C compiler - did you never actually look at the language standards or learn the language?

    Of course, since some people (myself included) see functions declared
    with empty parentheses as poor style (it is, after all, obsolescent
    since C99), gcc has an option to warn about it - "-Wstrict-prototypes".
    But it is legal code in a form used by a lot of old code, and not
    something you are likely to type accidentally, so it is not part of the
    group "-Wall" of warnings that are mostly uncontroversial.

    (In the next C standard, C23, "void foo()" will mean "foo" takes no
    parameters, just like in C++.)


      I mean, it takes hours extra work, far out of your way.  Write
    yourself a batch file with gcc flags - you could have done it 20 years
    ago and saved yourself and everyone else enormous effort.

    Why do you expect people to have to themselves implement a chunk of the compiler they're using?


    What? You complain that gcc's source is millions of lines long. How
    does a few command-line options count as "a chunk of the compiler" ?

    And have to do so for every compiler - at one time I was using 7 or 8.
    ALL of them should be doing their jobs properly without being told.


    They are. You just don't understand what their jobs are.



    A major point of a good programming language - aided by good tools -
    is to reduce the amount of bad code that is accepted.

    Yeah. In my language, A[i] only works when A is an array; P^ (pointer
    deref) only works when P is a pointer.

    Sounds obvious when put like that, but in C anything goes; Allowing A^
    and P[i] enables a huge amout of dangerous nonsense.


    No, despite your continued exaggerations, C is not "anything goes". But
    it /does/ allow some constructs that other languages don't (and vice versa).

    You may not have noticed, in your eagerness to condemn everything C
    related, including anyone who actually understands and uses the
    language, that I have repeatedly recommend that the OP /not/ copy C in
    his new language. The design decision for C's subscript to be syntactic
    sugar for pointer dereferencing (you can't apply it to an array in C,
    despite appearances that confuse you) made sense when C was created, and
    for the expected uses of the language. That decision is no longer a
    good choice for a modern language.


    I mean that you will not get any C compilers to get it to work: all
    report hard errors, and will not generate any code.


    So it is prohibited by the language.

    So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
    sounds like, it depends!


    Yes, it is prohibited by the language - as I said, and you bizarrely
    claimed otherwise (saying "no actual need to prohibit in the language -
    it just won't work"). It has nothing to do with types, it is in
    language constraint clauses.


    It may be confusing to look at, but look at ANY C source and you will
    see complex expressions that are much harder to grok, like:

    OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
              }

    (Is it even an expression? I /think/ it's function definition.)

    Are you arguing that because some people write C code that is even
    harder to understand, the OP should allow these nonsense expressions
    in his language?  That "logic" is like saying that because there are
    bank robbers, people should be allowed to drunk-drive.

    YOU are arguing that you shouldn't be allowed to compose certain
    operators because the result might be confusing. But why single out
    these particular ones?

    I didn't. You are making things up.


    I recommend you take a step outside to your garden. Jump up and down
    and scream "I hate C" at the top of your voice, until you are hoarse and
    red in the face. Get it out your system. Then come back here, stop
    posting ludicrous anti-C drivel, and maybe you can go back to
    contributing usefully to the discussion. You have more experience in
    home-made languages than most people - try to give useful advice and
    leave anything about C to people who can talk about it rationally.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to All on Tue Nov 8 16:16:34 2022
    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Tue Nov 8 17:46:58 2022
    On 08/11/2022 16:20, David Brown wrote:
    On 08/11/2022 13:43, Bart wrote:
    On 08/11/2022 08:33, David Brown wrote:
    On 08/11/2022 01:15, Bart wrote:

    (Unless you go out of /your/ way to ensure it doesn't pass. But
    you'd be better off avoiding such code. There are a million ways of
    writing nonsense code that cannot be prohibited by a compiler.)


    Yes, because "gcc -Wall" is /so/ hard to write.

    And it's SO hard for a compiler to just use that as a default!

    Yes, it /is/ hard to have it as the default.

    No it isn't. And the consequences of allowing terrible, error prone
    legacy code are considerable.

    You need ONE new option in a compiler, example:

    gcc --classic

    (Or, more apt, --unsafe.)

    That means that the next generation of lazy and/or newbie C programmers
    who don't bother with options get used to a stricter and safer version
    of the language.

    But it's not feasible - it will break endless
    amount of code and build scripts,

    They are likely to break anyway - don't you say that you archive
    complete compilers to ensure your projects can always be built?

    You are used to your own little world of your own tools, your own
    language, your own code.  Perhaps you don't realise what it means to
    work with other people - certainly you don't understand what it means
    for a language and a toolchain to be vital for /millions/ of programs.

    Here we are designing new languages and new tools. They don't
    necessarily need to be for the mass market.


        gcc c.c -c

    OK, I'll have to write -Wall as you say:

        gcc -Wall c.c -c

    But, it still passes!

    It is valid code - why would it not pass?

    Because it's fucking stupid code:

    #include <stdio.h>

    int fred() {return 0;}

    int main(void) {
    fred(1,2,3,4,5,6,7,8,9,10);
    fred("Hello, World!");
    fred(fred,fred,fred(fred(fred)));
    }

    On what planet could all those calls to fred() be correct? All of them,
    except at most one, will be wrong. And dangerous.

    Yet 'gcc -Wall -Wextra -Wpedantic etc etc` passes it quite happily.

    That is a fucking stupid compiler.



    (So much existing code wrongly uses () to mean no parameters - thanks
    no doubt to gcc's lax approach over decades - that I have to give bcc
    a special option to enable it when it comes up.)

    No, existing C code uses () to mean unspecified number of parameters - anything from zero upwards.

    No, all the C code I've seen routinely uses () to mean zero parameters
    only. The problem with that is that any number of parameters of any
    types can be passed, clearly incorrectly, and it cannot be detected.

    Code that uses () correctly (normally associated with function pointers)
    needs to ensure that the call and the callee match in argument counts
    and types. That's why it is dangerous. But this use is unusual.

    (In my language, that is achieved with explicit function pointer casts.)


    You claim to have made a C compiler - did you never actually look at the language standards or learn the language?


    I made a compiler for a subset of C - minus some features. () parameters
    need to be enabled by a legacy switch like the one I mentioned, in my
    case called '-old'.

    (In the next C standard, C23, "void foo()" will mean "foo" takes no parameters, just like in C++.)

    Will my nonsense program still pass?


      I mean, it takes hours extra work, far out of your way.  Write
    yourself a batch file with gcc flags - you could have done it 20
    years ago and saved yourself and everyone else enormous effort.

    Why do you expect people to have to themselves implement a chunk of
    the compiler they're using?


    What?  You complain that gcc's source is millions of lines long.  How
    does a few command-line options count as "a chunk of the compiler" ?

    By using 1000s of options to control every aspect of the process. The
    options form a mini-DSL to build a custom dialect of a language.

    No, despite your continued exaggerations, C is not "anything goes".  But
    it /does/ allow some constructs that other languages don't (and vice
    versa).

    You may not have noticed, in your eagerness to condemn everything C
    related, including anyone who actually understands and uses the
    language, that I have repeatedly recommend that the OP /not/ copy C in
    his new language.  The design decision for C's subscript to be syntactic sugar for pointer dereferencing (you can't apply it to an array in C,
    despite appearances that confuse you)

    You mean appearances like this:

    int A[10];
    int x;

    x = *A;

    gcc is happy with this.

    So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
    sounds like, it depends!


    Yes, it is prohibited by the language

    Yet no compiler stopped me creating an executable. So why is ++E++ a
    hard error and not ++E + E++?

    - as I said, and you bizarrely
    claimed otherwise (saying "no actual need to prohibit in the language -
    it just won't work").

    Yes, about ++E++. Not ++E + E++; I merely observed that gcc didn't take
    the latter seriously.

      It has nothing to do with types, it is in
    language constraint clauses.


    It may be confusing to look at, but look at ANY C source and you
    will see complex expressions that are much harder to grok, like:

    OP(op,3f) { F =
    ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;           } >>>>
    (Is it even an expression? I /think/ it's function definition.)

    Are you arguing that because some people write C code that is even
    harder to understand, the OP should allow these nonsense expressions
    in his language?  That "logic" is like saying that because there are
    bank robbers, people should be allowed to drunk-drive.

    YOU are arguing that you shouldn't be allowed to compose certain
    operators because the result might be confusing. But why single out
    these particular ones?

    I didn't.  You are making things up.

    You said:

    Make it a syntax error.

    about ++E++^ and ++E^++. Before going on to compare that syntax with
    Brainfuck.


    I recommend you take a step outside to your garden.  Jump up and down
    and scream "I hate C" at the top of your voice, until you are hoarse and
    red in the face.  Get it out your system.

    I suggest you do the same with "I hate Bart".

    I already know that the stuff I do is miles better than C, while still
    being simple, low-level, small footprint and easy to build fast. Thanks
    for reminding me what a quagmire it is.

    Then come back here, stop
    posting ludicrous anti-C drivel, and maybe you can go back to
    contributing usefully to the discussion.  You have more experience in home-made languages than most people - try to give useful advice and
    leave anything about C to people who can talk about it rationally.

    This is not the C group. C comes up tangentially from time to time. But
    I believe it was mostly you who wholesale dragged C and its compilers
    into the discussion.

    Please do not reply to this. I'm not interested in taking it any further.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Tue Nov 8 19:18:14 2022
    On 2022-11-08 18:46, Bart wrote:

    No it isn't. And the consequences of allowing terrible, error prone
    legacy code are considerable.

    Wrong. Backward compatibility trumps everything, absolutely everything.
    Legacy C, FORTRAN, COBOL code is far more stable than whatever new garbage.

    Unless your new language supports strong typing, contracts and formal verification, I'd better take old C code, than newly introduced fancy bugs.

    From a new language I expect new technological level. So long you guys
    are keeping on reinventing C, I'd better stay with C.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Tue Nov 8 19:15:50 2022
    On 08/11/2022 18:18, Dmitry A. Kazakov wrote:
    On 2022-11-08 18:46, Bart wrote:

    No it isn't. And the consequences of allowing terrible, error prone
    legacy code are considerable.

    Wrong. Backward compatibility trumps everything, absolutely everything. Legacy C, FORTRAN, COBOL code is far more stable than whatever new garbage.

    Unless your new language supports strong typing, contracts and formal verification, I'd better take old C code, than newly introduced fancy bugs.

    From a new language I expect new technological level. So long you guys
    are keeping on reinventing C, I'd better stay with C.

    They are plenty of newer, more ground-breaking languages around that
    might suit you: Rust, Zig, Odin, Dart, Julia... Or functional ones like Haskell, OCaml, F#.


    The ones I create are for personal use and occupy a particular niche in
    the range of possible languages. They are roughly represented by M and Q
    along this line:

    C--M-----Q-----------Python

    My codebase is small so backward compatibility (with my own languages)
    is not a great priority. I'm doing some innovative stuff with how the
    compilers work and how projects are managed and run, but the feature set
    of the languages themselves is not advanced, and actually less important.

    You yourself said to keep the set of operations minimal. I'm doing that
    with the features I have no interest in, such as over-strict type systems.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Tue Nov 8 21:38:59 2022
    On 08/11/2022 20:15, Bart wrote:
    On 08/11/2022 18:18, Dmitry A. Kazakov wrote:
    On 2022-11-08 18:46, Bart wrote:

    No it isn't. And the consequences of allowing terrible, error prone
    legacy code are considerable.

    Wrong. Backward compatibility trumps everything, absolutely
    everything. Legacy C, FORTRAN, COBOL code is far more stable than
    whatever new garbage.

    Unless your new language supports strong typing, contracts and formal
    verification, I'd better take old C code, than newly introduced fancy
    bugs.

     From a new language I expect new technological level. So long you
    guys are keeping on reinventing C, I'd better stay with C.

    They are plenty of newer, more ground-breaking languages around that
    might suit you: Rust, Zig, Odin, Dart, Julia... Or functional ones like Haskell, OCaml, F#.


    Most of these are something very different from C. Haskell and OCaml
    are, as you say, functional programming languages and a totally
    different concept. (They are a lot of fun and highly educational.) I
    don't know if anyone actually uses F# - it always struck me as another Microsoft me-too language, like C# but without the market popularity.

    Dart is dead, AFAIK, showing the risks of picking a new language. Zig
    is arguably a low-level C alternative, except it too is barely used.
    Odin is - well, you know it is not a mainstream choice when it doesn't
    even have a Wikipedia page. And Julia has its fans for particular uses,
    which do not overlap much with the kind of task you'd do in C.

    Which leaves Rust as the only new serious alternative to C these days
    (along with C++). Rust has its pros and cons, but is definitely worth considering. Certainly any budding low-level language designer should
    learn it and play with it, to see which features they want to copy and
    which to drop. One of the nice things about Rust is that the add-on
    tools for resource tracking are inspiring better analysis tools for C
    and C++ - such rivalry is often useful.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Tue Nov 8 21:28:24 2022
    On 08/11/2022 18:46, Bart wrote:
    On 08/11/2022 16:20, David Brown wrote:
    On 08/11/2022 13:43, Bart wrote:
    On 08/11/2022 08:33, David Brown wrote:
    On 08/11/2022 01:15, Bart wrote:

    (Unless you go out of /your/ way to ensure it doesn't pass. But
    you'd be better off avoiding such code. There are a million ways of
    writing nonsense code that cannot be prohibited by a compiler.)


    Yes, because "gcc -Wall" is /so/ hard to write.

    And it's SO hard for a compiler to just use that as a default!

    Yes, it /is/ hard to have it as the default.

    No it isn't. And the consequences of allowing terrible, error prone
    legacy code are considerable.

    You need ONE new option in a compiler, example:

       gcc --classic

    (Or, more apt, --unsafe.)

    That would be one way to break all existing build systems.

    It really is quite simple.

    (Note that I /agree/ with you entirely that it would be /better/ if gcc
    was far stricter by default. I am trying to explain to you why it cannot.)

    Also remember that gcc plays two significantly different roles. In one
    case, it is a development tool - you use it with appropriate warnings
    and flags (and other tools, such as debuggers and profilers) to help
    write correct code. Here warnings are vital feedback to the developer.

    The other role is as a systems compiler. This is not a concept familiar
    to Windows users, but in the *nix world (and most other OS's) software
    is often distributed as source. The source is, at least in theory,
    known to be correct - so the compiler should be as quiet as practically possible. The assumption is that any warnings that might be given are
    false positives. That has to be the default setting for compiler options.




        gcc c.c -c

    OK, I'll have to write -Wall as you say:

        gcc -Wall c.c -c

    But, it still passes!

    It is valid code - why would it not pass?

    Because it's fucking stupid code:

    I agree. But it is valid C code, so any C compiler has to accept it.


        #include <stdio.h>

        int fred() {return 0;}

        int main(void) {
            fred(1,2,3,4,5,6,7,8,9,10);
            fred("Hello, World!");
            fred(fred,fred,fred(fred(fred)));
        }

    On what planet could all those calls to fred() be correct? All of them, except at most one, will be wrong. And dangerous.

    Now you have code that /is/ undefined behaviour. And a good enough
    compiler /could/ spot that and reject it. Unfortunately, it is often
    very hard to do in real-life cases, and there is little point in making
    the effort to deal with pathological artificial cases.

    gcc has warnings that will complain about this code. It would have been
    better if these were enabled by default, or at least in -Wall.


    Yet 'gcc -Wall -Wextra -Wpedantic etc etc` passes it quite happily.

    That is a fucking stupid compiler.


    It is a compiler that values backwards compatibility above usability,
    for a language that puts backwards compatibility above new features or improvements to the language. Both the language and the toolchain
    expect users to take responsibility of their code and their use of their
    tools.

    When I program in C, using gcc, the above code would be marked as an
    error. If I can do it, so can you. It would be nice if it were easier
    or more automatic, but that's how it goes.



    (So much existing code wrongly uses () to mean no parameters - thanks
    no doubt to gcc's lax approach over decades - that I have to give bcc
    a special option to enable it when it comes up.)

    No, existing C code uses () to mean unspecified number of parameters -
    anything from zero upwards.

    No, all the C code I've seen routinely uses () to mean zero parameters
    only.

    Existing code uses it to mean an unspecified number of parameters,
    anything from zero upwards.

    It's very likely that meaning /zero/ parameters is the most common usage (especially since that's what it means in C++, and what it will mean in
    C23). But it is not the only usage. In particular, it can be a K&R
    style declaration of an external function, regardless of the number of parameters. There is still K&R C code in use today - and there are
    still some people who write their C in K&R style (perhaps because some
    people think "The C Programming Language" is a good way to learn the
    language).

    The problem with that is that any number of parameters of any
    types can be passed, clearly incorrectly, and it cannot be detected.


    Yes. That's why function prototypes were added to the language in C90.
    But non-prototype declarations could not be removed.

    Code that uses () correctly (normally associated with function pointers) needs to ensure that the call and the callee match in argument counts
    and types. That's why it is dangerous. But this use is unusual.

    What do you mean? There is no non-obsolescent use of "()" in
    declarations of functions, types, pointers, or anything else. It exists
    only as a backwards compatibility feature for K&R style non-prototype
    function declarations. And of course you can /call/ a function with no parameters as "foo()", either by function name or via a pointer-to-function-with-no-parameters. When you use prototype function declarations, the compiler can check that the call and callee match in
    argument counts and types (assuming you use the same declarations in
    each source file).


    (In my language, that is achieved with explicit function pointer casts.)


    If you have function pointer casts, your risk of error is high and you
    need to checking things manually. The same applies in C - there are no implicit casts of function pointer types.


    You claim to have made a C compiler - did you never actually look at
    the language standards or learn the language?


    I made a compiler for a subset of C - minus some features. () parameters
    need to be enabled by a legacy switch like the one I mentioned, in my
    case called '-old'.


    If it is just a subset of C, with no attempt at conformity, then it is misleading to refer to it as a C compiler. (It can still be a useful
    tool for your own use.)

    (In the next C standard, C23, "void foo()" will mean "foo" takes no
    parameters, just like in C++.)

    Will my nonsense program still pass?


    No. It would be as though you had written "int fred(void) {return 0;}"
    as the first line.


      I mean, it takes hours extra work, far out of your way.  Write
    yourself a batch file with gcc flags - you could have done it 20
    years ago and saved yourself and everyone else enormous effort.

    Why do you expect people to have to themselves implement a chunk of
    the compiler they're using?


    What?  You complain that gcc's source is millions of lines long.  How
    does a few command-line options count as "a chunk of the compiler" ?

    By using 1000s of options to control every aspect of the process. The
    options form a mini-DSL to build a custom dialect of a language.


    Most people don't need more than a few options, you cannot control
    "every aspect", it is not a DSL and you are not making a "custom
    dialect". That is all wild exaggeration. For most programmers, and
    most uses, "-Wall -std=c11 -O2" will cover the basics. My work is more specialised, and I regularly need a fair number of extra flags (such as specifying details of the target processor, linking setup, etc.). I
    also like to have paranoid levels of warnings and some weird
    optimisation flags - but I am quite unusual in that respect.

    No, despite your continued exaggerations, C is not "anything goes".
    But it /does/ allow some constructs that other languages don't (and
    vice versa).

    You may not have noticed, in your eagerness to condemn everything C
    related, including anyone who actually understands and uses the
    language, that I have repeatedly recommend that the OP /not/ copy C in
    his new language.  The design decision for C's subscript to be
    syntactic sugar for pointer dereferencing (you can't apply it to an
    array in C, despite appearances that confuse you)

    You mean appearances like this:

        int A[10];
        int x;

        x = *A;

    gcc is happy with this.

    You haven't used the subscript operator here. My point is, even though
    "A" is an array, writing "x = A[2]" does not apply the subscript
    operator to an array. Many people find that surprising or confusing -
    but that is simply how arrays work in C.


    So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
    sounds like, it depends!


    Yes, it is prohibited by the language

    Yet no compiler stopped me creating an executable. So why is ++E++ a
    hard error and not ++E + E++?


    My apologies - I have mixed things up. "++E++" is prohibited by the
    language. "++E + E++" is undefined behaviour - the language gives no requirements as to what it might mean. The compiler can generate code
    doing anything it likes.

    A complicating aspect is that most undefined behaviour is /runtime/
    undefined behaviour. That is, it is only a mistake if the code in
    question is actually run. This means that if the code with undefined
    behaviour is never encountered at runtime, it is not an error in the
    code - so the compiler is obliged to generate code for the rest of the
    program, and cannot (if it is conforming) stop with an error unless it
    can prove that the code will definitely be run. (In theory, it could do
    that if it can see code flow from main(), but in practice compilers
    don't do that - it would be a huge effort for little benefit.)


    - as I said, and you bizarrely claimed otherwise (saying "no actual
    need to prohibit in the language - it just won't work").

    Yes, about ++E++. Not ++E + E++; I merely observed that gcc didn't take
    the latter seriously.


    Again, I mixed them up. gcc /does/ warn about "++E + E++", if you
    enable warnings.


      It has nothing to do with types, it is in language constraint clauses.


    It may be confusing to look at, but look at ANY C source and you
    will see complex expressions that are much harder to grok, like:

    OP(op,3f) { F =
    ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;           }

    (Is it even an expression? I /think/ it's function definition.)

    Are you arguing that because some people write C code that is even
    harder to understand, the OP should allow these nonsense expressions
    in his language?  That "logic" is like saying that because there are
    bank robbers, people should be allowed to drunk-drive.

    YOU are arguing that you shouldn't be allowed to compose certain
    operators because the result might be confusing. But why single out
    these particular ones?

    I didn't.  You are making things up.

    You said:

    Make it a syntax error.

    about ++E++^ and ++E^++. Before going on to compare that syntax with Brainfuck.


    Yes, because that is how it reads.

    But at no point have I said anything to suggest I was talking about only
    these operators. The OP used postfix and prefix increment as an example
    - I didn't pick them. If the example had been "x ^= ~1<<-x*.1%!*y" I'd
    have treated it the same way.


    I recommend you take a step outside to your garden.  Jump up and down
    and scream "I hate C" at the top of your voice, until you are hoarse
    and red in the face.  Get it out your system.

    I suggest you do the same with "I hate Bart".

    I don't - this is not remotely personal, and I often enjoy our
    discussions. But I /do/ get frustrated about having to say the same
    things again and again.


    I already know that the stuff I do is miles better than C, while still
    being simple, low-level, small footprint and easy to build fast. Thanks
    for reminding me what a quagmire it is.

    The quagmire is of your own making.

    Look, no one is suggesting C is perfect, or that gcc is perfect. All
    along in this group, I have been arguing for people making new languages
    to avoid the mistakes of C.

    But it does not help anyone if you exaggerate the problems,
    misunderstand the language, and steadfastly insist on taking the worst
    possible view on it. I don't know your language - but I haven't the
    slightest doubt that I could write as ugly, confusing and bad code in it
    as you can in C. (As the saying goes, you can write FORTRAN in any
    language.) What you write is /pointless/. It helps no one - not you,
    not the OP.


    Then come back here, stop posting ludicrous anti-C drivel, and maybe
    you can go back to contributing usefully to the discussion.  You have
    more experience in home-made languages than most people - try to give
    useful advice and leave anything about C to people who can talk about
    it rationally.

    This is not the C group. C comes up tangentially from time to time. But
    I believe it was mostly you who wholesale dragged C and its compilers
    into the discussion.

    No, it was not. You were the first to mention C in this thread branch,
    and the first to mention gcc.

    But we are both to blame for digging further and re-running old
    discussions. I apologise for my part in it, especially for some of my
    less diplomatic wording. Blame it on frustration (which of course still
    means blame it on me).


    Please do not reply to this. I'm not interested in taking it any further.


    I generally write replies inline as I read posts, so I am seeing this
    request now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Tue Nov 8 23:06:11 2022
    On 08/11/2022 20:28, David Brown wrote:
    On 08/11/2022 18:46, Bart wrote:

    I made a compiler for a subset of C - minus some features. ()
    parameters need to be enabled by a legacy switch like the one I
    mentioned, in my case called '-old'.


    If it is just a subset of C, with no attempt at conformity, then it is misleading to refer to it as a C compiler.  (It can still be a useful
    tool for your own use.)

    There must be 1000s of amateur C compiler projects, probably more than
    for any other language.

    Mine was able to build programs like Lua, Tiny C, Seed7 and SQLite, and
    run those programs to varying degrees. So more capable than most, but I
    usually refer to it in docs as a C-subset compiler.

    Whatever C coding I do these days will be in that subset, and mine will
    be my first choice of C compiler. Any machine-generated C will be in
    that same subset.


    (In the next C standard, C23, "void foo()" will mean "foo" takes no
    parameters, just like in C++.)

    Will my nonsense program still pass?


    No.  It would be as though you had written "int fred(void) {return 0;}"
    as the first line.

    So what happened to backwards compatibility? All those programs which
    call () functions with more arguments will no longer work.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Wed Nov 9 09:12:05 2022
    On 09/11/2022 00:06, Bart wrote:
    On 08/11/2022 20:28, David Brown wrote:
    On 08/11/2022 18:46, Bart wrote:

    I made a compiler for a subset of C - minus some features. ()
    parameters need to be enabled by a legacy switch like the one I
    mentioned, in my case called '-old'.


    If it is just a subset of C, with no attempt at conformity, then it is
    misleading to refer to it as a C compiler.  (It can still be a useful
    tool for your own use.)

    There must be 1000s of amateur C compiler projects, probably more than
    for any other language.

    Mine was able to build programs like Lua, Tiny C, Seed7 and SQLite, and
    run those programs to varying degrees. So more capable than most, but I usually refer to it in docs as a C-subset compiler.

    OK.


    Whatever C coding I do these days will be in that subset, and mine will
    be my first choice of C compiler. Any machine-generated C will be in
    that same subset.


    Almost everyone in practice uses a subset of the language in choice. If
    you are releasing a compiler for general use, you have to support
    everything (even monstrosities like "longjmp"), but for your own use (or
    for specialised use, such as tied to a transcompiler for a different
    language), you only need the features you will use yourself.


    (In the next C standard, C23, "void foo()" will mean "foo" takes no
    parameters, just like in C++.)

    Will my nonsense program still pass?


    No.  It would be as though you had written "int fred(void) {return
    0;}" as the first line.

    So what happened to backwards compatibility? All those programs which
    call () functions with more arguments will no longer work.


    I still don't understand what you mean.

    int (*f)(void);
    int (*g)(int, int);

    void foo(void) {
    f(); // Good
    g(1, 2); // Good
    f(1, 2); // Error
    g(); // Error
    }

    Function pointers of different types - that is, different return types, different numbers or types of parameters - are incompatible in C. There
    are no implicit convertions between them (unlike objects pointers and
    void*), there is no common base type, and any use of explicit casts will
    lead to undefined behaviour if you don't cast back to the right type
    before calling the function. Even the cast back and forth is
    implementation dependent - an implementation could use different sizes
    for different function pointer types.

    In practice, of course, most implementations have the same size of
    pointer for all function types. On some real-world targets it will be different from the size of object pointers (imagine a 16-bit system with "large" code model and "small" data model, or vice versa). But C still
    does not allow you to mess with the function pointer types without
    explicit "I know what I am doing" casts.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Wed Nov 9 11:24:14 2022
    On 09/11/2022 08:12, David Brown wrote:
    On 09/11/2022 00:06, Bart wrote:

    So what happened to backwards compatibility? All those programs which
    call () functions with more arguments will no longer work.


    I still don't understand what you mean.

    int (*f)(void);
    int (*g)(int, int);

    void foo(void) {
        f();        // Good
        g(1, 2);        // Good
        f(1, 2);        // Error
        g();        // Error
    }

    Function pointers of different types - that is, different return types, different numbers or types of parameters - are incompatible in C.  There
    are no implicit convertions between them (unlike objects pointers and
    void*), there is no common base type, and any use of explicit casts will
    lead to undefined behaviour if you don't cast back to the right type
    before calling the function.  Even the cast back and forth is
    implementation dependent - an implementation could use different sizes
    for different function pointer types.

    In practice, of course, most implementations have the same size of
    pointer for all function types.  On some real-world targets it will be different from the size of object pointers (imagine a 16-bit system with "large" code model and "small" data model, or vice versa).  But C still
    does not allow you to mess with the function pointer types without
    explicit "I know what I am doing" casts.

    The following code is a legitimate use of () parameter lists:

    #include <stdio.h>

    int f1(int a) {return a;}
    int f2(int a, int b) {return a+b;}
    int f3(int a, int b, int c) {return a+b+c;}

    int (*fntable[])() = {NULL, f3, f1, f2};
    int args[] = {0, 3, 1, 2};

    int main(void) {
    int n =3, x;

    switch (args[n]) {
    case 1: x=fntable[n](10); break;
    case 2: x=fntable[n](20,30); break;
    case 3: x=fntable[n](40,50,60); break;
    }

    printf("x=%d\n",x);
    }

    'fntable' is populated with functions of mixed signatures. When calling
    one of those functions, the user-code must ensure the function pointer
    is called with the right arguments for that specific function.

    That is done here with the switch statement. If the () in this line:

    int (*fntable[])() = {NULL, f3, f1, f2};

    is assumed to be (void) in C23, then initialising with f1, f2, f3 will
    be illegal, and all those calls will be too. This is why I said, what
    happened to backwards compatibility.

    In my language there is no equivalent to C's unchecked () parameter
    list. There I would likely use the equivalent of void* pointers and
    apply a cast at the point of call. The same could be done in C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Wed Nov 9 05:53:47 2022
    On Tuesday, 8 November 2022 at 16:42:36 UTC, Bart wrote:
    On 07/11/2022 18:16, James Harris wrote:
    On 07/11/2022 14:23, Bart wrote:
    On 07/11/2022 11:55, James Harris wrote:

    ..

    ++E++

    This may not work, or not work as espected. The binding using my above
    scheme means this is equivalent to ++(E++). But the second ++ requires
    an lvalue as input, and yields an rvalue, which would be an invalid
    input to the first ++.

    Yes. If

    ++E++

    is going to be permitted
    Then you need to define what it means.

    Indeed. Given the relative precedences mentioned in another reply I'd have the operations in this order:

    prefix ++
    postfix ++

    Replies below based on that order (although I gather you have the opposite order).


    Here, suppose that in each case E
    starts off as 100:

    E++ # What value does E have afterwards?

    101


    ++E # What value does E have afterwards?

    Also 101


    X := E++ # What is the value of X?

    100


    ++E++ # What is the value of E after?

    102

    (++E)++


    X := ++E++ # What is the value of X? What is the type and value
    # of the E++ portion?

    X: 101
    (E would still end up holding 102)
    As a subexpression it would have the type and the intermediate value of E (101), not of E++. The trailing ++ would not contribute to the value of the subexpression; it would serve only to alter the stored value after the intermediate value of 101 had
    flown the nest..

    X := ((++E)++)


    I can't make ++E++ work in any of my languages because of type/lvalue discrepancies.

    Is that because you evaluate postfix ++ first?

    then for programmer sanity wouldn't it be true
    to say that both ++ operators need to refer to the same lvalue? If so then

    ++p

    should probably have higher precedence than

    p++

    or perhaps their precedences could be the same but they be applied in left-to-right order.
    It would already be a big deal, and a vast improvement over C, that "^"
    is a postfix op; don't push it!

    :)

    It may be worth looking at other operators which take in AND produce lvalues, most familiarly array indexing and field referencing, and hence they can be incremented. Isn't it true that for both ++ operators of

    ++points.x[1]
    points.x[1]++

    that a programmer would normally want points.x[1] incremented, i.e.
    field referencing and array indexing would take precedence over either
    ++ operator?
    I'm not sure what you're asking here or where producing lvalues comes in
    to it.
    Those examples work as expected in my language:

    record R =
    var x
    end

    points:=R((10,20,30))
    println points # (10, 20, 30)

    ++points.x[1]
    println points # (11, 20, 30)

    points.x[1]++
    println points # (12, 20, 30)

    However, my syntax works a specific way:

    * "." is not considered a normal binary operator (because it isn't).

    * "[]" is not considered that either (this is more typical)

    So `points.x[1]` forms a single expression term. Unary ops like `++`
    work on a term. If "." was a normal binary op, then your example would
    be parsed as:

    (++points).x[1]

    unless you make special rules just for ++.

    I have "." and "[....]" as normal operators but of high precedence. They are prioritised before the others we have been talking about. Either way the effect is the same.


    Note, usually ++A and A++ are interchangeable. There is only different behaviour if you try to use the resulting value (the first then returns
    new A, the second returns old A).
    But now what about dereference? Should it also take precedence over the
    ++ operators or should it come after one or both? For instance, what should the following mean?

    ++p^

    Should it be

    (++p)^

    or

    ++(p^)
    Isn't it just up to unary op evaluation? I already said how it's
    typically done, so that ++p^ means ++(p^). If it's unclear, then just
    use parentheses.

    Yes, it's up to unary op evaluation order. As a result of this discussion I have gone for **E before E** before E^ and would evaluate the above as

    (++p)^

    --
    James

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Wed Nov 9 16:18:55 2022
    On 09/11/2022 12:24, Bart wrote:
    On 09/11/2022 08:12, David Brown wrote:
    On 09/11/2022 00:06, Bart wrote:

    So what happened to backwards compatibility? All those programs which
    call () functions with more arguments will no longer work.


    I still don't understand what you mean.

    int (*f)(void);
    int (*g)(int, int);

    void foo(void) {
         f();        // Good
         g(1, 2);        // Good
         f(1, 2);        // Error
         g();        // Error
    }

    Function pointers of different types - that is, different return
    types, different numbers or types of parameters - are incompatible in
    C.  There are no implicit convertions between them (unlike objects
    pointers and void*), there is no common base type, and any use of
    explicit casts will lead to undefined behaviour if you don't cast back
    to the right type before calling the function.  Even the cast back and
    forth is implementation dependent - an implementation could use
    different sizes for different function pointer types.

    In practice, of course, most implementations have the same size of
    pointer for all function types.  On some real-world targets it will be
    different from the size of object pointers (imagine a 16-bit system
    with "large" code model and "small" data model, or vice versa).  But C
    still does not allow you to mess with the function pointer types
    without explicit "I know what I am doing" casts.

    The following code is a legitimate use of () parameter lists:

        #include <stdio.h>

        int f1(int a) {return a;}
        int f2(int a, int b) {return a+b;}
        int f3(int a, int b, int c) {return a+b+c;}

        int (*fntable[])() = {NULL, f3, f1, f2};

    That is using a K&R style non-prototyped function types.

        int args[] = {0, 3, 1, 2};

        int main(void) {
            int n =3, x;

            switch (args[n]) {
            case 1: x=fntable[n](10); break;
            case 2: x=fntable[n](20,30); break;
            case 3: x=fntable[n](40,50,60); break;
            }

            printf("x=%d\n",x);
        }

    'fntable' is populated with functions of mixed signatures. When calling
    one of those functions, the user-code must ensure the function pointer
    is called with the right arguments for that specific function.

    That is done here with the switch statement. If the () in this line:

        int (*fntable[])() = {NULL, f3, f1, f2};

    is assumed to be (void) in C23, then initialising with f1, f2, f3 will
    be illegal, and all those calls will be too. This is why I said, what happened to backwards compatibility.

    Yes, this could be a problem, and is an interesting example.

    I must admit I did not realise that pointers to "old-style" function
    type pointers are considered compatible to any function type pointer
    with the same return type - that was new to me. Old-style (K&R)
    function declarations have been obsolescent since C89/C90, and it is a
    source of annoyance to me that such obsolescent features have been
    allowed to remain in the language for /so/ long.

    I can appreciate that using "int (*)()" is convenient here because it
    avoids the need for casts. I would personally include the casts - using typedefs for clarity - because calling via old-style function types is
    more limited. If one of your functions has a parameter of type "float",
    an integer type smaller than "int", _Bool, or a character type, then the behaviour is undefined.


    In my language there is no equivalent to C's unchecked () parameter
    list. There I would likely use the equivalent of void* pointers and
    apply a cast at the point of call. The same could be done in C.

    Yes, though you would want to use "void (*)(void)" pointers in C, rather
    than "void *" pointers - function pointers and object pointers are very different beasts.

    I'm glad to see that you don't have an equivalent C's long-outdated
    unchecked function types. And I'm glad it is being dropped in C23.
    (clang has warned about it by default for a while, and gcc is
    considering doing so - but wary due to compatibility with existing
    builds. I expect they will run a trial "rebuild everything in Debian
    and see if anything breaks" before deciding.)

    I would recommend that in your own code generator, you use "void
    (*)(void)" pointers and put in the casts in the generated C code. If
    they are already present in the source Bart-language files, it should
    not be difficult.

    Then you could add a hard error on any use of K&R declarations to your C compiler, and score a point over gcc :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Thu Nov 10 10:42:53 2022
    On 09/11/2022 13:53, James Harris wrote:
    On Tuesday, 8 November 2022 at 16:42:36 UTC, Bart wrote:

    ++E++ # What is the value of E after?

    102

    (++E)++


    X := ++E++ # What is the value of X? What is the type and value
    # of the E++ portion?

    X: 101
    (E would still end up holding 102)

    As a subexpression it would have the type and the intermediate value of E (101), not of E++.

    This is where your approach differs from mine. It sounds like you would
    also allow this:

    ++ ++ ++ E

    so that E becomes 103? It doesn't then matter whether ++ is prefix or
    postfix. If that isn't the case (yours only works with mixed
    prefix/postfix), then I can only explain it like this:

    ++E is the same as: (E:=E+1; E) # that final E is an lvalue

    E++ is the same as: (T:=E; E:=E+1; T)

    the final T /is/ an lvalue, but not the right one! You can't use it to
    modify E.

    That wouldn't work for me anyway because T is a transient value
    (typically stored on the stack, register or unaccessible temporary -
    it's got to exist somewhere!) with no lvalue. It's similar to this:

    A+B is the same as: (T:=A+B; T)

    It's clear that ++(A+B) can't work unless you change what ++ means (eg.
    ++A now means (A+1) because whatever ++ modifies is not accessible).

    If that's how your approach works, then it would be unorthogonal:

    ++(E++) works
    (++E)++ doesn't

    even though you'd expect E to be 102 in both cases (and to deliver 101
    in both cases too). And:

    ++ ++ E works
    E ++ ++ doesn't

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 15:54:08 2022
    On 07/11/2022 16:16, Dmitry A. Kazakov wrote:
    On 2022-11-07 16:06, Bart wrote:
    On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
    On 2022-11-07 13:52, James Harris wrote:
    On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
    On 2022-11-07 12:55, James Harris wrote:

       ++E + E++
       ++E++
       V = V++

    ...

    ++ means cheap keyboard with broken keys or coffee spilled over it... (:-))

    A bit like Ada's --, then. ;-)


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Sun Nov 13 17:26:06 2022
    On 2022-11-13 16:54, James Harris wrote:
    On 07/11/2022 16:16, Dmitry A. Kazakov wrote:

    ++ means cheap keyboard with broken keys or coffee spilled over it...
    (:-))

    A bit like Ada's --, then. ;-)

    In Ada -- is a comment, not operator.

    An interesting question regarding operator's symbol is: sticking to
    ASCII or going Unicode. Let's say we wanted an increment operator (I do
    not). Why ++ from 60's? Take the increment (∆) or the upwards arrow ↑ etc.

    Note that all arguments against Unicode apply to operators. If the thing
    is difficult to type then it is difficult to remember: precedence level, associativity, semantics. If you can hold these in your head, you could remember the key combination as well. If you do not, then, maybe, having
    a subprogram Increment() would be better choice?

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Sun Nov 13 17:11:45 2022
    On 08/11/2022 16:29, David Brown wrote:
    On 08/11/2022 14:24, James Harris wrote:

    ...

    You can't stop everything - those evil programmers have better
    imaginations than any well-meaning language designer.  But you can try.
     Aim to make it harder to write convoluted code, and easier to write clearer code.  And try to make the clearer code more efficient, to
    reduce the temptation to write evil code.

    I agree

    ...

    In a similar way, programming can be hard when other programmers write
    constructs we don't like. I agree that it's best for a language to
    help programmers write readable and comprehensible programs - and even
    to make them the easiest to write, if possible - but the very
    flexibility which may allow them to do so may also give then the
    freedom to write code we don't care for. I don't think one can
    legislate against that.


    I'm not sure it is the same - after all, if some one exercises their
    rights to speak gibberish, or to give long, convoluted and
    incomprehensible speaches, the listener has the right to go away, ignore them, or fall asleep.  It's harder for a compiler to do that!

    I'm not worried about the compiler. As long as it can make sense of a
    piece of code then it can compile it. It's the human I am concerned
    about, especially the poor old maintenance programmer!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sun Nov 13 16:55:59 2022
    On 10/11/2022 10:42, Bart wrote:
    On 09/11/2022 13:53, James Harris wrote:
    On Tuesday, 8 November 2022 at 16:42:36 UTC, Bart wrote:

    ++E++ # What is the value of E after?

    102

    (++E)++


    X := ++E++ # What is the value of X? What is the type and value
    # of the E++ portion?

    X: 101
    (E would still end up holding 102)

    As a subexpression it would have the type and the intermediate value
    of E (101), not of E++.

    This is where your approach differs from mine. It sounds like you would
    also allow this:

        ++ ++ ++ E

    so that E becomes 103?

    That sounds right. Prefix operators have naturally to be applied right
    to left.

    It looks, by the way, almost as mad as a programmer writing

    E = E + 1 + 1 + 1

    It doesn't then matter whether ++ is prefix or
    postfix.

    I don't follow. In general (e.g. if autoincrement were to be used in an expression) then it /would/ matter whether prefix or postfix were used.


    If that isn't the case (yours only works with mixed
    prefix/postfix), then I can only explain it like this:

       ++E  is the same as: (E:=E+1; E)        # that final E is an lvalue

       E++ is the same as: (T:=E; E:=E+1; T)

    Yes.


    the final T /is/ an lvalue, but not the right one! You can't use it to
    modify E.

    AISI that T would be (or, if you prefer, would be converted to) an rvalue.


    That wouldn't work for me anyway because T is a transient value
    (typically stored on the stack, register or unaccessible temporary -
    it's got to exist somewhere!) with no lvalue. It's similar to this:

       A+B is the same as: (T:=A+B; T)

    It's clear that ++(A+B) can't work unless you change what ++ means (eg.
    ++A now means (A+1) because whatever ++ modifies is not accessible).

    It's similar if A were a struct. You could have

    A.F

    but you could not have

    (A + 4).F

    The LHS of the . operation, in this case, has to be an lvalue.


    If that's how your approach works, then it would be unorthogonal:

         ++(E++)  works
         (++E)++  doesn't

    I have it the other way round.

    ++E would consume and produce an lvalue.
    E++ would consume an lvalue and produce an rvalue.

    This appears to be the same as in C/C++ as mentioned at

    https://en.cppreference.com/w/cpp/language/operator_incdec


    even though you'd expect E to be 102 in both cases (and to deliver 101
    in both cases too).

    I wouldn't expect ++(E++) to work at all. E++ produces an rvalue which
    would be no good as input to ++E as ++E requires an lvalue.

    Prefix and postfix ++ are different operators, despite the visual
    similarity.


    And:

         ++ ++ E  works
         E ++ ++  doesn't

    That I agree with, which makes your earlier comments more puzzling. I am
    not sure what you were driving at.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 19:13:04 2022
    On 13/11/2022 16:26, Dmitry A. Kazakov wrote:
    On 2022-11-13 16:54, James Harris wrote:
    On 07/11/2022 16:16, Dmitry A. Kazakov wrote:

    ++ means cheap keyboard with broken keys or coffee spilled over it...
    (:-))

    A bit like Ada's --, then. ;-)

    In Ada -- is a comment, not operator.

    Indeed.


    An interesting question regarding operator's symbol is: sticking to
    ASCII or going Unicode. Let's say we wanted an increment operator (I do
    not). Why ++ from 60's? Take the increment (∆) or the upwards arrow ↑ etc.

    As you know, I don't like Unicode for program source. It can be hard to
    type, hard to read aloud, and hard to compare when two glyphs look the
    same but have different encodings. A small set of characters such as
    ASCII has none of those problems.

    As for replacing ++ with something else I have tried things like

    +> prefix increment
    <+ postfix increment

    or maybe ++> and <++.

    But I don't know if programmers in general would care for them.


    Note that all arguments against Unicode apply to operators. If the thing
    is difficult to type then it is difficult to remember: precedence level, associativity, semantics. If you can hold these in your head, you could remember the key combination as well. If you do not, then, maybe, having
    a subprogram Increment() would be better choice?

    Agreed.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Sun Nov 13 19:05:25 2022
    On 07/11/2022 20:59, David Brown wrote:
    On 07/11/2022 20:24, James Harris wrote:
    On 07/11/2022 14:58, David Brown wrote:
    On 07/11/2022 12:55, James Harris wrote:

    ...

    So it should be possible to combine multiple ++
    operators arbitrarily. For example,

       ++E + E++
       ++E++
       V = V++

    Expressions such as those would have a defined meaning. The actual
    meaning is less important than it being well defined and so
    something a programmer can rely on.

    I disagree entirely

    Good. :)

    - unless you include giving an error message saying the programmer
    should be fired for writing gibberish as "well defined and something
    you can rely on".  I can appreciate not wanting such things to be
    run-time undefined behaviour, but there is no reason at all to insist
    that it is acceptable by the compiler.

    As I said to Dmitry, if one wants to prohibit the above then one has
    to define what exactly is being prohibited and to be careful not
    thereby to prohibit something else that may be more legitimate.
    Further, such a prohibition is an additional rule the programmer has
    to learn.

    No one said this was easy!  Though Dmitry had some suggestions of rules
    to try.

    These prohibitions aren't really additional rules for the programmer to
    learn - it is primarily about disallowing things that a good programmer
    is not going to write in the first place.  No one should actually care
    if "++E++" is allowed or not, because they should never write it.

    Yet a programmer may find such an expression in someone else's code.

    Prohibiting it means you don't have to specify the order these operators
    are applied, or whether the expression must be evaluated for
    side-effects twice, or any of the rest of it.  The only people that will have to learn something extra are the sort of programmers who think it
    is smart to write line noise.

    And the programmers who have to read such code.



    All in all, ISTM better to define such expressions. The programmer is
    not forced to use them but at least if they are present in code and
    well defined then their meaning will be plain.


    No, the meaning will /not/ be plain.  That's the point.  Ideally you
    should only allow constructs that do exactly what they appear to do,
    without the reader having to study the manuals to understand some indecipherable gibberish that is technically legal code but completely
    alien to them because no sane programmer would write it.

    There are tradeoffs. Make a language too simple and while it will be
    easy to learn programs written in it will he indecipherable. By
    contrast, make a language too complex and it becomes harder to learn and programs written in it can still be indecipherable except to veterans.

    AISI somewhere between those two extremes is a sweet spot in which a
    language is reasonably easy to learn, and programs written in it would naturally be reasonably easy to understand.



    Take the first one,

       ++E + E++

    It could be defined fairly easily. If operands to + are defined to
    appear as though they were evaluated left then right and the ++
    operators are set to be of higher precedence and defined to take
    effect as soon as they are evaluated than

       ++E + E++

    would evaluate as though the operations were

       ++E; E++; +


    Then define it as "syntax error" and insist the programmer writes it sensibly.

    It's not a typical syntax error. Each operator would have the correct
    number of operators.

    Further, it may not be known whether the expression is well defined or
    not until run time. Consider two pointers, p0 and p1.

    ++(*p0) + (*p1)++

    AIUI in some languages if p0 and p1 point to different locations then
    the expression is well defined. If they point to the same location,
    however, then the effect is not well defined.

    There's no need for that complexity and uncertainty. Why not, instead,
    say that the expression is defined to function as though the operands
    were evaluated in a specific order? Wouldn't that be easier for a
    programmer to understand and rely on?


    I cannot conceive of a reason to have a pre-increment operator in a
    modern language, nor would I want post-increment to return a value (nor
    any other kind of assignment).  Ban side-effects in expressions -
    require a statement.  "x = y + 1;" is a statement, so it can affect "x".
     "y++;" is a statement - a convenient abbreviation for "y = y + 1;".
    "++x" no longer exists, and "x + x++;" makes no sense because it mixes
    an expression and a statement.

    What is the cost?  The programmer might have to split things into a few lines - but we have much bigger screens and vastly bigger disks than the
    days when C was born.  The programmer might need a few extra temporary variables - these are free with modern compiler techniques.

    Ask yourself why "++x;" and the like exist in languages like C.  The
    reason is that early compilers were weak - they were close to dumb translators into assembly, and if you wanted efficient results using the features of the target processor, you needed to write your code in a way
    that mimicked the actual processor instructions.  "INC A" was faster
    than "ADD A, 1", so you write "x++" rather than "x = x + 1".  This is no longer the case in the modern world.

    To be clear, my motivation for including ++x and x++ is not about any of
    those things but is all about readability and (to a lesser extent)
    writability. Some intentions are more naturally and more simply
    expressed with autoincrement or autodecrement. If they can be used to
    make code /clearer/ then ISTM it is worth doing the work to try to
    include them.

    Against that, however, is the concern that their inclusion may encourage programmers to write code which is unnecessarily cryptic. It /may/ be
    better to require programmers to code x = x + 1 or similar instead. ATM
    the jury is out.

    ...

    I am aware that it might make optimisation harder to achieve but that
    would only apply in some cases and is still, IMO, better than simply
    saying "that's not defined".

    IOW I welcome your disagreement but don't understand it!


    I think it is great that you are happy to discuss this and I try my bes
    to explain it.

    Thanks, I appreciate the input!

    Have to say that each of these discussions that we in this group have
    over what would appear to be minutiae well illustrate how much time it
    can take to resolve even tiny decisions in the design of a language!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to James Harris on Sun Nov 13 19:30:13 2022
    James Harris <james.harris.1@gmail.com> writes:
    As you know, I don't like Unicode for program source. It can be hard to
    type, hard to read aloud, and hard to compare when two glyphs look the
    same but have different encodings. A small set of characters such as
    ASCII has none of those problems.

    But it can be cumbersome to escape all quotation characters
    in a string, as in "\"\\" in C.

    Imagine one would use some obscure Unicode characters as
    string delimiters. For example,

    ᒪ CANADIAN SYLLABICS MA
    ᒧ CANADIAN SYLLABICS MO

    . Programmers will surely find a way to map them to their
    keyboards somehow. Then that string literal would be just
    ᒧ"\ᒪ!

    (One would just have to use escapes in the rare case that
    one really needs to have those Canadian syllabics ma or mo
    within a string literal.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Stefan Ram on Sun Nov 13 20:01:34 2022
    On 13/11/2022 19:30, Stefan Ram wrote:
    James Harris <james.harris.1@gmail.com> writes:
    As you know, I don't like Unicode for program source. It can be hard to
    type, hard to read aloud, and hard to compare when two glyphs look the
    same but have different encodings. A small set of characters such as
    ASCII has none of those problems.

    But it can be cumbersome to escape all quotation characters
    in a string, as in "\"\\" in C.

    Rather than allowing non-ASCII in source I came up with a scheme of what
    you might call 'named characters' extending the backslash idea of C to
    allow names instead of single characters after the backslash. It's off
    topic for this thread but it allows non-ASCII characters to be named
    (such that the names consist of ASCII characters and would thus be
    readable and universal).


    Imagine one would use some obscure Unicode characters as
    string delimiters. For example,

    ᒪ CANADIAN SYLLABICS MA
    ᒧ CANADIAN SYLLABICS MO

    . Programmers will surely find a way to map them to their
    keyboards somehow. Then that string literal would be just
    ᒧ"\ᒪ!

    I cannot read that. Nor could many other programmers.


    (One would just have to use escapes in the rare case that
    one really needs to have those Canadian syllabics ma or mo
    within a string literal.)



    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 19:18:12 2022
    On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from your
    suggestions but can't help but think they would make programming in it
    less comfortable, like the simplicity of a hair shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++; you
    are wrong. Actually that train left the station. Today C++ fun is
    templates. It is monstrous instantiations over instantiations barely resembling program code. Modern times is a glorious combination of
    Python performance with K&R C readability! (:-))

    Contrast

    *p := *q
    p := p + 1
    q := q + 1

    Perhaps

    +p++ := *q++

    expresses that part of the algorithm in a way which is more natural and
    easier to read...?


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sun Nov 13 19:41:56 2022
    On 07/11/2022 14:23, Bart wrote:
    On 07/11/2022 11:55, James Harris wrote:

    ...

    For unary operators, the evaluation order is rather peculiar yet seems
    to be used in quite a few languages without anyone questioning it. So if
    `a b c d` are unary operators, then the following:

       a b E c d

    is evaluated like this:

          a (b ((E c) d))

    That is, first all the post-fix operators in left-to-right order, then
    all the prefix ones in right-left order. It sounds bizarre when put like that!

    Indeed, right then left does sound unnatural, albeit that it's
    relatively easy to remember. Could it be the comparative simplicity of
    "right then left" is why C has postfix ++ of higher precedence than
    prefix ++, even though that would be an awkward way to order such
    operations?


    Taking a step back and considering general expression evaluation I
    have, so far, been defining the apparent order. And I'd like to
    continue with that. So it should be possible to combine multiple ++
    operators arbitrarily. For example,

       ++E + E++


    This is well defined, as unary operators bind more tightly than binary
    ones. This is just (++E) + (++E).

    Isn't it, rather,

    (++E) + (E++)

    ?


    However the evaluation order for '+' is not usually well-defined, so you don't know which operand will be done first.

       ++E++

    This may not work, or not work as espected. The binding using my above
    scheme means this is equivalent to ++(E++).

    Perhaps the scheme is wrong (for some definition of wrong)...?

    IMO (++E)++ makes more sense.


    But the second ++ requires
    an lvalue as input, and yields an rvalue, which would be an invalid
    input to the first ++.

    Better to give ++ (prefix) precedence over ++ (postfix), perhaps.

    ...

    At any rate, that distinction between prefix and postfix ++ seems to
    be recognised at the following link where it says "Prefix versions of
    the built-in operators return references and postfix versions return
    values."

       https://en.cppreference.com/w/cpp/language/operator_incdec

    I tried to get ++E++ to work using a suitable type for E, but in my
    language it cannot work, as the first ++ still needs an lvalue; just an rvalue which has a pointer type won't cut it.

    However ++E++^ can work, where ^ is deref, and E is a pointer.

    I think this is because in my language, for something to be a valid
    lvalue, you need to be able to apply & address-of to it. The result of
    E++ doesn't have an address. But (E++)^ works because & and ^ cancel
    out. Or something...

    As a guess, could your compiler be making an optimisation (coalescing postincrement and dereference) in a way which contravenes the definition
    of the operators?


    Setting that aside aside ... and going back to the query, what should
    be the relative precedences of the three operators? For example, how
    should the following be evaluated?

       ++E++^
       ++E^++

    Or should some ways of combining ^ with either or both of the ++
    operators be prohibited because they make code too difficult to
    understand?!!

    You have the same issues in C, but that's OK because people are so
    familiar with it. Also * deref is a prefix operator so you never have
    two distinct postfix operators, unless you write E++ --.

    But yes, parentheses are recommended when mixing certain prefix/postfix
    ops. I think this one is clear enough however:

       -E^

    Deference E then negate the result. As is this: -E[i]; you wouldn't
    assume that meant (-E)[i].

    Yes, of the operators you mention I have the precedence order as

    array indexing
    dereference
    unary minus

    Thus my expression parser ought to do the sensible thing in each case.
    The example you give is interesting because IIRC you give all unary
    prefix operators a very high precedence. ISTM that such an approach
    could try to evaluate the above wrongly and that it may be better to
    order precedences by what the operators consume and produce than just to
    boost the precedences of prefix operators. A prime example is boolean
    not. In

    not a > b

    the "a > b" part produces a boolean, which is a natural input to "not".
    It is surely sensible to have all boolean consumers as of lower
    precedence than operations which produce booleans rather than saying
    that "not" has a higher precedence because it is a prefix operator. Your choice, of course, but I know you appreciate a challenge. ;-)


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 20:49:09 2022
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:
    On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from your
    suggestions but can't help but think they would make programming in
    it less comfortable, like the simplicity of a hair shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++; you
    are wrong. Actually that train left the station. Today C++ fun is
    templates. It is monstrous instantiations over instantiations barely
    resembling program code. Modern times is a glorious combination of
    Python performance with K&R C readability! (:-))

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more natural
    and easier to read...?

    No algorithm requires you resort to pointers. With arrays it is plain assignment:

       p := q;

    You're assuming this is part of a loop. But perhaps other things are
    happening after each *p++ = *q++: the next source or dest might be
    different, or the next transfer might be of a different type and/or size.

    This is what makes such lower level operations so useful. A solution in
    a higher level but more limiting language might require ingenuity to get
    around the strictness.

    which BTW could be performed in parallel or by a single instruction on a
    CISC machine or with a bunch optimizations.

    Well, perhaps this *p++=*q++ is part of the result of such a process,
    where the target happens to be C source code. Few languages higher level
    than ASM are suited for that job.

    But by all means continue to pour scorn on it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Sun Nov 13 22:14:28 2022
    On 2022-11-13 21:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:
    On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from your
    suggestions but can't help but think they would make programming in
    it less comfortable, like the simplicity of a hair shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++;
    you are wrong. Actually that train left the station. Today C++ fun
    is templates. It is monstrous instantiations over instantiations
    barely resembling program code. Modern times is a glorious
    combination of Python performance with K&R C readability! (:-))

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more natural
    and easier to read...?

    No algorithm requires you resort to pointers. With arrays it is plain
    assignment:

        p := q;

    You're assuming this is part of a loop. But perhaps other things are happening after each *p++ = *q++: the next source or dest might be
    different, or the next transfer might be of a different type and/or size.

    This is what makes such lower level operations so useful. A solution in
    a higher level but more limiting language might require ingenuity to get around the strictness.

    which BTW could be performed in parallel or by a single instruction on
    a CISC machine or with a bunch optimizations.

    Well, perhaps this *p++=*q++ is part of the result of such a process,
    where the target happens to be C source code. Few languages higher level
    than ASM are suited for that job.

    But by all means continue to pour scorn on it.

    Sure, the point was about algorithms. Some artificially constructed
    cases do not count for programmers. This is why there is no need in such low-level languages. If something falls out of effective and safe
    techniques, the program gets redesigned.

    It is called engineering. If you have to design a new type of screw for
    your project, you are a bad engineer.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Sun Nov 13 21:19:11 2022
    On 2022-11-13 20:18, James Harris wrote:
    On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from your
    suggestions but can't help but think they would make programming in
    it less comfortable, like the simplicity of a hair shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++; you
    are wrong. Actually that train left the station. Today C++ fun is
    templates. It is monstrous instantiations over instantiations barely
    resembling program code. Modern times is a glorious combination of
    Python performance with K&R C readability! (:-))

    Contrast

      *p := *q
      p := p + 1
      q := q + 1

    Perhaps

      +p++ := *q++

    expresses that part of the algorithm in a way which is more natural and easier to read...?

    No algorithm requires you resort to pointers. With arrays it is plain assignment:

    p := q;

    which BTW could be performed in parallel or by a single instruction on a
    CISC machine or with a bunch optimizations. Consider the case when the
    target object requires construction and destruction. Referencing would
    need to construct a new object, destruct target, construct new target,
    destruct copy. In Ada the compiler is allowed to skip the intermediates
    and perform bitwise copy with "adjusting" the result.

    The morale: low-level overly specified stuff is not only uncomfortable
    and dangerous it is greatly inefficient in large scale programming.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sun Nov 13 21:06:34 2022
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:
    On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from your
    suggestions but can't help but think they would make programming in
    it less comfortable, like the simplicity of a hair shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++;
    you are wrong. Actually that train left the station. Today C++ fun
    is templates. It is monstrous instantiations over instantiations
    barely resembling program code. Modern times is a glorious
    combination of Python performance with K&R C readability! (:-))

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more natural
    and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just as easily
    have

    a[i++] := b[j++]


    With arrays it is plain
    assignment:

        p := q;

    You're assuming this is part of a loop. But perhaps other things are happening after each *p++ = *q++: the next source or dest might be
    different, or the next transfer might be of a different type and/or size.

    Yes, and less than a whole array may be being copied, or part of an
    array may be being copied over another part of the same array, etc.


    This is what makes such lower level operations so useful. A solution in
    a higher level but more limiting language might require ingenuity to get around the strictness.

    which BTW could be performed in parallel or by a single instruction on
    a CISC machine or with a bunch optimizations.

    Well, perhaps this *p++=*q++ is part of the result of such a process,
    where the target happens to be C source code. Few languages higher level
    than ASM are suited for that job.

    Compilers are getting good at recognising code which can be turned into intrinsics, though IMO we shouldn't ask too much of them.

    For example, an entire loop to count the bits set in a word may be
    recognised by a compiler and on an x86 target turned into a single
    popcnt instruction but AISI it would be better for the language to
    supply pseudofunctions for operations such as that. Why? Primarily
    because they make the source code easier to read.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 22:10:55 2022
    On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:
    On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from
    your suggestions but can't help but think they would make
    programming in it less comfortable, like the simplicity of a hair >>>>>>> shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++;
    you are wrong. Actually that train left the station. Today C++ fun >>>>>> is templates. It is monstrous instantiations over instantiations
    barely resembling program code. Modern times is a glorious
    combination of Python performance with K&R C readability! (:-))

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more natural
    and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just as
    easily have

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?


    Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square root,
    post exponentiation, post factorial when the humankind need them so bad?

    I cannot tell what point you are trying to make.


    You remind be a salesman selling the toaster equipped with a toilet
    brush. Know what? I do not need this combination...

    I think you only need the toilet brush.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Sun Nov 13 22:29:13 2022
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:
    On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from your >>>>>> suggestions but can't help but think they would make programming
    in it less comfortable, like the simplicity of a hair shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++;
    you are wrong. Actually that train left the station. Today C++ fun
    is templates. It is monstrous instantiations over instantiations
    barely resembling program code. Modern times is a glorious
    combination of Python performance with K&R C readability! (:-))

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more natural
    and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just as easily
    have

      a[i++] := b[j++]

    Same question. What for?

    Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square root,
    post exponentiation, post factorial when the humankind need them so bad?

    You remind be a salesman selling the toaster equipped with a toilet
    brush. Know what? I do not need this combination...

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 01:10:59 2022
    On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:
    On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from
    your suggestions but can't help but think they would make
    programming in it less comfortable, like the simplicity of a hair >>>>>>> shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++;
    you are wrong. Actually that train left the station. Today C++ fun >>>>>> is templates. It is monstrous instantiations over instantiations
    barely resembling program code. Modern times is a glorious
    combination of Python performance with K&R C readability! (:-))

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more natural
    and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just as
    easily have

       a[i++] := b[j++]

    Same question. What for?

    Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square root,
    post exponentiation, post factorial when the humankind need them so bad?

    You remind be a salesman selling the toaster equipped with a toilet
    brush. Know what? I do not need this combination...

    But I do!

    My largest program uses ++ nearly 1000 times (all varieties).

    It would use expexp, sqrtsqrt or !! exactly zero times each. So it would
    be very poor language design, and pointless.

    There is also no prior art; noone would have a clue what it meant.
    Neither do I; I think your sqrtsqrt means:

    (t:=x; x:=sqrt(x); t)

    But this is easy enough to implement within usercode; it doesn't need a dedicated language feature, as it's not something that a decent
    percentage of a language's community would commonly use.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 14 08:52:21 2022
    On 2022-11-13 23:10, James Harris wrote:
    On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:
    On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
    On 2022-11-08 09:04, James Harris wrote:

    I like the simplicity of the language which would result from
    your suggestions but can't help but think they would make
    programming in it less comfortable, like the simplicity of a
    hair shirt. ;)

    If you think programmers are dying to write stuff like *p++=*q++; >>>>>>> you are wrong. Actually that train left the station. Today C++
    fun is templates. It is monstrous instantiations over
    instantiations barely resembling program code. Modern times is a >>>>>>> glorious combination of Python performance with K&R C
    readability! (:-))

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more
    natural and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just as
    easily have

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.
    Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
    root, post exponentiation, post factorial when the humankind need them
    so bad?

    I cannot tell what point you are trying to make.

    There exist an infinite number of combinations that could be operators.
    E.g. divide by two and reboot the computer.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 14 10:26:46 2022
    On 13/11/2022 18:11, James Harris wrote:
    On 08/11/2022 16:29, David Brown wrote:
    On 08/11/2022 14:24, James Harris wrote:

    ...

    You can't stop everything - those evil programmers have better
    imaginations than any well-meaning language designer.  But you can
    try.   Aim to make it harder to write convoluted code, and easier to
    write clearer code.  And try to make the clearer code more efficient,
    to reduce the temptation to write evil code.

    I agree

    ...

    In a similar way, programming can be hard when other programmers
    write constructs we don't like. I agree that it's best for a language
    to help programmers write readable and comprehensible programs - and
    even to make them the easiest to write, if possible - but the very
    flexibility which may allow them to do so may also give then the
    freedom to write code we don't care for. I don't think one can
    legislate against that.


    I'm not sure it is the same - after all, if some one exercises their
    rights to speak gibberish, or to give long, convoluted and
    incomprehensible speaches, the listener has the right to go away,
    ignore them, or fall asleep.  It's harder for a compiler to do that!

    I'm not worried about the compiler. As long as it can make sense of a
    piece of code then it can compile it. It's the human I am concerned
    about, especially the poor old maintenance programmer!


    You /say/ that, but you don't appear to believe it or be interested in
    making it happen.

    On the one side, you claim you want a clear language that is
    understandable for programmers and maintenance. On the other side, you
    want to decide what "++E++" should mean, with random "^" characters
    thrown in for good measure.

    These two statements go together as well as Dmitry's toaster and toilet
    brush. It doesn't matter how precisely you define how the combination
    can be used and what it does, it is still not a good or useful thing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to James Harris on Mon Nov 14 10:24:18 2022
    James Harris <james.harris.1@gmail.com> writes:
    Rather than allowing non-ASCII in source I came up with a scheme of what
    you might call 'named characters' extending the backslash idea of C to
    allow names instead of single characters after the backslash. It's off
    topic for this thread but it allows non-ASCII characters to be named
    (such that the names consist of ASCII characters and would thus be
    readable and universal).

    Here's an example of a Python program.

    print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

    It prints:

    \"

    . But in Python one could also write

    print( r'\"' )

    to get the same output.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 14 11:32:46 2022
    On 2022-11-14 11:16, James Harris wrote:
    On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
    On 2022-11-13 23:10, James Harris wrote:
    On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:

    ...

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more
    natural and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just as
    easily have

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential component
    of many algorithms.

    Show me one that is not array assignment.

    It's as though someone points out a brick and
    someone else says "what does it mean?"

    No, it is like building up a brick factory in your back garden for the
    purpose of cracking walnuts...

    Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
    root, post exponentiation, post factorial when the humankind need
    them so bad?

    I cannot tell what point you are trying to make.

    There exist an infinite number of combinations that could be
    operators. E.g. divide by two and reboot the computer.

    OK. ATM I have a generous **but limited** number of operators.

    Limited by which criteria?

    Again, IMO it's important for the language to provide such
    pseudofunctions so that a programmer's code can be made clearer,
    simpler, and more readable.

    Like this one?

    ++p+++

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Mon Nov 14 10:47:29 2022
    On 14/11/2022 09:26, David Brown wrote:
    On 13/11/2022 18:11, James Harris wrote:
    On 08/11/2022 16:29, David Brown wrote:

    I'm not worried about the compiler. As long as it can make sense of a
    piece of code then it can compile it. It's the human I am concerned
    about, especially the poor old maintenance programmer!


    You /say/ that, but you don't appear to believe it or be interested in
    making it happen.

    On the one side, you claim you want a clear language that is
    understandable for programmers and maintenance.  On the other side, you
    want to decide what "++E++" should mean, with random "^" characters
    thrown in for good measure.

    In-place, value-returning increment ops written as ++ and -- are common
    in languages.

    So are pointer-dereference operators in lower-level languages, whether
    written as * or ^.

    Once you have those two possibilities in a language, why shouldn't you
    define what combinations of those operators might mean?

    (I just differ from James in thinking that successive *value-returning**
    ++ or -- operators, whether prefix or postfix, are not meaningful. I'd
    also think it would be bad form to chain them, but it is not practical
    to be ban at the syntax level.

    However, I have sometimes banned even `a+b` in some contexts, when the resulting value is unused.)

    Is your point that you shouldn't have either of those operators? ++ and
    -- can be replaced at some inconvenience. But getting rid of dereference
    is harder; if P is a pointer:

    print P

    will this display the value of the pointer, or the value of its target?
    If only there was a way to specify that precisely!

    Note that when p and q are byte pointers, then *p++ = *q++ (or p++^ :=
    q++^) corresponds to the one-byte Z80 LDI instruction.

    So it's something so meaningless that that tiny 8-bit processor decided
    to give it its own instruction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Nov 14 10:44:12 2022
    On 14/11/2022 09:26, David Brown wrote:
    On 13/11/2022 18:11, James Harris wrote:
    On 08/11/2022 16:29, David Brown wrote:

    ...

    You can't stop everything - those evil programmers have better
    imaginations than any well-meaning language designer.  But you can
    try.   Aim to make it harder to write convoluted code, and easier to
    write clearer code.  And try to make the clearer code more efficient,
    to reduce the temptation to write evil code.

    I agree

    ...

    I'm not worried about the compiler. As long as it can make sense of a
    piece of code then it can compile it. It's the human I am concerned
    about, especially the poor old maintenance programmer!


    You /say/ that, but you don't appear to believe it or be interested in
    making it happen.

    That comment surprises me a little. /The main point/ of this, AISI, is
    to make the job of the programmer simpler and to help him write code
    which is more readable. You said yourself that (paraphrasing) when
    there's a choice between clear and convoluted code it's important for a language to make the clearer code the easier one to write.


    On the one side, you claim you want a clear language that is
    understandable for programmers and maintenance.

    Yes.


    On the other side, you
    want to decide what "++E++" should mean, with random "^" characters
    thrown in for good measure.

    Not quite. As language designer I have to decide what facilities will be provided but I do not have absolute control over what a programmer may
    do with the facilities. Nor would a programmer want to work with a
    language which implemented unnecessary rules or rules which he may see
    as arbitrary.

    The expression you mention is just one of a myriad of what you might
    consider to be potential nasties. If I am going to prohibit that one
    then what about all the others?



    These two statements go together as well as Dmitry's toaster and toilet brush.  It doesn't matter how precisely you define how the combination
    can be used and what it does, it is still not a good or useful thing.

    OK, let's take the combination you mentioned:

    ++E++

    I wonder why you see a problem with it. As I see it, it increments E
    before evaluation and then increments E after evaluation. What is so
    complex about that? It does exactly what it says on the tin, and in the
    order that it says it. Remember that unlike C I define the apparent
    order of evaluation so the expression is perfectly well formed.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 11:00:02 2022
    On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
    On 2022-11-14 11:16, James Harris wrote:

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential
    component of many algorithms.

    Show me one that is not array assignment.

    What exactly is your objection: that there shouldn't be an increment
    operator at all?

    Then it's end of discussion. But if you allow a value-returning
    increment operator, then someone could use it in multiple places in the
    same expresion, together with other operators, and the language has to
    be able to deal with it.

    Note that any language that has reference parameters would allow this:

    a[postincr(i)] := b[postincr(j)]

    Or is that something else you're not keen on?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Mon Nov 14 11:11:57 2022
    On 14/11/2022 10:44, James Harris wrote:
    On 14/11/2022 09:26, David Brown wrote:
    On 13/11/2022 18:11, James Harris wrote:
    On 08/11/2022 16:29, David Brown wrote:

    ...

    You can't stop everything - those evil programmers have better
    imaginations than any well-meaning language designer.  But you can
    try.   Aim to make it harder to write convoluted code, and easier to >>>> write clearer code.  And try to make the clearer code more
    efficient, to reduce the temptation to write evil code.

    I agree

    ...

    I'm not worried about the compiler. As long as it can make sense of a
    piece of code then it can compile it. It's the human I am concerned
    about, especially the poor old maintenance programmer!


    You /say/ that, but you don't appear to believe it or be interested in
    making it happen.

    That comment surprises me a little. /The main point/ of this, AISI, is
    to make the job of the programmer simpler and to help him write code
    which is more readable. You said yourself that (paraphrasing) when
    there's a choice between clear and convoluted code it's important for a language to make the clearer code the easier one to write.


    On the one side, you claim you want a clear language that is
    understandable for programmers and maintenance.

    Yes.


    On the other side, you want to decide what "++E++" should mean, with
    random "^" characters thrown in for good measure.

    Not quite. As language designer I have to decide what facilities will be provided but I do not have absolute control over what a programmer may
    do with the facilities. Nor would a programmer want to work with a
    language which implemented unnecessary rules or rules which he may see
    as arbitrary.

    The expression you mention is just one of a myriad of what you might
    consider to be potential nasties. If I am going to prohibit that one
    then what about all the others?



    These two statements go together as well as Dmitry's toaster and
    toilet brush.  It doesn't matter how precisely you define how the
    combination can be used and what it does, it is still not a good or
    useful thing.

    OK, let's take the combination you mentioned:

      ++E++

    I wonder why you see a problem with it. As I see it, it increments E
    before evaluation and then increments E after evaluation. What is so
    complex about that? It does exactly what it says on the tin, and in the
    order that it says it. Remember that unlike C I define the apparent
    order of evaluation so the expression is perfectly well formed.

    But does this have the same priorities as:

    op1 E op2

    (where op2 is commonly done first) or does it have special rules, so
    that in:

    --E++

    the -- is done first? If it's different, then what is the ordering when
    mixed with other unary ops?

    You explained somewhere the circumstances where you think this is
    meaningful, but I can't remember what the rules are and I can't find the
    exact post.

    This is the problem. You shouldn't need to stop and think. I make the
    rules simple by stipulating that value-returning ++ and -- only ever
    return rvalues.

    Because if they ever start to return lvalues, then this becomes possible:

    ++E := 0
    E++ := 0

    (Whichever one is legal in your scheme.) So I think there is little
    useful expressivity to be gained.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Mon Nov 14 12:12:52 2022
    On 2022-11-14 12:00, Bart wrote:
    On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
    On 2022-11-14 11:16, James Harris wrote:

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential
    component of many algorithms.

    Show me one that is not array assignment.

    What exactly is your objection: that there shouldn't be an increment
    operator at all?

    There is no objection because, so far, there was no argument.
    Note that any language that has reference parameters would allow this:

       a[postincr(i)] := b[postincr(j)]

    Or is that something else you're not keen on?

    That language shall not have by-value or by-reference objects unless
    explicitly contracted by the programmer. Forcing by-reference on scalars
    is the best way to turn a modern processor into an i368.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 11:03:57 2022
    On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
    On 2022-11-14 11:16, James Harris wrote:
    On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
    On 2022-11-13 23:10, James Harris wrote:
    On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:

    ...

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more
    natural and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just as
    easily have

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential
    component of many algorithms.

    Show me one that is not array assignment.

    if is_name_first(b[j])
    a[i++] = b[j++]
    rep while is_name_follow(b[j])
    a[i++] = b[j++]
    end rep
    a[i] = 0
    return TOK_NAME
    end if

    Now, what don't you like about the ++ operators in that? How would you
    prefer to write it?


    It's as though someone points out a brick and someone else says "what
    does it mean?"

    No, it is like building up a brick factory in your back garden for the purpose of cracking walnuts...

    Says the man who likes Ada! ;-)

    Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
    root, post exponentiation, post factorial when the humankind need
    them so bad?

    I cannot tell what point you are trying to make.

    There exist an infinite number of combinations that could be
    operators. E.g. divide by two and reboot the computer.

    OK. ATM I have a generous **but limited** number of operators.

    Limited by which criteria?

    The set of operators is limited to what's reasonably necessary such as
    the usual stuff: function calls, array references, field selection,
    bitwise operations, arithmetic, comparison, boolean and assignment. Most
    are present in C; only a few are not such as bitwise combinations (e.g.
    nand and nor) and these two: concatenate and boolean (aka logical) xor.
    What's so bad about that?


    Again, IMO it's important for the language to provide such
    pseudofunctions so that a programmer's code can be made clearer,
    simpler, and more readable.

    Like this one?

       ++p+++

    I don't have a +++ operator so I am not sure what that is supposed to
    mean. It's no valid in my language.



    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 10:16:13 2022
    On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
    On 2022-11-13 23:10, James Harris wrote:
    On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:

    ...

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more
    natural and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just as
    easily have

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential component
    of many algorithms. It's as though someone points out a brick and
    someone else says "what does it mean?" or "show me the house that would
    be built from it before I can judge whether a brick is a good idea or
    not". It's a potential component, nothing more.


    Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
    root, post exponentiation, post factorial when the humankind need
    them so bad?

    I cannot tell what point you are trying to make.

    There exist an infinite number of combinations that could be operators.
    E.g. divide by two and reboot the computer.

    OK. ATM I have a generous **but limited** number of operators. For
    anything else the programmer would have to call a function. For example,
    I mentioned here recently that some compilers are extensive enough that
    they can recognise a loop which counts the number of bits set in a word.
    I would not have that as an operator but as a function (or
    pseudofunction). It might be invoked on c as

    x := a + b + bitcount(c) + d

    Again, IMO it's important for the language to provide such
    pseudofunctions so that a programmer's code can be made clearer,
    simpler, and more readable.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Mon Nov 14 11:06:50 2022
    On 13/11/2022 16:55, James Harris wrote:
    On 10/11/2022 10:42, Bart wrote:

    It's clear that ++(A+B) can't work unless you change what ++ means
    (eg. ++A now means (A+1) because whatever ++ modifies is not accessible).

    It's similar if A were a struct. You could have

      A.F

    but you could not have

      (A + 4).F

    The LHS of the . operation, in this case, has to be an lvalue.

    That's not right. C allows functions to return structs; they are not
    lvalues, but you can apply ".":

    typedef struct{int x,y;}Point;

    Point F(void) {
    Point p;
    return p;
    }

    int main(void)
    {
    Point p;
    int a;

    // F()=p; // Not valid: not an lvalue
    a=F().x; // Valid
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 14 12:29:10 2022
    On 2022-11-14 12:03, James Harris wrote:
    On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
    On 2022-11-14 11:16, James Harris wrote:
    On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
    On 2022-11-13 23:10, James Harris wrote:
    On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:

    ...

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more >>>>>>>>>> natural and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just as >>>>>>> easily have

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential
    component of many algorithms.

    Show me one that is not array assignment.

      if is_name_first(b[j])
        a[i++] = b[j++]
        rep while is_name_follow(b[j])
          a[i++] = b[j++]
        end rep
        a[i] = 0
        return TOK_NAME
      end if

    Now, what don't you like about the ++ operators in that? How would you
    prefer to write it?

    From parser production code:

    procedure Get_Identifier
    ( Code : in out Source'Class;
    Line : String;
    Pointer : Integer;
    Argument : out Tokens.Argument_Token
    ) is
    Index : Integer := Pointer + 1;
    Malformed : Boolean := False;
    Underline : Boolean := False;
    Symbol : Character;
    begin
    while Index <= Line'Last loop
    Symbol := Line (Index);
    if Is_Alphanumeric (Symbol) then
    Underline := False;
    elsif '_' = Symbol then
    Malformed := Malformed or Underline;
    Underline := True;
    else
    exit;
    end if;
    Index := Index + 1;
    end loop;
    Malformed := Malformed or Underline;
    Set_Pointer (Code, Index);
    Argument.Location := Link (Code);
    Argument.Value := new Identifier (Index - Pointer);
    declare
    This : Identifier renames Identifier (Argument.Value.all);
    begin
    This.Location := Argument.Location;
    This.Malformed := Malformed;
    This.Value := Line (Pointer..Index - 1);
    end;
    end Get_Identifier;

    It's as though someone points out a brick and someone else says "what
    does it mean?"

    No, it is like building up a brick factory in your back garden for the
    purpose of cracking walnuts...

    Says the man who likes Ada! ;-)

    Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
    root, post exponentiation, post factorial when the humankind need
    them so bad?

    I cannot tell what point you are trying to make.

    There exist an infinite number of combinations that could be
    operators. E.g. divide by two and reboot the computer.

    OK. ATM I have a generous **but limited** number of operators.

    Limited by which criteria?

    The set of operators is limited to what's reasonably necessary such as
    the usual stuff: function calls, array references, field selection,
    bitwise operations, arithmetic, comparison, boolean and assignment. Most
    are present in C; only a few are not such as bitwise combinations (e.g.
    nand and nor) and these two: concatenate and boolean (aka logical) xor. What's so bad about that?

    You came to a conclusion without putting any objective criteria upfront.

    Again, IMO it's important for the language to provide such
    pseudofunctions so that a programmer's code can be made clearer,
    simpler, and more readable.

    Like this one?

        ++p+++

    I don't have a +++ operator so I am not sure what that is supposed to
    mean. It's no valid in my language.

    It could a part of even simpler and brilliantly readable:

    ++p+++q+++r

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 12:45:25 2022
    On 14/11/2022 11:29, Dmitry A. Kazakov wrote:
    On 2022-11-14 12:03, James Harris wrote:
    On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
    On 2022-11-14 11:16, James Harris wrote:
    On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
    On 2022-11-13 23:10, James Harris wrote:
    On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:

    ...

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more >>>>>>>>>>> natural and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just >>>>>>>> as easily have

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential
    component of many algorithms.

    Show me one that is not array assignment.

       if is_name_first(b[j])
         a[i++] = b[j++]
         rep while is_name_follow(b[j])
           a[i++] = b[j++]
         end rep
         a[i] = 0
         return TOK_NAME
       end if

    Now, what don't you like about the ++ operators in that? How would you
    prefer to write it?

    From parser production code:

    procedure Get_Identifier
              (  Code     : in out Source'Class;
                 Line     : String;
                 Pointer  : Integer;
                 Argument : out Tokens.Argument_Token
              )  is
       Index     : Integer := Pointer + 1;
       Malformed : Boolean := False;
       Underline : Boolean := False;
       Symbol    : Character;
    begin
       while Index <= Line'Last loop
          Symbol := Line (Index);
          if Is_Alphanumeric (Symbol) then
             Underline := False;
          elsif '_' = Symbol then
             Malformed := Malformed or Underline;
             Underline := True;
          else
             exit;
          end if;
          Index := Index + 1;
       end loop;
       Malformed := Malformed or Underline;
       Set_Pointer (Code, Index);
       Argument.Location := Link (Code);
       Argument.Value := new Identifier (Index - Pointer);
       declare
          This : Identifier renames Identifier (Argument.Value.all);
       begin
          This.Location  := Argument.Location;
          This.Malformed := Malformed;
          This.Value     := Line (Pointer..Index - 1);
       end;
    end Get_Identifier;

    Clearly you get paid by the line. Even then, the code where a substring
    is copied into another location, which would require the double-stepping
    of the relevant pointer/indices of the earlier example, is missing here.

    I don't have a +++ operator so I am not sure what that is supposed to
    mean. It's no valid in my language.

    It could a part of even simpler and brilliantly readable:

      ++p+++q+++r

    This is legal in Ada:

    a := +b;

    But for some reason, you can't write ++b or + +b, it has to be `a := +
    (+b)`. So while you can't do +b++++c, you can write:

    a:=+b+(+(+(+(c))));

    Do the parentheses make this acceptable? I guess not, since you won't
    like those ++ ops no matter how many brackets are added.

    My point is that you can legally combine any number of operators to
    result in nonsense-looking code, and the language can do little about it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Mon Nov 14 14:17:29 2022
    On 2022-11-14 13:45, Bart wrote:
    On 14/11/2022 11:29, Dmitry A. Kazakov wrote:
    On 2022-11-14 12:03, James Harris wrote:
    On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
    On 2022-11-14 11:16, James Harris wrote:
    On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
    On 2022-11-13 23:10, James Harris wrote:
    On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
    On 2022-11-13 22:06, James Harris wrote:
    On 13/11/2022 20:49, Bart wrote:
    On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
    On 2022-11-13 20:18, James Harris wrote:

    ...

    Contrast

       *p := *q
       p := p + 1
       q := q + 1

    Perhaps

       +p++ := *q++

    expresses that part of the algorithm in a way which is more >>>>>>>>>>>> natural and easier to read...?

    No algorithm requires you resort to pointers.

    This doesn't have to be about pointers, Dmitry. One could just >>>>>>>>> as easily have

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential
    component of many algorithms.

    Show me one that is not array assignment.

       if is_name_first(b[j])
         a[i++] = b[j++]
         rep while is_name_follow(b[j])
           a[i++] = b[j++]
         end rep
         a[i] = 0
         return TOK_NAME
       end if

    Now, what don't you like about the ++ operators in that? How would
    you prefer to write it?

     From parser production code:

    procedure Get_Identifier
               (  Code     : in out Source'Class;
                  Line     : String;
                  Pointer  : Integer;
                  Argument : out Tokens.Argument_Token
               )  is
        Index     : Integer := Pointer + 1;
        Malformed : Boolean := False;
        Underline : Boolean := False;
        Symbol    : Character;
    begin
        while Index <= Line'Last loop
           Symbol := Line (Index);
           if Is_Alphanumeric (Symbol) then
              Underline := False;
           elsif '_' = Symbol then
              Malformed := Malformed or Underline;
              Underline := True;
           else
              exit;
           end if;
           Index := Index + 1;
        end loop;
        Malformed := Malformed or Underline;
        Set_Pointer (Code, Index);
        Argument.Location := Link (Code);
        Argument.Value := new Identifier (Index - Pointer);
        declare
           This : Identifier renames Identifier (Argument.Value.all);
        begin
           This.Location  := Argument.Location;
           This.Malformed := Malformed;
           This.Value     := Line (Pointer..Index - 1);
        end;
    end Get_Identifier;

    Clearly you get paid by the line. Even then, the code where a substring
    is copied into another location, which would require the double-stepping
    of the relevant pointer/indices of the earlier example, is missing here.

    It is there:

    This.Value := Line (Pointer..Index - 1);

    assigning array as a whole.

    I don't have a +++ operator so I am not sure what that is supposed to
    mean. It's no valid in my language.

    It could a part of even simpler and brilliantly readable:

       ++p+++q+++r

    This is legal in Ada:

         a := +b;

    But for some reason, you can't write ++b or + +b, it has to be `a := +
    (+b)`. So while you can't do +b++++c, you can write:

         a:=+b+(+(+(+(c))));

    Do the parentheses make this acceptable?

    I don't understand the point. Why would you like to have two unary
    pluses in a row?

    My point is that you can legally combine any number of operators to
    result in nonsense-looking code, and the language can do little about it.

    No, the point is that no reasonable code should be nonsense-looking and conversely.

    Increments cross that line. They were perfectly acceptable in K&R C. C
    was a very large and quite complex language then. It was beautiful
    comparing to FORTRAN IV, but I could not use it on a 64K machine. A
    5-pass C compiler took an eternity. Machines then were small and simple. Programs were tiny. *++p was a reasonably complex code. When looking at
    the program having this, it was perfectly clear what it does and why it
    is there. That was 40 years ago. Now, even a microcontoller is far more
    complex and powerful and programmed in a very different manner. We just
    do not need ++ anymore anywhere.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 14 15:47:25 2022
    On 14/11/2022 11:44, James Harris wrote:
    On 14/11/2022 09:26, David Brown wrote:
    On 13/11/2022 18:11, James Harris wrote:
    On 08/11/2022 16:29, David Brown wrote:

    ...

    You can't stop everything - those evil programmers have better
    imaginations than any well-meaning language designer.  But you can
    try.   Aim to make it harder to write convoluted code, and easier to >>>> write clearer code.  And try to make the clearer code more
    efficient, to reduce the temptation to write evil code.

    I agree

    ...

    I'm not worried about the compiler. As long as it can make sense of a
    piece of code then it can compile it. It's the human I am concerned
    about, especially the poor old maintenance programmer!


    You /say/ that, but you don't appear to believe it or be interested in
    making it happen.

    That comment surprises me a little.

    It shouldn't, given what you have written and what /I/ have written in response.

    /The main point/ of this, AISI, is
    to make the job of the programmer simpler and to help him write code
    which is more readable. You said yourself that (paraphrasing) when
    there's a choice between clear and convoluted code it's important for a language to make the clearer code the easier one to write.


    We agree on this as a major point of a language.

    What we disagree on is whether attempting to assign meaning to
    random-looking collections of punctuation characters furthers that goal.


    On the one side, you claim you want a clear language that is
    understandable for programmers and maintenance.

    Yes.


    On the other side, you want to decide what "++E++" should mean, with
    random "^" characters thrown in for good measure.

    Not quite. As language designer I have to decide what facilities will be provided but I do not have absolute control over what a programmer may
    do with the facilities.

    Agreed.

    Nor would a programmer want to work with a
    language which implemented unnecessary rules or rules which he may see
    as arbitrary.

    Disagreed.

    Baring the Barts of the world, /every/ programmer works with languages
    that contain rules that he/she thinks are unnecessary, arbitrary, or at
    the very least, sub-optimal. I have never heard of anyone who thinks
    the language they use is perfect in every way.

    Your challenge is to make a language that makes it easy to write good
    code, and hard to write bad code. That second part is at least as
    important, arguably significantly more important, than the first part.
    It is much harder to achieve in language design. (And it is impossible
    to achieve completely - it is /always/ possible to write bad code.)


    The expression you mention is just one of a myriad of what you might
    consider to be potential nasties. If I am going to prohibit that one
    then what about all the others?


    Prohibit nasty ones.

    A big step in that direction is to say that assignment is a statement,
    not an expression, and that variables cannot be changed by side-effects.
    (How you relate this to function calls is a related and complex issue
    that I have been glossing over here. An idea would be to distinguish
    between "procedures" that may have side effects, and "functions" that do
    not.)

    That means there is no such thing as an "increment" operator - post or pre.

    It also /hugely/ simplifies the language - both for the programmer, and
    for the implementer. If expressions have no side-effects, they can be duplicated, split up, re-arranged, moved around in code, all without
    affected the behaviour of the program.


    /You/ are responsible for your language, and you /can/ make arbitrary
    decisions and limitations. (My suggestions have not been arbitrary -
    they have been backed up by reasoning, even if opinions on them differ.)
    If you want to say that functions with a cyclomatic complexity over 20
    cause a warning and those over 30 cause a hard error, that's /your/
    decision. Some programmers will complain that they can't translate
    their old C/Pascal/Fortran/APL/whatever code directly into your
    language. Other programmers will be glad because the code they are
    faced with is guaranteed not to have big, incomprehensible functions.



    These two statements go together as well as Dmitry's toaster and
    toilet brush.  It doesn't matter how precisely you define how the
    combination can be used and what it does, it is still not a good or
    useful thing.

    OK, let's take the combination you mentioned:

      ++E++

    I wonder why you see a problem with it. As I see it, it increments E
    before evaluation and then increments E after evaluation. What is so
    complex about that? It does exactly what it says on the tin, and in the
    order that it says it. Remember that unlike C I define the apparent
    order of evaluation so the expression is perfectly well formed.


    The very fact that you are discussing how to define it means it is not
    clear and obvious. It is not obvious which order the increments happen,
    or if the order is defined, or if the order matters. It is not obvious
    what the return value should be. It is not obvious where you have
    lvalues or rvalues (not that a language should necessarily have such
    concepts). It is not obvious what happens to E.

    The only thing that /is/ obvious about it, is that any use of it in real
    code would be an extraordinarily bad idea.

    Thus for a language targeting humans (some languages are for
    intermediate code, or very low-level implementations stuff) with an aim
    to help people write good clear code and avoid bad an incomprehensible
    code, there is only one reaction a compiler could have to that
    expression - it /must/ be an error of some kind.

    It is up to /you/ to draw the lines between things that are acceptable
    and things that are not, with the understanding that such a line is
    never going to be perfect - it will have sub-optimal choices in both directions. I've given suggestions - you don't have to follow them.

    What you don't get to do is claim "I want to help programmers write good
    code" and then /allow/ this kind of shite with specific rules to define
    it. That's far worse than C saying "if you write ++E + E++, the
    behaviour is undefined - it's stupid, wrong, and you are on your own if
    you write such nonsense, but I can't stop you doing it".

    There's a very tempting myth in language design that /defining/
    behaviour is key - that gibberish and incorrect code can somehow be made "correct" by defining its behaviour. You are not alone in this - lots
    of languages try to achieve "no undefined behaviour" by defining the
    behaviour of everything instead of banning things that have no correct behaviour.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Stefan Ram on Mon Nov 14 15:14:40 2022
    On 14/11/2022 10:24, Stefan Ram wrote:
    James Harris <james.harris.1@gmail.com> writes:
    Rather than allowing non-ASCII in source I came up with a scheme of what
    you might call 'named characters' extending the backslash idea of C to
    allow names instead of single characters after the backslash. It's off
    topic for this thread but it allows non-ASCII characters to be named
    (such that the names consist of ASCII characters and would thus be
    readable and universal).

    Here's an example of a Python program.

    print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

    It prints:

    \"

    I see they are Unicode names. If I were to support such names I would
    have your example as something like

    print "\U:reverse solidus/\U:quotation mark/"

    but I prefer shorter names such as

    print "\bksl/\q11/"

    At least with Unicode someone has already defined a name for every
    character, but Unicode includes a lot of nonsense such as a character called

    NORTH INDIC FRACTION THREE SIXTEENTHS

    and a character called

    BOX DRAWINGS VERTICAL HEAVY AND RIGHT LIGHT

    not forgetting

    DENTISTRY SYMBOL LIGHT DOWN AND HORIZONTAL WITH CIRCLE

    and fortunately they didn't omit

    UPWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF DOWNWARDS TRIANGLE-HEADED
    ARROW

    so that's all right, then. Yes, that last one is the name of just one character!!!

    There are so many potential Unicode character names that the tables
    which would be needed just to convert names to codes would probably be
    larger than the rest of my compiler put together.



    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 15:24:17 2022
    On 14/11/2022 13:17, Dmitry A. Kazakov wrote:
    On 2022-11-14 13:45, Bart wrote:

    Clearly you get paid by the line. Even then, the code where a
    substring is copied into another location, which would require the
    double-stepping of the relevant pointer/indices of the earlier
    example, is missing here.

    It is there:

       This.Value := Line (Pointer..Index - 1);

    assigning array as a whole.

    OK, but this is then doing it in two passes. The original used only one
    pass. And the code when doing the transfer, whether as a loop or
    utilising a machine's block copy features, is not shown here.

    I don't have a +++ operator so I am not sure what that is supposed
    to mean. It's no valid in my language.

    It could a part of even simpler and brilliantly readable:

       ++p+++q+++r

    This is legal in Ada:

          a := +b;

    But for some reason, you can't write ++b or + +b, it has to be `a := +
    (+b)`. So while you can't do +b++++c, you can write:

          a:=+b+(+(+(+(c))));

    Do the parentheses make this acceptable?

    I don't understand the point. Why would you like to have two unary
    pluses in a row?

    I wanted to write deliberately confusing code the same way you did with ++p+++q+++r. Here I don't know which you intend to be ++, which are
    binary +, and which are unary plus. Although lexing rules usally mean tokenisation is like this:

    ++p++ +q++ +r

    while parsing rules would require that those +'s are binary adds, so
    better written like this:

    ++p++ + q++ + r

    Here, my while language won't make sense of ++p++, James' will, so this expression is simply adding those 3 terms, but you've deliberately
    chosen to write it so as to appear gobbledygook.






    My point is that you can legally combine any number of operators to
    result in nonsense-looking code, and the language can do little about it.

    No, the point is that no reasonable code should be nonsense-looking and conversely.

    Increments cross that line.

    But it performs a task that is needed, and in their absence, would
    simply be implemented, less efficiently and with more cluttery code,
    using other means.

    I have my own misgivings about it: there are in all 6 varieties of
    Increment (++x; --x; a:=++x; a:=--x; a:=x++; a:=x--), which are a pig to implement, and they spoil the lines of code like this:

    a[++n] := x
    b[n] := y
    c[n] := z

    Delete that first line, and you need to remember to transfer that ++n to
    the next line. And not to repeat that ++n as is easy to do.

    But I like the convenience too much. Also, in dynamic code, ++x can more
    easily map to a single byte code instruction, than x:=x+1 or even x+:=1.

    In my sources, it would be very unusual to have more than one ++ or --
    in any expression.




    They were perfectly acceptable in K&R C. C
    was a very large and quite complex language then. It was beautiful
    comparing to FORTRAN IV, but I could not use it on a 64K machine. A
    5-pass C compiler took an eternity.


    Did it? I never tried one. I only found out recently that they were that
    slow from reading reviews of C compilers of that era in Byte magazine.

    However that was also when I developed my own compilers on 64KB
    machines, and they never took more than a few seconds: I made sure of
    that. The only limiting factor would have been the speed of floppy disk transfer.

    Machines then were small and simple.
    Programs were tiny. *++p was a reasonably complex code.

    I can't remember if I had ++ then; I might have done. Then it would have
    given a significant advantage to write ++A[i] than A[i]:=A[i]+1. It
    still does now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Mon Nov 14 16:23:24 2022
    On 14/11/2022 11:47, Bart wrote:
    On 14/11/2022 09:26, David Brown wrote:
    On 13/11/2022 18:11, James Harris wrote:
    On 08/11/2022 16:29, David Brown wrote:

    I'm not worried about the compiler. As long as it can make sense of a
    piece of code then it can compile it. It's the human I am concerned
    about, especially the poor old maintenance programmer!


    You /say/ that, but you don't appear to believe it or be interested in
    making it happen.

    On the one side, you claim you want a clear language that is
    understandable for programmers and maintenance.  On the other side,
    you want to decide what "++E++" should mean, with random "^"
    characters thrown in for good measure.

    In-place, value-returning increment ops written as ++ and -- are common
    in languages.


    Yes. And bugs are common in programs. Being common does not
    necessarily mean it's a good idea.

    (It doesn't necessarily mean it's a bad idea either - I am not implying
    that increment and decrement are themselves a major cause of bugs! But
    mixing side-effects inside expressions /is/ a cause of bugs.)

    So are pointer-dereference operators in lower-level languages, whether written as * or ^.


    Same again.

    Once you have those two possibilities in a language, why shouldn't you
    define what combinations of those operators might mean?

    If you don't have them, you don't have a problem.

    Pointer dereferencing like this is not a requirement for a language. If
    you have "proper" arrays (I write it like that because the concept of
    "array" can be defined in many ways), multiple return values for
    functions, and a way to define data structures such as trees and lists,
    where else do you actually need pointers?

    Pure functional programming languages don't have pointers, or increment operators - they don't even have assignment. Functional programming
    languages are usually considered quite high level, but some slightly
    impure functional programming languages - such as OCaml - are very
    efficient compiled languages that rival C, Pascal, Ada, Fortran, etc.,
    for speed. OCaml /does/, AFAIUI (I am no expert in that language) have variables and pointers or references, but they are very rarely seen
    explicitly, and are intentionally cumbersome to use.


    Maybe the OP is designing a language in which pointer dereferencing and increment are expected to turn up so often that it is useful to combine
    them. But I think it is at lot more likely that this is a mistaken
    assumption based on limited experience with different kinds of
    programming languages. The result will be like your own language - a re-implementation of C or Pascal, with some benefits and some new disadvantages, and nothing of real innovation or interest. I am trying
    to make suggestions to break that pattern.


    (I just differ from James in thinking that successive *value-returning**
    ++ or -- operators, whether prefix or postfix, are not meaningful. I'd
    also think it would be bad form to chain them, but it is not practical
    to be ban at the syntax level.

    If you think it is "bad form", ban it. For any language that is going
    to be successful in a wider field, not just a plaything for one person,
    the man-hour effort in /using/ the language will far outweigh the effort /designing/ or /implementing/ it. Thus it does not matter if a good
    design choice is difficult to implement, as it will save effort in the
    long run.


    However, I have sometimes banned even `a+b` in some contexts, when the resulting value is unused.)

    Is your point that you shouldn't have either of those operators?

    Yes! What gave it away - the first three or four times I said as much?
    (And - since repeating myself seems helpful in this thread - I would
    also avoid other assignment operators or operators with side-effects. Assignment should be a statement, not an expression.)

    ++ and
    -- can be replaced at some inconvenience. But getting rid of dereference
    is harder; if P is a pointer:

         print P

    will this display the value of the pointer, or the value of its target?
    If only there was a way to specify that precisely!

    Note that when p and q are byte pointers, then *p++ = *q++ (or p++^ :=
    q++^) corresponds to the one-byte Z80 LDI instruction.

    So it's something so meaningless that that tiny 8-bit processor decided
    to give it its own instruction.

    The instruction set for a processor is - or should be - completely
    different from a programming language, even one that is relatively
    low-level. The source code might handle an array called "xs" and loop
    through it with "for x in xs do ... ", while the generated assembly
    might use a "*p++" instruction.

    If I want to write in assembly, I can write assembly - I don't want to
    do that, so I don't. Even in C, I don't make heavy use of "*p++", and
    very rarely in the middle of complex expressions - usually array access
    is clearer. (Of course I use increment operator, especially in loops,
    because that's how C is written. But a new language can do better than
    that.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Mon Nov 14 15:46:03 2022
    On 14/11/2022 11:06, Bart wrote:
    On 13/11/2022 16:55, James Harris wrote:
    On 10/11/2022 10:42, Bart wrote:

    It's clear that ++(A+B) can't work unless you change what ++ means
    (eg. ++A now means (A+1) because whatever ++ modifies is not
    accessible).

    It's similar if A were a struct. You could have

       A.F

    but you could not have

       (A + 4).F

    The LHS of the . operation, in this case, has to be an lvalue.

    That's not right. C allows functions to return structs; they are not
    lvalues, but you can apply ".":

        typedef struct{int x,y;}Point;

        Point F(void) {
            Point p;
            return p;
        }

        int main(void)
        {
            Point p;
            int a;

        //  F()=p;           // Not valid: not an lvalue
            a=F().x;         // Valid
        }

    Well, C structs, even those which are returned from a function, would
    naturally /have/ lvalues.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Mon Nov 14 15:29:50 2022
    On 14/11/2022 11:00, Bart wrote:
    On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
    On 2022-11-14 11:16, James Harris wrote:

       a[i++] := b[j++]

    Same question. What for?

    What do you mean, "What for?"?

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential
    component of many algorithms.

    Show me one that is not array assignment.

    What exactly is your objection: that there shouldn't be an increment
    operator at all?

    Then it's end of discussion. But if you allow a value-returning
    increment operator, then someone could use it in multiple places in the
    same expresion, together with other operators, and the language has to
    be able to deal with it.

    Note that any language that has reference parameters would allow this:

       a[postincr(i)] := b[postincr(j)]

    I've seen similar in C++ but

    postincr(i)

    looks potentially misleading. Yes, there's the function name as a clue
    but a programmer could easily read that expression without realising
    that the function could change variable i. The action of the interface
    is not clear from the syntax.

    Worse, if a language can change i in the above expression then a
    programmer may find he has to check *every* function that gets called to
    find out what other actual parameters can be changed.

    Have you thought of making it evident in the syntax with something like

    postincr(&i)

    where & doesn't mean "address of", as in C, but simply flags up that i
    is an inout parameter and could be changed by the callee? The absence of
    & would provide assurance that the variable mentioned would not be
    treated by the callee as inout.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Mon Nov 14 16:54:26 2022
    On 2022-11-14 16:24, Bart wrote:
    On 14/11/2022 13:17, Dmitry A. Kazakov wrote:
    On 2022-11-14 13:45, Bart wrote:

    Clearly you get paid by the line. Even then, the code where a
    substring is copied into another location, which would require the
    double-stepping of the relevant pointer/indices of the earlier
    example, is missing here.

    It is there:

        This.Value := Line (Pointer..Index - 1);

    assigning array as a whole.

    OK, but this is then doing it in two passes. The original used only one
    pass.

    The original code must allocate the result of unknown in advance length *upfront*. So it must use pool followed by reallocs. The code above
    allocates memory in the arena where the AST is kept, not in the pool,
    and first when the length of the identifier is already determined.

    And the code when doing the transfer, whether as a loop or
    utilising a machine's block copy features, is not shown here.

    That is up to the compiler optimization, which is the point.

    My point is that you can legally combine any number of operators to
    result in nonsense-looking code, and the language can do little about
    it.

    No, the point is that no reasonable code should be nonsense-looking
    and conversely.

    Increments cross that line.

    But it performs a task that is needed,

    The task not needed.

    They were perfectly acceptable in K&R C. C was a very large and quite
    complex language then. It was beautiful comparing to FORTRAN IV, but I
    could not use it on a 64K machine. A 5-pass C compiler took an eternity.

    Did it? I never tried one. I only found out recently that they were that
    slow from reading reviews of C compilers of that era in Byte magazine.

    They were written in C using inferior tools like Lex and Yacc. Under
    memory constraint it was customary to do multiple passes and keep
    intermediate results on the disk. Even assembler took 2 passes, I
    believe, and required up to 10 minutes time followed by 10 minutes linking.

    As I said, C was a complex language then. Its ill thought syntax
    contributed to compiler complexity. It took decades before C compilers
    could produce meaningful error messages.

    C was reasonably good for a medium sized PDP machine. It is awful now.
    Why people continue to borrow its worst features is beyond me.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Mon Nov 14 15:59:24 2022
    On 14/11/2022 15:23, David Brown wrote:
    On 14/11/2022 11:47, Bart wrote:

    Once you have those two possibilities in a language, why shouldn't you
    define what combinations of those operators might mean?

    If you don't have them, you don't have a problem.

    Pointer dereferencing like this is not a requirement for a language.  If
    you have "proper" arrays (I write it like that because the concept of
    "array" can be defined in many ways), multiple return values for
    functions, and a way to define data structures such as trees and lists,
    where else do you actually need pointers?

    I have a dynamic language with proper, first class lists, trees,
    strings, records, which take care of 90% of the pointer uses in a
    C-class language. Yet they can still be useful. This is an extract from
    a program that dumps the contents of an EXE file:

    coffptr:=makeref(pedata+coffoffset,imagefileheader)

    genstrln("Coff header: "+tostr(coffptr^))

    genstrln("Machine: "+tostr(coffptr^.machine,"h2"))
    genstrln("Nsections: "+tostr(coffptr^.nsections,"h2"))
    genstrln("Timestamp: "+tostr(coffptr^.timedatestamp,"h4"))
    genstrln("Symtab offset: "+tostr(coffptr^.symtaboffset))
    genstrln("Nsymbols: "+tostr(coffptr^.nsymbols))
    genstrln("Opt Hdr size: "+tostr(coffptr^.optheadersize))
    genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b"))
    genline()

    ('pedata' is a simple byte pointer; I've chosen to load the EXE into a
    simple memory block. 'makeref' takes an offset into that block and
    returns a pointer to that offset, interpreted as a pointer to a
    particular struct type. "^" is a deref op.

    'imagefileheader' is this type:

    type imagefileheader=struct
    wt_word machine # (u16)
    wt_word nsections
    wt_dword timedatestamp # (u32)
    wt_dword symtaboffset
    wt_dword nsymbols
    wt_word optheadersize
    wt_word characteristics
    end
    )



    Pure functional programming languages don't have pointers, or increment operators - they don't even have assignment.  Functional programming languages are usually considered quite high level, but some slightly
    impure functional programming languages - such as OCaml - are very
    efficient compiled languages that rival C, Pascal, Ada, Fortran, etc.,
    for speed.  OCaml /does/, AFAIUI (I am no expert in that language) have variables and pointers or references, but they are very rarely seen explicitly, and are intentionally cumbersome to use.

    That pure functional languages aren't used everywhere suggests they
    aren't great at the everyday tasks like the ones I deal with.

    (I would like to see Haskell's take on that task of decoding that EXE
    file, and dealing with that specific data layout. My example was the
    simplest part of it.

    For that matter, how would you do it in Python? Rather painfully I would imagine.)


    Maybe the OP is designing a language in which pointer dereferencing and increment are expected to turn up so often that it is useful to combine them.  But I think it is at lot more likely that this is a mistaken assumption based on limited experience with different kinds of
    programming languages.  The result will be like your own language - a re-implementation of C or Pascal, with some benefits and some new disadvantages, and nothing of real innovation or interest.

    Innovation these days seems to be:

    * To create incomprehensible languages that require several advanced
    degrees in mathematics, PL and type theory to understand

    * To make it as hard as possible to perform any tasks by removing
    features such as loops, mutable variables and functions with
    side-effects. (It's worth bearing in mind that most elements in a
    computer system: display, file-system and don't forget the memory, are necessarily mutable.)

    * To tie you up in knots with strictly typed everything (or in Rust,
    with its 'borrow checker'.

    No thanks. My innovation is keeping this stuff simple, accessible, fast,
    and at a human scale.

    I am trying
    to make suggestions to break that pattern.

    Look at Reddits PL forum. At least 90% of new languages there are
    FP-based. Yet when you look at the implementation languages, it tends to
    be a different story.


    (I just differ from James in thinking that successive
    *value-returning** ++ or -- operators, whether prefix or postfix, are
    not meaningful. I'd also think it would be bad form to chain them, but
    it is not practical to be ban at the syntax level.

    If you think it is "bad form", ban it.

    Obviously I can't ban `a + b`. Equally obviously, this code is pointless:

    a + b;

    Given an expression-based language, what would you do? In the past,
    after working with C, I would unthinkingly type:

    a = b

    in my syntax instead of a := b. This wasn't an error: it would compare a
    and b then discard the result. But it was a bug.

      For any language that is going
    to be successful in a wider field, not just a plaything for one person,
    the man-hour effort in /using/ the language will far outweigh the effort /designing/ or /implementing/ it.  Thus it does not matter if a good
    design choice is difficult to implement, as it will save effort in the
    long run.

    In my case, implementing a series of compilers over about 20 years took
    perhaps one year of /part-time/ work. The rest of it was using the
    language, even as an individual. So at least 20:1.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to James Harris on Mon Nov 14 16:18:00 2022
    James Harris <james.harris.1@gmail.com> writes:
    but I prefer shorter names such as
    print "\bksl/\q11/"

    I see. You could use Unicode names as a fallback for those
    characters for which you have not defined a name yet.

    The raw string literals in Python have a strange irregularity:
    One can use a backslash with no escape, except at the very end
    of the string, so one can write

    r"\abc" to get \abc, but one can not write

    r"abc\" to get abc\. Transcript:

    print( r"\abc" )
    |\abc
    print( r"abc\" )
    | File "<stdin>", line 1
    | print( r"abc\" )
    | ^
    |SyntaxError: EOL while scanning string literal


    I tried to design an escape mechanism for string literals in my
    language "Unotal" that has no irregularities.

    I already described this kind of string literals here on 2021-09-29,
    but in the meantime I have actually written an implementation.

    I also have written a tiny demo implementation in Python which
    can be used to experiment with the notation (see below).

    Here's a short summary of my notation:

    - a string literal is written using brackets, as in [abc],
    which means the string "abc" (3 characters: a, b, and c).

    - nested brackets are allowed: [abc[def]ghi] is "abc[def]ghi"
    (11 characters).

    - a single left bracket is written as "[`]". This is
    admittedly ugly, but it is very rare in most kinds of texts,
    so that conflicts with texts containing a literal "[`]"
    should be very rare.

    - a single right bracket is written as "[]`" for similar
    reasons.

    I tried to make sure that no other rules are needed and that
    every text can be encoded this way.

    Here's a small Python program with a tiny scanner.

    source code

    def scan( source ):
    return source[ 1: -1 ].replace( '[`]', '[' ).replace( '[]`', ']' )

    def demo( source ):
    print( f"{source:14}", scan( source ))

    print( f"{'literal':14}", 'meaning' )
    demo( '[def]' )
    demo( '[de[]f]' )
    demo( '[de[`]f]' )
    demo( '[[de[`]]f]' )
    demo( '[de[]`f]' )
    demo( '[de[`]`[]`f]' )
    demo( '[de[`][]``f]' )
    demo( '[`]]' )
    demo( '[[]`]' )

    output

    literal meaning
    [def] def
    [de[]f] de[]f
    [de[`]f] de[f
    [[de[`]]f] [de[]f
    [de[]`f] de]f
    [de[`]`[]`f] de[`]f
    [de[`][]``f] de[]`f
    [`]] `]
    [[]`] ]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Mon Nov 14 16:21:13 2022
    On 14/11/2022 11:11, Bart wrote:
    On 14/11/2022 10:44, James Harris wrote:
    On 14/11/2022 09:26, David Brown wrote:

    ...

    OK, let's take the combination you mentioned:

       ++E++

    I wonder why you see a problem with it. As I see it, it increments E
    before evaluation and then increments E after evaluation. What is so
    complex about that? It does exactly what it says on the tin, and in
    the order that it says it. Remember that unlike C I define the
    apparent order of evaluation so the expression is perfectly well formed.

    But does this have the same priorities as:

       op1 E op2

    (where op2 is commonly done first) or does it have special rules, so
    that in:

      --E++

    the -- is done first? If it's different, then what is the ordering when
    mixed with other unary ops?

    You explained somewhere the circumstances where you think this is
    meaningful, but I can't remember what the rules are and I can't find the exact post.

    The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
    postfix ++ or --.

    Your example would probably be better expressed as

    E - 1

    !

    As for the bigger picture of operator precedences, it's based on the transformations which operators naturally make to their operands. For
    example, comparisons (such as less-than) naturally take numbers and
    produce booleans so their precedences put them after numeric operators
    (+, * etc) and before boolean operators (and, or, not, etc). The natural
    series is (simplified)

    locations (such as field selection and array indexing)
    numbers (retrieved from those locations)
    comparisons (of those numbers)
    booleans

    The latter take booleans and produce other booleans. IOW booleans are
    the bottom of the chain.

    In summary, the operators which work on locations (lvalues) come first,
    then those which work on numbers, then the comparisons and then those
    which work on booleans. There could not be a more natural order!!

    That said, autoincrement, dereference and bitwise operators are a bit
    (sic) anomalous - e.g. one could make a case for bitwise coming before
    or after operations on numbers. I chose to put them before so "bit
    patterns" come between locations and numbers.


    This is the problem. You shouldn't need to stop and think. I make the
    rules simple by stipulating that value-returning ++ and -- only ever
    return rvalues.

    Because if they ever start to return lvalues, then this becomes possible:

       ++E := 0
       E++ := 0

    (Whichever one is legal in your scheme.) So I think there is little
    useful expressivity to be gained.

    Indeed, neither of those is useful.

    Interestingly, I just tried

    ++E = 0;

    with cc and c++ compilers. The first rejected it (as ++E needing to be
    an lvalue); the second accepted it. That's not a reflection of either
    compiler, BTW, but without checking it's probably more to do with the language/dialect definition. FWIW I expect the second compiler could be persuaded to issue a warning about unreachable code or suchlike.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Stefan Ram on Mon Nov 14 16:44:15 2022
    On 14/11/2022 16:18, Stefan Ram wrote:
    James Harris <james.harris.1@gmail.com> writes:
    but I prefer shorter names such as
    print "\bksl/\q11/"

    I see. You could use Unicode names as a fallback for those
    characters for which you have not defined a name yet.

    That's a possibility if explicitly creating a Unicode string but I see
    Unicode as "for printing rather than for processing". IMO something else
    is needed for the latter.


    The raw string literals in Python have a strange irregularity:
    One can use a backslash with no escape, except at the very end
    of the string, so one can write

    r"\abc" to get \abc, but one can not write

    r"abc\" to get abc\. Transcript:

    print( r"\abc" )
    |\abc
    print( r"abc\" )
    | File "<stdin>", line 1
    | print( r"abc\" )
    | ^
    |SyntaxError: EOL while scanning string literal


    Yes, that's odd.


    I tried to design an escape mechanism for string literals in my
    language "Unotal" that has no irregularities.

    I already described this kind of string literals here on 2021-09-29,
    but in the meantime I have actually written an implementation.

    I also have written a tiny demo implementation in Python which
    can be used to experiment with the notation (see below).

    Here's a short summary of my notation:

    - a string literal is written using brackets, as in [abc],
    which means the string "abc" (3 characters: a, b, and c).

    - nested brackets are allowed: [abc[def]ghi] is "abc[def]ghi"
    (11 characters).

    - a single left bracket is written as "[`]". This is
    admittedly ugly, but it is very rare in most kinds of texts,
    so that conflicts with texts containing a literal "[`]"
    should be very rare.

    - a single right bracket is written as "[]`" for similar
    reasons.

    I tried to make sure that no other rules are needed and that
    every text can be encoded this way.

    Here's a small Python program with a tiny scanner.

    source code

    def scan( source ):
    return source[ 1: -1 ].replace( '[`]', '[' ).replace( '[]`', ']' )

    def demo( source ):
    print( f"{source:14}", scan( source ))

    print( f"{'literal':14}", 'meaning' )
    demo( '[def]' )
    demo( '[de[]f]' )
    demo( '[de[`]f]' )
    demo( '[[de[`]]f]' )
    demo( '[de[]`f]' )
    demo( '[de[`]`[]`f]' )
    demo( '[de[`][]``f]' )
    demo( '[`]]' )
    demo( '[[]`]' )

    output

    literal meaning
    [def] def
    [de[]f] de[]f
    [de[`]f] de[f
    [[de[`]]f] [de[]f
    [de[]`f] de]f
    [de[`]`[]`f] de[`]f
    [de[`][]``f] de[]`f
    [`]] `]
    [[]`] ]

    That's cool, especially the brevity of your Python code! I remember the discussion - and the hours of my life it took to try to come up with a
    scheme I was happy with. As I remember, the big problem was
    metacharacters - those which are used to delimit the string or a
    character name - as I guess is true of your left-bracket example.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Nov 14 17:37:08 2022
    On 14/11/2022 14:47, David Brown wrote:
    On 14/11/2022 11:44, James Harris wrote:
    On 14/11/2022 09:26, David Brown wrote:
    On 13/11/2022 18:11, James Harris wrote:
    On 08/11/2022 16:29, David Brown wrote:

    ...

    The expression you mention is just one of a myriad of what you might
    consider to be potential nasties. If I am going to prohibit that one
    then what about all the others?


    Prohibit nasty ones.

    Enumerating the 'nasty ones' is the problem. If there are 20 dyadic
    operators then there are something like 400 ways of combining them. AIUI
    you want me to pick some of those 400 and tell the programmer "you
    cannot combine these two even where there's no type mismatch".



    A big step in that direction is to say that assignment is a statement,
    not an expression,

    Done that.

    and that variables cannot be changed by side-effects.

    I will not be doing that. I know you favour functional programming, and
    that's fine, but the language I am working on is unapologetically
    imperative.

     (How you relate this to function calls is a related and complex issue
    that I have been glossing over here.  An idea would be to distinguish between "procedures" that may have side effects, and "functions" that do not.)

    That means there is no such thing as an "increment" operator - post or pre.

    It also /hugely/ simplifies the language - both for the programmer, and
    for the implementer.  If expressions have no side-effects, they can be duplicated, split up, re-arranged, moved around in code, all without
    affected the behaviour of the program.

    This needs a separate discussion, David. It is far too big a topic for
    this thread. (Feel free to start a new one; I have plenty to say!) All I
    can say here is as I said above, the language I am working on is
    imperative.

    ...

    These two statements go together as well as Dmitry's toaster and
    toilet brush.  It doesn't matter how precisely you define how the
    combination can be used and what it does, it is still not a good or
    useful thing.

    OK, let's take the combination you mentioned:

       ++E++

    I wonder why you see a problem with it. As I see it, it increments E
    before evaluation and then increments E after evaluation. What is so
    complex about that? It does exactly what it says on the tin, and in
    the order that it says it. Remember that unlike C I define the
    apparent order of evaluation so the expression is perfectly well formed.


    The very fact that you are discussing how to define it means it is not
    clear and obvious.

    On that I must disagree. One cannot expect to understand a language
    without at least learning the basics. Consider


    a * b + c

    I would parse that as (a*b)+c but not all languages would. As a reader
    of infix code you would have to know the order in which operators would
    be applied. You have to know the basics of a language in order to read
    code written in it.


    It is not obvious which order the increments happen,
    or if the order is defined, or if the order matters.  It is not obvious
    what the return value should be.  It is not obvious where you have
    lvalues or rvalues (not that a language should necessarily have such concepts).  It is not obvious what happens to E.

    Of course it's not obvious to someone who doesn't know the rules of the language. A language designer cannot produce a language with no rules.
    What a language designer /can/ do is to make the rules simple and understandable - but someone who reads the code still has to understand
    what the rules are.

    As for the rules we have been discussing here those I have come up with
    are, in the main, the ones you would be familiar with; even the new ones
    are logical and simple. Once you understand them it's incredibly easy to
    parse an expression, even of the kind which, to you, looks like gibberish.

    In fact, I have to say that neither of those you have objected to should
    really look like gibberish, even to the uninitiated. Would you find

    ++A + B++

    objectionable? If not, I cannot see why you would find

    ++E + E++

    so objectionable, either. Isn't the only new thing you, as a reader of
    such code, would need to know is the operations are carried out in?
    Otherwise it's just like the preceding expression.


    ...

    There's a very tempting myth in language design that /defining/
    behaviour is key - that gibberish and incorrect code can somehow be made "correct" by defining its behaviour.  You are not alone in this - lots
    of languages try to achieve "no undefined behaviour" by defining the behaviour of everything instead of banning things that have no correct behaviour.

    My reason for defining /apparent/ code behaviour is to ensure
    computational consistency on different platforms. Who wouldn't want that?


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 14 18:28:01 2022
    On 14/11/2022 16:14, James Harris wrote:
    On 14/11/2022 10:24, Stefan Ram wrote:
    James Harris <james.harris.1@gmail.com> writes:
    Rather than allowing non-ASCII in source I came up with a scheme of what >>> you might call 'named characters' extending the backslash idea of C to
    allow names instead of single characters after the backslash. It's off
    topic for this thread but it allows non-ASCII characters to be named
    (such that the names consist of ASCII characters and would thus be
    readable and universal).

       Here's an example of a Python program.

    print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

       It prints:

    \"

    I see they are Unicode names. If I were to support such names I would
    have your example as something like

      print "\U:reverse solidus/\U:quotation mark/"

    but I prefer shorter names such as

      print "\bksl/\q11/"

    At least with Unicode someone has already defined a name for every
    character, but Unicode includes a lot of nonsense such as a character
    called


    How about using the HTML character entity names? That would be
    "&bsol;&quot;". These are a good deal shorter than Unicode names but
    are vastly better than inventing your own names.

    <https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Nov 14 18:02:58 2022
    On 14/11/2022 17:28, David Brown wrote:
    On 14/11/2022 16:14, James Harris wrote:
    On 14/11/2022 10:24, Stefan Ram wrote:
    James Harris <james.harris.1@gmail.com> writes:
    Rather than allowing non-ASCII in source I came up with a scheme of
    what
    you might call 'named characters' extending the backslash idea of C to >>>> allow names instead of single characters after the backslash. It's off >>>> topic for this thread but it allows non-ASCII characters to be named
    (such that the names consist of ASCII characters and would thus be
    readable and universal).

       Here's an example of a Python program.

    print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

       It prints:

    \"

    I see they are Unicode names. If I were to support such names I would
    have your example as something like

       print "\U:reverse solidus/\U:quotation mark/"

    but I prefer shorter names such as

       print "\bksl/\q11/"

    At least with Unicode someone has already defined a name for every
    character, but Unicode includes a lot of nonsense such as a character
    called


    How about using the HTML character entity names?  That would be "&bsol;&quot;".  These are a good deal shorter than Unicode names but
    are vastly better than inventing your own names.

    <https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>

    Thanks for the pointer. I /could/ include them with my syntax along the
    lines of

    "\H:Backslash/\H:quot/"

    where H: indicates HTML names. I did consider them before but had to
    reject them. I cannot remember all the reasons why, now, but from taking
    a quick look they appear, like Unicode, to be more for printing than for processing. For example, there is frac45 for 4/5 but such a scheme
    allows only the fractions which are predefined. Also, HTML names combine diacritics with characters (e.g. yacute) whereas AISI it's important for
    them to be kept separate.

    What's needed, IMO, is a set of names intended for /processing/ rather
    than for typesetting.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Mon Nov 14 18:46:20 2022
    On 14/11/2022 16:59, Bart wrote:
    On 14/11/2022 15:23, David Brown wrote:
    On 14/11/2022 11:47, Bart wrote:

    Once you have those two possibilities in a language, why shouldn't
    you define what combinations of those operators might mean?

    If you don't have them, you don't have a problem.

    Pointer dereferencing like this is not a requirement for a language.
    If you have "proper" arrays (I write it like that because the concept
    of "array" can be defined in many ways), multiple return values for
    functions, and a way to define data structures such as trees and
    lists, where else do you actually need pointers?

    I have a dynamic language with proper, first class lists, trees,
    strings, records, which take care of 90% of the pointer uses in a
    C-class language. Yet they can still be useful. This is an extract from
    a program that dumps the contents of an EXE file:

        coffptr:=makeref(pedata+coffoffset,imagefileheader)

        genstrln("Coff header:     "+tostr(coffptr^))

        genstrln("Machine:         "+tostr(coffptr^.machine,"h2"))
        genstrln("Nsections:       "+tostr(coffptr^.nsections,"h2"))
        genstrln("Timestamp:       "+tostr(coffptr^.timedatestamp,"h4"))
        genstrln("Symtab offset:   "+tostr(coffptr^.symtaboffset))
        genstrln("Nsymbols:        "+tostr(coffptr^.nsymbols))
        genstrln("Opt Hdr size:    "+tostr(coffptr^.optheadersize))
        genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b"))
        genline()


    None of that needs pointers or references.

    Initialise a "coff_header" read-only unmutable variable from a slice of
    the memory array holding the image. Then there is no reference or
    pointer in the source code - you are using local data. If the image is
    also held as a unmutable data, then the local "variable" can be
    /implemented/ as a pointer for efficiency - but /logically/ in the
    source code it is its own entity. (See the benefits of having
    everything unmutable and read-only unless you can't avoid it?)


    Pure functional programming languages don't have pointers, or
    increment operators - they don't even have assignment.  Functional
    programming languages are usually considered quite high level, but
    some slightly impure functional programming languages - such as OCaml
    - are very efficient compiled languages that rival C, Pascal, Ada,
    Fortran, etc., for speed.  OCaml /does/, AFAIUI (I am no expert in
    that language) have variables and pointers or references, but they are
    very rarely seen explicitly, and are intentionally cumbersome to use.

    That pure functional languages aren't used everywhere suggests they
    aren't great at the everyday tasks like the ones I deal with.

    That they are used /somewhere/ suggests that they work fine for many
    tasks. For example, if you ever make a phone call, it is likely that it
    passes through equipment running code in Erlang - a functional
    programming language.

    Functional programming languages have a reputation for being difficult
    to learn and use. That's not entirely undeserved, but they have many advantages over imperative languages. You spend more time learning
    them, and less time fixing bugs in your code. (They are not a good
    match for small microcontrollers, however.)


    (I would like to see Haskell's take on that task of decoding that EXE
    file, and dealing with that specific data layout. My example was the
    simplest part of it.

    For that matter, how would you do it in Python? Rather painfully I would imagine.)


    Nah, it's easy enough in Python. A bytes array to hold the original
    image, and a "struct.unpack" on a slice of the array to pull out the
    contents.

    I'd have to look up some more for the Haskell syntax.


    Maybe the OP is designing a language in which pointer dereferencing
    and increment are expected to turn up so often that it is useful to
    combine them.  But I think it is at lot more likely that this is a
    mistaken assumption based on limited experience with different kinds
    of programming languages.  The result will be like your own language -
    a re-implementation of C or Pascal, with some benefits and some new
    disadvantages, and nothing of real innovation or interest.

    Innovation these days seems to be:

    * To create incomprehensible languages that require several advanced
    degrees in mathematics, PL and type theory to understand

    * To make it as hard as possible to perform any tasks by removing
    features such as loops, mutable variables and functions with
    side-effects. (It's worth bearing in mind that most elements in a
    computer system: display, file-system and don't forget the memory, are necessarily mutable.)

    * To tie you up in knots with strictly typed everything (or in Rust,
    with its 'borrow checker'.

    No thanks. My innovation is keeping this stuff simple, accessible, fast,
    and at a human scale.


    I disagree with your scepticism, but I agree that there are lots of
    languages with different paradigms for different purposes.

    However, making yet-another-C is IMHO a pointless exercise. It might be
    better in some ways, but not enough to make it worth the effort.

    I am trying to make suggestions to break that pattern.

    Look at Reddits PL forum. At least 90% of new languages there are
    FP-based. Yet when you look at the implementation languages, it tends to
    be a different story.


    (I just differ from James in thinking that successive
    *value-returning** ++ or -- operators, whether prefix or postfix, are
    not meaningful. I'd also think it would be bad form to chain them,
    but it is not practical to be ban at the syntax level.

    If you think it is "bad form", ban it.

    Obviously I can't ban `a + b`. Equally obviously, this code is pointless:

        a + b;


    You can ban that. Rule number 42 - the result of an expression must be assigned to a variable, used in another expression, or passed as the
    argument to a function call. No problem.

    Given an expression-based language, what would you do? In the past,
    after working with C, I would unthinkingly type:

        a = b

    in my syntax instead of a := b. This wasn't an error: it would compare a
    and b then discard the result. But it was a bug.


    Any decent C compile (with the right options) will complain about the C equivalent, "a == b;", as a statement with no effect. It could just as
    easily be made an error in a language.

    And if you follow my suggestion that expressions can't have
    side-effects, then it's easy to distinguish between "statements" and "expressions" because you no longer have a C-style "expression statement".

      For any language that is going to be successful in a wider field,
    not just a plaything for one person, the man-hour effort in /using/
    the language will far outweigh the effort /designing/ or
    /implementing/ it.  Thus it does not matter if a good design choice is
    difficult to implement, as it will save effort in the long run.

    In my case, implementing a series of compilers over about 20 years took perhaps one year of /part-time/ work. The rest of it was using the
    language, even as an individual. So at least 20:1.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Mon Nov 14 18:43:20 2022
    On 14/11/2022 15:24, Bart wrote:
    On 14/11/2022 13:17, Dmitry A. Kazakov wrote:
    On 2022-11-14 13:45, Bart wrote:

    ...

    My point is that you can legally combine any number of operators to
    result in nonsense-looking code, and the language can do little about
    it.

    No, the point is that no reasonable code should be nonsense-looking
    and conversely.

    Increments cross that line.

    But it performs a task that is needed, and in their absence, would
    simply be implemented, less efficiently and with more cluttery code,
    using other means.

    I have my own misgivings about it: there are in all 6 varieties of
    Increment (++x; --x; a:=++x; a:=--x; a:=x++; a:=x--), which are a pig to implement, and they spoil the lines of code like this:

    I thought you correctly said before that ++x was (++x; x) and x++ was
    (t:=x; x++; rval(t)). Adding -- that's still only four varieties, IMO.


        a[++n] := x
        b[n] := y
        c[n] := z

    Delete that first line, and you need to remember to transfer that ++n to
    the next line. And not to repeat that ++n as is easy to do.

    Interjecting somewhat, as a programmer I'd probably write that as

    ++n
    a[n] = x
    b[n] = y
    c[n] = z

    because to me it's clearer and would make code maintenance easier,


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 14 19:41:14 2022
    On 2022-11-14 19:26, James Harris wrote:
    On 14/11/2022 11:29, Dmitry A. Kazakov wrote:
    On 2022-11-14 12:03, James Harris wrote:
    On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
    On 2022-11-14 11:16, James Harris wrote:
    On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

    ...

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential
    component of many algorithms.

    Show me one that is not array assignment.

       if is_name_first(b[j])
         a[i++] = b[j++]
         rep while is_name_follow(b[j])
           a[i++] = b[j++]
         end rep
         a[i] = 0
         return TOK_NAME
       end if

    Now, what don't you like about the ++ operators in that? How would
    you prefer to write it?

     From parser production code:

    procedure Get_Identifier
               (  Code     : in out Source'Class;
                  Line     : String;
                  Pointer  : Integer;
                  Argument : out Tokens.Argument_Token
               )  is
        Index     : Integer := Pointer + 1;
        Malformed : Boolean := False;
        Underline : Boolean := False;
        Symbol    : Character;
    begin
        while Index <= Line'Last loop
           Symbol := Line (Index);
           if Is_Alphanumeric (Symbol) then
              Underline := False;
           elsif '_' = Symbol then
              Malformed := Malformed or Underline;
              Underline := True;
           else
              exit;
           end if;
           Index := Index + 1;
        end loop;
        Malformed := Malformed or Underline;
        Set_Pointer (Code, Index);
        Argument.Location := Link (Code);
        Argument.Value := new Identifier (Index - Pointer);
        declare
           This : Identifier renames Identifier (Argument.Value.all);
        begin
           This.Location  := Argument.Location;
           This.Malformed := Malformed;
           This.Value     := Line (Pointer..Index - 1);
        end;
    end Get_Identifier;

    Well, that's an astonishingly long piece of code, Dmitry,

    Because it is a production code. It must deal with different types of
    sources, with error handling and syntax tree generation.

    and if I read
    it correctly it doesn't even check whether it begins on a name-first character: that has to be decided before the procedure starts!

    Exactly, because that is already established by the parser since the
    language grammar distinguishes identifiers by the first character.

    But I am not sure I do understand it. Even allowing for what I believe
    is meant to be double underscore detection (except at the start and
    end?) it takes significantly more study than the simple name-first, name-follow code which preceded it.

    That's how the language defines it. This example is from an Ada 95
    parser. Ada 95 RM 2.3:

    https://www.adahome.com/rm95/rm9x-02-03.html

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 18:26:11 2022
    On 14/11/2022 11:29, Dmitry A. Kazakov wrote:
    On 2022-11-14 12:03, James Harris wrote:
    On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
    On 2022-11-14 11:16, James Harris wrote:
    On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

    ...

    Show me the algorithm.

    There's no particular algorithm; the construct is a potential
    component of many algorithms.

    Show me one that is not array assignment.

       if is_name_first(b[j])
         a[i++] = b[j++]
         rep while is_name_follow(b[j])
           a[i++] = b[j++]
         end rep
         a[i] = 0
         return TOK_NAME
       end if

    Now, what don't you like about the ++ operators in that? How would you
    prefer to write it?

    From parser production code:

    procedure Get_Identifier
              (  Code     : in out Source'Class;
                 Line     : String;
                 Pointer  : Integer;
                 Argument : out Tokens.Argument_Token
              )  is
       Index     : Integer := Pointer + 1;
       Malformed : Boolean := False;
       Underline : Boolean := False;
       Symbol    : Character;
    begin
       while Index <= Line'Last loop
          Symbol := Line (Index);
          if Is_Alphanumeric (Symbol) then
             Underline := False;
          elsif '_' = Symbol then
             Malformed := Malformed or Underline;
             Underline := True;
          else
             exit;
          end if;
          Index := Index + 1;
       end loop;
       Malformed := Malformed or Underline;
       Set_Pointer (Code, Index);
       Argument.Location := Link (Code);
       Argument.Value := new Identifier (Index - Pointer);
       declare
          This : Identifier renames Identifier (Argument.Value.all);
       begin
          This.Location  := Argument.Location;
          This.Malformed := Malformed;
          This.Value     := Line (Pointer..Index - 1);
       end;
    end Get_Identifier;

    Well, that's an astonishingly long piece of code, Dmitry, and if I read
    it correctly it doesn't even check whether it begins on a name-first
    character: that has to be decided before the procedure starts!

    But I am not sure I do understand it. Even allowing for what I believe
    is meant to be double underscore detection (except at the start and
    end?) it takes significantly more study than the simple name-first,
    name-follow code which preceded it.

    Nevertheless, I take your point about you preferring

    Index := Index + 1;

    over

    Index++

    and your preference for a separate step to transfer the characters.

    ...

    Again, IMO it's important for the language to provide such
    pseudofunctions so that a programmer's code can be made clearer,
    simpler, and more readable.

    Like this one?

        ++p+++

    I don't have a +++ operator so I am not sure what that is supposed to
    mean. It's no valid in my language.

    It could a part of even simpler and brilliantly readable:

      ++p+++q+++r

    In my language that's also a syntax error. You'd have to separate the
    operators to make it legal.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Mon Nov 14 18:43:35 2022
    On 14/11/2022 17:46, David Brown wrote:
    On 14/11/2022 16:59, Bart wrote:

    I have a dynamic language with proper, first class lists, trees,
    strings, records, which take care of 90% of the pointer uses in a
    C-class language. Yet they can still be useful. This is an extract
    from a program that dumps the contents of an EXE file:

         coffptr:=makeref(pedata+coffoffset,imagefileheader)

         genstrln("Coff header:     "+tostr(coffptr^))

         genstrln("Machine:         "+tostr(coffptr^.machine,"h2")) >>      genstrln("Nsections:       "+tostr(coffptr^.nsections,"h2")) >>      genstrln("Timestamp:       "+tostr(coffptr^.timedatestamp,"h4"))
         genstrln("Symtab offset:   "+tostr(coffptr^.symtaboffset))
         genstrln("Nsymbols:        "+tostr(coffptr^.nsymbols))
         genstrln("Opt Hdr size:    "+tostr(coffptr^.optheadersize))
         genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b")) >>      genline()


    None of that needs pointers or references.

    Initialise a "coff_header" read-only unmutable variable from a slice of
    the memory array holding the image.

    At minimum, this tasks needs to the ability to take a block of bytes,
    and variously interpret parts of it as primitive numeric types of
    specific widths and signedness.

    That can be helped by having pointers to such types. It can be further
    helped by allowing a struct type which is a collection of such types in
    a particular layout. And a way to transfer data from arbitrary bytes to
    that struct object. Or to map the address of that struct into the middle
    of that block. (Or as I do it above, set a pointer to that struct to the
    middle of the block.)

    This is stuff which is meat-and-drink to a lower-level language like C,
    or like mine (even my scripting language).

    It requires some effort in Python and the result will be clunky (and
    probably require some add-on modules). While a functional language will struggle (to be accurate, the programmer will struggle because they've
    chosen the wrong language).

    Here's a more challenging record type that comes up in OBJ files:

    type imagesymbol=struct
    union
    stringz*8 shortname
    struct
    u32 short
    u32 long
    end
    u64 longname
    end
    u32 value
    u16 sectionno
    u16 symtype
    byte storageclass
    byte nauxsymbols
    end

    (Again, this is defined directly in my /dynamic/ scripting language.)

    No thanks. My innovation is keeping this stuff simple, accessible,
    fast, and at a human scale.


    I disagree with your scepticism, but I agree that there are lots of
    languages with different paradigms for different purposes.

    However, making yet-another-C is IMHO a pointless exercise.  It might be better in some ways, but not enough to make it worth the effort.

    If you're going to use a C-class language, then why not one with some
    modern refinements? That's what I do.

    (For example, default 64-bit everything; a module scheme; value arrays;
    sane type syntax; whole-program compilation; slices; expression-based
    (see below); a 'byte' type! )

    Obviously I can't ban `a + b`. Equally obviously, this code is pointless:

         a + b;


    You can ban that.  Rule number 42 - the result of an expression must be assigned to a variable, used in another expression, or passed as the
    argument to a function call.  No problem.

    This is effectively what I did. The only expressions allowed as
    standalone statements were assignments; function calls; increments.
    Anything else required an `eval` prefix to force evaluation.

    Given an expression-based language, what would you do? In the past,
    after working with C, I would unthinkingly type:

         a = b

    in my syntax instead of a := b. This wasn't an error: it would compare
    a and b then discard the result. But it was a bug.


    Any decent C compile (with the right options) will complain about the C equivalent, "a == b;", as a statement with no effect.  It could just as easily be made an error in a language.

    And if you follow my suggestion that expressions can't have
    side-effects, then it's easy to distinguish between "statements" and "expressions" because you no longer have a C-style "expression statement".

    Because my early languages were loosely based on Algol68, not C, they
    were expression-based. Later I simplified to distinct statements and expressions, but now I've gone back.

    Now both my languages are expression-based. That is, statements and
    expressions are interchangeable. That's supposed to be good, right,
    because FP languages work the same way? I think expression-based are
    regarded as superior.

    But it does make some things harder.

    For a start, any expression can have side-effects, because an expression
    can be or can include what you might call a statment.

    So I can get rid of ++ here:

    A[i++] := 0

    but I could simply write it like this:

    A[t:=i; i:=i+1; t] := 0

    In which case I might as well keep the ++.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Mon Nov 14 19:10:04 2022
    On 14/11/2022 18:43, James Harris wrote:
    On 14/11/2022 15:24, Bart wrote:
    On 14/11/2022 13:17, Dmitry A. Kazakov wrote:
    On 2022-11-14 13:45, Bart wrote:

    ...

    My point is that you can legally combine any number of operators to
    result in nonsense-looking code, and the language can do little
    about it.

    No, the point is that no reasonable code should be nonsense-looking
    and conversely.

    Increments cross that line.

    But it performs a task that is needed, and in their absence, would
    simply be implemented, less efficiently and with more cluttery code,
    using other means.

    I have my own misgivings about it: there are in all 6 varieties of
    Increment (++x; --x; a:=++x; a:=--x; a:=x++; a:=x--), which are a pig
    to implement, and they spoil the lines of code like this:

    I thought you correctly said before that ++x was (++x; x) and x++ was
    (t:=x; x++; rval(t)). Adding -- that's still only four varieties, IMO.

    There are four value-returning versions, which are implemented
    differently from non-value-returning or standalone versions:

    ++x
    --x

    Here, x++ and x-- are treated as ++x and --x. If I write this in my
    dynamic language (using p^, as a simple variable results in a dedicated bytecode for the first two):

    ++(p^)
    (p^)++
    a:=++(p^)
    a:=(p^)++

    Then the generated bytecode is this (annotated):

    pushm p # ++(p^)
    incrptr

    pushm p # (p^)++
    incrptr

    pushm p # a:=++(p^)
    incrload
    popm a

    pushm p # a:=(p^)++
    loadincr
    popm a

    Three lots of code (plus three more for --)

    (Another operator with separate value/non-value versions is assignment.
    With augmented assignment, I only support the non-value version, so a :=
    (x +:= y) is not allowed.

    I also split functions into non-value-returning procs, and
    value-returning functions.

    Calling a proc where a value is expected won't work. Calling a function
    then discarding its value is sometimes implemented differently - like a
    proc.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 14 20:51:22 2022
    On 14/11/2022 19:02, James Harris wrote:
    On 14/11/2022 17:28, David Brown wrote:
    On 14/11/2022 16:14, James Harris wrote:
    On 14/11/2022 10:24, Stefan Ram wrote:
    James Harris <james.harris.1@gmail.com> writes:
    Rather than allowing non-ASCII in source I came up with a scheme of
    what
    you might call 'named characters' extending the backslash idea of C to >>>>> allow names instead of single characters after the backslash. It's off >>>>> topic for this thread but it allows non-ASCII characters to be named >>>>> (such that the names consist of ASCII characters and would thus be
    readable and universal).

       Here's an example of a Python program.

    print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

       It prints:

    \"

    I see they are Unicode names. If I were to support such names I would
    have your example as something like

       print "\U:reverse solidus/\U:quotation mark/"

    but I prefer shorter names such as

       print "\bksl/\q11/"

    At least with Unicode someone has already defined a name for every
    character, but Unicode includes a lot of nonsense such as a character
    called


    How about using the HTML character entity names?  That would be
    "&bsol;&quot;".  These are a good deal shorter than Unicode names but
    are vastly better than inventing your own names.

    <https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>

    Thanks for the pointer. I /could/ include them with my syntax along the
    lines of

      "\H:Backslash/\H:quot/"

    "&Backslash;" (Unicode U+2216 Set minus) is a different character ∖ from
    \, which is "&bsol;" (Unicode U++005C Reverse solidus).


    where H: indicates HTML names. I did consider them before but had to
    reject them. I cannot remember all the reasons why, now, but from taking
    a quick look they appear, like Unicode, to be more for printing than for processing. For example, there is frac45 for 4/5 but such a scheme
    allows only the fractions which are predefined. Also, HTML names combine diacritics with characters (e.g. yacute) whereas AISI it's important for
    them to be kept separate.

    What's needed, IMO, is a set of names intended for /processing/ rather
    than for typesetting.

    That makes little sense to me. Are you intending to invent your own
    character encoding, or your own fonts here? Are you planning on making
    your own display or print system?

    The character "⅘" is easily typed on *nix keyboards with a compose key
    (with common setups), HTML has it as "&frac45;", Unicode has it as
    "U+2158 Vulgar fraction four fifths". They support fractions that are
    common enough to exist as characters in fonts. You can't add your own
    personal "twenty two sevenths" character and expect it to turn up when
    printed, nor will you ever come across it when reading files or
    documents from elsewhere. (Of course you can choose to support only a
    subset of the HTML or Unicode names.)

    And what do you mean by "processing", and what makes you think it is
    remotely relevant to separate diacritics from characters? In some
    languages, "ä" is a letter "a" with a diacritic, in others it is an
    entirely distinct letter of its own. The same applies to lots of
    characters. Unicode has a complex system of "normalisation" for
    relating combining diacritics and letters into single combined Unicode characters, which are often a better choice for display than you would
    get with by displaying two individual graphemes.

    Are you going to try to split up Chinese or Korean characters into their components? What about Mongolian, or Arabic?

    If you try to mess with text or characters for "processing", you'll get
    it wrong. The least bad thing you can do is make it convenient to use -
    to input and output. That means UTF-8 and a way to type them in source
    code (like C's "\uNNNN" or HTML's "&xNNNN;") and optionally a way to
    name them (such as using HTML's names) when they are inconvenient to
    type directly, as the Unicode hex numbers are hard to remember.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 14 21:45:37 2022
    On 14/11/2022 18:37, James Harris wrote:
    On 14/11/2022 14:47, David Brown wrote:
    On 14/11/2022 11:44, James Harris wrote:
    On 14/11/2022 09:26, David Brown wrote:
    On 13/11/2022 18:11, James Harris wrote:
    On 08/11/2022 16:29, David Brown wrote:

    ...

    The expression you mention is just one of a myriad of what you might
    consider to be potential nasties. If I am going to prohibit that one
    then what about all the others?


    Prohibit nasty ones.

    Enumerating the 'nasty ones' is the problem. If there are 20 dyadic
    operators then there are something like 400 ways of combining them. AIUI
    you want me to pick some of those 400 and tell the programmer "you
    cannot combine these two even where there's no type mismatch".


    Then enable nice ones, and make everything else an error. (That's the
    usual way to do it.)



    A big step in that direction is to say that assignment is a statement,
    not an expression,

    Done that.

    and that variables cannot be changed by side-effects.

    I will not be doing that. I know you favour functional programming, and that's fine, but the language I am working on is unapologetically
    imperative.


    Many unapologetically imperative languages do not allow side-effects in expressions. It is a natural rule for functional programming languages,
    since pure functional programming does not have side-effects or
    modifiable variables at all. But there is absolutely /nothing/ about
    being an imperative language that suggests you need to allow
    side-effects or assignments /within/ expressions.

      (How you relate this to function calls is a related and complex
    issue that I have been glossing over here.  An idea would be to
    distinguish between "procedures" that may have side effects, and
    "functions" that do not.)

    That means there is no such thing as an "increment" operator - post or
    pre.

    It also /hugely/ simplifies the language - both for the programmer,
    and for the implementer.  If expressions have no side-effects, they
    can be duplicated, split up, re-arranged, moved around in code, all
    without affected the behaviour of the program.

    This needs a separate discussion, David. It is far too big a topic for
    this thread. (Feel free to start a new one; I have plenty to say!) All I
    can say here is as I said above, the language I am working on is
    imperative.


    And being imperative has nothing to do with it, as I said above.

    As for new threads, that's up to you - it's your language, and you can
    discuss the aspects that interest you. I'm just trying to make
    suggestions based on long experience, with suggestions for things that I
    think will make a "better" language (for some value of "better") than
    existing ones. I've used perhaps 30 programming languages. I don't
    claim to have used them all extensively, and I have forgotten a /lot/,
    but I think it is more than most people. And I've seen a lot of bad
    code in many languages (some of the bad code I've written myself).

    I'm hoping that you are trying to do something other than making a new
    C. I'm hoping you are not trying to make an "ultimate language for
    everyone and every use on every target". I'm hoping you are not trying
    to re-write everything related to language development. I'm hoping you
    are not trying to invent the "perfect" language after learning just one
    or two existing languages.


    These two statements go together as well as Dmitry's toaster and
    toilet brush.  It doesn't matter how precisely you define how the
    combination can be used and what it does, it is still not a good or
    useful thing.

    OK, let's take the combination you mentioned:

       ++E++

    I wonder why you see a problem with it. As I see it, it increments E
    before evaluation and then increments E after evaluation. What is so
    complex about that? It does exactly what it says on the tin, and in
    the order that it says it. Remember that unlike C I define the
    apparent order of evaluation so the expression is perfectly well formed. >>>

    The very fact that you are discussing how to define it means it is not
    clear and obvious.

    On that I must disagree.

    You think it is /obvious/ what "++E++" means?

    One cannot expect to understand a language
    without at least learning the basics. Consider


      a * b + c

    I would parse that as (a*b)+c but not all languages would. As a reader
    of infix code you would have to know the order in which operators would
    be applied. You have to know the basics of a language in order to read
    code written in it.


    Agreed. But as soon as you say "infix operators with common
    mathematical precedence for common operations", it's done. You've distinguished it from post-fix notation of Forth (where you would write
    "a b * c + "), from strict left-to-write languages (where "a + b * c"
    would parse "(a + b) * c"), and from pre-fix notation languages (where
    you would perhaps write "+(*(a, b), c)" ).

    "++E++" remains meaningless to experienced programmers.


    It is not obvious which order the increments happen, or if the order
    is defined, or if the order matters.  It is not obvious what the
    return value should be.  It is not obvious where you have lvalues or
    rvalues (not that a language should necessarily have such concepts).
    It is not obvious what happens to E.

    Of course it's not obvious to someone who doesn't know the rules of the language. A language designer cannot produce a language with no rules.
    What a language designer /can/ do is to make the rules simple and understandable - but someone who reads the code still has to understand
    what the rules are.


    I do not think "++E++" will be clear and obvious to someone who /does/
    know the rules of the language. Remember, this is not something that
    will be commonly used and become idiomatic, like "*p++ = *q++;" is in C.
    Programmers will always need to look up the details - that's why it is
    not a good idea.

    As for the rules we have been discussing here those I have come up with
    are, in the main, the ones you would be familiar with; even the new ones
    are logical and simple. Once you understand them it's incredibly easy to parse an expression, even of the kind which, to you, looks like gibberish.

    In fact, I have to say that neither of those you have objected to should really look like gibberish, even to the uninitiated. Would you find

      ++A + B++

    objectionable?

    Yes.

    If not, I cannot see why you would find

      ++E + E++

    so objectionable, either.

    It is worse, because you are changing the same thing twice in an
    unordered manner.

    Isn't the only new thing you, as a reader of
    such code, would need to know is the operations are carried out in?
    Otherwise it's just like the preceding expression.


    ...

    There's a very tempting myth in language design that /defining/
    behaviour is key - that gibberish and incorrect code can somehow be
    made "correct" by defining its behaviour.  You are not alone in this -
    lots of languages try to achieve "no undefined behaviour" by defining
    the behaviour of everything instead of banning things that have no
    correct behaviour.

    My reason for defining /apparent/ code behaviour is to ensure
    computational consistency on different platforms. Who wouldn't want that?


    My reason for disallowing such expressions is to ensure computational consistency on different platforms, /including/ the human reader at a
    glance. Who wouldn't want /that/ ?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to David Brown on Mon Nov 14 23:28:44 2022
    On 14/11/2022 20:45, David Brown wrote:
    [To James:]
    [...] But there is absolutely
    /nothing/ about being an imperative language that suggests you need
    to allow side-effects or assignments /within/ expressions.

    If assignments within expressions are verboten, then you need to
    either forbid assignments within functions or have two classes of function, those with and those without [inc sub-procedures]. If assignments are
    allowed at all, then you cannot in general tell at compile time whether
    any assignment is reached at run time, leading to further complications.
    If you regard output as a side-effect, that too leads to problems. Yet
    during program development it is common to insert temporary diagnostic
    printing or variables. I can understand the concept of languages without assignments or other side effects; and of languages with them; I find it difficult to see the point of languages where such things are allowed in
    some places and not in others. If we need hair shirts [and I'm not sure
    that we do], they should be worn all the time, not put on and taken off
    in accordance with arcane rules that only high priests understand.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Herold

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Tue Nov 15 01:14:42 2022
    On 14/11/2022 16:21, James Harris wrote:
    On 14/11/2022 11:11, Bart wrote:
    On 14/11/2022 10:44, James Harris wrote:
    On 14/11/2022 09:26, David Brown wrote:

    ...

    OK, let's take the combination you mentioned:

       ++E++

    I wonder why you see a problem with it. As I see it, it increments E
    before evaluation and then increments E after evaluation. What is so
    complex about that? It does exactly what it says on the tin, and in
    the order that it says it. Remember that unlike C I define the
    apparent order of evaluation so the expression is perfectly well formed.

    But does this have the same priorities as:

        op1 E op2

    (where op2 is commonly done first) or does it have special rules, so
    that in:

       --E++

    the -- is done first? If it's different, then what is the ordering
    when mixed with other unary ops?

    You explained somewhere the circumstances where you think this is
    meaningful, but I can't remember what the rules are and I can't find
    the exact post.

    The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
    postfix ++ or --.

    Is that for all unary operators or just ++ and --?

    If only for those, then those exceptions will make rules for all unary
    ops even more bizarre. If for all of them, then one consequence is that
    `-P^` is parsed as `(-P)^`, so that you try to negate a pointer, instead
    of what's at the pointer target.

    However, I have my own exceptions, which are casts:

    ref u16(P)^ := 0

    If P was a byte pointer, this will now write 16 bits not 8. Here, the
    cast is done first, differently from a unary operator.

    But I cover this in my list below: casts are syntax so trump everything.
    (Which comes first in `ref T (X)[i]`, I don't know, I'd have to test it.
    But this would probably be written to as (X[i]) to remove doubt.)


    Your example would probably be better expressed as

      E - 1

    !

    As for the bigger picture of operator precedences, it's based on the transformations which operators naturally make to their operands. For example, comparisons (such as less-than) naturally take numbers and
    produce booleans so their precedences put them after numeric operators
    (+, * etc) and before boolean operators (and, or, not, etc). The natural series is (simplified)

      locations (such as field selection and array indexing)
      numbers (retrieved from those locations)
      comparisons (of those numbers)
      booleans

    So this is about binary ops now? I make that even simpler by saying all
    binary ops come after unary ops. The tighest binding, starting with the tightest, is:

    Syntax-bound ("." and "[]" for example)
    Unary ops (postfix LTR then prefix RTL)
    Binary (**, Mul, Add, Compare, Logical)


    Because if they ever start to return lvalues, then this becomes possible:

        ++E := 0
        E++ := 0

    (Whichever one is legal in your scheme.) So I think there is little
    useful expressivity to be gained.

    Indeed, neither of those is useful.

    Interestingly, I just tried

      ++E = 0;

    with cc and c++ compilers. The first rejected it (as ++E needing to be
    an lvalue); the second accepted it. That's not a reflection of either compiler, BTW, but without checking it's probably more to do with the language/dialect definition. FWIW I expect the second compiler could be persuaded to issue a warning about unreachable code or suchlike.

    What does C++ expect it to mean? I know it allows the result of ?: as an lvalue, which C doesn't. But the meaning of that is clear, and I have it
    too:

    (a | b | c) := x # assign to b or c depending on a

    (Actually you can have an arbitrarily complex expression on the LHS as
    an lvalue, including Switch and if-then-elsif chains. But whether they
    still work in my languages, I don't know as I rarely use the feature.

    Actually it comes up indirectly here:

    F((a | b | c))

    when F is a function taking a reference parameter, so the arg must be an Lvalue. So either b or c is modified in F. This one works, because I
    think it came up recently and it had to!)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Tue Nov 15 09:07:00 2022
    On 14/11/2022 19:43, Bart wrote:
    On 14/11/2022 17:46, David Brown wrote:
    On 14/11/2022 16:59, Bart wrote:

    I have a dynamic language with proper, first class lists, trees,
    strings, records, which take care of 90% of the pointer uses in a
    C-class language. Yet they can still be useful. This is an extract
    from a program that dumps the contents of an EXE file:

    coffptr:=makeref(pedata+coffoffset,imagefileheader)

    genstrln("Coff header: "+tostr(coffptr^))

    genstrln("Machine: "+tostr(coffptr^.machine,"h2"))
    genstrln("Nsections: "+tostr(coffptr^.nsections,"h2"))
    genstrln("Timestamp: "+tostr(coffptr^.timedatestamp,"h4"))
    genstrln("Symtab offset: "+tostr(coffptr^.symtaboffset))
    genstrln("Nsymbols: "+tostr(coffptr^.nsymbols))
    genstrln("Opt Hdr size: "+tostr(coffptr^.optheadersize))
    genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b"))
    genline()


    None of that needs pointers or references.

    Initialise a "coff_header" read-only unmutable variable from a slice
    of the memory array holding the image.

    At minimum, this tasks needs to the ability to take a block of bytes,
    and variously interpret parts of it as primitive numeric types of
    specific widths and signedness.

    Agreed. Still, pointers are unnecessary there.


    That can be helped by having pointers to such types. It can be further helped by allowing a struct type which is a collection of such types in
    a particular layout. And a way to transfer data from arbitrary bytes to
    that struct object. Or to map the address of that struct into the middle
    of that block. (Or as I do it above, set a pointer to that struct to the middle of the block.)
    It is certainly handy to have a way of interpreting the bytes of an
    image as a struct type with particular layout. It is not necessary, but
    it is handy.


    This is stuff which is meat-and-drink to a lower-level language like C,
    or like mine (even my scripting language).

    It requires some effort in Python and the result will be clunky (and probably require some add-on modules).
    import struct # Standard module
    bs = open("potato_c.cof").read()

    machine, nsections, timestamp, symtaboffset, nsymbols,
    optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

    That's it. Three lines. I would not think of C for this kind of thing
    - Python is /much/ better suited. I'd only start looking at C (or C++)
    if I need so high speed that the Python code was not fast enough, even
    with PyPy.


    While a functional language will
    struggle (to be accurate, the programmer will struggle because they've chosen the wrong language).
    I don't believe that. I am not familiar enough with Haskell to be able
    to give the code, but I have no doubts at all that someone experienced
    with Haskell will manage it fine. IO is not hard in the language, and
    it has all the built-in modules needed for such interfaces.

    Haskell is apparently number 25 on the list of language popularities on
    Github, with about 0.4% usage. That's not huge, but not insignificant
    either. But then, it was never intended to be a major practical
    language - though some people and companies (Facebook uses it for
    content analysis) do use it for practical work. It's main motivations
    are for teaching people good software development, developing new
    techniques, algorithms and methods, and figuring out what "works" and
    could be incorporated in other languages.

    It is that last feature that is most noticeable. Most major modern
    languages are not pure functional languages in themselves, but contain
    aspects from functional programming. I can't think of any serious,
    popular language with significant development in the last decade that
    does not have lambdas and the ability to work with functions as objects.
    Most high-level languages have list comprehensions and higher-order
    functions (map, filter, etc.). Many support defining high-level data structures directly without the need to mess around with pointers
    manually. Many encourage immutable data as the norm.

    This is why I bring it up here - not because I think the OP should be
    making a functional programming language, but because I think he should
    be taking inspiration and incorporating ideas from that world.


    Here's a more challenging record type that comes up in OBJ files:

    type imagesymbol=struct
    union
    stringz*8 shortname
    struct
    u32 short
    u32 long
    end
    u64 longname
    end
    u32 value
    u16 sectionno
    u16 symtype
    byte storageclass
    byte nauxsymbols
    end

    (Again, this is defined directly in my /dynamic/ scripting language.)

    Again, peanuts in Python - and I expect also peanuts in Haskell.

    No thanks. My innovation is keeping this stuff simple, accessible,
    fast, and at a human scale.


    I disagree with your scepticism, but I agree that there are lots of
    languages with different paradigms for different purposes.

    However, making yet-another-C is IMHO a pointless exercise. It might
    be better in some ways, but not enough to make it worth the effort.

    If you're going to use a C-class language, then why not one with some
    modern refinements? That's what I do.
    (For example, default 64-bit everything; a module scheme; value arrays;
    sane type syntax; whole-program compilation; slices; expression-based
    (see below); a 'byte' type! )
    Nothing of that is /remotely/ worth making a new language and giving up
    on everything C - tools, compilers, developer familiarity, libraries,
    and all the rest.

    I'm not saying that these are not good things (though I might disagree
    with you on some of the details). I am saying that it is not worth it.

    This is why we still have C, and why it is so popular in practice - it
    is not because anyone thinks it is a "perfect" language, it is because
    the benefits of the C ecosystem outweigh the small benefits of minor
    variations of the language.

    And I think most of what you like could be achieved by using a subset of
    C++ along with a few template libraries. (To be fair, that was
    certainly not the case when you started your language.)



    Obviously I can't ban `a + b`. Equally obviously, this code is
    pointless:

    a + b;


    You can ban that. Rule number 42 - the result of an expression must
    be assigned to a variable, used in another expression, or passed as
    the argument to a function call. No problem.

    This is effectively what I did. The only expressions allowed as
    standalone statements were assignments; function calls; increments.
    Anything else required an `eval` prefix to force evaluation.

    Good.

    Given an expression-based language, what would you do? In the past,
    after working with C, I would unthinkingly type:

    a = b

    in my syntax instead of a := b. This wasn't an error: it would
    compare a and b then discard the result. But it was a bug.


    Any decent C compile (with the right options) will complain about the
    C equivalent, "a == b;", as a statement with no effect. It could just
    as easily be made an error in a language.

    And if you follow my suggestion that expressions can't have
    side-effects, then it's easy to distinguish between "statements" and
    "expressions" because you no longer have a C-style "expression
    statement".

    Because my early languages were loosely based on Algol68, not C, they
    were expression-based. Later I simplified to distinct statements and expressions, but now I've gone back.

    Now both my languages are expression-based. That is, statements and expressions are interchangeable. That's supposed to be good, right,
    because FP languages work the same way? I think expression-based are regarded as superior.

    I have nothing against expressions - I am against side-effects in
    expressions.

    But it does make some things harder.

    For a start, any expression can have side-effects, because an expression
    can be or can include what you might call a statment.

    So I can get rid of ++ here:

    A[i++] := 0

    but I could simply write it like this:

    A[t:=i; i:=i+1; t] := 0

    In which case I might as well keep the ++.


    Ban side-effects in expressions, and you have :

    A[i] := 0
    i = i + 1

    It is not hard.

    And of course, a large proportion of increments are in loops. So now
    you have (mixing syntaxes from different languages to avoid prejudice) :

    for i in range(10) {
    A[i] = 0
    }

    Or :

    for a& in A {
    a = 0
    }

    Or :
    A = [0 for a in A]

    Or :
    A = [0] * 10

    Or :
    A.set(0)

    Or :

    A = [0 .. ]

    Or :

    A = [0 .. ][range(A)]

    There are endless choices here, none of which need an increment
    operator, or pointers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Andy Walker on Tue Nov 15 09:16:07 2022
    On 15/11/2022 00:28, Andy Walker wrote:
    On 14/11/2022 20:45, David Brown wrote:
    [To James:]
    [...]  But there is absolutely
    /nothing/ about being an imperative language that suggests you need
    to allow side-effects or assignments /within/ expressions.

        If assignments within expressions are verboten, then you need to either forbid assignments within functions or have two classes of function, those with and those without [inc sub-procedures].

    Yes, I mentioned in one post that functions are a complicating factor
    that need to be discussed, but are probably best covered in a separate
    thread. Two classes of function was a possibility I mentioned.

    If assignments are
    allowed at all, then you cannot in general tell at compile time whether
    any assignment is reached at run time, leading to further complications.
    If you regard output as a side-effect, that too leads to problems.  Yet during program development it is common to insert temporary diagnostic printing or variables.  I can understand the concept of languages without assignments or other side effects;  and of languages with them;  I find it difficult to see the point of languages where such things are allowed in
    some places and not in others.  If we need hair shirts [and I'm not sure that we do], they should be worn all the time, not put on and taken off
    in accordance with arcane rules that only high priests understand.


    Rules need to be clear, certainly. But it entirely possible to have a
    language that allows assignment in some places and not others. Impure functional programming languages (such as OCaml) have that.

    There is no easy answer to these kinds of design decisions, and in the
    end it is the OP who must decide. I am suggesting ways to make the
    language clear and perhaps easier for compiler analysis. There are
    always trade-offs going on - every choice makes some things easier and
    other things harder.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Tue Nov 15 08:42:19 2022
    On 15/11/2022 01:14, Bart wrote:
    On 14/11/2022 16:21, James Harris wrote:
    On 14/11/2022 11:11, Bart wrote:

    ...

    The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
    postfix ++ or --.

    Is that for all unary operators or just ++ and --?

    Just ++ and ==, as stated.


    If only for those, then those exceptions will make rules for all unary
    ops even more bizarre. If for all of them, then one consequence is that
    `-P^` is parsed as `(-P)^`, so that you try to negate a pointer, instead
    of what's at the pointer target.

    However, I have my own exceptions, which are casts:

        ref u16(P)^ := 0

    Rather than casts I have conversions. Are they the same? I never really understood how people use the term 'cast'. Either way, you raise a good
    point: Where should type conversions come in the order of precedence.

    I currently have them as function calls so they come at the top. I would
    have your example as

    (ref u16)(P)* = 0



    If P was a byte pointer, this will now write 16 bits not 8. Here, the
    cast is done first, differently from a unary operator.

    But I cover this in my list below: casts are syntax so trump everything. (Which comes first in `ref T (X)[i]`, I don't know, I'd have to test it.
    But this would probably be written to as (X[i]) to remove doubt.)

    ...

    Because if they ever start to return lvalues, then this becomes
    possible:

        ++E := 0
        E++ := 0

    (Whichever one is legal in your scheme.) So I think there is little
    useful expressivity to be gained.

    Indeed, neither of those is useful.

    Interestingly, I just tried

       ++E = 0;

    with cc and c++ compilers. The first rejected it (as ++E needing to be
    an lvalue); the second accepted it. That's not a reflection of either
    compiler, BTW, but without checking it's probably more to do with the
    language/dialect definition. FWIW I expect the second compiler could
    be persuaded to issue a warning about unreachable code or suchlike.

    What does C++ expect it to mean?

    Probably the same as my language (since it retains the lvalue) but I
    don't know C++.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Tue Nov 15 09:35:19 2022
    On 14/11/2022 19:51, David Brown wrote:
    On 14/11/2022 19:02, James Harris wrote:
    On 14/11/2022 17:28, David Brown wrote:
    On 14/11/2022 16:14, James Harris wrote:
    On 14/11/2022 10:24, Stefan Ram wrote:
    James Harris <james.harris.1@gmail.com> writes:

    Rather than allowing non-ASCII in source I came up with a scheme
    of what
    you might call 'named characters' extending the backslash idea of
    C to
    allow names instead of single characters after the backslash. It's >>>>>> off
    topic for this thread but it allows non-ASCII characters to be named >>>>>> (such that the names consist of ASCII characters and would thus be >>>>>> readable and universal).

    ...

    What's needed, IMO, is a set of names intended for /processing/ rather
    than for typesetting.

    That makes little sense to me.  Are you intending to invent your own character encoding, or your own fonts here?  Are you planning on making
    your own display or print system?

    As I've already said, Unicode and HTML are fine for output. Where
    programmers work with the semantics of characters, however, they need characters to be in semantic categories, you know: letters, arithmetic
    symbols, digits, different cases, etc. So far I've not come across
    anything to support that multilingually. AISI what's needed is a way to
    expand character encodings to bit fields such as

    <category><base character><variant><diacritics><appearance>

    where

    category = group (e.g. alphabetic letters, punctuation, etc)
    base character = main semantic identification (e.g. an 'a')
    variant (e.g. upper or lower case)
    diacritics (those applied to this character in this location)
    appearance (e.g. a round 'a' or a printer's 'a' or unspecified)

    Note that that's purely about semantics; it doesn't include typefaces or character sizes or bold or italic etc which are all for rendering.


    The character "⅘" is easily typed on *nix keyboards with a compose key (with common setups), HTML has it as "&frac45;", Unicode has it as
    "U+2158 Vulgar fraction four fifths".  They support fractions that are common enough to exist as characters in fonts.  You can't add your own personal "twenty two sevenths" character and expect it to turn up when printed, nor will you ever come across it when reading files or
    documents from elsewhere.  (Of course you can choose to support only a subset of the HTML or Unicode names.)

    Anything like that which doesn't scale should be classified as
    patronising garbage. I can give you my more-negative comments about such nonsense, if you like. >:-|

    If you want to discuss this further I'd appreciate it if you could start another thread; I would reply to it. This thread is already full!


    And what do you mean by "processing", and what makes you think it is
    remotely relevant to separate diacritics from characters?

    A standard is needed. Which to use? If there are 100 letters and 40
    diacritics then there would be 140 codes. If they were to be
    amalgamated, however, then you could have up to 4000 combined
    characters. And that's allowing each letter to have only one diacritic.
    Allow two and you have 160,000 potential characters. Etc.

    In some
    languages, "ä" is a letter "a" with a diacritic, in others it is an
    entirely distinct letter of its own.  The same applies to lots of characters.  Unicode has a complex system of "normalisation" for
    relating combining diacritics and letters into single combined Unicode characters, which are often a better choice for display than you would
    get with by displaying two individual graphemes.

    See above.


    Are you going to try to split up Chinese or Korean characters into their components?  What about Mongolian, or Arabic?

    I'd consider it but I don't yet know enough about those languages or how
    they are typically processed in programs to comment.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 15 11:06:55 2022
    On 2022-11-15 10:35, James Harris wrote:

    As I've already said, Unicode and HTML are fine for output. Where
    programmers work with the semantics of characters, however, they need characters to be in semantic categories, you know: letters, arithmetic symbols, digits, different cases, etc. So far I've not come across
    anything to support that multilingually. AISI what's needed is a way to expand character encodings to bit fields such as

      <category><base character><variant><diacritics><appearance>

    where

      category = group (e.g. alphabetic letters, punctuation, etc)
      base character = main semantic identification (e.g. an 'a')
      variant (e.g. upper or lower case)
      diacritics (those applied to this character in this location)
      appearance (e.g. a round 'a' or a printer's 'a' or unspecified)

    Note that that's purely about semantics; it doesn't include typefaces or character sizes or bold or italic etc which are all for rendering.

    I am not sure what are you trying to say. The Unicode characterization
    is defined in the file:

    https://unicode.org/Public/UNIDATA/UnicodeData.txt

    Programming languages like Ada (since 2005) define blanks, punctuation,
    letters etc in terms of Unicode characterization in order to support multilingual programs. Not the best idea, IMO, but for what it is worth:

    https://docs.adacore.com/live/wave/arm12/html/arm12/arm12-2-3.html#S0002

    There is no problem with Unicode string literals whatsoever. You just
    place characters as they are. The only escape is "" for ". That is all.

    Surely programmers are advised to never ever use anything but ASCII in identifiers and literals. If you need something else, use the code point
    to string conversion and concatenation:

    der_Aerger := Wide_Character'Val (16#C4#) & "rger";

    (Diaeresis rhymes with diarrhea (:-))

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 15 13:14:44 2022
    On 2022-11-15 12:44, James Harris wrote:

    Do you also believe that the Unix

      bytes = read(fd, &buf[1], reqd);

    should be prohibited since it has the side effect within the expression
    of modifying the buffer? If so, what would you replace it with??

    That is simple. Ada's standard library has it:

    procedure Read
    ( Stream : in out Root_Stream_Type;
    Item : out Stream_Element_Array;
    Last : out Stream_Element_Offset
    ) is abstract;

    Item is an array:

    type Stream_Element_Array is
    array (Stream_Element_Offset range <>) of aliased Stream_Element;

    It is also a "virtual" operation in C++ terms to be overridden by new implementation of stream. Last is the index of the last element read.
    Notice non-sliding bounds, as you can do this:

    Last := Buff'First - 1;
    loop
    Read (S, Buff (Last + 1..Buff'Last), Last); -- Non-blocking chunk
    exit when Last = Buff'Last; -- Done
    end loop;

    Since bounds do not slide Last stays valid for all array slices.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Tue Nov 15 11:44:16 2022
    On 14/11/2022 20:45, David Brown wrote:
    On 14/11/2022 18:37, James Harris wrote:
    On 14/11/2022 14:47, David Brown wrote:
    On 14/11/2022 11:44, James Harris wrote:
    On 14/11/2022 09:26, David Brown wrote:

    ...

    A big step in that direction is to say that assignment is a
    statement, not an expression,

    Done that.

    and that variables cannot be changed by side-effects.

    I will not be doing that. I know you favour functional programming,
    and that's fine, but the language I am working on is unapologetically
    imperative.


    Many unapologetically imperative languages do not allow side-effects in expressions.  It is a natural rule for functional programming languages, since pure functional programming does not have side-effects or
    modifiable variables at all.  But there is absolutely /nothing/ about
    being an imperative language that suggests you need to allow
    side-effects or assignments /within/ expressions.

    Do you also believe that the Unix

    bytes = read(fd, &buf[1], reqd);

    should be prohibited since it has the side effect within the expression
    of modifying the buffer? If so, what would you replace it with??

    ...


    These two statements go together as well as Dmitry's toaster and
    toilet brush.  It doesn't matter how precisely you define how the
    combination can be used and what it does, it is still not a good or
    useful thing.

    OK, let's take the combination you mentioned:

       ++E++

    I wonder why you see a problem with it. As I see it, it increments E
    before evaluation and then increments E after evaluation. What is so
    complex about that? It does exactly what it says on the tin, and in
    the order that it says it. Remember that unlike C I define the
    apparent order of evaluation so the expression is perfectly well
    formed.


    The very fact that you are discussing how to define it means it is
    not clear and obvious.

    On that I must disagree.

    You think it is /obvious/ what "++E++" means?

    If you don't know the rules then its not obvious.

    If you know the rules then it's *blindingly* obvious. What's more, the
    rules are easy to learn.

    ...

    "++E++" remains meaningless to experienced programmers.

    It may be meaningless to programmers who wrongly try to apply the rules
    of other languages to it but why would you do that? It's an invalid
    thing to do. Languages differ.



    It is not obvious which order the increments happen, or if the order
    is defined, or if the order matters.  It is not obvious what the
    return value should be.  It is not obvious where you have lvalues or
    rvalues (not that a language should necessarily have such concepts).
    It is not obvious what happens to E.

    Of course it's not obvious to someone who doesn't know the rules of
    the language. A language designer cannot produce a language with no
    rules. What a language designer /can/ do is to make the rules simple
    and understandable - but someone who reads the code still has to
    understand what the rules are.


    I do not think "++E++" will be clear and obvious to someone who /does/
    know the rules of the language.

    It parses as

    (++E)++

    Operations would appear to be applied in that order, prefix then
    postfix. It's not complicated. Take an example:

    E = 5
    A = (++E)++ ;if it helps you to see it that way but parens not needed
    print A, E

    result:

    6 7

    If you want a bit more formality:

    ++E ==> (E := E + 1; E)
    E++ ==> (T := E; E := E + 1; valof(T))


    Remember, this is not something that
    will be commonly used and become idiomatic, like "*p++ = *q++;" is in C.
     Programmers will always need to look up the details - that's why it is
    not a good idea.

    Maybe you find it hard to read because you are trying to look at it as a
    single operation - a bit like a spaceship symbol? It's not. It's two
    entirely separate operations that just happen to be adjacent. Think of
    them in that way and as long as you know the order in which they will be applied the overall effect is obvious.


    As for the rules we have been discussing here those I have come up
    with are, in the main, the ones you would be familiar with; even the
    new ones are logical and simple. Once you understand them it's
    incredibly easy to parse an expression, even of the kind which, to
    you, looks like gibberish.

    In fact, I have to say that neither of those you have objected to
    should really look like gibberish, even to the uninitiated. Would you
    find

       ++A + B++

    objectionable?

    Yes.

    If not, I cannot see why you would find

       ++E + E++

    so objectionable, either.

    It is worse, because you are changing the same thing twice in an
    unordered manner.

    That's not true. I've said many times that the apparent order would be
    defined. Did you not read what I wrote or is there some other reason you
    still think it would be unordered?

    Operands to binops like "+" would appear to be evaluated left-then-right.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Andy Walker on Tue Nov 15 12:16:18 2022
    On 14/11/2022 23:28, Andy Walker wrote:
    On 14/11/2022 20:45, David Brown wrote:
    [To James:]
    [...]  But there is absolutely
    /nothing/ about being an imperative language that suggests you need
    to allow side-effects or assignments /within/ expressions.

        If assignments within expressions are verboten, then you need to either forbid assignments within functions or have two classes of function, those with and those without [inc sub-procedures].  If assignments are allowed at all, then you cannot in general tell at compile time whether
    any assignment is reached at run time, leading to further complications.
    If you regard output as a side-effect, that too leads to problems.  Yet during program development it is common to insert temporary diagnostic printing or variables.  I can understand the concept of languages without assignments or other side effects;  and of languages with them;  I find it difficult to see the point of languages where such things are allowed in
    some places and not in others.  If we need hair shirts [and I'm not sure that we do], they should be worn all the time, not put on and taken off
    in accordance with arcane rules that only high priests understand.

    Largely agreed. I was going to reply to that to tackle the 'alarming
    wave' (tm) of recommendations in this group for functional programming
    but as it may snowball into a bigger discussion I've started a new
    thread. qv


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Tue Nov 15 15:11:26 2022
    On 15/11/2022 09:42, James Harris wrote:
    On 15/11/2022 01:14, Bart wrote:
    On 14/11/2022 16:21, James Harris wrote:
    On 14/11/2022 11:11, Bart wrote:

    ...

    The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
    postfix ++ or --.

    Is that for all unary operators or just ++ and --?

    Just ++ and ==, as stated.


    If only for those, then those exceptions will make rules for all unary
    ops even more bizarre. If for all of them, then one consequence is
    that `-P^` is parsed as `(-P)^`, so that you try to negate a pointer,
    instead of what's at the pointer target.

    However, I have my own exceptions, which are casts:

         ref u16(P)^ := 0

    Rather than casts I have conversions. Are they the same? I never really understood how people use the term 'cast'. Either way, you raise a good point: Where should type conversions come in the order of precedence.


    In C terminology, a "cast" is an explicit conversion. So :

    int x = 123;
    double y = 12.3;

    x = y;

    is an implict conversion - the value in "y", of type "double", is
    converted to type "int" automatically.

    x = (int) y;

    is a cast - it is an /explicit/ conversion. In this case, it does
    exactly the same thing to the value, of course.

    "Typecast", on the other hand, is something that happens to actors when
    they have played on kind of role too long and people don't think they
    can change. It has no meaning in C terminology - though people use it, thinking it means "cast".

    Other languages can have slightly different terminology.


    Because if they ever start to return lvalues, then this becomes
    possible:

        ++E := 0
        E++ := 0

    (Whichever one is legal in your scheme.) So I think there is little
    useful expressivity to be gained.

    Indeed, neither of those is useful.

    Interestingly, I just tried

       ++E = 0;

    with cc and c++ compilers. The first rejected it (as ++E needing to
    be an lvalue); the second accepted it. That's not a reflection of
    either compiler, BTW, but without checking it's probably more to do
    with the language/dialect definition. FWIW I expect the second
    compiler could be persuaded to issue a warning about unreachable code
    or suchlike.

    What does C++ expect it to mean?

    Probably the same as my language (since it retains the lvalue) but I
    don't know C++.


    In both C and C++, "++E" means precisely the same as "E += 1", which
    again means precisely the same as "E = E + 1". (This is assuming no overloaded operators in C++.)

    In C, the result of "(E = E + 1)" is the /value/ of E after the
    addition, and is thus an lvalue.

    In C++, the result is a /reference/ to E after the addition, and
    therefore an lvalue.

    I don't know the precise reasoning for the difference - perhaps it is
    simply that the addition of references to the language made it natural
    to have an lvalue in such cases, while C does not have references.

    ("References" in C++ terminology are just names for lvalues. They can
    be thought of as non-null pointers that are dereferenced automatically -
    except the compiler will not actually make pointers or put things in
    memory unless necessary.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Tue Nov 15 15:22:45 2022
    On 15/11/2022 12:44, James Harris wrote:
    On 14/11/2022 20:45, David Brown wrote:
    On 14/11/2022 18:37, James Harris wrote:
    On 14/11/2022 14:47, David Brown wrote:
    On 14/11/2022 11:44, James Harris wrote:
    On 14/11/2022 09:26, David Brown wrote:

    ...

    A big step in that direction is to say that assignment is a
    statement, not an expression,

    Done that.

    and that variables cannot be changed by side-effects.

    I will not be doing that. I know you favour functional programming,
    and that's fine, but the language I am working on is unapologetically
    imperative.


    Many unapologetically imperative languages do not allow side-effects
    in expressions.  It is a natural rule for functional programming
    languages, since pure functional programming does not have
    side-effects or modifiable variables at all.  But there is absolutely
    /nothing/ about being an imperative language that suggests you need to
    allow side-effects or assignments /within/ expressions.

    Do you also believe that the Unix

      bytes = read(fd, &buf[1], reqd);

    should be prohibited since it has the side effect within the expression
    of modifying the buffer? If so, what would you replace it with??


    As I said before (a couple of times at least), function calls are
    another matter and should be considered separately.

    One possibility is to distinguish between "functions" that have no side
    effects and can therefore be freely mixed, re-arranged, duplicated,
    omitted, etc., and "procedures" that have side-effects and must be
    called exactly as requested in the code. Such "procedures" would not be allowed in expressions - only as statements or part of assignment
    statements.

    In many cases where you have modification of parameters or passing by
    non-const address, a more advanced language could use multiple returns :

    bytes, data = read(fd, max_count)

    But that might require considerable compiler effort to generate
    efficient results in other cases.

    ...


    These two statements go together as well as Dmitry's toaster and
    toilet brush.  It doesn't matter how precisely you define how the >>>>>> combination can be used and what it does, it is still not a good
    or useful thing.

    OK, let's take the combination you mentioned:

       ++E++

    I wonder why you see a problem with it. As I see it, it increments
    E before evaluation and then increments E after evaluation. What is
    so complex about that? It does exactly what it says on the tin, and
    in the order that it says it. Remember that unlike C I define the
    apparent order of evaluation so the expression is perfectly well
    formed.


    The very fact that you are discussing how to define it means it is
    not clear and obvious.

    On that I must disagree.

    You think it is /obvious/ what "++E++" means?

    If you don't know the rules then its not obvious.

    Yes.


    If you know the rules then it's *blindingly* obvious. What's more, the
    rules are easy to learn.

    No. If you see that written, it is blindingly obvious that the
    programmer is a smart-arse that thinks it is "cool" to write something
    that looks like a flourish or ornament at the end of a book chapter,
    instead of writing "E += 2" or "E = E + 2".

    I am not interested in pandering to smart-arse programmers. There are
    too many of them already, and they don't need encouragement.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Tue Nov 15 15:26:57 2022
    On 15/11/2022 08:07, David Brown wrote:

    On 14/11/2022 19:43, Bart wrote:


    It requires some effort in Python and the result will be clunky (and probably require some add-on modules).

    import struct        # Standard module
    bs = open("potato_c.cof").read()

    machine, nsections, timestamp, symtaboffset, nsymbols,
    optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

    That's it.  Three lines.  I would not think of C for this kind of thing
    - Python is /much/ better suited.  I'd only start looking at C (or C++)
    if I need so high speed that the Python code was not fast enough, even
    with PyPy.

    I said it will be clunky and require add-on modules and it is and does.
    (BTW you might be missing an argument in that struct.unpack_from call.)

    Using that approach for the nested structs and unions of my other
    example is not so straightforward. You basically have to fight for every
    field.

    The result is a tuple of unnamed fields. You really want a proper
    record, which is yet another add-on, with a choice of several modules
    depending on which set of characterics you need.

    The elements of the tuple are also normal Python variables. If you
    wanted to modify elements (which needs a mutable tuple anyway), they
    will not behave the same way as those packed types, and then you'd have
    to write the whole struct back (using .pack), and will need to know its provenance, which here has been lost. With a reference like mine, that
    is built-in.

    In short, it's a hack. But it's a typical approach using in scripting languages.

    'struct' is also not a true Python module; it's a front end for an
    internal one called `_struct`, likely implemented in C, and almost
    certainly using pointers.

    (Whereas, by having pointers as intrinsic features, I know I could
    implement such a module in my language, if I needed to. Think of it as a language building feature.)


    While a functional language will
    struggle (to be accurate, the programmer will struggle because they've chosen the wrong language).
    I don't believe that.  I am not familiar enough with Haskell to be able
    to give the code, but I have no doubts at all that someone experienced
    with Haskell will manage it fine.  IO is not hard in the language, and
    it has all the built-in modules needed for such interfaces.

    Haskell is apparently number 25 on the list of language popularities on Github, with about 0.4% usage.  That's not huge, but not insignificant either.  But then, it was never intended to be a major practical
    language - though some people and companies (Facebook uses it for
    content analysis) do use it for practical work.  It's main motivations
    are for teaching people good software development, developing new
    techniques, algorithms and methods, and figuring out what "works" and
    could be incorporated in other languages.

    I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.

    Anything that makes a big deal out of closures, continuations, currying, lambdas and higher order functions. I have little use for such things otherwise.

    Haskell is great for elegantly defining certain kinds of types and
    algorithms, not so good for reams of boilerplate code or UI stuff which
    is what much of programming is.

    It doesn't even have loops (AIUI); one task of the EXE reader is to
    displays lists of sections, imports, exports, base relocations...

    Loops are such a basic requirement, and yet a language designer decides
    they don't need them. After all you can emulate any loop using
    recursion, no matter that it makes for less readable, less intuitive
    (and less efficient) code.


    It is that last feature that is most noticeable.  Most major modern languages are not pure functional languages in themselves, but contain aspects from functional programming.

    Yeah. It turns out that pretty every language you've heard of (except C)
    has higher-order functions. Too much pressure from academics I reckon.

    Such features have some very subtle behaviours which I find incredibly
    hard to get my head around.

    (See the 'twice plus-three' example in https://en.wikipedia.org/wiki/Higher-order_function. It me ages to
    figure out what was going on there, and what was needed in an
    implementation to make it work.)

    They make understanding whatever:

    E++^

    means child's play by comparison. And yet THIS is the feature you want
    to ban! I don't get it. Remember not everone is a mathematician.

    E++^ is well defined in my code. Instead of:

    doswitch c:=p++^
    when 'A'..'Z' then...

    which involves modifying two things, I'd have to write ... actually I
    don't know what; there is nowhere to put them. I'd have to split that
    loop and switch, and add a new indent level:

    do
    c := S[p] # change to an indexed string
    p := p + 1
    switch c
    ...

    It's a bit like turning HLL code into assembly! Plus, with that extra S,
    two extra 'p's, and that '1' there are quite a few extra things to get
    wrong, and extra lines for somebody to grok and relate to each other.

      I can't think of any serious,
    popular language with significant development in the last decade that
    does not have lambdas and the ability to work with functions as objects.

    Exactly, and that is totally wrong. Too much attention is paid to
    academics who seem to know little about designing accessible languages.

    In Python, every function is really a variable initialised [effectively]
    to some anonymous function. Which means that with 100% of the functions,
    you can do this for any defined function F:

    F = 42

    Or, more subtly, setting it to any arbitrary functions. That sounds
    incredibly unsafe.

    So Python has immutable tuples, but mutable functions! Every identifier
    is a variable that you can rebind to something else.

    With my scripting language, you can do that with exactly 0% of the
    defined functions. If you want mutable function references, /then/ you
    use a variable: G := F.

    This is why I bring it up here - not because I think the OP should be
    making a functional programming language, but because I think he should
    be taking inspiration and incorporating ideas from that world.

    Sure, I've taken lots of ideas from functional languages, ones that
    still work in an imperative style. For example I use functions like
    head() and tail() from Haskell (except mine aren't lazy). I once had
    list-comps too, but they fell into disuse.

    I don't have lambda functions, but I have a thing called deferred code,
    which I haven't yet gotten around to. The problem is figuring out
    exactly how in-depth the implementation should be, because no matter
    what you do, there will be yet another example from the FP world which
    reveals one more dimension you hadn't realised existed.

    Then I start to think, I don't really want the people who might use a
    language like mine to need to bother their heads about it. Code should
    be clear and obvious; relying on the incredibly subtle and obscure
    behaviours associated with lambdas, closures et al, isn't.


    Here's a more challenging record type that comes up in OBJ files:

          type imagesymbol=struct
              union
                  stringz*8 shortname
                  struct
                      u32   short
                      u32   long
                  end
                  u64      longname
              end
              u32  value
              u16  sectionno
              u16  symtype
              byte storageclass
              byte nauxsymbols
          end

    (Again, this is defined directly in my /dynamic/ scripting language.)

    Again, peanuts in Python - and I expect also peanuts in Haskell.


    I'm sure there is a way to do it with enough effort. But as effortlessly
    as this, as readable, and with the abilty to just do P.shortname to get
    an actual string? You will get there eventually, but it's basically DIY.

    You don't believe amateurs can add value to what mainstream languages
    can do. The trouble is that you don't appreciate the things that add
    value, so long as there is some unreadable, unsafe way to hack your way
    around a task.

    Nothing of that is /remotely/ worth making a new language and giving up
    on everything C - tools, compilers, developer familiarity, libraries,
    and all the rest.

    With an army of people behind it, such tools can be created for a new
    language.

    In the other group, I mentioned how C code is still predominantly
    32-bit, even on 64-bit hardware. That is a big one just by itself.

    But, what do I care? MY language /is/ fully 64-bit, I can do 1<<60
    without remembering to do 1ULL<60, it /has/ a module system, namespaces,
    the works, and that gives me a kick when I compare it to C.

    I'm not saying that these are not good things (though I might disagree
    with you on some of the details).  I am saying that it is not worth it.

    This is why we still have C, and why it is so popular in practice - it
    is not because anyone thinks it is a "perfect" language, it is because
    the benefits of the C ecosystem outweigh the small benefits of minor variations of the language.

    This is exactly why there can still be a place for a language like C,
    but tidied up and brought up-to-date without all its baggage. People
    want a language they feel they know inside-out, and can be made to do
    anyway; they want to feel in charge and confident.

    Except that the people creating alternatives, usually try to do much,
    and lose many of the attributes of C that make it attractive.

    And I think most of what you like could be achieved by using a subset of
    C++ along with a few template libraries.  (To be fair, that was
    certainly not the case when you started your language.)

    Templates have problems. Whatever problem they are a solution too, needs
    to be done another way if you want a language that is much, much simpler
    and faster to build than C++.

    Ban side-effects in expressions, and you have :

        A[i] := 0
        i = i + 1

    It is not hard.

    It IS hard. Did you miss the bit where I said that expressions and
    statements are interchangeable in an expression-based language? So you
    HAVE to allow anything within those square brackets.

    I use 'unit' to refer to any expression-or-statement. So the syntax for
    an array index is A[unit] with a single unit, however my example used a sequence of three, not allowed. But that just means I'd have to write it
    like this:

    A[(t:=i; i:=i=1; t)] := 0

    And of course, a large proportion of increments are in loops.  So now
    you have (mixing syntaxes from different languages to avoid prejudice) :

        for i in range(10) {
            A[i] = 0
        }

    Or :

        for a& in A {
            a = 0
        }

    Or :
        A = [0 for a in A]

    Or :
        A = [0] * 10

    Or :
        A.set(0)

    Or :

        A = [0 .. ]

    Or :

        A = [0 .. ][range(A)]

    There are endless choices here, none of which need an increment
    operator, or pointers.

    And in FP, you don't have loops, or assignments. At this rate there
    won't be anything left! Why won't we all just code in lambda calculus as
    that can apparently represent any program.

    There are still plenty of increments outside of loops (actually I don't
    use for-loops much in my programs, mainly in smaller contexts), as well
    as inside loops when you're incrementing something that doesn't happen
    to be the loop index.

    This discussion is about whether to have a shorter way of writing:

    <expr> := <the exact same expr> + 1

    And whether that is:

    <expr> +:= 1

    or either of:

    ++<expr>
    <expr>++

    (This example uses the non-value-returning variety.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Tue Nov 15 16:22:25 2022
    On 15/11/2022 15:26, Bart wrote:
    On 15/11/2022 08:07, David Brown wrote:

    ...

    Most major modern
    languages are not pure functional languages in themselves, but contain
    aspects from functional programming.

    Yeah. It turns out that pretty every language you've heard of (except C)
    has higher-order functions. Too much pressure from academics I reckon.

    Such features have some very subtle behaviours which I find incredibly
    hard to get my head around.

    (See the 'twice plus-three' example in https://en.wikipedia.org/wiki/Higher-order_function. It me ages to
    figure out what was going on there, and what was needed in an
    implementation to make it work.)

    They make understanding whatever:

       E++^

    means child's play by comparison. And yet THIS is the feature you want
    to ban! I don't get it.

    Good point! Preferences definitely vary.

    ...

    This discussion is about whether to have a shorter way of writing:

      <expr> := <the exact same expr> + 1

    And whether that is:

      <expr> +:= 1

    or either of:

      ++<expr>
      <expr>++

    (This example uses the non-value-returning variety.)

    Yes, AISI the discussion was primarily about what such operations should
    mean and how they should be ordered relative to each other. The subtext
    was whether they should be included at all. At least now I've got a good
    way to include them that choice is still open. That's far better than
    just simply banning them and avoiding the challenge.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 15 17:35:44 2022
    On 2022-11-15 17:22, James Harris wrote:

    Yes, AISI the discussion was primarily about what such operations should
    mean and how they should be ordered relative to each other. The subtext
    was whether they should be included at all. At least now I've got a good
    way to include them that choice is still open. That's far better than
    just simply banning them and avoiding the challenge.

    That depends on your priorities. E.g. the Rubik's cube. You rotate a
    row, four sides change. That's fun. But me, a prosaic programmer, just
    pluck the slates one by one and put them back in order... (:-))

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Tue Nov 15 16:59:36 2022
    On 15/11/2022 16:35, Dmitry A. Kazakov wrote:
    On 2022-11-15 17:22, James Harris wrote:

    Yes, AISI the discussion was primarily about what such operations
    should mean and how they should be ordered relative to each other. The
    subtext was whether they should be included at all. At least now I've
    got a good way to include them that choice is still open. That's far
    better than just simply banning them and avoiding the challenge.

    That depends on your priorities. E.g. the Rubik's cube. You rotate a
    row, four sides change. That's fun. But me, a prosaic programmer, just
    pluck the slates one by one and put them back in order... (:-))

    :-)


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Tue Nov 15 16:58:05 2022
    On 15/11/2022 14:22, David Brown wrote:
    On 15/11/2022 12:44, James Harris wrote:

    ...

    Do you also believe that the Unix

       bytes = read(fd, &buf[1], reqd);

    should be prohibited since it has the side effect within the
    expression of modifying the buffer? If so, what would you replace it
    with??


    As I said before (a couple of times at least), function calls are
    another matter and should be considered separately.

    Then you wouldn't be able to prevent a programmer coding

    a = b + nudge_up(&c) + d;

    and therefore the programmer may query why ++c is not available in the
    first place.

    BTW, any time one thinks of 'treating X separately' it's good to be
    wary. Step-outs tend to make a language hard to learn and awkward to use.


    One possibility is to distinguish between "functions" that have no side effects and can therefore be freely mixed, re-arranged, duplicated,
    omitted, etc., and "procedures" that have side-effects and must be
    called exactly as requested in the code.  Such "procedures" would not be allowed in expressions - only as statements or part of assignment
    statements.

    Classifying functions by whether they have side effects or not is not as clear-cut as it may at first appear. Please see the thread I started
    today on functional programming.


    In many cases where you have modification of parameters or passing by non-const address, a more advanced language could use multiple returns :

        bytes, data = read(fd, max_count)

    But that might require considerable compiler effort to generate
    efficient results in other cases.

    Thanks for the suggestion. I wondered about that and I like it in
    principle but couldn't see how one would then sensibly (i.e. efficiently
    and in keeping with the rest of the language) then go on to write the
    data (whose length we would not know in advance, although we would know
    its maximum size) to the correct part of the buffer.

    ...

    You think it is /obvious/ what "++E++" means?

    If you don't know the rules then its not obvious.

    Yes.


    If you know the rules then it's *blindingly* obvious. What's more, the
    rules are easy to learn.

    No.  If you see that written, it is blindingly obvious that the
    programmer is ...

    No, it is not. In languages in which 'nudge' operators are supported
    many programmers may write

    ++E

    as a subexpression if they want E to be incremented before it is
    evaluated. They may also write

    E++

    if they want E to be incremented after it is evaluated. And if the
    algorithm they are they are implementing calls for E to be incremented
    before and after then programmers should be able to code both. This is
    not about them being clever. It's about them being able to have the code naturally express the intent of the algorithm and to reflect the
    processing that's in the programmer's mind.

    All that's required, compared with C, is for the apparent evaluation
    order to be defined.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Tue Nov 15 18:31:55 2022
    On 15/11/2022 17:58, James Harris wrote:
    On 15/11/2022 14:22, David Brown wrote:
    On 15/11/2022 12:44, James Harris wrote:

    ...

    Do you also believe that the Unix

       bytes = read(fd, &buf[1], reqd);

    should be prohibited since it has the side effect within the
    expression of modifying the buffer? If so, what would you replace it
    with??


    As I said before (a couple of times at least), function calls are
    another matter and should be considered separately.

    Then you wouldn't be able to prevent a programmer coding

      a = b + nudge_up(&c) + d;

    Why wouldn't I (as a language designer) be able to prevent that?

    We don't live in the days of assemblers where the only thing you know
    about a function is its name and a hope that you've got the parameters
    right. The declaration of a function (from module import, or however
    you want to handle it) can include all sorts of information about the
    function. Is it pure? Is it small, suitable for inlining? Can it
    throw a exceptions? What are its pre-conditions and post-conditions?
    (Some languages support "programming by contract" to make it hugely
    easier to check correctness.)


    and therefore the programmer may query why ++c is not available in the
    first place.


    You assume /so/ many limitations on what you can do as a language
    designer. You can do /anything/. If you want to allow something, allow
    it. If you want to prohibit it, prohibit it. Don't run away claiming
    you had no choice or hide behind assertions about what other programmers
    or other languages do.

    If want side-effects in operators, functions, etc., then that's fine -
    but it is /your/ choice to support that feature, and /your/ choice to
    drop the benefits achievable by /not/ allowing them. You are not forced
    to allow them, any more than you are forced to follow any of my
    suggestions. (And often my suggestions are not given as
    recommendations, but merely to air the options you have.)

    BTW, any time one thinks of 'treating X separately' it's good to be
    wary. Step-outs tend to make a language hard to learn and awkward to use.


    So do over-generalisations.


    One possibility is to distinguish between "functions" that have no
    side effects and can therefore be freely mixed, re-arranged,
    duplicated, omitted, etc., and "procedures" that have side-effects and
    must be called exactly as requested in the code.  Such "procedures"
    would not be allowed in expressions - only as statements or part of
    assignment statements.

    Classifying functions by whether they have side effects or not is not as clear-cut as it may at first appear. Please see the thread I started
    today on functional programming.


    You'll notice I've replied to it :-)

    The classification is not /entirely/ clear-cut, but not for the reasons
    you think or described. It is possible to allow side-effects that have
    no visible or logical effect, but exist only as hidden effects for implementation efficiency purposes. Such a feature has to be controlled carefully to ensure there are not logical effects.


    In many cases where you have modification of parameters or passing by
    non-const address, a more advanced language could use multiple returns :

         bytes, data = read(fd, max_count)

    But that might require considerable compiler effort to generate
    efficient results in other cases.

    Thanks for the suggestion. I wondered about that and I like it in
    principle but couldn't see how one would then sensibly (i.e. efficiently
    and in keeping with the rest of the language) then go on to write the
    data (whose length we would not know in advance, although we would know
    its maximum size) to the correct part of the buffer.


    I never said this would be easy!

    ...

    You think it is /obvious/ what "++E++" means?

    If you don't know the rules then its not obvious.

    Yes.


    If you know the rules then it's *blindingly* obvious. What's more,
    the rules are easy to learn.

    No.  If you see that written, it is blindingly obvious that the
    programmer is ...

    No, it is not. In languages in which 'nudge' operators are supported
    many programmers may write

      ++E

    as a subexpression if they want E to be incremented before it is
    evaluated. They may also write

      E++

    if they want E to be incremented after it is evaluated. And if the
    algorithm they are they are implementing calls for E to be incremented
    before and after then programmers should be able to code both. This is
    not about them being clever. It's about them being able to have the code naturally express the intent of the algorithm and to reflect the
    processing that's in the programmer's mind.

    All that's required, compared with C, is for the apparent evaluation
    order to be defined.



    I can appreciate that you want to give a meaning to "++E", that you want
    to give a meaning to "E++", and you expect programmers to use one or the
    other in different contexts. I can appreciate that you want to define
    order of evaluation within expressions.

    But I have yet to see any indication that "++E++" could ever be a
    sensible expression in any real code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Tue Nov 15 17:32:13 2022
    On 14/11/2022 15:23, David Brown wrote:
    On 14/11/2022 11:47, Bart wrote:

    ...

    In-place, value-returning increment ops written as ++ and -- are
    common in languages.


    Yes.  And bugs are common in programs.  Being common does not
    necessarily mean it's a good idea.

    (It doesn't necessarily mean it's a bad idea either - I am not implying
    that increment and decrement are themselves a major cause of bugs!  But mixing side-effects inside expressions /is/ a cause of bugs.)

    The side effects of even something awkward such as

    *(++p) = *(q++);

    are little different from those of the longer version

    p = p + 1;
    *p = *q;
    q = q + 1;

    The former is clearer, however. That makes it easier to see the intent..

    Just blaming operators you don't like is unsound - especially since, as
    you seem to suggest below, you use them in your own code!!!

    ...

    [discussion of ++ and -- operators]

    Is your point that you shouldn't have either of those operators?

    Yes!  What gave it away - the first three or four times I said as much?

    ...

    ... (Of course I use increment operator, especially in loops,
    because that's how C is written.  But a new language can do better than that.)

    If you think ++ and -- shouldn't exist then why not ban them from your
    own programming for a while before you try to get them banned from a new language?


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Tue Nov 15 19:05:35 2022
    On 15/11/2022 16:26, Bart wrote:
    On 15/11/2022 08:07, David Brown wrote:

    On 14/11/2022 19:43, Bart wrote:


    It requires some effort in Python and the result will be clunky (and
    probably require some add-on modules).

    import struct        # Standard module
    bs = open("potato_c.cof").read()

    machine, nsections, timestamp, symtaboffset, nsymbols,
    optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

    That's it.  Three lines.  I would not think of C for this kind of
    thing - Python is /much/ better suited.  I'd only start looking at C
    (or C++) if I need so high speed that the Python code was not fast
    enough, even with PyPy.

    I said it will be clunky and require add-on modules and it is and does.

    It is not "clunky" by any sane view - certainly not compared to your
    code (or code written in C). And no, it does not require add-on modules
    - the "struct" module is part of Python.

    (BTW you might be missing an argument in that struct.unpack_from call.)

    No, I am not. There is an optional third argument, but it is optional.


    Using that approach for the nested structs and unions of my other
    example is not so straightforward. You basically have to fight for every field.

    You have to define every field in every language, or define the ones you
    want along with offsets to skip uninteresting data.


    The result is a tuple of unnamed fields. You really want a proper
    record, which is yet another add-on, with a choice of several modules depending on which set of characterics you need.

    You can do that in Python.


    The elements of the tuple are also normal Python variables. If you
    wanted to modify elements (which needs a mutable tuple anyway), they
    will not behave the same way as those packed types, and then you'd have
    to write the whole struct back (using .pack), and will need to know its provenance, which here has been lost. With a reference like mine, that
    is built-in.

    In short, it's a hack. But it's a typical approach using in scripting languages.

    In short, you are making up shit in an attempt to make your own language
    look better than other languages, because you'd rather say something
    silly than admit that any other language could be better in any way for
    any task.


    'struct' is also not a true Python module; it's a front end for an
    internal one called `_struct`, likely implemented in C, and almost
    certainly using pointers.


    Please re-think what you wrote there. I hope you can realise how
    ridiculous you are being.

    (Whereas, by having pointers as intrinsic features, I know I could
    implement such a module in my language, if I needed to. Think of it as a language building feature.)


    While a functional language will
    struggle (to be accurate, the programmer will struggle because they've >>  > chosen the wrong language).
    I don't believe that.  I am not familiar enough with Haskell to be
    able to give the code, but I have no doubts at all that someone
    experienced with Haskell will manage it fine.  IO is not hard in the
    language, and it has all the built-in modules needed for such interfaces.

    Haskell is apparently number 25 on the list of language popularities
    on Github, with about 0.4% usage.  That's not huge, but not
    insignificant either.  But then, it was never intended to be a major
    practical language - though some people and companies (Facebook uses
    it for content analysis) do use it for practical work.  It's main
    motivations are for teaching people good software development,
    developing new techniques, algorithms and methods, and figuring out
    what "works" and could be incorporated in other languages.

    I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.

    Anything that makes a big deal out of closures, continuations, currying, lambdas and higher order functions. I have little use for such things otherwise.


    So because /you/ don't understand these things or how they are used, you
    assume that people who /do/ understand them can't write programs in
    functional programming languages?

    Haskell is great for elegantly defining certain kinds of types and algorithms, not so good for reams of boilerplate code or UI stuff which
    is what much of programming is.

    As I mentioned in another posts, there are opinions, and there are
    /qualified/ opinions.

    It's /fine/ if functional programming doesn't interest you. No one can
    be interested in everything or learn about all kinds of programming.
    But wise people don't make categorical statements about topics they know nothing about.


    It doesn't even have loops (AIUI); one task of the EXE reader is to
    displays lists of sections, imports, exports, base relocations...

    Loops are such a basic requirement, and yet a language designer decides
    they don't need them. After all you can emulate any loop using
    recursion, no matter that it makes for less readable, less intuitive
    (and less efficient) code.


    It is that last feature that is most noticeable.  Most major modern
    languages are not pure functional languages in themselves, but contain
    aspects from functional programming.

    Yeah. It turns out that pretty every language you've heard of (except C)
    has higher-order functions. Too much pressure from academics I reckon.

    Such features have some very subtle behaviours which I find incredibly
    hard to get my head around.


    You are a smart guy. You could get your head around it quite easily, if
    only you were willing. (I can try and help, if you want - but only if
    you promise not to give up before you start, claim that it is all
    useless, ugly, or clunky without justification, and not to go off on a
    tangent about how much better your own language is.)

    (See the 'twice plus-three' example in https://en.wikipedia.org/wiki/Higher-order_function. It me ages to
    figure out what was going on there, and what was needed in an
    implementation to make it work.)

    Try pasting the C++ code into godbot.org, and compile with -O2 :

    auto twice = [](auto f)
    {
    return [f](int x) {
    return f(f(x));
    };
    };

    auto plus_three = [](int i)
    {
    return i + 3;
    };

    int foo()
    {
    auto g = twice(plus_three);

    return g(7);
    }

    The compiler happily turns "foo" into "return 13".

    But I agree that gcc has to work harder to implement this kind of thing
    than your own compiler for your own language.

      I can't think of any serious, popular language with significant
    development in the last decade that does not have lambdas and the
    ability to work with functions as objects.

    Exactly, and that is totally wrong. Too much attention is paid to
    academics who seem to know little about designing accessible languages.


    It's not academics that use these features - it is practical
    programmers. A big use of lambdas, for example, is in callbacks and
    event handlers - used all the time in GUI programs and Javascript.

    Academics may invent this kind of thing, and use languages like Haskell
    to play with them - but they are implemented in real languages because
    real programmers use them for real code. Rust, Go, C++ - these are not academics' languages.


    In Python, every function is really a variable initialised [effectively]
    to some anonymous function. Which means that with 100% of the functions,
    you can do this for any defined function F:

        F = 42

    Or, more subtly, setting it to any arbitrary functions. That sounds incredibly unsafe.

    Python /is/ unsafe - it's a very dynamic language, with little
    compile-time checking. It is checked at run-time.

    But no, Python does not have variables at all. It has /names/, that are references to objects. A function is an object, usually (but not
    necessarily) given a name with a "def" statement. That name can be
    rebound to a different object, just like any other name.


    So Python has immutable tuples, but mutable functions! Every identifier
    is a variable that you can rebind to something else.

    Functions are not mutable in Python. You misunderstand.


    With my scripting language, you can do that with exactly 0% of the
    defined functions. If you want mutable function references, /then/ you
    use a variable: G := F.


    Function pointers are not function objects.

    This is why I bring it up here - not because I think the OP should be
    making a functional programming language, but because I think he
    should be taking inspiration and incorporating ideas from that world.

    Sure, I've taken lots of ideas from functional languages, ones that
    still work in an imperative style. For example I use functions like
    head() and tail() from Haskell (except mine aren't lazy). I once had list-comps too, but they fell into disuse.

    OK.


    I don't have lambda functions, but I have a thing called deferred code,
    which I haven't yet gotten around to. The problem is figuring out
    exactly how in-depth the implementation should be, because no matter
    what you do, there will be yet another example from the FP world which reveals one more dimension you hadn't realised existed.

    There are always more possibilities.


    Then I start to think, I don't really want the people who might use a language like mine to need to bother their heads about it. Code should
    be clear and obvious; relying on the incredibly subtle and obscure
    behaviours associated with lambdas, closures et al, isn't.

    A major part of clarity is in the mind of the observer.


    You don't believe amateurs can add value to what mainstream languages
    can do. The trouble is that you don't appreciate the things that add
    value, so long as there is some unreadable, unsafe way to hack your way around a task.

    Amateurs can make good things. But the idea that a single amateur can revolutionise a particular field is almost, but not quite, a complete
    myth. It doesn't stop amateurs, and it doesn't stop them /occasionally/
    coming up with something great - whether it is a programming language, something in science, maths, or anything else. However, almost
    invariably, the successful ones are the ones that build a community
    around a good idea and work together.


    Nothing of that is /remotely/ worth making a new language and giving
    up on everything C - tools, compilers, developer familiarity,
    libraries, and all the rest.

    With an army of people behind it, such tools can be created for a new language.

    Yes. The main task is not pondering some insignificant variation on an existing language (even if that variation is, in some way, "better").
    The main task is coming up with something that inspires that army of
    people to get behind it.


    In the other group, I mentioned how C code is still predominantly
    32-bit, even on 64-bit hardware. That is a big one just by itself.

    But, what do I care? MY language /is/ fully 64-bit, I can do 1<<60
    without remembering to do 1ULL<60, it /has/ a module system, namespaces,
    the works, and that gives me a kick when I compare it to C.

    I'm not saying that these are not good things (though I might disagree
    with you on some of the details).  I am saying that it is not worth it.

    This is why we still have C, and why it is so popular in practice - it
    is not because anyone thinks it is a "perfect" language, it is because
    the benefits of the C ecosystem outweigh the small benefits of minor
    variations of the language.

    This is exactly why there can still be a place for a language like C,
    but tidied up and brought up-to-date without all its baggage. People
    want a language they feel they know inside-out, and can be made to do
    anyway; they want to feel in charge and confident.


    Where are all these people that want something almost, but not quite,
    exactly like C?

    Except that the people creating alternatives, usually try to do much,
    and lose many of the attributes of C that make it attractive.

    And I think most of what you like could be achieved by using a subset
    of C++ along with a few template libraries.  (To be fair, that was
    certainly not the case when you started your language.)

    Templates have problems. Whatever problem they are a solution too, needs
    to be done another way if you want a language that is much, much simpler
    and faster to build than C++.

    But what if you don't care that C++ needs a complex compiler, because
    you are not a compiler writer? That applies to 99.999% of programmers.


    Ban side-effects in expressions, and you have :

         A[i] := 0
         i = i + 1

    It is not hard.

    It IS hard. Did you miss the bit where I said that expressions and
    statements are interchangeable in an expression-based language? So you
    HAVE to allow anything within those square brackets.

    You and James are forever seeing problems - you think you are /forced/
    into decisions. Take responsibility - you don't /have/ to allow
    anything you don't want to allow.


    I use 'unit' to refer to any expression-or-statement. So the syntax for
    an array index is A[unit] with a single unit, however my example used a sequence of three, not allowed. But that just means I'd have to write it
    like this:

        A[(t:=i; i:=i=1; t)] := 0

    And of course, a large proportion of increments are in loops.  So now
    you have (mixing syntaxes from different languages to avoid prejudice) :

         for i in range(10) {
             A[i] = 0
         }

    Or :

         for a& in A {
             a = 0
         }

    Or :
         A = [0 for a in A]

    Or :
         A = [0] * 10

    Or :
         A.set(0)

    Or :

         A = [0 .. ]

    Or :

         A = [0 .. ][range(A)]

    There are endless choices here, none of which need an increment
    operator, or pointers.

    And in FP, you don't have loops, or assignments.

    You have recursion, so you don't need loops. And you have assignment -
    you just don't have /re-assignment/.

    At this rate there
    won't be anything left! Why won't we all just code in lambda calculus as
    that can apparently represent any program.


    Why don't you just make a Turing machine? It is an imperative language,
    and it's really quite simple.

    There are still plenty of increments outside of loops (actually I don't
    use for-loops much in my programs, mainly in smaller contexts), as well
    as inside loops when you're incrementing something that doesn't happen
    to be the loop index.

    This discussion is about whether to have a shorter way of writing:

      <expr> := <the exact same expr> + 1

    And whether that is:

      <expr> +:= 1

    or either of:

      ++<expr>
      <expr>++

    (This example uses the non-value-returning variety.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Tue Nov 15 19:09:04 2022
    On 15/11/2022 17:31, David Brown wrote:
    On 15/11/2022 17:58, James Harris wrote:
    On 15/11/2022 14:22, David Brown wrote:
    On 15/11/2022 12:44, James Harris wrote:

    ...

    Do you also believe that the Unix

       bytes = read(fd, &buf[1], reqd);

    should be prohibited since it has the side effect within the
    expression of modifying the buffer? If so, what would you replace it
    with??


    As I said before (a couple of times at least), function calls are
    another matter and should be considered separately.

    Then you wouldn't be able to prevent a programmer coding

       a = b + nudge_up(&c) + d;

    Why wouldn't I (as a language designer) be able to prevent that?

    The question is not whether prevention would be possible but whether you
    (i.e. DB) would consider it /advisable/. If you prevented it then a lot
    of familiar programming patterns and a number of existing APIs would
    become unavailable to you so choose wisely...! :-)

    ...

    You assume /so/ many limitations on what you can do as a language
    designer.  You can do /anything/.  If you want to allow something, allow it.  If you want to prohibit it, prohibit it.

    Sorry, but it doesn't work like that. A language cannot be built on
    ad-hoc choices such as you have suggested. In this very thread you've
    suggested I prohibit certain operator combinations, and that I ban side
    effects in expressions but maybe not necessarily those from parameters
    in function calls. It's not that simple. If a language designer were to
    'pick and mix' like that the resultant language would be a nightmare to
    learn and use. There has to be a language 'ethos' - i.e. an overall
    approach it takes - and it has to follow consistent principles if it is
    going to be a good design rather than a bad one.

    ...

    BTW, any time one thinks of 'treating X separately' it's good to be
    wary. Step-outs tend to make a language hard to learn and awkward to use.


    So do over-generalisations.

    Not really. It's ad-hoc rules which become burdensome. By contrast,
    saying any operator can be 'adjacent' to any other as long as the types
    are honoured makes learning a language more logical. It may give the
    programmer freedoms you personally don't like but they make the language
    easier to learn and use.

    Seriously, try designing a language, yourself. You don't have to
    implement it. Just try coming up with a cohesive design of something you
    would like to program in.



    One possibility is to distinguish between "functions" that have no
    side effects and can therefore be freely mixed, re-arranged,
    duplicated, omitted, etc., and "procedures" that have side-effects
    and must be called exactly as requested in the code.  Such
    "procedures" would not be allowed in expressions - only as statements
    or part of assignment statements.

    Classifying functions by whether they have side effects or not is not
    as clear-cut as it may at first appear. Please see the thread I
    started today on functional programming.


    You'll notice I've replied to it :-)

    :-)

    ...

    All that's required, compared with C, is for the apparent evaluation
    order to be defined.



    I can appreciate that you want to give a meaning to "++E",

    No, I don't! You have this all wrong. The reason for considering the
    inclusion of the operators we have been discussing in this thread is to
    allow a more natural style of expression for algorithms that it suits.
    You seem to keep thinking the goal is to attribute meaning to symbols.
    That's not so.


    that you want
    to give a meaning to "E++", and you expect programmers to use one or the other in different contexts.  I can appreciate that you want to define
    order of evaluation within expressions.

    I don't /want/ to define the order of evaluation; I /do/ define the
    (apparent) order of evaluation. That's part of my language's ethos. If
    I, in addition, permit ++ etc and dereference then their apparent order
    /has/ to be defined, and it now has been.


    But I have yet to see any indication that "++E++" could ever be a
    sensible expression in any real code.

    Bart came up with an example something like

    +(+(+(+ x)))

    That's not at all sensible. You want that banned, too?


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Tue Nov 15 19:22:39 2022
    On 15/11/2022 18:05, David Brown wrote:
    On 15/11/2022 16:26, Bart wrote:

    I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.

    Anything that makes a big deal out of closures, continuations,
    currying, lambdas and higher order functions. I have little use for
    such things otherwise.


    So because /you/ don't understand these things or how they are used, you assume that people who /do/ understand them can't write programs in functional programming languages?

    No. It's nothing I've ever used, and unlikely ever to use. I like my
    functions plain, and static, just like you prefer your expressions to
    use only simple operators with no side-effects.

    I still can't comprehend why YOU think this stuff is simple and obvious,
    yet you are stumped by an increment of a pointer followed by a dereference.

    On a list of must-haves for a programming language, not only would they
    not be at the top of my list, they wouldn't even be at the bottom!


    Haskell is great for elegantly defining certain kinds of types and
    algorithms, not so good for reams of boilerplate code or UI stuff
    which is what much of programming is.

    As I mentioned in another posts, there are opinions, and there are /qualified/ opinions.

    My opinion comes from 20 years of writing code to /get things done/ in a working environment. Which includes developing the languages and
    choosing the features that that best made that possible. Never once did
    I think that 'currying' was going to dramatically transform how I coded;
    never did I spend days working around the omissions of closures.

    Such features have some very subtle behaviours which I find incredibly
    hard to get my head around.


    You are a smart guy.  You could get your head around it quite easily, if only you were willing.

    No, it is hard, obscure, subtle. Take my word for it.

    Try pasting the C++ code into godbot.org, and compile with -O2 :

    The compiler happily turns "foo" into "return 13".

    So what does that mean? C++ clearly has support for this, and some optimisations which can collapse the function calls into a constant
    value. That tells us nothing about how hard the task is or how hard it
    is understand exactly why the task is more difficult that it at first looks.

    I discussed on Reddit how such a thing would look in my dynamic language
    if I decided to implement it, which is like this:

    fun twice(f) = {x:f(f(x))} # {...} is my deferred code syntax
    fun plusthree(x) = x+3 # 'fun' is for one-liners

    g := twice(plusthree)
    println g(7)

    I instead tried with a mock-up, which had two components: the
    transformations my bytecode compiler would do, and the support code that
    would need to be supplied by the language. The mock-up within the
    working language looked like this:

    # Transformed user code
    fun af$1(x, f, g) = f(f(x))
    fun twice(f) = makecls(af$1, f)
    fun plusthree(x) = x+3

    g := twice(plusthree)
    println callcls(g, 7)

    # Emulating interpreter support
    record cls = (var fn, a, b)

    func makecls(f, ?a, ?b)=
    cls(f, a, b)
    end

    func callcls(c, x)=
    fn := c.fn
    fn(x, c.a, c.b)
    end

    This produced the correct result. Enough worked also so that `twice`
    could be called again with a different argument, while the original `g`
    still worked. (A cruder implementation could hardcode things so that,
    while it produced '13', it would only work with a one-time argument to `twice`.)

    This is where it turned out that there were further refinements needed
    to make it work with more challenging examples.

    In the end I didn't do the necessary changes as, while intriguing to
    work on, mere box-ticking was not a worthwhile use of my time, nor a
    worthwhile complication in my product, since I was never going to use it.


    It's not academics that use these features - it is practical
    programmers.  A big use of lambdas, for example, is in callbacks and
    event handlers - used all the time in GUI programs and Javascript.

    Yes, that was my deferred code feature, itself deferred. (It means I
    have to instead define an explicit, named function.)


    Academics may invent this kind of thing, and use languages like Haskell
    to play with them - but they are implemented in real languages because
    real programmers use them for real code.  Rust, Go, C++ - these are not academics' languages.

    I find idiomatic Rust incomprehensible.


    In Python, every function is really a variable initialised
    [effectively] to some anonymous function. Which means that with 100%
    of the functions, you can do this for any defined function F:

         F = 42

    Or, more subtly, setting it to any arbitrary functions. That sounds
    incredibly unsafe.

    Python /is/ unsafe - it's a very dynamic language, with little
    compile-time checking.  It is checked at run-time.

    And my dynamic language is a lot less dynamic, so is safer, but of
    course Python is superior.

    But no, Python does not have variables at all.  It has /names/, that are references to objects.  A function is an object, usually (but not necessarily) given a name with a "def" statement.  That name can be
    rebound to a different object, just like any other name.


    So Python has immutable tuples, but mutable functions! Every
    identifier is a variable that you can rebind to something else.

    Functions are not mutable in Python.  You misunderstand.

    I've used 'mutable' to mean two things: in-place modification of an
    object, and being able to re-bind a name to something else. These are
    conflated everywhere, but I reckoned people would get the point.

    In Python, a function like this:

    def F():
    pass

    is more or less equivalent to:

    def __0001():
    pass
    F = __0001

    Effectively any function is just a variable to which has been assigned
    some anonymous function (although in practice, the function retains its
    'F' identify even if the user's 'F' variable has been assigned a
    different value).

    The end result is the same: you can never be sure that 'F' still refers
    to that static function.

    99.99% of the time you never want such functions to change, so why make
    it possible? I can understand that in Python, a bytecode compiler might
    not know in advance what F is, but that can be mitigated.

    When I once experimented with such a language, any such tentative
    functions were initialised at runtime, but once initialised, could not
    be changed. So whether an identifier was the name of a function, module,
    class or variable, was set at runtime, then fixed.

    If you want it to be dynamic, then use a 'variable' (the clue is in the
    name).


    With my scripting language, you can do that with exactly 0% of the
    defined functions. If you want mutable function references, /then/ you
    use a variable: G := F.


    Function pointers are not function objects.

    Is there any practical difference?

    You don't believe amateurs can add value to what mainstream languages
    can do. The trouble is that you don't appreciate the things that add
    value, so long as there is some unreadable, unsafe way to hack your
    way around a task.

    Amateurs can make good things.  But the idea that a single amateur can revolutionise a particular field is almost, but not quite, a complete
    myth.

    I don't want to revolutionise everything. I just hope someone else
    would, but the current state of PL design to me looks dire.

    However I can take my ideas and use them myself, and sod everyone else;
    it's their loss.

    You seem convinced that that incredibly hackish and unprofessional way
    of accessing the contents of that executable file is just as good as
    doing it properly. Well carry on thinking that if you want.

    (I don't know what it is about scripting languages, and the way they
    eschew a feature as straightforward as a record with fields defined at compile-time. Either it doesn't exist, or they try and emulate such a
    thing badly and inefficiently.)

    This is exactly why there can still be a place for a language like C,
    but tidied up and brought up-to-date without all its baggage. People
    want a language they feel they know inside-out, and can be made to do
    anyway; they want to feel in charge and confident.


    Where are all these people that want something almost, but not quite,
    exactly like C?

    There are loads that want to extend C, or import some favourite feature
    of C++ into C, or some that want to write Python-like code in C. This is
    apart from the ones doing implementing yet another new take on a
    functional language.

    What I'm talking about however is the popularity of C; why would they
    use C, rather then the next one up which is C++?

    To me the answer is clear, I guess to you it's less so.

    But what if you don't care that C++ needs a complex compiler, because
    you are not a compiler writer?  That applies to 99.999% of programmers.

    I think people care when their project requires a long edit-run cycle.

    It IS hard. Did you miss the bit where I said that expressions and
    statements are interchangeable in an expression-based language? So you
    HAVE to allow anything within those square brackets.

    You and James are forever seeing problems - you think you are /forced/
    into decisions.  Take responsibility - you don't /have/ to allow
    anything you don't want to allow.

    It's you who don't like it, not me! As I've tried to explain, in an expression-based language, you can have statements inside expressions
    inside statements. Everything can have a side-effect.

    Even gnu-C has that feature.

    At this rate there won't be anything left! Why won't we all just code
    in lambda calculus as that can apparently represent any program.


    Why don't you just make a Turing machine?  It is an imperative language,
    and it's really quite simple.

    You've missed my point. It's not me reducing everthing down to a handful
    of features. Lambda-calculus is where you can easily end up.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Tue Nov 15 22:40:34 2022
    On 15/11/2022 20:09, James Harris wrote:
    On 15/11/2022 17:31, David Brown wrote:
    On 15/11/2022 17:58, James Harris wrote:
    On 15/11/2022 14:22, David Brown wrote:
    On 15/11/2022 12:44, James Harris wrote:

    ...

    Do you also believe that the Unix

       bytes = read(fd, &buf[1], reqd);

    should be prohibited since it has the side effect within the
    expression of modifying the buffer? If so, what would you replace
    it with??


    As I said before (a couple of times at least), function calls are
    another matter and should be considered separately.

    Then you wouldn't be able to prevent a programmer coding

       a = b + nudge_up(&c) + d;

    Why wouldn't I (as a language designer) be able to prevent that?

    The question is not whether prevention would be possible but whether you (i.e. DB) would consider it /advisable/. If you prevented it then a lot
    of familiar programming patterns and a number of existing APIs would
    become unavailable to you so choose wisely...! :-)


    I am not the language designer here - and I still don't really grok what
    kind of language /you/ want, what you understand from before, what uses
    it should have, or what you think is wrong with existing languages. (Or
    maybe this is all for fun and interest, which is always the best reason
    for doing anything.) That makes it hard to give recommendations.

    Some familiar programming patterns may not be usable, but that's always
    the case. And for some common patterns, I say good riddance! I can't
    see why there would be issues with API's, however - though you always
    need some kind of FFI wrapper.

    ...

    You assume /so/ many limitations on what you can do as a language
    designer.  You can do /anything/.  If you want to allow something,
    allow it.  If you want to prohibit it, prohibit it.

    Sorry, but it doesn't work like that.

    Yes, it does.

    A language cannot be built on
    ad-hoc choices such as you have suggested.

    I haven't suggested ad-hoc choices. I have tried to make reasoned
    suggestions. Being different from languages you have used before, or
    how you envision your new language, does not make them ad-hoc.

    In this very thread you've
    suggested I prohibit certain operator combinations, and that I ban side effects in expressions but maybe not necessarily those from parameters
    in function calls. It's not that simple. If a language designer were to
    'pick and mix' like that the resultant language would be a nightmare to
    learn and use. There has to be a language 'ethos' - i.e. an overall
    approach it takes - and it has to follow consistent principles if it is
    going to be a good design rather than a bad one.


    I agree with your philosophy here. I disagree that my suggestions don't
    fit with that - it's just that the "language ethos", as you call it
    (it's as good a term as any other) is different from what you imagined.

    ...

    BTW, any time one thinks of 'treating X separately' it's good to be
    wary. Step-outs tend to make a language hard to learn and awkward to
    use.


    So do over-generalisations.

    Not really.

    Yes, really.

    Imagine if you were to stop treating "letters", "digits" and
    "punctuation" separately, and say "They are all just characters. Let's
    treat them the same". Now people can name a function "123", or "2+2".
    It's conceivable that you'd work out a grammar and parsing rules that
    allow that (Forth, for example, has no problem with functions that are
    named by digits. You can redefine "2" to mean "1" if you like). Do you
    think that would make the language easier to learn and less awkward to use?

    It's ad-hoc rules which become burdensome.

    Agreed.

    By contrast,
    saying any operator can be 'adjacent' to any other as long as the types
    are honoured makes learning a language more logical. It may give the programmer freedoms you personally don't like but they make the language easier to learn and use.

    I don't see a need for operators like ++ and --, either as prefix or
    postfix. I don't see a need for assignment, either simple or complex
    (like "+=") as returning a value - neither an lvalue, or an rvalue.

    There's nothing arbitrary or ad-hoc about not having these in the
    language. Lots of languages have nothing like that.


    Seriously, try designing a language, yourself. You don't have to
    implement it. Just try coming up with a cohesive design of something you would like to program in.


    If I had the time... :-)

    I fully appreciate that this is not an easy task.



    One possibility is to distinguish between "functions" that have no
    side effects and can therefore be freely mixed, re-arranged,
    duplicated, omitted, etc., and "procedures" that have side-effects
    and must be called exactly as requested in the code.  Such
    "procedures" would not be allowed in expressions - only as
    statements or part of assignment statements.

    Classifying functions by whether they have side effects or not is not
    as clear-cut as it may at first appear. Please see the thread I
    started today on functional programming.


    You'll notice I've replied to it :-)

    :-)

    ...

    All that's required, compared with C, is for the apparent evaluation
    order to be defined.



    I can appreciate that you want to give a meaning to "++E",

    No, I don't! You have this all wrong. The reason for considering the inclusion of the operators we have been discussing in this thread is to
    allow a more natural style of expression for algorithms that it suits.
    You seem to keep thinking the goal is to attribute meaning to symbols.
    That's not so.


    that you want to give a meaning to "E++", and you expect programmers
    to use one or the other in different contexts.  I can appreciate that
    you want to define order of evaluation within expressions.

    I don't /want/ to define the order of evaluation; I /do/ define the (apparent) order of evaluation. That's part of my language's ethos. If
    I, in addition, permit ++ etc and dereference then their apparent order
    /has/ to be defined, and it now has been.


    But I have yet to see any indication that "++E++" could ever be a
    sensible expression in any real code.

    Bart came up with an example something like

      +(+(+(+ x)))

    That's not at all sensible. You want that banned, too?


    Yes :-) Seriously, I appreciate that there will always be compromises -
    trying to ban everything silly while allowing everything sensible would
    mean countless ad-hoc rules, and you are right to reject that. I am
    advocating drawing a line, just like you - the difference is merely a
    matter of where to draw that line. I'd draw the line so that it throws
    out the increment and decrement operators entirely. But if you really
    wanted to keep them, I'd make them postfix only and as statements, not
    in expressions - let "x++" mean "x += 1" which means "x = 1" which
    should, IMHO, be a statement and not allowed inside an expression.


    Of course, it might also be interesting to go in the other direction
    entirely and be very flexible with operators - let users define their
    own operators, using Unicode symbols, letters, and mixtures, both for
    their own types and existing ones. If someone wants to write code that involves a lot of squaring, then let them define operators so they can
    write "x = squareof y", or "x = y²". They'd be able to write more of a
    mess, but also be able to write some things very nicely.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Tue Nov 15 23:11:08 2022
    On 15/11/2022 18:05, David Brown wrote:
    On 15/11/2022 16:26, Bart wrote:

    import struct        # Standard module
    bs = open("potato_c.cof").read()

    machine, nsections, timestamp, symtaboffset, nsymbols,
    optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

    That's it.  Three lines.  I would not think of C for this kind of
    thing - Python is /much/ better suited.

    I don't believe you. The C equivalent of the big struct below will
    already exist; would you really waste time on composing a long string of
    H and I characters, hoping you don't make a mistake, and then have to
    spend time isolating the individual anonymous tuple elements by index?

    There are 45 fields here; 61 if you split 'imagedir' into its two
    components. A reminder: the code below is /already/ in my scripting
    language.

    type optionalheader=struct !exe/dll only
    wt_word magic
    byte majorlv
    byte minorlv
    wt_dword codesize
    wt_dword idatasize
    wt_dword zdatasize
    wt_dword entrypoint
    wt_dword codebase
    word64 imagebase
    wt_dword sectionalignment
    wt_dword filealignment
    wt_word majorosv
    wt_word minorosv
    wt_word majorimagev
    wt_word minorimagev
    wt_word majorssv
    wt_word minorssv
    wt_dword win32version
    wt_dword imagesize
    wt_dword headerssize
    wt_dword checksum
    wt_word subsystem
    wt_word dllcharacteristics
    word64 stackreserve
    word64 stackcommit
    word64 heapreserve
    word64 heapcommit
    wt_dword loaderflags
    wt_dword rvadims
    imagedir exporttable
    imagedir importtable
    imagedir resourcetable
    imagedir exceptiontable
    imagedir certtable
    imagedir basereloctable
    imagedir debug
    imagedir architecture
    imagedir globalptr
    imagedir tlstable
    imagedir loadconfigtable
    imagedir boundimport
    imagedir iat
    imagedir delayimportdescr
    imagedir clrheader
    imagedir reserved
    end


    I said it will be clunky and require add-on modules and it is and does.

    It is not "clunky" by any sane view - certainly not compared to your
    code (or code written in C).

    Most of my code is formatting the output. You access fields using coffptr.nsections for example.

      And no, it does not require add-on modules
    - the "struct" module is part of Python.

    (BTW you might be missing an argument in that struct.unpack_from call.)

    No, I am not.  There is an optional third argument, but it is optional.

    What about the second argument? I don't understand how the function call
    knows to get the data from 'bs'.


    Using that approach for the nested structs and unions of my other
    example is not so straightforward. You basically have to fight for
    every field.

    You have to define every field in every language, or define the ones you
    want along with offsets to skip uninteresting data.

    When properly supported, you can define the fields of a struct just as
    you would in any static language (see above example), and you can write handling code just as conveniently.

    You don't have to manually write strings of anonymous letter codes and
    have to remember their ordering everywhere they are used. That is just
    crass.

    I went out of my way to add such facilities in my scripting language,
    because I felt it was important. So you can code just as you would in a
    static language but with the convenience of informal scripting.

    Clearly you don't care for such things and prefer a hack.


    The result is a tuple of unnamed fields. You really want a proper
    record, which is yet another add-on, with a choice of several modules
    depending on which set of characterics you need.

    You can do that in Python.

    Yeah, I know, you can do anything in Python, since there is an army of
    people who will create the necessary add-on modules to create ugly and cumbersome bolted-on solutions.

    I can list dozens of things that my scripting language does better than
    Python. (Here, such a list exists: https://github.com/sal55/langs/blob/master/QLang/QBasics.md.)



    In short, you are making up shit in an attempt to make your own language
    look better than other languages, because you'd rather say something
    silly than admit that any other language could be better in any way for
    any task.

    Not at all. Python is better for lots of things, mainly because there
    are a million libraries that people have written for it, armies of
    volunteers who have written suitable, bindings or written all sorts of
    shit. And there is huge community and lots of resources to help out.

    It is also full of as many advanced, esoteric features that you could
    wish for.

    But it is short of the more basic and primitive features of the kind I
    use and find invaluable.

    'struct' is also not a true Python module; it's a front end for an
    internal one called `_struct`, likely implemented in C, and almost
    certainly using pointers.

    Please re-think what you wrote there.  I hope you can realise how
    ridiculous you are being.

    Tell me. Maybe struct.py could be written in pure Python; I don't know.
    I'm saying I guarantee mine would have the necessary features to do so.

    But this started off being about pointers. Here's another challenge:
    this program is for Windows, and displays the first 2 bytes of the
    executable image of the interpreter, as loaded in memory:

    println peek(0x400000, u16):"m"

    fun peek(addr, t=byte) = makeref(addr, t)^

    This displays 'MZ' (the signature of PE files on Windows). But of
    interest is how Python would implement that peek() function.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Tue Nov 15 23:30:25 2022
    On 15/11/2022 21:40, David Brown wrote:

    If someone wants to write code that
    involves a lot of squaring, then let them define operators so they can
    write "x = squareof y", or "x = y²".  They'd be able to write more of a mess, but also be able to write some things very nicely.

    I have such an operator, called `sqr`. And also briefly allowed the
    superscript version (as a postfix op), until Unicode came along and
    spoilt it all.

    One reason I had sqr was because it was in Pascal (iirc). But it
    genuinely comes in useful. Sure, I could also use x**2, but ** used to
    be only defined for floats, while `sqr` has been used for much longer.

    You could also ask why some languages have a dedicated `sqrt` function
    when they could just as easily do x**0.5 or pow(x**0.5).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Wed Nov 16 11:44:21 2022
    On 15/11/2022 10:06, Dmitry A. Kazakov wrote:
    On 2022-11-15 10:35, James Harris wrote:

    As I've already said, Unicode and HTML are fine for output. Where
    programmers work with the semantics of characters, however, they need
    characters to be in semantic categories, you know: letters, arithmetic
    symbols, digits, different cases, etc. So far I've not come across
    anything to support that multilingually. AISI what's needed is a way
    to expand character encodings to bit fields such as

       <category><base character><variant><diacritics><appearance>

    where

       category = group (e.g. alphabetic letters, punctuation, etc)
       base character = main semantic identification (e.g. an 'a')
       variant (e.g. upper or lower case)
       diacritics (those applied to this character in this location)
       appearance (e.g. a round 'a' or a printer's 'a' or unspecified)

    Note that that's purely about semantics; it doesn't include typefaces
    or character sizes or bold or italic etc which are all for rendering.

    I am not sure what are you trying to say.

    I am suggesting that a modern language should define a multilingual
    model for text /processing/. As part of that, programs need to work with different aspects of characters. Hence the bitfields, above.

    The Unicode characterization
    is defined in the file:

       https://unicode.org/Public/UNIDATA/UnicodeData.txt

    Thanks for the link.

    ...

    There is no problem with Unicode string literals whatsoever. You just
    place characters as they are. The only escape is "" for ". That is all.

    Two problems with that, AISI:

    1. some of the characters look like others

    2. discussing unrecognised characters with someone else!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Wed Nov 16 13:02:43 2022
    On 2022-11-16 12:44, James Harris wrote:
    On 15/11/2022 10:06, Dmitry A. Kazakov wrote:

    There is no problem with Unicode string literals whatsoever. You just
    place characters as they are. The only escape is "" for ". That is all.

    Two problems with that, AISI:

    1. some of the characters look like others

    For the reader, not for the compiler. If you want Unicode you get the
    whole package, homoglyphs included.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Wed Nov 16 17:02:43 2022
    On 16/11/2022 00:30, Bart wrote:
    On 15/11/2022 21:40, David Brown wrote:

    If someone wants to write code that involves a lot of squaring, then
    let them define operators so they can write "x = squareof y", or "x =
    y²".  They'd be able to write more of a mess, but also be able to
    write some things very nicely.

    I have such an operator, called `sqr`. And also briefly allowed the superscript version (as a postfix op), until Unicode came along and
    spoilt it all.


    Why would Unicode spoil it?

    One reason I had sqr was because it was in Pascal (iirc). But it
    genuinely comes in useful. Sure, I could also use x**2, but ** used to
    be only defined for floats, while `sqr` has been used for much longer.

    You could also ask why some languages have a dedicated `sqrt` function
    when they could just as easily do x**0.5 or pow(x**0.5).


    It's more common to see such functions in a language's library than part
    of the core language. You'd have "sqrt" as a function because it is far
    and away the most common use of raising something to a fractional power,
    much more familiar from school mathematics, and much more efficient to implement than a general power function.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Wed Nov 16 17:50:56 2022
    On 15/11/2022 20:22, Bart wrote:
    On 15/11/2022 18:05, David Brown wrote:
    On 15/11/2022 16:26, Bart wrote:

    I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.

    Anything that makes a big deal out of closures, continuations,
    currying, lambdas and higher order functions. I have little use for
    such things otherwise.


    So because /you/ don't understand these things or how they are used,
    you assume that people who /do/ understand them can't write programs
    in functional programming languages?

    No. It's nothing I've ever used, and unlikely ever to use. I like my functions plain, and static, just like you prefer your expressions to
    use only simple operators with no side-effects.


    Of course you won't use something when you won't even consider trying to
    learn about it.

    I still can't comprehend why YOU think this stuff is simple and obvious,
    yet you are stumped by an increment of a pointer followed by a dereference.


    I haven't written anything to suggest that I am "stumped" by this. My
    point was to say it is unnecessary to support such expressions in a
    programming language, and a language may be better in some ways if it
    does not allow increment operators or even pointers.

    Now, it is undeniable /fact/ that programming languages do not need
    operators such as increment, or other operators that cause side-effects.
    It is undeniable fact that programming languages do not need pointers.
    There are countless languages that have neither of these things, and
    they have been used successfully for all sorts of purposes. You don't
    even have to look at functional programming languages or other "it's all
    too complicated for me" languages - neither Python nor Pascal has
    increment operators, Python has no pointers. BASIC has been a
    phenomenally successful language, especially for people who like simple languages, and has neither.

    It is also objective fact that not having these features has advantages
    and disadvantages (I've gone through these enough already).

    Whether a given language is /better overall/ with or without these
    features - that is an entirely different question. It is very
    subjective, and depends highly on the kind of language (it's "ethos", as
    James puts it), the kind of use it will see, and the kind of people who
    will use it. That's why I raise suggestions here, rather than giving recommendations. (On other topics, such as James' - forgive the strong language - insane ideas about character encodings, I have given recommendations.)

    On a list of must-haves for a programming language, not only would they
    not be at the top of my list, they wouldn't even be at the bottom!


    Yes, but for you, a "must-have" list for a programming language would be
    mainly "must be roughly like ancient style C in functionality, but with
    enough change in syntax and appearance so that no one will think it is
    C". If that's what you like, and what pays for your daily bread, then
    that's absolutely fine.

    And there's no doubt that a large proportion of programmers go through
    their career without ever considering higher order functions (functions
    that operate on or return functions).

    But equally there's no doubt that they /are/ useful for many people in
    many types of coding. Sometimes higher order functions are used without
    people knowing about them - Python decorators are a fine example.

    Actually, Python declarators are such a good example that I recommend
    this link <https://realpython.com/primer-on-python-decorators/> that
    gives a number of useful examples.


    Think of this example. You have some code with functions "foo", "bar"
    and "foobar". Mostly you call them as they are in your code.

    But sometimes you want to "decorate" the calls for ease of debugging -
    you want to print out "Starting foo with parameters..." at the start,
    and "Returning from foo with result..." at the end. You don't want to
    change "foo" itself, because you only want this tracing sometimes.

    So you write :

    int debug_foo(int x, double y, char * z) {
    printf("Starting foo with parameters %i %f %p\n",
    x, y, z);
    int r = foo(x, y, z);
    printf("Returning from foo with result %i\n", r);
    return r;
    }

    That's fine - now you call "debug_foo" instead of "foo" in the cases you
    want. You can even use "#define foo debug_foo" to make it active
    throughout the rest of the file.

    Then you need to do it again for "bar".

    Then you need to do it again for "foobar".

    Then you decide to add timings to the traces, and have to re-do all
    three debug functions.

    Then you need to copy it again for "barfoo", and debug the typo you made
    in "debug_foobar".

    You start thinking, "I know macros are evil, but maybe they can be used
    to automate this somehow?". You get an ugly but workable solution
    letting you write:

    MAKE_DEBUG_FUNC(debug_foo, foo);
    MAKE_DEBUG_FUNC(debug_bar, bar);

    Then you realise that it doesn't work for "foobar", because that only
    takes two parameters. And now you have two macros "MAKE_DEBUG_FUNC_idp"
    and "MAKE_DEBUG_FUNC_id".

    Then you start wondering if you can make a macro that makes the macros,
    or if you should thrown the computer out the window.


    Alternatively, you could make a higher order function in a language that supports it. This is in C++20 (it's slightly neater than early C++
    versions can do). I'm not claiming that this is at all clear or obvious
    to people with experience only in imperative languages - understanding
    how to make something like this takes effort and practice. And the
    syntax is C++ style, which will be unfamiliar to people used to
    functional programming languages. But I wanted to write it out, as a
    working function.

    #include <iostream>

    auto debug(auto const& f) {
    return [&f](auto... args) {
    std::cout << "Calling ";
    ((std::cout << " " << args), ...);
    std::cout << "\n";
    auto r = f(args...);
    std::cout << "Returning " << r << "\n";
    return r;
    };
    }

    Suppose your real functions are :

    int foo(int x);
    int bar(int x, double y);
    double foobar(int x, double y, const char * p);

    Your original code was:

    int a = foo(10);
    int b = bar(20, 3.14);
    double c = foobar(30, 2.71828, "Hello");


    Your debug version is :

    int a = debug(foo)(10);
    int b = debug(bar)(20, 3.14);
    double c = debug(foobar)(30, 2.71828, "Hello");

    Note that the "debug" function is applied to "foo", and returns a
    function that is then called with "10" as the parameter.

    You can also write (at file scope, or inside a function) :

    auto debug_foo = debug(foo);

    and use :
    int a = debug_foo(10);





    None of this gives you things you could not do by hand. But if you find yourself doing the same thing by hand many times, then it is natural to
    ask if it can be automated - if you can write a function to do that.
    You can, if you have higher order functions.

    (C++ is not perfect here by any means, and more could be added to the
    language. For example, Python lets you make functions that take classes
    as parameters or return classes. C++ does not (yet) have that level of metaprogramming.)



    Haskell is great for elegantly defining certain kinds of types and
    algorithms, not so good for reams of boilerplate code or UI stuff
    which is what much of programming is.

    As I mentioned in another posts, there are opinions, and there are
    /qualified/ opinions.

    My opinion comes from 20 years of writing code to /get things done/ in a working environment. Which includes developing the languages and
    choosing the features that that best made that possible. Never once did
    I think that 'currying' was going to dramatically transform how I coded; never did I spend days working around the omissions of closures.


    That's like saying you have 20 years of experience as a taxi driver, and
    never once had to use "flaps" or "ailerons", or even think about the
    concept. You therefore can't understand why pilots want to use them all
    the time. You can give a qualified opinion on driving round roundabouts
    and may be an expert on gearing, but you have no basis for a qualified
    opinion on flying.

    So again - mocking and dismissing concepts that you know nothing about,
    makes you look foolish. (Your ignorance of the topic is not the issue -
    we are all ignorant of almost everything.)

    Such features have some very subtle behaviours which I find
    incredibly hard to get my head around.


    You are a smart guy.  You could get your head around it quite easily,
    if only you were willing.

    No, it is hard, obscure, subtle. Take my word for it.


    No, I will not take your word for it. You know nothing about it.


    Try pasting the C++ code into godbot.org, and compile with -O2 :

    The compiler happily turns "foo" into "return 13".

    So what does that mean? C++ clearly has support for this, and some optimisations which can collapse the function calls into a constant
    value. That tells us nothing about how hard the task is or how hard it
    is understand exactly why the task is more difficult that it at first
    looks.

    True. I just wanted to show that an implementation can give efficient
    results from high order functions.

    If you want a blow-by-blow explanation of the code, I can give it - but
    only if you want it.



    I discussed on Reddit how such a thing would look in my dynamic language
    if I decided to implement it, which is like this:

        fun twice(f) = {x:f(f(x))}       # {...} is my deferred code syntax
        fun plusthree(x) = x+3           # 'fun' is for one-liners

        g := twice(plusthree)
        println g(7)


    That looks okay to me (though it would look a /lot/ better - IMHO - if
    you used spaces more often. It would also stop newsreaders trying to
    turn your code into smilies :-) ).

    I instead tried with a mock-up, which had two components: the
    transformations my bytecode compiler would do, and the support code that would need to be supplied by the language. The mock-up within the
    working language looked like this:

        # Transformed user code
        fun af$1(x, f, g) = f(f(x))
        fun twice(f) = makecls(af$1, f)
        fun plusthree(x) = x+3

        g := twice(plusthree)
        println callcls(g, 7)

        # Emulating interpreter support
        record cls = (var fn, a, b)

        func makecls(f, ?a, ?b)=
            cls(f, a, b)
        end

        func callcls(c, x)=
            fn := c.fn
            fn(x, c.a, c.b)
        end

    This produced the correct result. Enough worked also so that `twice`
    could be called again with a different argument, while the original `g`
    still worked. (A cruder implementation could hardcode things so that,
    while it produced '13', it would only work with a one-time argument to `twice`.)


    One thing stands out here - it looks like you are trying to make "twice"
    into a stand-alone, run-time function. That can be done in interpreted languages, but in compiled languages it is, at best, inefficient. The
    normal method would be to consider "twice" as a compile-time
    metafunction, so that when the compiler sees "twice(plusthree)" it
    generates an anonymous function { x : plusthree(plusthree(x)) }, which
    can be compiled normally. "g" is effectively a function pointer
    initialised to this anonymous function.


    This is where it turned out that there were further refinements needed
    to make it work with more challenging examples.

    In the end I didn't do the necessary changes as, while intriguing to
    work on, mere box-ticking was not a worthwhile use of my time, nor a worthwhile complication in my product, since I was never going to use it.


    It's not academics that use these features - it is practical
    programmers.  A big use of lambdas, for example, is in callbacks and
    event handlers - used all the time in GUI programs and Javascript.

    Yes, that was my deferred code feature, itself deferred. (It means I
    have to instead define an explicit, named function.)


    That's the impression I got. I don't know how you handle captures of
    local variables (if you do so at all).

    In some languages, such as Lua, /all/ functions are anonymous
    (lambda's). When you write "function foo(x) ..." in Lua, it is
    syntactic sugar for "foo = function (x) ...".


    Academics may invent this kind of thing, and use languages like
    Haskell to play with them - but they are implemented in real languages
    because real programmers use them for real code.  Rust, Go, C++ -
    these are not academics' languages.

    I find idiomatic Rust incomprehensible.


    I haven't tried Rust enough to comment. I'd want a minimum of a
    dedicated long weekend learning and trying it before I could say if I
    thought it was going to work out for me or not.


    In Python, every function is really a variable initialised
    [effectively] to some anonymous function. Which means that with 100%
    of the functions, you can do this for any defined function F:

         F = 42

    Or, more subtly, setting it to any arbitrary functions. That sounds
    incredibly unsafe.

    Python /is/ unsafe - it's a very dynamic language, with little
    compile-time checking.  It is checked at run-time.

    And my dynamic language is a lot less dynamic, so is safer, but of
    course Python is superior.


    Let's just say that the ecosystem for Python is large enough to call it
    a successful language. It's choices of trade-offs between flexibility
    and static error checking seem to be fine for many uses. (Python is
    "safe" in other ways, of course - but the ability to re-bind names to
    different objects gives an easy way to make mistakes that are not found
    until run-time testing.)

    But no, Python does not have variables at all.  It has /names/, that
    are references to objects.  A function is an object, usually (but not
    necessarily) given a name with a "def" statement.  That name can be
    rebound to a different object, just like any other name.


    So Python has immutable tuples, but mutable functions! Every
    identifier is a variable that you can rebind to something else.

    Functions are not mutable in Python.  You misunderstand.

    I've used 'mutable' to mean two things: in-place modification of an
    object, and being able to re-bind a name to something else. These are conflated everywhere, but I reckoned people would get the point.


    They are not conflated by /me/ - they are totally different concepts.
    If you choose to refer to yourself as "Bartholomew" rather than "Bart",
    it is quite different from turning yourself into an octopus.

    But I'll assume you are just mixing up terms, rather than
    misunderstanding fundamental concepts of Python. (And again, ignorance
    is not a problem - it's possible to do a lot of practical programming in
    Python without really understanding that it does not have variables.)

    In Python, a function like this:

        def F():
            pass

    is more or less equivalent to:

       def __0001():
           pass
       F = __0001


    If you like, yes.

    Effectively any function is just a variable to which has been assigned
    some anonymous function (although in practice, the function retains its
    'F' identify even if the user's 'F' variable has been assigned a
    different value).

    Python does not have variables. It has /identifiers/. Change
    "variable" for "identifier" in your description, and "assigned" to
    "bound", and you've got it right.


    The end result is the same: you can never be sure that 'F' still refers
    to that static function.

    99.99% of the time you never want such functions to change, so why make
    it possible? I can understand that in Python, a bytecode compiler might
    not know in advance what F is, but that can be mitigated.

    Python is flexible that way - identifiers can be bound and rebound to
    almost anything.

    As is apparent from my posts, I usually prefer a stricter language.
    Python makes it easy to write powerful code in relatively few lines. It
    also makes it easy to make mistakes that would be caught by a compiler
    of a stricter language. That's the balance it picks.


    When I once experimented with such a language, any such tentative
    functions were initialised at runtime, but once initialised, could not
    be changed. So whether an identifier was the name of a function, module, class or variable, was set at runtime, then fixed.

    If you want it to be dynamic, then use a 'variable' (the clue is in the name).


    With my scripting language, you can do that with exactly 0% of the
    defined functions. If you want mutable function references, /then/
    you use a variable: G := F.


    Function pointers are not function objects.

    Is there any practical difference?


    Yes, but there is a good bit of overlap in how you can use them. And if
    there is no way in a language to manipulate functions, then function
    pointers are all you need.

    You don't believe amateurs can add value to what mainstream languages
    can do. The trouble is that you don't appreciate the things that add
    value, so long as there is some unreadable, unsafe way to hack your
    way around a task.

    Amateurs can make good things.  But the idea that a single amateur can
    revolutionise a particular field is almost, but not quite, a complete
    myth.

    I don't want to revolutionise everything. I just hope someone else
    would, but the current state of PL design to me looks dire.

    However I can take my ideas and use them myself, and sod everyone else;
    it's their loss.

    You seem convinced that that incredibly hackish and unprofessional way
    of accessing the contents of that executable file is just as good as
    doing it properly. Well carry on thinking that if you want.


    And you seem convinced that the Python code I showed is "hackish" and "unprofessional". I honestly have no idea why anyone might think that,
    even if that person had never heard of programming other than in C or C variants.

    (I don't know what it is about scripting languages, and the way they
    eschew a feature as straightforward as a record with fields defined at compile-time. Either it doesn't exist, or they try and emulate such a
    thing badly and inefficiently.)

    The code works fine - it is clear and simple, shorter than in your
    language, and easy to modify and maintain.

    If you prefer to think of structures matching C struct definitions
    (which are /one/ way to describe a file format, but certainly not the
    only way), you can use the "ctypes" Python module and define a structure.


    This is exactly why there can still be a place for a language like C,
    but tidied up and brought up-to-date without all its baggage. People
    want a language they feel they know inside-out, and can be made to do
    anyway; they want to feel in charge and confident.


    Where are all these people that want something almost, but not quite,
    exactly like C?

    There are loads that want to extend C, or import some favourite feature
    of C++ into C, or some that want to write Python-like code in C. This is apart from the ones doing implementing yet another new take on a
    functional language.

    Relatively speaking - compared to the people that just use C - there are
    very few. There are some, but the "extended C" languages rarely have
    any relevance or popularity (baring some extensions of major compiler suppliers). The languages that are successful are those that do
    something /significantly/ different - enough of a difference to make it
    worth leaving behind the tools, the experience, the colleagues, the code
    from their old choice of language. It's a high bar, and not one
    achieved by saying "I think "int * p;" is backwards - I'll make an
    alternative to C that has "ref int p" instead".


    What I'm talking about however is the popularity of C; why would they
    use C, rather then the next one up which is C++?

    To me the answer is clear, I guess to you it's less so.

    There are /many/ reasons. Unwillingness to learn something new that
    takes a significant effort - your /real/ reason - is certainly one of
    them. But it is not the only one.


    But what if you don't care that C++ needs a complex compiler, because
    you are not a compiler writer?  That applies to 99.999% of programmers.

    I think people care when their project requires a long edit-run cycle.


    Yes. But as long as it is fast enough, they don't care if it is faster.
    And they'd rather have good than fast - I'd rather have a 10 minute
    compile time that found my bugs then, than a 10 second compile time and
    only see the bugs after 10 hours of run-time testing. Of course, I'd
    rather have a 10 second compile time. 1 second would be nice, but not
    more useful. Any speedup beyond 1 second is irrelevant.

    It IS hard. Did you miss the bit where I said that expressions and
    statements are interchangeable in an expression-based language? So
    you HAVE to allow anything within those square brackets.

    You and James are forever seeing problems - you think you are /forced/
    into decisions.  Take responsibility - you don't /have/ to allow
    anything you don't want to allow.

    It's you who don't like it, not me! As I've tried to explain, in an expression-based language, you can have statements inside expressions
    inside statements. Everything can have a side-effect.


    Again - you have the feature because you /choose/ to have it. That's
    fine - if that's what you want, great. I've no problem with that. But
    I /do/ have a problem when you (or James) say you /have/ to allow this,
    or are /forced/ to do that, or can't prohibit something else.

    Even gnu-C has that feature.

    At this rate there won't be anything left! Why won't we all just code
    in lambda calculus as that can apparently represent any program.


    Why don't you just make a Turing machine?  It is an imperative
    language, and it's really quite simple.

    You've missed my point. It's not me reducing everthing down to a handful
    of features. Lambda-calculus is where you can easily end up.


    No one is reducing everything down to a handful of features. I am
    suggesting - not recommending - that it is possible to remove the
    feature of allowing side-effects in expressions, and gaining the
    features of clearer code that is easier to optimise and easier to
    confirm correctness.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Wed Nov 16 20:04:01 2022
    On 16/11/2022 16:50, David Brown wrote:
    On 15/11/2022 20:22, Bart wrote:

    Of course you won't use something when you won't even consider trying to learn about it.

    I've thought about learning Chinese. Then I decided there was no point.

    I still can't comprehend why YOU think this stuff is simple and
    obvious, yet you are stumped by an increment of a pointer followed by
    a dereference.


    I haven't written anything to suggest that I am "stumped" by this.  My
    point was to say it is unnecessary to support such expressions in a programming language, and a language may be better in some ways if it
    does not allow increment operators or even pointers.

    It is something I value, but you don't. And higher order functions are something you value, but I don't.

    That's all it is.


    Now, it is undeniable /fact/ that programming languages do not need
    operators such as increment, or other operators that cause side-effects.

    Just `a := b` causes a side effect. Possibly quite a big one if 'b' is a substantial data structure and ':=' does a deep copy.

    There is usually a task to be done. In `A[++i] := 0`, I want two things
    to change, which is going to happen whether write it like that, or as `i
    := i+1; A[i] := 0`. So why write `i` 3 times?

    It is not a big deal. Maybe in functional programming it might be, but
    here *I* am specifying the paradigm and I say it's OK.

    I'm not asking you or anyone else to use my language.


    Yes, but for you, a "must-have" list for a programming language would be mainly "must be roughly like ancient style C in functionality, but with enough change in syntax and appearance so that no one will think it is
    C".  If that's what you like, and what pays for your daily bread, then that's absolutely fine.

    Yes, I don't need a higher level language for what I use it for. But
    there are still dozens of things which make the experience superior to
    just using C. Ones either you genuinely don't appreciate, or are just
    pissing on for the sake of it.

    * Case-insensitive
    * 1-based and N-based
    * Algol-style syntax, line-oriented and largely semicolon-free, and sane
    type syntax
    * Module scheme (define everything in exactly one place)
    * Namespaces
    * Encapsulation (define functions inside records etc)
    * Out-of-order definitions including easy mutual record cross-references
    * Regular i8-i64 and u8-u64 type denotations, include 'byte' (u8)
    * Default 64-bit 'int', 'word' types, and 64-bit integer constants
    * Built-in print and read statements WITH NO STUPID FORMAT CODES
    * Keyword and default function parameters
    * Fewer, more intuitive operator precedences
    * Does not conflate arrays and pointers
    * 'Proper' for loops; for-in loops
    * Separate 'switch' and 'case' selection; the latter has no restrictions
    (and no stupid fallthrough on switch)
    * Proper named constants
    * Break out of nested loops
    * Embed strings and binary files
    * 'Tabledata' and 'enumdata' features (compare with X-macros)
    * Function reflection
    * Built-in, overloaded ops like abs, min, max
    * 'Properties' such as .len and .lwb
    * Built-in 'swap'
    * Bit/field extracion/insertion syntax
    * Multiple function return values
    * Multiple assignment
    * Slices (including slices of char arrays to give counted strings)
    * Doc strings
    * Whole-program compiler that does not need a separate build system
    * Pass-by-reference
    * Value arrays


    Yeah, just like C! If you think this lot is just C with a paint-job,
    then you're in denial.

    Of course, I fully expect you to be completely dismissive of all of
    this. I wouldn't swap any of these for higher-order functions.


    And there's no doubt that a large proportion of programmers go through
    their career without ever considering higher order functions (functions
    that operate on or return functions).

    Too right. To be able to use such things, they MUST be 100% intuitive
    and be usable with 100% confidence. But that's just the author; you need
    to consider other readers of your code too, and those who have to
    maintain it.

    To me they are a very long way from being 100% intuitive. So what do you
    think I should do: strive to be a 10th-rate programmer in a functional
    language I've no clue about; give up programming and tend to my garden;
    or carry on coding in a style that *I* understand 100% (and most others
    will too)?

    The stuff I do simply doesn't require a sophisticated language with
    advanced types and curried functions invented on-the-fly. Here is an
    actual example from an old app, a small function to keep it short:

    proc displaypoletotal =
    if not poleenabled then return fi
    print @poledev, chr(31), chr(14) ! clear display
    print @poledev, "Total:", rightstr(strcash(total, paymentunit), 14)
    end

    (This is part of a POS and displays running totals, on an LED display
    mounted on a pole, driven from a serial port. It ran in a duty-free area
    and worked with multiple currencies.)

    What can higher-order-functions do for me here? Absolutely sod-all.

    But equally there's no doubt that they /are/ useful for many people in
    many types of coding.  Sometimes higher order functions are used without people knowing about them - Python decorators are a fine example.

    Actually, Python declarators are such a good example that I recommend

    Decorators?

    this link <https://realpython.com/primer-on-python-decorators/> that
    gives a number of useful examples.

    Decorators are a /very/ good example of a Python feature that I could
    never get my head around. 5 minutes later, I'd have to look them up again.

    Think of this example.  You have some code with functions "foo", "bar"
    and "foobar".  Mostly you call them as they are in your code.

    auto debug(auto const& f) {
        return [&f](auto... args) {
            std::cout << "Calling ";
            ((std::cout << " " << args), ...);
            std::cout << "\n";
            auto r = f(args...);
            std::cout << "Returning " << r << "\n";
            return r;
        };
    }

    Suppose your real functions are :

        int foo(int x);
        int bar(int x, double y);
        double foobar(int x, double y, const char * p);

    Your original code was:

        int a = foo(10);
        int b = bar(20, 3.14);
        double c = foobar(30, 2.71828, "Hello");


    None of this gives you things you could not do by hand.  But if you find yourself doing the same thing by hand many times, then it is natural to
    ask if it can be automated - if you can write a function to do that. You
    can, if you have higher order functions.

    I can't follow the C++ debug function at all. But I notice the user code changes from 'foo()' to 'debug()()'; I thought this could be done while
    leaving the foo() call unchanged.

    But no, my language doesn't deal with parameter lists as a first class
    entity at all. (At best it can access them as a list object, but it
    doesn't help here.)

    The best I can do here is to have a dedicated function for each number
    of arguments, and to use dynamic code to allow the same function for any
    types:

    func debug3(f, a,b,c)=
    println "Calling",f,"with",a,b,c
    f(a,b,c)
    end

    func foobar(a,b,c)=
    println "FooBar",a,b,c
    return a+b+c
    end

    x:=debug3(foobar, 5,6,7) # in place of foobar(5, 6, 7)

    println x

    This displays:

    Calling <procid:"foobar"> with 5 6 7
    FooBar 5 6 7
    18

    However this loses to ability to use any keyword or default arguments
    for FooBar, since they are only available for direct calls (it's done at compile-time).

    So I can see that that C++ debug does some very hairy stuff, to make it
    work with static types and for any function, but I just can't understand it.

    However, given the requirement you outlined, I could probably come up
    with a custom feature to do just that. Although it might be in the form
    of a compiler option which injects the debug code at the start of the
    relevant functions. Then the user code does not need updating.

    See, when you have control of the language and implementation, there are
    more and better possibilities.

    That's like saying you have 20 years of experience as a taxi driver, and never once had to use "flaps" or "ailerons", or even think about the concept.  You therefore can't understand why pilots want to use them all
    the time.  You can give a qualified opinion on driving round roundabouts
    and may be an expert on gearing, but you have no basis for a qualified opinion on flying.

    I don't want to fly. (I was once in a small aircraft flying at 7000 ft.
    But I've also ridden a bike at 8000 ft, although over a mountain in that
    case. So who needs to fly?!)



    So again - mocking and dismissing concepts that you know nothing about,
    makes you look foolish.  (Your ignorance of the topic is not the issue -
    we are all ignorant of almost everything.)

    Have I ever called you ignorant? I don't care about these concepts; they
    are not for me. But I appreciate lots of things you don't care for.

    Look at this code; it is a silly task, but concentrate on the bit that
    does the input:

    real a,b,c

    print "Three numbers: "
    readln a, b, c

    println "Their sum is:", a+b+c

    The spec is that the three numbers are read /from the same line/, and
    can be separated with commas or spaces.

    Try to do that `readln` part in Python, and just as simply. Even in C
    it's an ordeal.

    (My code actually works on either of my languages, static or dynamic.
    That's a bonus feature. Imagine a solution in Python or C that works
    with both languages.)

    No, it is hard, obscure, subtle. Take my word for it.


    No, I will not take your word for it.  You know nothing about it.

    I implemented it, remember? Even if it was a mock-up to see if a
    proposed built-in approach would work.

    Yes, that was my deferred code feature, itself deferred. (It means I
    have to instead define an explicit, named function.)


    That's the impression I got.  I don't know how you handle captures of
    local variables (if you do so at all).

    When I had local functions for a while, they could access static
    variables, user types, named constants, macros, enums and other local
    functions within a containing function. Plus of course anything defined globally. But not parameters and stack-frame variables of the enclosing functions.

    Quite a lot could actually be done that way. So it could with my
    deferred code objects.


    Effectively any function is just a variable to which has been assigned
    some anonymous function (although in practice, the function retains
    its 'F' identify even if the user's 'F' variable has been assigned a
    different value).

    Python does not have variables.  It has /identifiers/.  Change
    "variable" for "identifier" in your description, and "assigned" to
    "bound", and you've got it right.


    Just call them variables that work in a particular way: they are
    references to objects, but can never be references to other variables.

    When you assign a value, you are copying a reference.

    And you seem convinced that the Python code I showed is "hackish" and "unprofessional".

    Defining a struct's layout as "IIHHIII" or whatever? Yeah, that's really professional!


    The code works fine - it is clear and simple, shorter than in your
    language, and easy to modify and maintain.

    Really? The struct changes: two fields are swapped. You have to count
    along counting which one those characters needed to be exchanged. And
    that multiple assignment needs to be revised too. It's a bit hit and miss.


    If you prefer to think of structures matching C struct definitions
    (which are /one/ way to describe a file format, but certainly not the
    only way), you can use the "ctypes" Python module and define a structure.

    So why didn't you do that in the first place? I assume that can define
    pointers too? (Since structs can contain pointers and you might need to
    access what they point to.)

    But I guess that this was about you proving that pointers were
    unnecessary...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Wed Nov 16 21:45:18 2022
    On 16/11/2022 12:02, Dmitry A. Kazakov wrote:
    On 2022-11-16 12:44, James Harris wrote:
    On 15/11/2022 10:06, Dmitry A. Kazakov wrote:

    There is no problem with Unicode string literals whatsoever. You just
    place characters as they are. The only escape is "" for ". That is all.

    Two problems with that, AISI:

    1. some of the characters look like others

    For the reader, not for the compiler. If you want Unicode you get the
    whole package, homoglyphs included.


    Yes, Unicode has all kinds of problems for humans. Something else is
    needed for programming but I don't think it's been created yet.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Wed Nov 16 22:01:58 2022
    On 16/11/2022 16:02, David Brown wrote:
    On 16/11/2022 00:30, Bart wrote:
    On 15/11/2022 21:40, David Brown wrote:

    If someone wants to write code that involves a lot of squaring, then
    let them define operators so they can write "x = squareof y", or "x =
    y²".  They'd be able to write more of a mess, but also be able to
    write some things very nicely.

    I have such an operator, called `sqr`. And also briefly allowed the
    superscript version (as a postfix op), until Unicode came along and
    spoilt it all.


    Why would Unicode spoil it?

    I was using 8-bit code pages for western European alphabets since
    probably from the end of the 80s. It was simple, I supported it and it
    worked well. (At that time, I was also responsible for vector fonts
    within my apps.)

    But Unicode makes everything harder, with characters taking up multiple
    bytes, and a lot of the time it just doesn't work. (I've seen Unicode
    errors on everything from TV subtitles to supermarket receipts, and that
    was a few weeks ago.)

    Then you have the choices of UCS2, UCS4, UTF8, with patchy support for
    UTF8 within Windows. Even if I get it working on my machine, how do I
    know that someone else running my program will have their machine set up properly?

    For me it's just not worth it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Wed Nov 16 23:02:01 2022
    On 14/11/2022 18:41, Dmitry A. Kazakov wrote:
    On 2022-11-14 19:26, James Harris wrote:
    On 14/11/2022 11:29, Dmitry A. Kazakov wrote:
    On 2022-11-14 12:03, James Harris wrote:

    ...

       if is_name_first(b[j])
         a[i++] = b[j++]
         rep while is_name_follow(b[j])
           a[i++] = b[j++]
         end rep
         a[i] = 0
         return TOK_NAME
       end if

    Now, what don't you like about the ++ operators in that? How would
    you prefer to write it?

     From parser production code:

    procedure Get_Identifier
               (  Code     : in out Source'Class;
                  Line     : String;
                  Pointer  : Integer;
                  Argument : out Tokens.Argument_Token
               )  is
        Index     : Integer := Pointer + 1;
        Malformed : Boolean := False;
        Underline : Boolean := False;
        Symbol    : Character;
    begin
        while Index <= Line'Last loop
           Symbol := Line (Index);
           if Is_Alphanumeric (Symbol) then
              Underline := False;
           elsif '_' = Symbol then
              Malformed := Malformed or Underline;
              Underline := True;
           else
              exit;
           end if;
           Index := Index + 1;
        end loop;
        Malformed := Malformed or Underline;
        Set_Pointer (Code, Index);
        Argument.Location := Link (Code);
        Argument.Value := new Identifier (Index - Pointer);
        declare
           This : Identifier renames Identifier (Argument.Value.all);
        begin
           This.Location  := Argument.Location;
           This.Malformed := Malformed;
           This.Value     := Line (Pointer..Index - 1);
        end;
    end Get_Identifier;

    Well, that's an astonishingly long piece of code, Dmitry,

    Because it is a production code.

    So was the code which preceded it.


    It must deal with different types of
    sources, with error handling and syntax tree generation.

    In fairness, detecting double and trailing underscores adds work to the
    Ada code so I've been thinking how I might write the Ada version. The
    following is my attempt. It is untested and in a somewhat experimental
    form but I think it's quite readable. The main part of the code would be

    errors = 0
    last_char = line(pointer)
    rep for i = pointer + 1, while i le line_last, ++i
    ch = line(i)
    if ch eq '_'
    if last_char eq '_' so ++errors ;Consecutive underscores
    on not is_alphanum(ch)
    break rep ;If neither underscore nor alphanum we are done
    end if
    last_char = ch
    end rep
    if last_char eq '_' so ++errors ;Trailing underscore
    ....
    this.Malformed = bool(errors)

    Some notes on the code. I found the Ada program's

    Malformed := Malformed or Underline;

    to be clever but it took a bit of thinking about to work out what it was intended to do in the context. So I changed it to an error count which
    is incremented with

    ++errors

    and used bool(errors) at the end.

    Rather than having a boolean called Underline I just kept a copy of the
    last character.

    Since I didn't need the Underline boolean I found I could also get rid
    of a branch of the if statement so the code is a little shorter. But
    more important, I think the code is clearer. YMMV but I bet you can
    understand it!

    In the context of this discussion, the code uses ++i to increment the
    index and ++errors to increment the error count. (There's no danger of
    either of them overflowing, given the line length.) Neither is embedded
    in an expression.

    On the clarity of the "++" operator note that any of these would do the
    same:

    i = i + 1
    i += 1
    ++i

    I assert that the last one is the most readable. It makes the
    programmer's intent clear at a glance.

    ...

    But I am not sure I do understand it. Even allowing for what I believe
    is meant to be double underscore detection (except at the start and
    end?) it takes significantly more study than the simple name-first,
    name-follow code which preceded it.

    That's how the language defines it. This example is from an Ada 95
    parser. Ada 95 RM 2.3:

       https://www.adahome.com/rm95/rm9x-02-03.html



    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Thu Nov 17 11:34:30 2022
    On 16/11/2022 23:01, Bart wrote:
    On 16/11/2022 16:02, David Brown wrote:
    On 16/11/2022 00:30, Bart wrote:
    On 15/11/2022 21:40, David Brown wrote:

    If someone wants to write code that involves a lot of squaring, then
    let them define operators so they can write "x = squareof y", or "x
    = y²".  They'd be able to write more of a mess, but also be able to
    write some things very nicely.

    I have such an operator, called `sqr`. And also briefly allowed the
    superscript version (as a postfix op), until Unicode came along and
    spoilt it all.


    Why would Unicode spoil it?

    I was using 8-bit code pages for western European alphabets since
    probably from the end of the 80s. It was simple, I supported it and it
    worked well. (At that time, I was also responsible for vector fonts
    within my apps.)


    Such code pages did work, but were very limited. In the UK, code pages typically meant nothing worse than mixups between # and £. Go beyond
    the English speaking world, and code pages were a nightmare. If one non-English Western European language was enough, they were often not
    /too/ bad - but supporting multiple languages was often hugely
    complicated and fraught with errors.

    Unicode made some things more complex, but other things far easier - it
    is not a surprise to me that it has supplanted pretty much every usage
    where plain old 7-bit ASCII is insufficient. I understand how Unicode
    can be difficult, but it is solving a difficult problem.


    But back to your superscript square operator - does that mean you used
    an extended ASCII code in a specific code page for superscript 2 (I
    think it is 0xfb in Latin-9), but when Unicode came out you stopped
    using anything beyond 7-bit ASCII?

    But Unicode makes everything harder, with characters taking up multiple bytes, and a lot of the time it just doesn't work. (I've seen Unicode
    errors on everything from TV subtitles to supermarket receipts, and that
    was a few weeks ago.)

    That's not a Unicode problem - that's a software bug.


    Then you have the choices of UCS2, UCS4, UTF8, with patchy support for
    UTF8 within Windows. Even if I get it working on my machine, how do I
    know that someone else running my program will have their machine set up properly?

    For me it's just not worth it.



    In the early days of Unicode, there were different encodings. For the
    last couple of decades it's been clear that there is /one/ sensible
    encoding - UTF-8. Everything else exists only as long as it is needed
    for backwards compatibility and until software, OS's and API's are
    changed - you only need them on the boundaries of code (file I/O,
    calling external API's). And yes, I know that is an extra hassle.

    I do understand that it is /much/ easier to stick to 7-bit ASCII if that
    is all you need. But if 7-bit ASCII is not sufficient, the UTF-8 is
    much easier than anything else.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Bart on Thu Nov 17 11:40:29 2022
    On 17/11/2022 11:24, Bart wrote:

    Hmm, I just compiled it with both bcc and tcc, and they both correctly
    show €°£ when using code page 65001. So that's something, but what's up with gcc?

    BTW ° (degree symbol in case it doesn't display properly) was something
    I did use more extensively both in my scripting language and my apps' CLI.

    So sin(30°) would evaluate to 0.5, instead of having to do sin(pi/6
    (typical of ordinary languages) or sin(30 deg) which was the fall-back
    version.

    Both ° and 'deg' applied a scaling factor to the number.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Thu Nov 17 13:12:06 2022
    On 2022-11-17 12:24, Bart wrote:

    If I wanted to display UTF8 right now on Windows, say from a C program
    even, I would have to fight it. If I write this (created with Notepad):

      #include <stdio.h>
      int main(void) {
          printf("€°£");
      }

    If you want to display UTF-8, you must obviously use UTF-8, no?

    #include <stdio.h>
    int main(void) {
    printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
    }

    In CMD:

    CHCP 65001
    Active code page: 65001

    main.exe
    €°£

    Of course, you could use the code you wrote under the condition that
    both the editor and the compiler use UTF-8.

    Which is why every programming guideline must require ASCII-7 source
    like I provided.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Thu Nov 17 11:24:44 2022
    On 17/11/2022 10:34, David Brown wrote:
    On 16/11/2022 23:01, Bart wrote:
    On 16/11/2022 16:02, David Brown wrote:
    On 16/11/2022 00:30, Bart wrote:
    On 15/11/2022 21:40, David Brown wrote:

    If someone wants to write code that involves a lot of squaring,
    then let them define operators so they can write "x = squareof y",
    or "x = y²".  They'd be able to write more of a mess, but also be
    able to write some things very nicely.

    I have such an operator, called `sqr`. And also briefly allowed the
    superscript version (as a postfix op), until Unicode came along and
    spoilt it all.


    Why would Unicode spoil it?

    I was using 8-bit code pages for western European alphabets since
    probably from the end of the 80s. It was simple, I supported it and it
    worked well. (At that time, I was also responsible for vector fonts
    within my apps.)


    Such code pages did work, but were very limited.  In the UK, code pages typically meant nothing worse than mixups between # and £.  Go beyond
    the English speaking world, and code pages were a nightmare.  If one non-English Western European language was enough, they were often not
    /too/ bad - but supporting multiple languages was often hugely
    complicated and fraught with errors.

    Unicode made some things more complex, but other things far easier - it
    is not a surprise to me that it has supplanted pretty much every usage
    where plain old 7-bit ASCII is insufficient.  I understand how Unicode
    can be difficult, but it is solving a difficult problem.


    But back to your superscript square operator - does that mean you used
    an extended ASCII code in a specific code page for superscript 2 (I
    think it is 0xfb in Latin-9), but when Unicode came out you stopped
    using anything beyond 7-bit ASCII?

    I used one that I called 'ANSI' but is Windows-1252 (https://en.wikipedia.org/wiki/Windows-1252). There, superscript 2 was
    code 0xB2.

    Before that I used an older one associated with MSDOS, I think code page
    850, where superscript 2 was code 0xFD.

    (IMO the best set of extended character codes was used by the Amstrad
    PCW from the 1980s; a Z80-based word processing machine. Very elegantly
    set out.)

    I provided support for French and German (mainly Swiss variations) and
    Dutch. This included providing special keyboard layouts used on
    digitising tablets, and supplying the vector fonts necessary for
    pen-plotters. (These were largely nicked from AutoCAD but I added
    support for accents, 'hats', cedillas etc, plus some special symbols.)


    But Unicode makes everything harder, with characters taking up
    multiple bytes, and a lot of the time it just doesn't work. (I've seen
    Unicode errors on everything from TV subtitles to supermarket
    receipts, and that was a few weeks ago.)

    That's not a Unicode problem - that's a software bug.

    It means even the big boys have issues with it.


    Then you have the choices of UCS2, UCS4, UTF8, with patchy support for
    UTF8 within Windows. Even if I get it working on my machine, how do I
    know that someone else running my program will have their machine set
    up properly?

    For me it's just not worth it.



    In the early days of Unicode, there were different encodings.  For the
    last couple of decades it's been clear that there is /one/ sensible
    encoding - UTF-8.

    If I wanted to display UTF8 right now on Windows, say from a C program
    even, I would have to fight it. If I write this (created with Notepad):

    #include <stdio.h>
    int main(void) {
    printf("€°£");
    }

    and compile with gcc, it shows:

    €°£

    I'm not sure what code page it's on, but if I switch to 65001 which is
    supposed to be UTF8, then it shows:

    �������

    (or equivalent in the terminal font). If I dump the C source, it does
    indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC code
    for the Euro sign.

    I'm sure that on Linux it works perfectly within a terminal window. But
    I'm on Windows and I can't be bothered to do battle. Even if /I/ get it
    to work, I can't guarantee it for anyone else.

    Hmm, I just compiled it with both bcc and tcc, and they both correctly
    show €°£ when using code page 65001. So that's something, but what's up with gcc?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Thu Nov 17 12:35:17 2022
    On 17/11/2022 12:12, Dmitry A. Kazakov wrote:
    On 2022-11-17 12:24, Bart wrote:

    If I wanted to display UTF8 right now on Windows, say from a C program
    even, I would have to fight it. If I write this (created with Notepad):

       #include <stdio.h>
       int main(void) {
           printf("€°£");
       }

    If you want to display UTF-8, you must obviously use UTF-8, no?

        #include <stdio.h>
        int main(void) {
            printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
        }


    This wasn't the problem. I verified that the text file contained the
    correct UTF8 sequences, and the two other compilers worked. This was a
    problem with gcc, which also fails your version.



    In CMD:

    CHCP 65001
    Active code page: 65001

    main.exe
    €°£

    Of course, you could use the code you wrote under the condition that
    both the editor and the compiler use UTF-8.

    The point about UTF8 is that it doesn't matter. So the string contains 'character' E2; in C, this is just a byte array, it should just pass it
    as it is to the printf function.


    Which is why every programming guideline must require ASCII-7 source
    like I provided.

    That would work, but is also completely impractical for large amounts of non-ASCII content. Or even small amounts. You /need/ editor support. I
    don't have it and don't do enough with Unicode to make it worth the trouble.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Thu Nov 17 14:20:21 2022
    On 2022-11-17 13:35, Bart wrote:
    On 17/11/2022 12:12, Dmitry A. Kazakov wrote:
    On 2022-11-17 12:24, Bart wrote:

    If I wanted to display UTF8 right now on Windows, say from a C
    program even, I would have to fight it. If I write this (created with
    Notepad):

       #include <stdio.h>
       int main(void) {
           printf("€°£");
       }

    If you want to display UTF-8, you must obviously use UTF-8, no?

         #include <stdio.h>
         int main(void) {
             printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
         }


    This wasn't the problem. I verified that the text file contained the
    correct UTF8 sequences, and the two other compilers worked. This was a problem with gcc, which also fails your version.

    The above was compiled with gcc version 10.3.1 20210520.

    In CMD:

    ;CHCP 65001
    Active code page: 65001

    ;main.exe
    €°£

    Of course, you could use the code you wrote under the condition that
    both the editor and the compiler use UTF-8.

    The point about UTF8 is that it doesn't matter. So the string contains 'character' E2; in C, this is just a byte array, it should just pass it
    as it is to the printf function.

    It does, but the terminal driver interprets octets you output there. You
    can verify the actual output by redirecting the standard output.

    That would work, but is also completely impractical for large amounts of non-ASCII content. Or even small amounts. You /need/ editor support. I
    don't have it and don't do enough with Unicode to make it worth the
    trouble.

    That's is another guideline topic: you never ever place localization
    stuff in the source code.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Thu Nov 17 15:06:46 2022
    On 17/11/2022 13:20, Dmitry A. Kazakov wrote:
    On 2022-11-17 13:35, Bart wrote:
    On 17/11/2022 12:12, Dmitry A. Kazakov wrote:
    On 2022-11-17 12:24, Bart wrote:

    If I wanted to display UTF8 right now on Windows, say from a C
    program even, I would have to fight it. If I write this (created
    with Notepad):

       #include <stdio.h>
       int main(void) {
           printf("€°£");
       }

    If you want to display UTF-8, you must obviously use UTF-8, no?

         #include <stdio.h>
         int main(void) {
             printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
         }


    This wasn't the problem. I verified that the text file contained the
    correct UTF8 sequences, and the two other compilers worked. This was a
    problem with gcc, which also fails your version.

    The above was compiled with gcc version 10.3.1 20210520.

    And run on Windows?

    Further tests show that it works in every case, including using gcc with
    puts, and gcc+printf under WSL. It only fails with gcc + printf + Windows.

    Odd. But then my point is you can't rely on it. You still need the UTF8
    code page set.

    That's not all, because console display is different from graphical display.

    #include <stdio.h>
    #include <windows.h>
    int main(void) {
    MessageBox(0,"\xE2\x82\xAC\xC2\xB0\xC2\xA3",
    "\xE2\x82\xAC\xC2\xB0\xC2\xA3",0);
    }

    This displays only gobbledygook. Of course, this is set to use
    MessageBoxA, which expects an ASCII string; but why won't it take UTF8
    and show something sensible?

    Presumably it needs the correct code page set for the WinAPI, not the
    one I set for the console. But I can't find a way to do it; MS docs
    suggest setting this in a resource file or XML manifest file; WTF?

    The alternative is to use MessageBoxW, but that means switching
    everything to UCS2; it is a massive upheaval.

    Any wonder that I'm just not interested? Here I have to say that Linux
    seems to get it right.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Thu Nov 17 16:20:10 2022
    On 2022-11-17 16:06, Bart wrote:
    On 17/11/2022 13:20, Dmitry A. Kazakov wrote:
    On 2022-11-17 13:35, Bart wrote:
    On 17/11/2022 12:12, Dmitry A. Kazakov wrote:
    On 2022-11-17 12:24, Bart wrote:

    If I wanted to display UTF8 right now on Windows, say from a C
    program even, I would have to fight it. If I write this (created
    with Notepad):

       #include <stdio.h>
       int main(void) {
           printf("€°£");
       }

    If you want to display UTF-8, you must obviously use UTF-8, no?

         #include <stdio.h>
         int main(void) {
             printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
         }


    This wasn't the problem. I verified that the text file contained the
    correct UTF8 sequences, and the two other compilers worked. This was
    a problem with gcc, which also fails your version.

    The above was compiled with gcc version 10.3.1 20210520.

    And run on Windows?

    Of course.

    Further tests show that it works in every case, including using gcc with puts, and gcc+printf under WSL. It only fails with gcc + printf + Windows.

    Odd. But then my point is you can't rely on it. You still need the UTF8
    code page set.

    That's not all, because console display is different from graphical
    display.

       #include <stdio.h>
       #include <windows.h>
       int main(void) {
           MessageBox(0,"\xE2\x82\xAC\xC2\xB0\xC2\xA3",
                        "\xE2\x82\xAC\xC2\xB0\xC2\xA3",0);
       }

    This displays only gobbledygook. Of course, this is set to use
    MessageBoxA, which expects an ASCII string; but why won't it take UTF8
    and show something sensible?

    Because Windows GDI is ASCII (MessageBoxA) or else UTF-16 (MessageBoxW).

    Presumably it needs the correct code page set for the WinAPI, not the
    one I set for the console. But I can't find a way to do it; MS docs
    suggest setting this in a resource file or XML manifest file; WTF?

    There is no any pages in Windows GDI.

    Any wonder that I'm just not interested? Here I have to say that Linux
    seems to get it right.

    Linux used code pages as well. It adopted Unicode very late and took
    UTF-8 straight away. On its part Windows started early and took UCS-2 by splitting all calls into xxxA and xxxW. Later on, when Unicode grew
    larger Microsoft silently replaced UCS-2 with UTF-16. All xxxW calls are
    UTF-16 now.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Fri Nov 18 08:12:43 2022
    On 16/11/2022 21:04, Bart wrote:
    On 16/11/2022 16:50, David Brown wrote:
    On 15/11/2022 20:22, Bart wrote:

    Of course you won't use something when you won't even consider trying
    to learn about it.

    I've thought about learning Chinese. Then I decided there was no point.


    That's fine. But you wouldn't enter a discussion with a linguist who
    has some experience with Chinese, and try to tell them that Chinese
    grammar is beyond human comprehension. You could say that /you/ think
    it looks like Chinese writing would be hard to learn - but you could
    /not/ say anything about how hard it is for Chinese speakers to learn.
    You could not even say that it really would be difficult for you to
    learn, because you haven't tried or investigated enough.

    I still can't comprehend why YOU think this stuff is simple and
    obvious, yet you are stumped by an increment of a pointer followed by
    a dereference.


    I haven't written anything to suggest that I am "stumped" by this.  My
    point was to say it is unnecessary to support such expressions in a
    programming language, and a language may be better in some ways if it
    does not allow increment operators or even pointers.

    It is something I value, but you don't.

    Again, I have written nothing to indicate that. You read so much
    between the lines that you miss the words I actually write. When I
    write "You could consider doing A in your /new/ language - it would give
    these advantages...", that is /exactly/ what I mean. It does not mean
    "I don't like B", or "I am stumped by C", or "You should not do D".

    And higher order functions are something you value, but I don't.


    That bit is true.

    That's all it is.



    I can fully respect your personal preferences - that's not the issue for
    me. I find it sad and disappointing that someone can have such strong
    opinions about something they have never really considered or tried to
    learn about, and consequently don't understand, but I guess that is
    human nature. It's ancient survival tactics, honed by evolution - when something is new, different and unknown, human instinct is to fear it
    and run away, rather than investigate and study it.

    The bit that really bugs me is how you (and James) can hold such strong opinions about how /other people/ might like and use these features and languages that support them. Is it so hard to accept that some people
    like using higher order functions? Or that some people write code in functional programming languages, because they find it a better choice
    for their needs? Is it so hard to accept that other people can write
    code for the same task in widely different languages, and /your/ code in
    /your/ language is not the "perfect" solution or the only "non-clunky" code?



    Now, it is undeniable /fact/ that programming languages do not need
    operators such as increment, or other operators that cause side-effects.

    Just `a := b` causes a side effect. Possibly quite a big one if 'b' is a substantial data structure and ':=' does a deep copy.

    It is not an expression in many languages, but a statement. Indeed, the
    symbol ":=" is not an "operator" in many languages - it's just part of
    the syntax of an assignment statement.


    There is usually a task to be done. In `A[++i] := 0`, I want two things
    to change, which is going to happen whether write it like that, or as `i
    := i+1; A[i] := 0`. So why write `i` 3 times?



    If I want two things to change, why try to squeeze it into /one/
    expression or statement? Why not write two statements, each one doing a
    single clear and simple task?

    (As to writing "i" three times - again, these things are often found in
    loops, where a good syntax can mean "i" is never written at all.)


    It is not a big deal. Maybe in functional programming it might be, but
    here *I* am specifying the paradigm and I say it's OK.

    I'm not asking you or anyone else to use my language.


    I am not asking you to use functional programming either - I am merely
    asking you to appreciate that /others/ do so, and they will find your
    language as clunky, repetitive, ugly and inexpressive in comparison.


    Yes, but for you, a "must-have" list for a programming language would
    be mainly "must be roughly like ancient style C in functionality, but
    with enough change in syntax and appearance so that no one will think
    it is C".  If that's what you like, and what pays for your daily
    bread, then that's absolutely fine.

    Yes, I don't need a higher level language for what I use it for. But
    there are still dozens of things which make the experience superior to
    just using C. Ones either you genuinely don't appreciate, or are just
    pissing on for the sake of it.


    Again, I have written nothing to indicate that. You read so much
    between the lines that you miss the words I actually write. When I
    write "You could consider doing A in your /new/ language - it would give
    these advantages...", that is /exactly/ what I mean. It does not mean
    "I don't like B", or "I am stumped by C", or "You should not do D".


    * Case-insensitive

    Subjective, and I disagree.

    * 1-based and N-based

    (I assume you mean array indexing, or possibly loops?) Subjective, and
    I disagree if the starting 1 is implicit. If it is explicit - "array 1
    to 10 of int" - then I definitely like it.

    * Algol-style syntax, line-oriented and largely semicolon-free, and sane
    type syntax

    Subjective on all points. But these are mostly syntactic details - C
    and Pascal are both considered "ALGOL family" languages despite
    syntactic differences.

    * Module scheme (define everything in exactly one place)

    Objective - a good module system is always a good idea. /Defining/
    things in one place is good, but being able to /declare/ them elsewhere
    is useful as it lets you separate interfaces from implementations.

    * Namespaces

    Objective - good.

    * Encapsulation (define functions inside records etc)

    Objective - good. However, it is not so important that functions are
    defined /inside/ a record/struct/etc. The important part is that you
    can make a user-defined type containing data of other types, and that
    you can restrict access to the internals to be via specific functions or operations.

    * Out-of-order definitions including easy mutual record cross-references

    Subjective - convenient in some ways, less convenient in others. I
    personally don't mind for functions and variables. But it is definitely
    useful for defining recursive structure types.

    * Regular i8-i64 and u8-u64 type denotations, include 'byte' (u8)

    Objective - very good for low-level languages, unnecessary for higher
    level languages (except for FFI and other interfacing). It's a lot
    nicer if you can just use "integer" and let the compiler worry about
    sizes, but that's harder to implement efficiently.

    * Default 64-bit 'int', 'word' types, and 64-bit integer constants

    Subjective - it makes little difference in practice if default integer
    sizes are 32-bit or 64-bit, assuming big target systems. Each is big
    enough for "almost everything", and neither is big enough for
    "absolutely everything".

    * Built-in print and read statements WITH NO STUPID FORMAT CODES

    Subjective - though the modern trend is strongly towards avoiding
    special statement types and preferring standard library functions for
    this sort of thing. What constitutes "stupid" format codes is obviously
    highly subjective - but it is objectively better if they can be deduced automatically by the compiler.

    * Keyword and default function parameters

    Subjective. I like keyword or named parameters. Others feel they
    encourage having too complicated interfaces with too many parameters.

    * Fewer, more intuitive operator precedences

    Subjective. Some people think it is best to give a total order for
    operator precedence, which means a lot of levels. And "intuitive" is completely subjective after basic mathematical arithmetic operators. I
    am inclined to agree with you in general, however.

    * Does not conflate arrays and pointers

    Objective, and good.

    * 'Proper' for loops; for-in loops

    "Proper" is subjective. Good loop constructs are vital for an
    imperative language, however.

    * Separate 'switch' and 'case' selection; the latter has no restrictions
    (and no stupid fallthrough on switch)

    Subjective, I think. There are many ways to handle multiple choices or
    pattern matching, and I don't think there is any justification for
    claiming one particular way is "right" or "the best". One can do a lot
    better than C's "switch", and I agree about "fallthrough".

    * Proper named constants

    Subjective - again, "proper" is totally your own opinion. Good support
    for read-only (but set at run-time) and compile-time constant objects of
    all types is objectively good.

    * Break out of nested loops

    Objective - you can always do that in some way. /How/ you should be
    able to do it, is very subjective.

    * Embed strings and binary files

    Objective and subjective. Strings are too useful to require them to be
    in separate files (though the possibility of doing so is useful for international translations). Embedding binary files as part of the
    language is more subjective - some people think it is a good idea, some
    people do not. (I think it is nice.) Being able to include them via
    linking is definitely a useful feature.

    * 'Tabledata' and 'enumdata' features (compare with X-macros)

    I don't really know what you mean here. Most languages support filling
    a table or array with constant data. Some let you do so using
    compile-time functions, which is /very/ nice IMHO (though it can be
    costly in compile time). "X-macros" is a general technique for textual substitution macros, which can be used for a huge variety of things.
    Like many powerful techniques, it can be used to make code simpler,
    clearer and more maintainable - or abused to make it messier.

    * Function reflection

    Objective - reflection, at least of things known constant at compile
    time (for compiled languages), is a useful feature.

    * Built-in, overloaded ops like abs, min, max

    Subjective. There are no real advantages in being "built in" compared
    to library functions. The trend is that functions (and features) that
    can be in a library, /are/ in a library. But languages with simpler and
    more limited tools, you can probably get more efficient results from
    built-in functions rather than library functions.

    * 'Properties' such as .len and .lwb

    Subjective. Some like properties, some like functions. I personally
    think it's a good kind of syntax for type-related information (such as
    the size of a type), rather than for run-time information.

    * Built-in 'swap'

    It's difficult to call that one - I think it's hard to object to having
    such a feature, but it might also be important to be able to override it
    for your own types.

    * Bit/field extracion/insertion syntax

    Again, difficult to call. It's useful to be able to do bitfield
    manipulation, but it can be done by struct definition or by an operator
    or function taking the start and length as parameters. I'd say defined
    and named bitfields are most important, and ad-hoc accesses can always
    be done by masking and shifting when needed.

    * Multiple function return values

    Objective - that's a useful feature.

    * Multiple assignment

    Objective - also good, IMHO.

    * Slices (including slices of char arrays to give counted strings)

    Objective - some kind of slice syntax is nice on arrays.

    * Doc strings

    Objective - documentation is vital!

    * Whole-program compiler that does not need a separate build system

    Subjective, and I disagree. It does not matter how many parts tools are
    in. Single "do everything" tools tend to be convenient but inflexible.
    All other things being equal, small and simpler tools are better - but
    all other things are seldom equal, and it's a low priority for me.

    * Pass-by-reference

    Subjective. Some people like it, others think it makes it harder to see
    the effects calls have or what they might change. Pass by const
    reference is less controversial.

    * Value arrays


    You mean arrays as value types, that can be assigned, passed as
    parameters or returned from functions? Objective - it's good.


    Yeah, just like C! If you think this lot is just C with a paint-job,
    then you're in denial.

    Yes, it is a lot like C. It has a number of changes, some that I think
    are good, some that I think are bad, but basically it is mostly like C.
    You can take a old-style C program, with some restrictions, and
    translate it fairly directly into your language. You can take a program
    in your language and translate it fairly directly into C, albeit some
    parts will be ugly or non-idiomatic. I suspect that it would be more challenging to translate modern C cleanly into your language, than
    vice-versa.

    I am not saying they are the same, any more than C and Pascal are the
    same - but they are very similar.

    In particular, it's quite clear to me that when you developed your
    language, you had the assembly level implementation heavily in mind when
    doing so. Why are your numbers 64-bit integers by default? It is not
    because it is a particularly useful size for integers, but because it
    fits the cpu's you are targeting. Why do you want integer types that
    are powers of 8? They fit the cpu. Why don't you allow user-defined
    integer types that fit the sizes relevant to the application, such as a
    type for numbers 1 to 100? Because you think in terms of the cpu and implementation, not in terms of the application space and the
    programmer's task. Why do you not have inline functions, overloaded
    functions, etc.? Because you think a function name should correspond to
    a label of the same name in the assembly code. Why do you have two's complement overflow for integers? Because that's what the cpu does, not because it is useful to programmers.

    It's a low-level language. Even if it is not explicitly defined in
    terms of a particular cpu, that's your philosophy all the way. (And let
    me stress - there is nothing wrong with that.) The result is almost
    inevitably similar to C, which has a similar background philosophy
    (albeit with a wider range of cpus in mind).


    Of course, I fully expect you to be completely dismissive of all of
    this. I wouldn't swap any of these for higher-order functions.


    I can't imagine why you would think adding higher-order functions would
    mean dropping any of it.


    And there's no doubt that a large proportion of programmers go through
    their career without ever considering higher order functions
    (functions that operate on or return functions).

    Too right. To be able to use such things, they MUST be 100% intuitive
    and be usable with 100% confidence. But that's just the author; you need
    to consider other readers of your code too, and those who have to
    maintain it.

    "Intuitive" means you've used it often enough to use the feature without thinking about it. Nothing more. Stop imagining that everything you
    learned along your programming career is somehow easier than other
    methods seen by other people. It's so long since you learned to program
    that you've forgotten how it goes. When you have a long history of
    programming in ALGOL, assembly, and perhaps a spot of FORTRAN or BASIC,
    the step to C or your own language is minor. That makes it /seem/
    intuitive, but it is not - it's just what you are used to. For people
    with different backgrounds, there's no reason to suppose that other
    types of language are not easier to learn (more "intuitive" for them).

    COBOL was made to be easy for people with a background in business
    logic. dBase is for database developers who want more than they can get
    from SQL. A Forth programmer would think your language was gibberish,
    as would someone who thinks about relations between data and likes
    Prolog. Someone who has been good at mathematics at school will like
    Haskell's way of defining things, but think C is insane (how can "x" be
    equal to "x + 1" ?).

    I learned functional programming at university. When I started, I had programmed mainly in a number of types of BASIC (that's what home
    computers had), plus assembly and machine code on four different
    processors, and played a little with Forth, Pascal, C, Logo, and even a
    touch of Prolog and APL. During the first 8 week term, functional
    programming was one of the six (IIRC) courses we had. Intuition is
    quickly learned. (To be fair, we had a teacher and a planned path to
    learning - we did not just leap into random high-order functions.)


    To me they are a very long way from being 100% intuitive. So what do you think I should do: strive to be a 10th-rate programmer in a functional language I've no clue about; give up programming and tend to my garden;
    or carry on coding in a style that *I* understand 100% (and most others
    will too)?

    No. I think you should be happy to accept that you don't know anything
    about functional programming, and haven't the inclination or motivation
    to learn, and leave it at that. What you should /stop/ doing is
    claiming that it is not "intuitive", not useful, clunky, impractical,
    hard to understand, or any of the other unjustified and unjustifiable complaints you have about it just because /you/ didn't grok it at a
    quick glance.

    I think anyone trying to make an interesting and useful new programming language should learn some functional programming (as I think they
    should learn many other kinds of languages) to get a broader view of programming. You might not need that if you just want to make a variant
    of the languages you already know, just with syntax that suits your own preferences better.

    I also think that anyone interested in becoming a better /programmer/ or software developer, rather than just a better /coder/, should learn some function programming. You'll be a better imperative programmer for it.

    But for you, personally, I think your prejudice and biases (or
    "intuition") are too fixed. You'll never look at something new with an
    open mind, so there is little point.


    The stuff I do simply doesn't require a sophisticated language with
    advanced types and curried functions invented on-the-fly. Here is an
    actual example from an old app, a small function to keep it short:

        proc displaypoletotal =
            if not poleenabled then return fi
            print @poledev, chr(31), chr(14)    ! clear display
            print @poledev, "Total:", rightstr(strcash(total, paymentunit),
    14)
        end

    (This is part of a POS and displays running totals, on an LED display
    mounted on a pole, driven from a serial port. It ran in a duty-free area
    and worked with multiple currencies.)

    What can higher-order-functions do for me here? Absolutely sod-all.


    So what? What do bitfield extraction operators give you here? Or
    multiple return values? Sod-all.

    Higher-order functions are a feature of functional programming
    techniques. No one expects you to write code in lambda calculus, any
    more than they expect imperative programmers to write Turing machines.

    But equally there's no doubt that they /are/ useful for many people in
    many types of coding.  Sometimes higher order functions are used
    without people knowing about them - Python decorators are a fine example.

    Actually, Python declarators are such a good example that I recommend

    Decorators?

    Yes. (I don't really know how much Python you have done.)


    this link <https://realpython.com/primer-on-python-decorators/> that
    gives a number of useful examples.

    Decorators are a /very/ good example of a Python feature that I could
    never get my head around. 5 minutes later, I'd have to look them up again.


    You have to try /using/ them to have a hope of learning about them.

    Think of this example.  You have some code with functions "foo", "bar"
    and "foobar".  Mostly you call them as they are in your code.

    auto debug(auto const& f) {
         return [&f](auto... args) {
             std::cout << "Calling ";
             ((std::cout << " " << args), ...);
             std::cout << "\n";
             auto r = f(args...);
             std::cout << "Returning " << r << "\n";
             return r;
         };
    }

    Suppose your real functions are :

         int foo(int x);
         int bar(int x, double y);
         double foobar(int x, double y, const char * p);

    Your original code was:

         int a = foo(10);
         int b = bar(20, 3.14);
         double c = foobar(30, 2.71828, "Hello");


    None of this gives you things you could not do by hand.  But if you
    find yourself doing the same thing by hand many times, then it is
    natural to ask if it can be automated - if you can write a function to
    do that. You can, if you have higher order functions.

    I can't follow the C++ debug function at all.

    "auto" just means "infer the type automatically". Since the compiler
    knows the type of everything here, there is no need to specify it
    explicitly - and indeed the type of lambdas can't be expressed
    explicitly (each anonymous function is its own type). And the "const&"
    bit just means it takes its parameter by constant reference - it's not
    actually needed here, just habit.

    The syntax "[&f](auto... args) { } " is declaring an anonymous function.
    The local variable "f" (from the parameters) can be seen inside the
    function, and the function will take a variable number of arguments
    whose number and types are determined automatically when called. The
    "return" means that this anonymous function is the return value of the
    "debug" function.

    The line "((std::cout << " " << args), ...);" basically means "do
    (std::cout << " " << arg)" for each "arg" in the list of arguments. The
    syntax can be fiddly until you are used to it, but it is handy for
    making functions that can work with many arguments (such as a "sum"
    function that can be used as "sum(1, 2, 3, 4,)" with as many arguments
    as you want).

    "auto r = f(args...);" should be clear now :-) It declares a local
    variable whose type is determined automatically, initialised by calling
    the function "f" with all the arguments. This value "r" is returned
    after the debug printout.


    But I notice the user code
    changes from 'foo()' to 'debug()()'; I thought this could be done while leaving the foo() call unchanged.

    The call to "debug" with argument "foo" returns a new function, which is
    then callable.

    You could easily add local variables :

    auto debug_foo = debug(foo);
    auto foo = debug_foo;

    (It would have been nicer if "auto foo = debug(foo);" were allowed, but
    the grammar of C++ does not allow that. I have never claimed it was a
    perfect language!)

    In effect, the identifier "foo" is already a "const" object from before,
    and thus cannot be changed inside the same scope - it can only be
    overridden in a narrower scope.

    Or you could write "#define foo debug(foo)", but I suppose you'd call
    macros cheating!


    But no, my language doesn't deal with parameter lists as a first class
    entity at all. (At best it can access them as a list object, but it
    doesn't help here.)

    The best I can do here is to have a dedicated function for each number
    of arguments, and to use dynamic code to allow the same function for any types:

        func debug3(f, a,b,c)=
            println "Calling",f,"with",a,b,c
            f(a,b,c)
        end

        func foobar(a,b,c)=
            println "FooBar",a,b,c
            return a+b+c
        end

        x:=debug3(foobar, 5,6,7)     # in place of foobar(5, 6, 7)

        println x

    This displays:

      Calling <procid:"foobar"> with 5 6 7
      FooBar 5 6 7
      18

    However this loses to ability to use any keyword or default arguments
    for FooBar, since they are only available for direct calls (it's done at compile-time).

    (You are also missing the return types. I don't know if that was an
    oversight, or a complication you can't easily handle.)

    Dynamic typing reduces the amount of manual work, but it is all run-time effort. The C++ method is all compile-time work, and gives optimal
    code. (Well, the implementation of std::cout is always an inefficient
    mess...) You think of functions as single pieces of executable code, corresponding (in a compiled language) to an assembly label and a set of assembly instructions. More advanced languages think of them as a
    description of actions with a far loser connection between the source
    code and the generated assembly code. (That does make it much harder to understand the generated assembly, or for assembly-level debugging.) So
    the calls to "debug" do not result in assembly CALL instructions - they
    result in the std::cout statements being included alongside the calls to
    the original "foo" function.


    So I can see that that C++ debug does some very hairy stuff, to make it
    work with static types and for any function, but I just can't understand
    it.


    It would be unreasonable to expect anyone to understand this stuff from
    a simple Usenet post!

    However, given the requirement you outlined, I could probably come up
    with a custom feature to do just that. Although it might be in the form
    of a compiler option which injects the debug code at the start of the relevant functions. Then the user code does not need updating.

    See, when you have control of the language and implementation, there are
    more and better possibilities.


    When you have a flexible enough language, you don't /need/ to mess with
    the implementation. You can do what you want anyway. And note that
    almost nobody has their own language and implementation to change at
    will - it is not a scalable solution.

    There is a proposal to add metaclasses to C++. (If you google for this,
    be aware that the syntax is quite hairy, and likely to be used only by
    library programmers.) One of the things this gives you is enough power
    to define the concept of "struct", "class", "union", "enum", bitfields,
    etc., as metaclasses. That means you could take C++ with metaclasses,
    remove the keywords "struct", "class", etc., and then define them again
    within the language itself.


    That's like saying you have 20 years of experience as a taxi driver,
    and never once had to use "flaps" or "ailerons", or even think about
    the concept.  You therefore can't understand why pilots want to use
    them all the time.  You can give a qualified opinion on driving round
    roundabouts and may be an expert on gearing, but you have no basis for
    a qualified opinion on flying.

    I don't want to fly. (I was once in a small aircraft flying at 7000 ft.
    But I've also ridden a bike at 8000 ft, although over a mountain in that case. So who needs to fly?!)


    That's fine - but some other people (not me either) /do/ want to fly.



    So again - mocking and dismissing concepts that you know nothing
    about, makes you look foolish.  (Your ignorance of the topic is not
    the issue - we are all ignorant of almost everything.)

    Have I ever called you ignorant? I don't care about these concepts; they
    are not for me. But I appreciate lots of things you don't care for.

    Look at this code; it is a silly task, but concentrate on the bit that
    does the input:

        real a,b,c

        print "Three numbers: "
        readln a, b, c

        println "Their sum is:", a+b+c

    The spec is that the three numbers are read /from the same line/, and
    can be separated with commas or spaces.

    Try to do that `readln` part in Python, and just as simply. Even in C
    it's an ordeal.

    a, b, c = eval(input("Three numbers: "))
    print(a + b + c)

    Or just :

    print(sum(eval(input("Input some numbers: "))))

    (I will not try it in C - I fully agree it's a poor choice of language
    for that kind of thing.)


    (My code actually works on either of my languages, static or dynamic.
    That's a bonus feature. Imagine a solution in Python or C that works
    with both languages.)

    No, it is hard, obscure, subtle. Take my word for it.


    No, I will not take your word for it.  You know nothing about it.

    I implemented it, remember? Even if it was a mock-up to see if a
    proposed built-in approach would work.

    Yes, that was my deferred code feature, itself deferred. (It means I
    have to instead define an explicit, named function.)


    That's the impression I got.  I don't know how you handle captures of
    local variables (if you do so at all).

    When I had local functions for a while, they could access static
    variables, user types, named constants, macros, enums and other local functions within a containing function. Plus of course anything defined globally. But not parameters and stack-frame variables of the enclosing functions.

    Quite a lot could actually be done that way. So it could with my
    deferred code objects.


    Effectively any function is just a variable to which has been
    assigned some anonymous function (although in practice, the function
    retains its 'F' identify even if the user's 'F' variable has been
    assigned a different value).

    Python does not have variables.  It has /identifiers/.  Change
    "variable" for "identifier" in your description, and "assigned" to
    "bound", and you've got it right.


    Just call them variables that work in a particular way: they are
    references to objects, but can never be references to other variables.


    I was trying to be precise, so that you get the right idea.

    When you assign a value, you are copying a reference.

    When you bind an object to an identifier or other object, you take a new reference (increasing its reference counter), not just a copy.


    And you seem convinced that the Python code I showed is "hackish" and
    "unprofessional".

    Defining a struct's layout as "IIHHIII" or whatever? Yeah, that's really professional!


    It's a nice, simple format string - "I" is a 32-bit unsigned integer,
    "H" is a 16-bit unsigned "half integer", lower case versions are signed.
    There's a list. It is /really/ simple and efficient, and very
    flexible. You define the layout in a single string.


    The code works fine - it is clear and simple, shorter than in your
    language, and easy to modify and maintain.

    Really? The struct changes: two fields are swapped. You have to count
    along counting which one those characters needed to be exchanged. And
    that multiple assignment needs to be revised too. It's a bit hit and miss.


    In different layouts, some things will always be a little easier, other
    things a little harder. Yes, it's clear and simple and easily used.


    If you prefer to think of structures matching C struct definitions
    (which are /one/ way to describe a file format, but certainly not the
    only way), you can use the "ctypes" Python module and define a structure.

    So why didn't you do that in the first place? I assume that can define pointers too? (Since structs can contain pointers and you might need to access what they point to.)

    No, you don't have pointers like in C - they have to be related to
    Python types since you are interfacing between C types and Python types.
    But the ctypes module handles the details.

    I wrote it with "struct" because for a simple case like this, it is
    shorter and clearer. But if you prefer, use "ctypes" - it's nicer for
    some things. There is no need to limit yourself to one single solution.


    But I guess that this was about you proving that pointers were
    unnecessary...


    It was about you thinking that your own code using pointers could not be implemented neatly in languages without pointers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Fri Nov 18 11:03:04 2022
    On 17/11/2022 12:24, Bart wrote:
    On 17/11/2022 10:34, David Brown wrote:
    On 16/11/2022 23:01, Bart wrote:

    But Unicode makes everything harder, with characters taking up
    multiple bytes, and a lot of the time it just doesn't work. (I've
    seen Unicode errors on everything from TV subtitles to supermarket
    receipts, and that was a few weeks ago.)

    That's not a Unicode problem - that's a software bug.

    It means even the big boys have issues with it.

    What makes you think software for that kind of thing is made by "big
    boys"? It's as likely to be small companies or single developers, as
    anything else. /I/ have made software that printed out receipts in
    machines in supermarkets, and that was in assembly with no Unicode in
    sight. (It was a long time ago.)

    And the "big boys" make mistakes as often as the small folks, both in
    terms of understanding the problem, knowing about the solutions, and implementing the code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Fri Nov 18 12:00:56 2022
    On 15/11/2022 18:32, James Harris wrote:
    On 14/11/2022 15:23, David Brown wrote:
    On 14/11/2022 11:47, Bart wrote:

    ...

    In-place, value-returning increment ops written as ++ and -- are
    common in languages.


    Yes.  And bugs are common in programs.  Being common does not
    necessarily mean it's a good idea.

    (It doesn't necessarily mean it's a bad idea either - I am not
    implying that increment and decrement are themselves a major cause of
    bugs!  But mixing side-effects inside expressions /is/ a cause of bugs.)

    The side effects of even something awkward such as

      *(++p) = *(q++);

    are little different from those of the longer version

      p = p + 1;
      *p = *q;
      q = q + 1;

    The former is clearer, however. That makes it easier to see the intent..

    Really? I have no idea what the programmer's intent was. "*p++ =
    *q++;" is common enough that the intent is clear there, but from your
    first code I can't see /why/ the programmer wanted to /pre/increment
    "p". Maybe he/she made a mistake? Maybe he/she doesn't really
    understand the difference between pre-increment and post-increment?
    It's a common beginners misunderstanding.

    On the other hand, it is quite clear from the separate lines exactly
    what order the programmer intended.

    What would you say are the differences in side-effects of these two code snippets? (I'm assuming we are talking about C here.)


    Just blaming operators you don't like is unsound - especially since, as
    you seem to suggest below, you use them in your own code!!!


    All I am saying is that it's worth considering the advantages and
    disadvantages of making a decision about such operators. I'm not
    denying that the operators can be useful - I am questioning whether
    those uses are enough in comparison to the advantages of /not/ having them.

    When I program in C, I use the features of C as best I can in the
    clearest way. I don't change the language (though I limit myself to a
    subset of the possibilities, of course, as everyone does in every language).

    ...

    [discussion of ++ and -- operators]

    Is your point that you shouldn't have either of those operators?

    Yes!  What gave it away - the first three or four times I said as much?

    ...

    ... (Of course I use increment operator, especially in loops, because
    that's how C is written.  But a new language can do better than that.)

    If you think ++ and -- shouldn't exist then why not ban them from your
    own programming for a while before you try to get them banned from a new language?


    Banning them from C would have no benefits because C has side-effects in expressions, and avoiding the operators won't change that. Besides, the
    real problem is not careful and considered use, it is the /abuse/ that
    some people call "clever coding". It is not the "char c = *p++;" that
    is the problem, it is the "++E*++".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to David Brown on Fri Nov 18 11:47:45 2022
    On 2022-11-18 11:03, David Brown wrote:

    And the "big boys" make mistakes as often as the small folks, both in
    terms of understanding the problem, knowing about the solutions, and implementing the code.

    Big boys make more mistakes due to mismanagement typical for big
    organizations.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Fri Nov 18 11:48:43 2022
    On 16/11/2022 00:11, Bart wrote:
    On 15/11/2022 18:05, David Brown wrote:
    On 15/11/2022 16:26, Bart wrote:

    import struct        # Standard module
    bs = open("potato_c.cof").read()

    machine, nsections, timestamp, symtaboffset, nsymbols,
    optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

    That's it.  Three lines.  I would not think of C for this kind of
    thing - Python is /much/ better suited.

    I don't believe you.

    OK, I suppose. But I've done this in Python, I've done it in C, and I
    would never choose C for any kind of serious file handling. Reading a
    simple structure in C is okay, and of course if you already have the
    struct defined in a C header, using it will save effort.


    (BTW you might be missing an argument in that struct.unpack_from call.)

    No, I am not.  There is an optional third argument, but it is optional.

    What about the second argument? I don't understand how the function call knows to get the data from 'bs'.


    My apologies - I must have had a copy-and-paste screwup. I didn't see
    it, and assumed you were talking about the optional "offset" parameter,
    or perhaps the odd "/" character that appears in the documentation.
    Yes, I was missing "bs" as the second parameter.

    <https://docs.python.org/3/library/struct.html>


    Using that approach for the nested structs and unions of my other
    example is not so straightforward. You basically have to fight for
    every field.

    You have to define every field in every language, or define the ones
    you want along with offsets to skip uninteresting data.

    When properly supported, you can define the fields of a struct just as
    you would in any static language (see above example), and you can write handling code just as conveniently.

    You don't have to manually write strings of anonymous letter codes and
    have to remember their ordering everywhere they are used. That is just
    crass.

    You don't have to do that in Python either. But it is really convenient
    for small and simple cases.

    If you have something bigger or more complicated, you can do what you
    can always do in programming - divide and conquer. When I wrote network
    code for Modbus, I used one pack/unpack set for the common packet
    header, another for the Modbus function specific fields, and another for
    the data (using "H" * n for the format string of multiple 16-bit
    unsigned types, since that's what Modbus likes).

    Or you can use ctypes: <https://docs.python.org/3/library/ctypes.html#structures-and-unions>

    I've found struct pack/unpack to be very convenient and simple. It is particularly handy in combination with Python's array slicing, and easy combination of arrays or strings by just "adding" them.


    I went out of my way to add such facilities in my scripting language,
    because I felt it was important. So you can code just as you would in a static language but with the convenience of informal scripting.

    Clearly you don't care for such things and prefer a hack.


    You say "hack" as though there exists "right" and "wrong" ways to handle particular tasks. I use a language and techniques that are quick,
    simple, and do the job. Other people can understand, modify and use the
    code as they need. (In that respect, pretty much everything beats your solution - except maybe Perl :-) ) It's a solution, not a "hack".


    The result is a tuple of unnamed fields. You really want a proper
    record, which is yet another add-on, with a choice of several modules
    depending on which set of characterics you need.

    You can do that in Python.

    Yeah, I know, you can do anything in Python, since there is an army of
    people who will create the necessary add-on modules to create ugly and cumbersome bolted-on solutions.

    Do you actually know much about Python? Maybe I've been assuming too much.


    I can list dozens of things that my scripting language does better than Python. (Here, such a list exists: https://github.com/sal55/langs/blob/master/QLang/QBasics.md.)


    I'm not falling for that again. Suffice to say that you can assume any
    other programmer (not just me) will throw out at least half the list
    because they disagree with your opinions. (Which half will, of course,
    vary enormously.) And anyone with experience with Python and who is
    insanely bored could list hundreds or thousands of things that are
    better in Python - assuming they could find solid documentation for your language.



    In short, you are making up shit in an attempt to make your own
    language look better than other languages, because you'd rather say
    something silly than admit that any other language could be better in
    any way for any task.

    Not at all. Python is better for lots of things, mainly because there
    are a million libraries that people have written for it, armies of
    volunteers who have written suitable, bindings or written all sorts of
    shit. And there is huge community and lots of resources to help out.

    It is also full of as many advanced, esoteric features that you could
    wish for.

    But it is short of the more basic and primitive features of the kind I
    use and find invaluable.

    'struct' is also not a true Python module; it's a front end for an
    internal one called `_struct`, likely implemented in C, and almost
    certainly using pointers.

    Please re-think what you wrote there.  I hope you can realise how
    ridiculous you are being.

    Tell me. Maybe struct.py could be written in pure Python; I don't know.

    Your last point is quite telling - you /don't know/, yet you feel
    qualified to make claims about it. But you are missing the /real/ point
    - the implementation is /irrelevant/. It doesn't matter if "struct" is
    written in pure Python, or as a module written in C, or anything else.
    It is a standard Python library module (thus not an "add-on" or "some
    weird extra module"). When people write code in the Python language
    using the standard Python library to do tasks like the one you gave
    above, they do not use pointers.

    (In case you are curious, some implementations of Python - such as PyPy
    - have pure Python implementations of modules like "struct". Others,
    such as CPython, have C modules for that kind of thing.)

    I'm saying I guarantee mine would have the necessary features to do so.

    But this started off being about pointers. Here's another challenge:
    this program is for Windows, and displays the first 2 bytes of the
    executable image of the interpreter, as loaded in memory:

        println peek(0x400000, u16):"m"

        fun peek(addr, t=byte) = makeref(addr, t)^

    This displays 'MZ' (the signature of PE files on Windows). But of
    interest is how Python would implement that peek() function.


    You think it is a good think that programs have direct access to the
    memory their interpreter's executable? Really?

    import struct
    bs = open("/usr/bin/python").read()
    print(struct.unpack_from("H", bs, 0x40000)[0])

    That prints the 16-bit value at offset 0x40000 from the start of the
    "python" file. Was that what you wanted?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Fri Nov 18 12:22:32 2022
    On 18/11/2022 10:48, David Brown wrote:
    On 16/11/2022 00:11, Bart wrote:

    I can list dozens of things that my scripting language does better
    than Python. (Here, such a list exists:
    https://github.com/sal55/langs/blob/master/QLang/QBasics.md.)


    I'm not falling for that again.  Suffice to say that you can assume any other programmer (not just me) will throw out at least half the list
    because they disagree with your opinions.  (Which half will, of course,
    vary enormously.)  And anyone with experience with Python and who is insanely bored could list hundreds or thousands of things that are
    better in Python - assuming they could find solid documentation for your language.

    The point of that list was to show the basics that are either missing or
    that have clunky bolted-on implementations, or that have to be achieved
    by abusing advanced features (I'm thinking of people who have added even
    'goto' to Python).

    As one example, if I want the ASCII code for "A", I write 'A' as one
    might do in C.

    In Python it's ord("A") (which used to involve a global lookup then a
    function call; perhaps it still does, since 'ord' might have been
    reassigned).

    In Lua it's string.byte("A"). Just what you need in interpreted code:
    extra overheads!

    What I'm saying is that there are too many advanced features with less
    support for fundamental ones.

    This displays 'MZ' (the signature of PE files on Windows). But of
    interest is how Python would implement that peek() function.


    You think it is a good think that programs have direct access to the
    memory their interpreter's executable?  Really?

    I don't claim my language is safe. I made a decision that still allowing
    raw pointers in scripting languages can be useful in the right hands.
    (However these have some extra protections compared with their C
    equivalents.)

    But Python has so many possibilities with Cython or Ctypes or C
    extension modules, that I'm sure just as much mischief can be done if
    somebody wants, but the whole thing is so complicated that you can't
    audit that easily.



        import struct
        bs = open("/usr/bin/python").read()
        print(struct.unpack_from("H", bs, 0x40000)[0])

    That prints the 16-bit value at offset 0x40000 from the start of the
    "python" file.  Was that what you wanted?

    Not quite. Your code just reads 2 bytes from a file (at a 256K offset
    from the start of the file; the MZ signature is at offset 0).

    My code accesses my task's virtual memory space, where the executable
    image is placed at offset 0x400000 (4MB) from address 0x000000. Think of
    it as a form of reflection.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to David Brown on Fri Nov 18 14:09:03 2022
    On 18/11/2022 07:12, David Brown wrote:
    If I want two things to change, why try to squeeze it into /one/
    expression or statement? Why not write two statements, each one
    doing a single clear and simple task?

    I think there is a danger of over-egging this sort of case, partly caused by writing simple examples. It depends on context, but one reason
    to write one expression rather than two statements is to keep code short
    and structured:

    a[ i +:= 1 ] := x

    vs

    begin
    i +:= 1;
    a[i] := x
    end

    [on the assumption that this is the controlled clause of a conditional or
    loop, so needs to be turned into one statement]. Repeated several times
    in a few lines of code, it turns a half-page procedure you can comprehend
    "at a glance" into a page-and-a-half for which you have to keep scrolling
    up and down, the "begins" and "ends" [or moral equivalents] start to
    obtrude, and the whole structure of the code is less clear. You've
    replaced one task ["assign to the next element of the array"] by two,
    that may or may not be linked.

    Clearly this can be overdone, but there is a balance to be struck,
    and I would argue that Pascal and, judging by Dmitry's tokeniser example,
    Ada are too far one way and C[++] too far the other.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Couperin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Fri Nov 18 16:52:33 2022
    On 18/11/2022 07:12, David Brown wrote:
    On 16/11/2022 21:04, Bart wrote:

    That's fine.  But you wouldn't enter a discussion with a linguist who
    has some experience with Chinese, and try to tell them that Chinese
    grammar is beyond human comprehension.  You could say that /you/ think
    it looks like Chinese writing would be hard to learn - but you could
    /not/ say anything about how hard it is for Chinese speakers to learn.
    You could not even say that it really would be difficult for you to
    learn, because you haven't tried or investigated enough.

    I mentioned Chinese because I and a friend did attend a short (and free)
    taster course.

    But I also mentioned it because it's not that useful in the UK, even if
    I'd mastered some it.

    It's also interesting because if you look at the English Wikipedia entry
    on China, the web page source is something like 99% ASCII, with the rest Unicode. But even if you look at the /Chinese/ page, the page source is
    still 80% ASCII (mainly due to tags etc)

    This ties in with the discussion on Unicode, and also my point about
    giving weight to more basic language features.

    Unicode understands the importance of ASCII, since ASCII occupies the
    first 128 Unicode codepoints.

    (Imagine if ASCII had been relegated to some obscure alphabet.)

    I can fully respect your personal preferences - that's not the issue for me.  I find it sad and disappointing that someone can have such strong opinions about something they have never really considered or tried to
    learn about,

    I've tried lots of times. I've even tried implementing some of it.

    and consequently don't understand, but I guess that is
    human nature.  It's ancient survival tactics, honed by evolution - when something is new, different and unknown, human instinct is to fear it
    and run away, rather than investigate and study it.

    There is a VAST amount of complexity about now, most of which is not
    justified. And that applies across everything - processors, computers, operating systems, languages, tools and applications.

    It all barely manages to work, mainly thanks to just throwing more
    resources at any problems instead of tackling it at root, by simplifying.

    The big thing in new languges now is not really advanced kinds of
    functions, it is even more advanced and esoteric type systems.

    I'm not interested in advanced functions and even less in types. You can
    do a huge amount with the basics; what you do can be understood by more
    people; and it is accessible to more people. It's probably also easier
    to make things efficient.

    Tell me what the latest thing is in English now; what's new? There are
    lots of new words about, but most people communicate using a much more
    basic vocabulary.

    The bit that really bugs me is how you (and James) can hold such strong opinions about how /other people/ might like and use these features and languages that support them.  Is it so hard to accept that some people
    like using higher order functions?

    Is it so hard to accept that the vast majority of functions in any
    application are VERY ordinary? So why make so much of the tiny minority
    that need those special features?

    Look at any library: OpenGL exports 500 ordinary functions; GTK and
    Windows perhaps the best part of 10,000 functions each.


    Or that some people write code in
    functional programming languages, because they find it a better choice
    for their needs?

    I said I don't care what other people use.

    I care when functional style is inflicted on me, but if I reject it,
    then I'm just being deliberately ignorant; a diehard; a Luddite; a stick-in-the-mud.

    All I see is something that makes ordinary coding much harder to do and
    harder to understand.


      Is it so hard to accept that other people can write
    code for the same task in widely different languages, and /your/ code in /your/ language is not the "perfect" solution or the only "non-clunky"
    code?

    I accept my 1980s-style code is clunky. But it still works extraordinary
    well. Code in my scripting language is clear enough and conservative
    enough to act as pseudo-code.

    (There was a period in clc when Ben Bacarisse used to post short 10-line solitions to some task in Haskell. Except that they were rather
    difficult to understand.

    I had a habit then where I posted solutions in my scripting language
    which were far simpler for /anyone/ to understand. But mine might have
    been 15 lines instead of 10.)

    If I want two things to change, why try to squeeze it into /one/
    expression or statement?  Why not write two statements, each one doing a single clear and simple task?

    What about:

    swap(a, b) # from my stuff
    (a, b) = (b, a) # from Python
    (a, b) = f() # multiple function return values
    x := (a, b, c, d) # record/struct constructor assigns 4 fields
    A[3..6] := (1,2,3,4) # change 4 elements of A

    This stuff happens everywhere.

    (As to writing "i" three times - again, these things are often found in loops, where a good syntax can mean "i" is never written at all.)

    If you're talking about C-style for-loops, that remains one of the great mysteries of the world: why so many languages would copy such a crappy
    feature:

    for (i=a; i<=b; ++i)

    instead of:

    for i in a..b # or countless variations

    But it's amusing that people who defend that C-style loop dismiss that
    superior version and say, Ah, but you shouldn't be using such iteration
    at all! Let's jump a language level or two and go straight into advanced
    array manipulation features, or straight to functional code.

    Here, show me this in any language:

    to n do
    println random()
    od

    Just repeat-N-times. It terms of low-hanging fruit of simple-to-add and simple-to-understand and convenient features, it is one of the lowest.
    But as a feature it is rare.

    Yeah, just like C! If you think this lot is just C with a paint-job,
    then you're in denial.

    Yes, it is a lot like C.

    Yes, it is that /level/ of language. But done properly (IMO).

      It has a number of changes, some that I think
    are good, some that I think are bad, but basically it is mostly like C.

    There are some massive changes. Even C++ has only just acquired modules.

    In particular, it's quite clear to me that when you developed your
    language, you had the assembly level implementation heavily in mind when doing so.  Why are your numbers 64-bit integers by default?

    What other size makes sense these days? C has to use i32 for
    compatibility reasons, although the language itself would allow i64.

      It is not
    because it is a particularly useful size for integers, but because it
    fits the cpu's you are targeting.  Why do you want integer types that
    are powers of [2]?

    For the same reason that 10 other languages I can't be bothered to list,
    that have fixed-width integer types, do so too.

    Yes you can have the luxury of ignoring the CPU, but you're going to
    have a slower and/or more complex language and/or a more massive and
    slower compiler or JIT to get it up to speed.


    It's a low-level language.

    I've never said anything else. My two languages are marked as M and Q
    here, on a scale of language level:

    C--M-----Q---------------Python

    Plus other kinds of languages exist outside of that line. It's a
    particular niche I find useful for the things I want to do.

    Of course, I fully expect you to be completely dismissive of all of
    this. I wouldn't swap any of these for higher-order functions.


    I can't imagine why you would think adding higher-order functions would
    mean dropping any of it.

    Languages having such functions ahead of nearly everything else annoy
    me. One new language proposal over on Reddit described 3 kinds of functions.

    The first two were advanced ones; the last were ordinary 'named'
    functions, as are used for 99% of the functions in real applications.

    They were almost an afterthought!

    "Intuitive" means you've used it often enough to use the feature without thinking about it.  Nothing more.  Stop imagining that everything you learned along your programming career is somehow easier than other
    methods seen by other people.  It's so long since you learned to program that you've forgotten how it goes.  When you have a long history of programming in ALGOL, assembly, and perhaps a spot of FORTRAN or BASIC,
    the step to C or your own language is minor.  That makes it /seem/ intuitive, but it is not - it's just what you are used to.

    It was also what I could do within 16KB of memory on an 8-bit processor, written in an assembler written in hex machine code.

    Then I realised how much potential even that simple language had. With
    that language (and with an extra 16KB data memory and 8KB video memory),
    I worked on two kinds of programs I had an interest in, apart from the
    language stuff:

    * 3D vector graphics (which involved emulating floating point
    arithmetic, as well as line-drawing algorithms etc)

    * Video processing based on capturing image data from a frame grabber.

    The language really isn't that important. You don't need anything fancy
    or sophisticated, the basics will do. Plus some conveniences I've added.
    Yes, even in 2022.


    No.  I think you should be happy to accept that you don't know anything about functional programming, and haven't the inclination or motivation
    to learn, and leave it at that.

    Let's leave it then.


    I also think that anyone interested in becoming a better /programmer/ or software developer, rather than just a better /coder/, should learn some function programming.  You'll be a better imperative programmer for it.

    It would have to be /my/ functional language, that's not going to happen.


    But for you, personally, I think your prejudice and biases (or
    "intuition") are too fixed.  You'll never look at something new with an
    open mind, so there is little point.

    You mean, like you are so sceptical of my bit operations; or
    'tabledata'; or whole-program compilation (which simply changes the
    granularity at which a compiler works).

    I've also long abolished 'linkers'; I've introduced run-from-source,
    even for compiled code; I eliminated the distinction between expressions
    and statements;...

    Quite a lot of innovative stuff. Just not the sort of thing that you
    think adds value.

    And then, you dismiss my whole-program compiler, but extol link-time-optimisation.

    (Examples of 'tabledata', now called 'enumdata' when enums are involved:

    https://github.com/sal55/langs/blob/master/MLang/Examples/aa_tables.m)


    So what?  What do bitfield extraction operators give you here?  Or
    multiple return values?  Sod-all.

    People can see the utility of bitfield extraction and multiple return
    values, even when writing reams of very dull code. You must have seen
    GETBIT and SETBIT macros in C.

    The benefits of currying are far more elusive.

    There is a proposal to add metaclasses to C++.

    Why not? It has everything else!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to James Harris on Fri Nov 18 18:24:20 2022
    On 16/11/2022 23:02, James Harris wrote:
    On 14/11/2022 18:41, Dmitry A. Kazakov wrote:
    On 2022-11-14 19:26, James Harris wrote:
    On 14/11/2022 11:29, Dmitry A. Kazakov wrote:

    ...

        Index     : Integer := Pointer + 1;
        Malformed : Boolean := False;
        Underline : Boolean := False;
        Symbol    : Character;
    begin
        while Index <= Line'Last loop
           Symbol := Line (Index);
           if Is_Alphanumeric (Symbol) then
              Underline := False;
           elsif '_' = Symbol then
              Malformed := Malformed or Underline;
              Underline := True;
           else
              exit;
           end if;
           Index := Index + 1;
        end loop;
        Malformed := Malformed or Underline;

    ...

        errors = 0
        last_char = line(pointer)
        rep for i = pointer + 1, while i le line_last, ++i
          ch = line(i)
          if ch eq '_'
            if last_char eq '_' so ++errors ;Consecutive underscores
          on not is_alphanum(ch)
            break rep ;If neither underscore nor alphanum we are done
          end if
          last_char = ch
        end rep
        if last_char eq '_' so ++errors ;Trailing underscore

    ...

    It occurred to me that the code could be made yet shorter, leading to

    errors = 0
    rep for i = pointer, while ++i le line_last
    if line(i) eq '_'
    if line(i - 1) eq '_' so ++errors
    on not alphanum(line(i))
    exit rep ;Neither underscore nor alphanum so we're done
    end if
    end rep
    if line(i - 1) eq '_' so ++errors ;Trailing underscore

    Overall, this bit of the code has gone from 17 lines to 12 to 9. That's
    an appreciable reduction. And no tricks were involved. This is basic
    refactored code.

    Don't get me wrong. I'm not saying that short code is in and of itself a benefit. But if the 'complexity density' remains the same then less code
    does help: it gives a programmer less to read, less to understand, less
    to keep in his head, and less to debug.

    As well as having fewer lines, some of the lines in the latter version
    are also shorter and simpler than those in the version which preceded it.

    The most recent changes:

    I stripped out the variables ch and last_char. Referring, instead, to
    line(i) and line(i - 1). IMO that makes clearer what's being referred
    to: the char at the current index and the one before.

    Another change is relevant to this thread: I made better use of one of
    the "++" operators. The prior code began its loop with

    i = pointer + 1, while i ...., ++i

    That code was an obvious candidate for changing to the simpler

    i = pointer, while ++i ....


    I return to my earlier assertion: allowing programmers to use the nudge operators ("++" etc) allows certain algorithms to be expressed more
    elegantly, more clearly, and more simply.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Fri Nov 18 19:01:20 2022
    On 15/11/2022 21:40, David Brown wrote:
    On 15/11/2022 20:09, James Harris wrote:
    On 15/11/2022 17:31, David Brown wrote:
    On 15/11/2022 17:58, James Harris wrote:

    ...

    The question is not whether prevention would be possible but whether
    you (i.e. DB) would consider it /advisable/. If you prevented it then
    a lot of familiar programming patterns and a number of existing APIs
    would become unavailable to you so choose wisely...! :-)


    I am not the language designer here

    Uh, huh.


    - and I still don't really grok what
    kind of language /you/ want, what you understand from before, what uses
    it should have, or what you think is wrong with existing languages.  (Or maybe this is all for fun and interest, which is always the best reason
    for doing anything.)  That makes it hard to give recommendations.

    ...

    You assume /so/ many limitations on what you can do as a language
    designer.  You can do /anything/.  If you want to allow something,
    allow it.  If you want to prohibit it, prohibit it.

    Sorry, but it doesn't work like that.

    Yes, it does.

    No, it does not. Your view of language design is far too simplistic.
    Note, also, that in a few paragraphs you say that you are not the
    language designer whereas I am, but then you go on to try to tell me how
    it works and how it doesn't and, previously, that anything can be done.
    You'd gain by /trying/ it yourself. They you might see that it's not as straightforward as you suggest.


    A language cannot be built on ad-hoc choices such as you have suggested.

    I haven't suggested ad-hoc choices.  I have tried to make reasoned suggestions.  Being different from languages you have used before, or
    how you envision your new language, does not make them ad-hoc.

    Saying you'd like selected combinations of operators to be banned looks
    like an ad-hoc approach to me.

    ...

    BTW, any time one thinks of 'treating X separately' it's good to be
    wary. Step-outs tend to make a language hard to learn and awkward to
    use.


    So do over-generalisations.

    Not really.

    Yes, really.

    You simply repeating phrases back to me but in the negative does not
    make your assertions correct or mine wrong.


    Imagine if you were to stop treating "letters", "digits" and
    "punctuation" separately, and say "They are all just characters.  Let's treat them the same".  Now people can name a function "123", or "2+2".
    It's conceivable that you'd work out a grammar and parsing rules that
    allow that (Forth, for example, has no problem with functions that are
    named by digits.  You can redefine "2" to mean "1" if you like).  Do you think that would make the language easier to learn and less awkward to use?

    Certainly not. Why do you ask?


    It's ad-hoc rules which become burdensome.

    Agreed.

    Phew!

    ...

    Seriously, try designing a language, yourself. You don't have to
    implement it. Just try coming up with a cohesive design of something
    you would like to program in.


    If I had the time...  :-)

    I fully appreciate that this is not an easy task.

    I'm sure you do but your view of the details is superficial. You have
    ideas which are interesting in themselves but you don't appear to
    appreciate how decisions bear on each other when you have to bring
    hundreds of them together.

    ...

    Bart came up with an example something like

       +(+(+(+ x)))

    That's not at all sensible. You want that banned, too?


    Yes :-)  Seriously, I appreciate that there will always be compromises - trying to ban everything silly while allowing everything sensible would
    mean countless ad-hoc rules, and you are right to reject that.  I am advocating drawing a line, just like you - the difference is merely a
    matter of where to draw that line.  I'd draw the line so that it throws
    out the increment and decrement operators entirely.  But if you really wanted to keep them, I'd make them postfix only and as statements, not
    in expressions - let "x++" mean "x += 1" which means "x = 1" which
    should, IMHO, be a statement and not allowed inside an expression.

    This does, indeed, in a sense, come down to where the designer decides
    to draw the line. Unfortunately there is no simple line.

    For example, you spoke about banning side effects in expressions. For
    sure, you could do that. But then you thought side effects in function
    calls in expressions should possibly be treated differently and be left
    in! Making such rules is not as simple as it may appear. All the
    decisions a language designer makes have a tendency to bear on each
    other, even if only in the ethos of the final language: whether it's
    simple and cohesive or ad hoc and confusing.

    Further, remember that the decisions the language designer makes have to
    be communicated to the programmer. If a designer says "these side
    effects are allowed but these other ones are not" then that just gives
    the programmer more to learn and remember.

    As I say, you could try designing a language. You are a smart guy. You
    could work on a design in your head while walking to the shops, while
    waiting for a train, etc. As one of my books on language design says,
    "design repeatedly: it will make you a better designer".


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Fri Nov 18 20:14:52 2022
    On 18/11/2022 11:00, David Brown wrote:
    On 15/11/2022 18:32, James Harris wrote:

    ...

    The side effects of even something awkward such as

       *(++p) = *(q++);

    are little different from those of the longer version

       p = p + 1;
       *p = *q;
       q = q + 1;

    The former is clearer, however. That makes it easier to see the intent..

    Really?  I have no idea what the programmer's intent was.  "*p++ =
    *q++;" is common enough that the intent is clear there, but from your
    first code I can't see /why/ the programmer wanted to /pre/increment
    "p".  Maybe he/she made a mistake?  Maybe he/she doesn't really
    understand the difference between pre-increment and post-increment? It's
    a common beginners misunderstanding.

    I don't think I know of any language which allows a programmer to say
    /why/ something is the case; that's what comments are for. Programs
    normally talk about /what/ to do, not why. The very fact that the
    assignment does something non-idiomatic is a sign that a comment could
    be useful. It's akin to

    for (i = 0; i <= n ....

    If the test really should be <= then a comment may be useful to explain why.


    On the other hand, it is quite clear from the separate lines exactly
    what order the programmer intended.

    What would you say are the differences in side-effects of these two code snippets?  (I'm assuming we are talking about C here.)

    That depends on whether the operations are ordered or not. In C they'd
    be different, potentially, from what they would be in my language. What
    would you say they are?



    Just blaming operators you don't like is unsound - especially since,
    as you seem to suggest below, you use them in your own code!!!


    All I am saying is that it's worth considering the advantages and disadvantages of making a decision about such operators.  I'm not
    denying that the operators can be useful - I am questioning whether
    those uses are enough in comparison to the advantages of /not/ having them.

    It's always useful to have one's preferences challenged. :-)


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Fri Nov 18 19:31:55 2022
    On 18/11/2022 07:12, David Brown wrote:


    ...

    ... I find it sad and disappointing that someone can have such strong opinions about something they have never really considered or tried to
    learn about, ...

    A bit like those who have strong opinions on how a programming language
    should be designed but have never designed one themselves, you mean?!


    and consequently don't understand, but I guess that is
    human nature.  It's ancient survival tactics, honed by evolution - when something is new, different and unknown, human instinct is to fear it
    and run away, rather than investigate and study it.

    The bit that really bugs me is how you (and James) can hold such strong opinions about how /other people/ might like and use these features and languages that support them.

    There's an old maxim: Try and design a language for other people to use
    and you'll end up with PL/I or Cobol. Try and design a language for
    yourself and you might find that others of like mind also appreciate it.

    That's a reasonable maxim although I'd add to keep it simple: cut down
    on those ad-hoc rules or others will find it too hard to remember, even
    if you as the language designer can remember them all.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Fri Nov 18 19:19:05 2022
    On 15/11/2022 12:14, Dmitry A. Kazakov wrote:
    On 2022-11-15 12:44, James Harris wrote:

    Do you also believe that the Unix

       bytes = read(fd, &buf[1], reqd);

    should be prohibited since it has the side effect within the
    expression of modifying the buffer? If so, what would you replace it
    with??

    That is simple. Ada's standard library has it:

       procedure Read
                 (  Stream : in out Root_Stream_Type;
                    Item   : out Stream_Element_Array;
                    Last   : out Stream_Element_Offset
                 )  is abstract;

    Item is an array:

    type Stream_Element_Array is
       array (Stream_Element_Offset range <>) of aliased Stream_Element;

    It is also a "virtual" operation in C++ terms to be overridden by new implementation of stream. Last is the index of the last element read.
    Notice non-sliding bounds, as you can do this:

       Last := Buff'First  - 1;
       loop
          Read (S, Buff (Last + 1..Buff'Last), Last); -- Non-blocking chunk
          exit when Last = Buff'Last;                 -- Done
       end loop;

    Since bounds do not slide Last stays valid for all array slices.

    That's cool. So the call passes to Read a 'virtual' array, aka a view of
    part of an array, and Last is /output/ from the call? Presumably the
    array is made into a view (rather than an actual array) by means of the "aliased" keyword. Is that correct?

    Since Last is output from Read why do you set it before the loop starts?

    If there's no more data does Read throw an exception?

    It's interesting that the array is termed an /out/ parameter even though
    only part of it might be overwritten!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Fri Nov 18 21:46:44 2022
    On 2022-11-18 20:19, James Harris wrote:
    On 15/11/2022 12:14, Dmitry A. Kazakov wrote:
    On 2022-11-15 12:44, James Harris wrote:

    Do you also believe that the Unix

       bytes = read(fd, &buf[1], reqd);

    should be prohibited since it has the side effect within the
    expression of modifying the buffer? If so, what would you replace it
    with??

    That is simple. Ada's standard library has it:

        procedure Read
                  (  Stream : in out Root_Stream_Type;
                     Item   : out Stream_Element_Array;
                     Last   : out Stream_Element_Offset
                  )  is abstract;

    Item is an array:

    type Stream_Element_Array is
        array (Stream_Element_Offset range <>) of aliased Stream_Element;

    It is also a "virtual" operation in C++ terms to be overridden by new
    implementation of stream. Last is the index of the last element read.
    Notice non-sliding bounds, as you can do this:

        Last := Buff'First  - 1;
        loop
           Read (S, Buff (Last + 1..Buff'Last), Last); -- Non-blocking chunk
           exit when Last = Buff'Last;                 -- Done
        end loop;

    Since bounds do not slide Last stays valid for all array slices.

    That's cool. So the call passes to Read a 'virtual' array, aka a view of
    part of an array, and Last is /output/ from the call? Presumably the
    array is made into a view (rather than an actual array) by means of the "aliased" keyword. Is that correct?

    Aliased is only needed for pointers. Stream_Element_Array has aliased
    elements for interoperability with the OS. It means that you can take a
    pointer to any array element and pass it down to our beloved C's fread (:-))

    Since Last is output from Read why do you set it before the loop starts?

    Because it moves from Buff'First - 1 to Buff'Last as we read the stream
    into Buff.

    If there's no more data does Read throw an exception?

    No, it returns available data. So the above is busy polling when S is non-blocking. If you need non-busy interface you must have an event you
    could wait for and reset (without falling into a race condition).
    Usually, this stuff is kept inside the implementation of the stream.
    E.g. S keeps an event, or creates one per each task/thread in the
    extreme case of shared I/O. Read returns available data or waits for the
    event which is reset atomically with getting buffered data.

    It's interesting that the array is termed an /out/ parameter even though
    only part of it might be overwritten!
    There is no much difference between out and in out arrays. Basically out
    means that the callee won't expect anything in the array. Note that the
    bounds (and other constraints) cannot be changed. That is the key
    difference between out-argument and a result. E.g. you cannot mutate an argument into something else.

    You could try to refine I/O modes (views) like in, out, in out. E.g. the
    file system and databases support blocking of portions of data. The corresponding I/O mode would be when some parts of the array are in the
    in mode other parts are in the out or in out mode. You could consider
    extending that on structures and other containers, of course.

    I have no idea how to express that on the language level, but you get
    the idea. Type algebra is an exciting subject. Alas, nobody pays any
    attention these days.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Fri Nov 18 21:57:31 2022
    On 2022-11-18 20:01, James Harris wrote:
    On 15/11/2022 21:40, David Brown wrote:

    Imagine if you were to stop treating "letters", "digits" and
    "punctuation" separately, and say "They are all just characters.
    Let's treat them the same".  Now people can name a function "123", or
    "2+2". It's conceivable that you'd work out a grammar and parsing
    rules that allow that (Forth, for example, has no problem with
    functions that are named by digits.  You can redefine "2" to mean "1"
    if you like).  Do you think that would make the language easier to
    learn and less awkward to use?

    Certainly not. Why do you ask?

    Well, let me intervene. Actually early languages played with such ideas.
    It is worth to mention Forth, Lisp, TeX, all sorts of preprocessors etc.
    That time the idea that you could "program" the language syntax was
    very popular.

    Then evolution of languages took a different turn. Polymorphism was
    achieved through decomposition (first procedural, then OO, relational, functional) keeping the language syntax stable.

    (Software reuse choked the ugly spawn. If each programmer created his
    own new language reuse would not be possible)

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sat Nov 19 16:05:15 2022
    On 17/11/2022 11:24, Bart wrote:

    ...

    If I wanted to display UTF8 right now on Windows, say from a C program
    even, I would have to fight it. If I write this (created with Notepad):

      #include <stdio.h>
      int main(void) {
          printf("€°£");
      }

    and compile with gcc, it shows:

    €°£

    I'm not sure what code page it's on, but if I switch to 65001 which is supposed to be UTF8, then it shows:

    �������

    (or equivalent in the terminal font). If I dump the C source, it does
    indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC code
    for the Euro sign.

    I'm sure that on Linux it works perfectly within a terminal window. But
    I'm on Windows and I can't be bothered to do battle. Even if /I/ get it
    to work, I can't guarantee it for anyone else.

    I presume you piped the output into hd or xxd to see exactly what was
    being sent to the terminal - and hopefully prove it was the correct UTF-8.

    I can't comment on how to get Windows to display Unicode chars from
    UTF-8. I found Windows terminal drivers (VT-102 etc) often failed to
    produce the correct output whereas Unix terminal emulation for the same terminals looked perfect.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sat Nov 19 16:20:47 2022
    On 19/11/2022 16:05, James Harris wrote:
    On 17/11/2022 11:24, Bart wrote:

    ...

    If I wanted to display UTF8 right now on Windows, say from a C program
    even, I would have to fight it. If I write this (created with Notepad):

       #include <stdio.h>
       int main(void) {
           printf("€°£");
       }

    and compile with gcc, it shows:

    €°£

    I'm not sure what code page it's on, but if I switch to 65001 which is
    supposed to be UTF8, then it shows:

    �������

    (or equivalent in the terminal font). If I dump the C source, it does
    indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC
    code for the Euro sign.

    I'm sure that on Linux it works perfectly within a terminal window.
    But I'm on Windows and I can't be bothered to do battle. Even if /I/
    get it to work, I can't guarantee it for anyone else.

    I presume you piped the output into hd or xxd to see exactly what was
    being sent to the terminal - and hopefully prove it was the correct UTF-8.

    Well, gcc using puts, or bcc/tcc using puts or prints, work correcly.

    For some reason gcc+printf doesn't deal with it properly. I assumed
    those characters were raw UTF8 bytes, and redirecting it now to a file,
    that is exactly what I get.

    gcc+printf is bypassing something it shouldn't.

    My language supports it too (but needs external tools to get that
    Unicode text into my source files as UTF8), even when translated to C
    and passed through gcc. Here however, the C uses its own printf declaration.

    Going back to the C, replacing #include <stdio.h> with:

    extern int printf(const char*, ...);

    makes it work with gcc too.

    So the main failure is associated with gcc. But there are other issues,
    such as ensuring the correct code pages, making it work with graphical
    output, and other failures which I'm sure will come up. I don't want the headache.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sat Nov 19 19:47:05 2022
    On 19/11/2022 16:20, Bart wrote:
    On 19/11/2022 16:05, James Harris wrote:
    On 17/11/2022 11:24, Bart wrote:

    ...

    If I wanted to display UTF8 right now on Windows, say from a C
    program even, I would have to fight it. If I write this (created with
    Notepad):

       #include <stdio.h>
       int main(void) {
           printf("€°£");
       }

    and compile with gcc, it shows:

    €°£

    I'm not sure what code page it's on, but if I switch to 65001 which
    is supposed to be UTF8, then it shows:

    �������

    (or equivalent in the terminal font). If I dump the C source, it does
    indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC
    code for the Euro sign.

    I'm sure that on Linux it works perfectly within a terminal window.
    But I'm on Windows and I can't be bothered to do battle. Even if /I/
    get it to work, I can't guarantee it for anyone else.

    I presume you piped the output into hd or xxd to see exactly what was
    being sent to the terminal - and hopefully prove it was the correct
    UTF-8.

    Well, gcc using puts, or bcc/tcc using puts or prints, work correcly.

    For some reason gcc+printf doesn't deal with it properly. I assumed
    those characters were raw UTF8 bytes, and redirecting it now to a file,
    that is exactly what I get.

    gcc+printf is bypassing something it shouldn't.

    Is printf supposed to handle non-ASCII strings? Remember that it is
    expected to interpret the first string whereas puts isn't.

    Actually, including UTF8 in any simple string sounds dodgy. As an
    example, imagine an embedded byte value of 0x80 on a 1s complement
    machine. It would likely terminate the string.

    IOW I wouldn't expect any of this stuff to work portably.

    And as I said before, source should be pure ASCII....!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Sat Nov 19 20:23:04 2022
    On 17/11/2022 13:20, Dmitry A. Kazakov wrote:
    On 2022-11-17 13:35, Bart wrote:
    On 17/11/2022 12:12, Dmitry A. Kazakov wrote:
    On 2022-11-17 12:24, Bart wrote:

    If I wanted to display UTF8 right now on Windows, say from a C
    program even, I would have to fight it. If I write this (created
    with Notepad):

       #include <stdio.h>
       int main(void) {
           printf("€°£");
       }

    ...

    In CMD:

    ;CHCP 65001
    Active code page: 65001

    ;main.exe
    €°£

    Of course, you could use the code you wrote under the condition that
    both the editor and the compiler use UTF-8.

    The point about UTF8 is that it doesn't matter. So the string contains
    'character' E2; in C, this is just a byte array, it should just pass
    it as it is to the printf function.

    ...

    That would work, but is also completely impractical for large amounts
    of non-ASCII content. Or even small amounts. You /need/ editor
    support. I don't have it and don't do enough with Unicode to make it
    worth the trouble.

    That's is another guideline topic: you never ever place localization
    stuff in the source code.

    I was going to say the same. Where Bart says that typing "large amounts
    of non-ASCII content" needs editor support I would go the other way. In
    source code use pure ASCII, and ship with files which map the source to different locales.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sat Nov 19 20:20:33 2022
    On 19/11/2022 19:47, James Harris wrote:
    On 19/11/2022 16:20, Bart wrote:
    On 19/11/2022 16:05, James Harris wrote:
    On 17/11/2022 11:24, Bart wrote:

    ...

    If I wanted to display UTF8 right now on Windows, say from a C
    program even, I would have to fight it. If I write this (created
    with Notepad):

       #include <stdio.h>
       int main(void) {
           printf("€°£");
       }

    and compile with gcc, it shows:

    €°£

    I'm not sure what code page it's on, but if I switch to 65001 which
    is supposed to be UTF8, then it shows:

    �������

    (or equivalent in the terminal font). If I dump the C source, it
    does indeed contain the E2 82 AC sequence which is the UTF8 for the
    20AC code for the Euro sign.

    I'm sure that on Linux it works perfectly within a terminal window.
    But I'm on Windows and I can't be bothered to do battle. Even if /I/
    get it to work, I can't guarantee it for anyone else.

    I presume you piped the output into hd or xxd to see exactly what was
    being sent to the terminal - and hopefully prove it was the correct
    UTF-8.

    Well, gcc using puts, or bcc/tcc using puts or prints, work correcly.

    For some reason gcc+printf doesn't deal with it properly. I assumed
    those characters were raw UTF8 bytes, and redirecting it now to a
    file, that is exactly what I get.

    gcc+printf is bypassing something it shouldn't.

    Is printf supposed to handle non-ASCII strings? Remember that it is
    expected to interpret the first string whereas puts isn't.

    The only characters printf formats care about are '%', which indicates
    the start of a format sequence, and 0, which indicates the end of the
    string.

    Besides, all the other printf versions I tried worked (and it works on
    Linux).

    Window + gcc + printf didn't, but I don't think it got as far as calling
    a normal printf; doubtless it's doing something too clever. The relevant characters get to the output, but somehow bypass the part where the OS
    console driver converts the character stream to Unicode.


    Actually, including UTF8 in any simple string sounds dodgy. As an
    example, imagine an embedded byte value of 0x80 on a 1s complement
    machine. It would likely terminate the string.

    What machines use 1s complement these days? You might as well worry
    about those using 7-bit characters! Or EBCDIC.

    UTF8 was designed to be transparent to anything processing 8-bit
    strings. On ones completed it presumably wouldn't work, unless
    characters were wider than 8 bits.

    (Don't tell me you're avoiding the use of UTF8 for that reason. For
    anyone still using ones complement, probably ASCII would be too advanced
    as they're still using 5-bit telegraph codes!)


    IOW I wouldn't expect any of this stuff to work portably.

    And as I said before, source should be pure ASCII....!

    Sure. But this is mostly about data within programs which can be anything.

    Restricting source code means you can't have Unicode content in
    comments, or inside string constants.

    The means not being able to have "°", you'd need an escape sequence. Or convert any pasted Unicode string into such a string.

    This is unreasonable considering that providing such support within a
    compiler requires pretty much zero effort.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sat Nov 19 20:51:39 2022
    On 19/11/2022 20:20, Bart wrote:
    On 19/11/2022 19:47, James Harris wrote:
    On 19/11/2022 16:20, Bart wrote:

    ...

    For some reason gcc+printf doesn't deal with it properly. I assumed
    those characters were raw UTF8 bytes, and redirecting it now to a
    file, that is exactly what I get.

    gcc+printf is bypassing something it shouldn't.

    Is printf supposed to handle non-ASCII strings? Remember that it is
    expected to interpret the first string whereas puts isn't.

    The only characters printf formats care about are '%', which indicates
    the start of a format sequence, and 0, which indicates the end of the
    string.

    Well, is printf /specified/ to accept all other 8-bit codes? If not, you
    may find printf working in some cases but not in others.


    Besides, all the other printf versions I tried worked (and it works on Linux).

    I'm sure you know that "it works here" is irrelevant because C's
    specification includes much IB and UB. One could /very/ easily find some
    code doing what one wants in one environment and not doing what one
    wants in another. The effect of UB could even vary depending on whether
    it was Tuesday or not. Seriously, we need to keep well away from
    anything in C which is not defined.

    IOW the fact that something works when tried is *meaningless* if it's
    contrary to the specs.

    ...

    Actually, including UTF8 in any simple string sounds dodgy. As an
    example, imagine an embedded byte value of 0x80 on a 1s complement
    machine. It would likely terminate the string.

    What machines use 1s complement these days? You might as well worry
    about those using 7-bit characters! Or EBCDIC.

    As I say, if you write non-specified C code it may happen to work on one machine but that's no guarantee it will work on another.


    UTF8 was designed to be transparent to anything processing 8-bit
    strings. On ones completed it presumably wouldn't work, unless
    characters were wider than 8 bits.

    Well, UTF8 on a 1's complement machine embedded in a string which the
    runtime expects to be ASCII sounds like a recipe for trouble. Even
    characters having a set the top bit may lead to problems with char
    signedness on a 2s complement machine.


    (Don't tell me you're avoiding the use of UTF8 for that reason. For
    anyone still using ones complement, probably ASCII would be too advanced
    as they're still using 5-bit telegraph codes!)

    Baudot rules :-)

    In fact, I'd use UTF8 in auxiliary files but not in the source.




    IOW I wouldn't expect any of this stuff to work portably.

    And as I said before, source should be pure ASCII....!

    Sure. But this is mostly about data within programs which can be anything.

    That's OK. Data read would be pure bits and octets. It's when something
    tries to interpret those bits that problems can arise - e.g. to_upper
    ... and the printf control string.


    Restricting source code means you can't have Unicode content in
    comments, or inside string constants.

    Exactly.


    The means not being able to have "°", you'd need an escape sequence. Or convert any pasted Unicode string into such a string.

    Not quite. If that char (which appears to be a degree character) varied
    between locales it could go in an aux file. If it was universal,
    however, the source could include its name with such as

    "\degree/"


    This is unreasonable considering that providing such support within a compiler requires pretty much zero effort.

    This is not to help the compiler or the compiler writer. The idea of restricting source to a lingua franca is to help teams of humans
    collaborate on the source even if they are from different locales.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sat Nov 19 21:13:50 2022
    On 19/11/2022 20:20, Bart wrote:
    On 19/11/2022 19:47, James Harris wrote:
    On 19/11/2022 16:20, Bart wrote:
    On 19/11/2022 16:05, James Harris wrote:
    On 17/11/2022 11:24, Bart wrote:

    ...

    If I wanted to display UTF8 right now on Windows, say from a C
    program even, I would have to fight it. If I write this (created
    with Notepad):

       #include <stdio.h>
       int main(void) {
           printf("€°£");
       }

    and compile with gcc, it shows:

    €°£

    ...

    gcc+printf is bypassing something it shouldn't.

    Is printf supposed to handle non-ASCII strings? Remember that it is
    expected to interpret the first string whereas puts isn't.

    The only characters printf formats care about are '%', which indicates
    the start of a format sequence, and 0, which indicates the end of the
    string.

    Besides, all the other printf versions I tried worked (and it works on Linux).

    As I said in the other reply, if this is contrary to the specification
    then this stuff is poisonous. But curious to see what would happen in my particular environment I tried your code. The source file had your
    string between the quotes as

    e2 82 ac c2 b0 c2 a3

    What is that? UTF8?

    When run, also on Unix, it outputs the same bytes.

    $ ./a.out | hd
    00000000 e2 82 ac c2 b0 c2 a3

    Hence no change for this specific test.

    It's 7 bytes, though, and when you ran it on Windows you said you got

    €°£

    which is also 7 characters. At a guess, were those chars from the
    codepage your Windows terminal was using (rather than it interpreting UTF8)?


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luserdroog@21:1/5 to James Harris on Sat Nov 19 19:46:19 2022
    On Saturday, November 19, 2022 at 3:13:52 PM UTC-6, James Harris wrote:
    On 19/11/2022 20:20, Bart wrote:

    Besides, all the other printf versions I tried worked (and it works on Linux).
    As I said in the other reply, if this is contrary to the specification
    then this stuff is poisonous. But curious to see what would happen in my particular environment I tried your code. The source file had your
    string between the quotes as

    e2 82 ac c2 b0 c2 a3

    What is that? UTF8?

    Yep. That's UTF-8. With a little practice it's pretty easy to parse, or at least to find
    the character code boundaries. The first byte starts with some number of ones in the most significant position, from 0 to 4 (or maybe 6 in rare applications).
    That number tells you the length of the encoding for that character.

    So e2 is the start of a 3 byte code. And c2 is the start of a 2 byte code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Sun Nov 20 13:28:56 2022
    On 18/11/2022 20:01, James Harris wrote:
    On 15/11/2022 21:40, David Brown wrote:
    On 15/11/2022 20:09, James Harris wrote:
    On 15/11/2022 17:31, David Brown wrote:
    On 15/11/2022 17:58, James Harris wrote:

    ...

    The question is not whether prevention would be possible but whether
    you (i.e. DB) would consider it /advisable/. If you prevented it then
    a lot of familiar programming patterns and a number of existing APIs
    would become unavailable to you so choose wisely...! :-)


    I am not the language designer here

    Uh, huh.


    - and I still don't really grok what kind of language /you/ want, what
    you understand from before, what uses it should have, or what you
    think is wrong with existing languages.  (Or maybe this is all for fun
    and interest, which is always the best reason for doing anything.)
    That makes it hard to give recommendations.

    ...

    You assume /so/ many limitations on what you can do as a language
    designer.  You can do /anything/.  If you want to allow something,
    allow it.  If you want to prohibit it, prohibit it.

    Sorry, but it doesn't work like that.

    Yes, it does.

    No, it does not. Your view of language design is far too simplistic.
    Note, also, that in a few paragraphs you say that you are not the
    language designer whereas I am, but then you go on to try to tell me how
    it works and how it doesn't and, previously, that anything can be done.
    You'd gain by /trying/ it yourself. They you might see that it's not as straightforward as you suggest.


    That is a fair point. But I challenge you to show me where there are
    rules written for language designs. Explain to me exactly why you are
    not allowed to, say, provide an operator "-" without a corresponding
    operator "+". Tell me who is banning you from deciding that source code
    lines must be limited to 40 characters, or that every assignment
    statement shall be preceded by the keyword "please". I'm not saying any
    of these things are a good idea (though something similar has been done
    in other cases), I am saying it is /your/ choice to do that or not.


    You can say "I can't have feature A and feature B and maintain the
    consistency I want." You /cannot/ say "I can't have feature A". It is
    /your/ decision not have feature A. Choosing to have it may mean
    changing or removing feature B, or losing some consistency that you had
    hoped to maintain. But it is your language, your choices, your
    responsibility - saying "I can't do that" is abdicating that responsibility.


    A language cannot be built on ad-hoc choices such as you have suggested.


    It most certainly can. Every language is a collection of design
    decisions, and most of them are at least somewhat ad-hoc.

    However, my suggestions where certainly /not/ ad-hoc - it was for a
    particular way of thinking about operators and expressions, with
    justification and an explanation of the benefits. Whether you choose to
    follow those suggestions or not, is a matter of your personal choices
    for how you want your language to work - and /that/ choice is therefore somewhat ad-hoc. They only appear ad-hoc if you don't understand what I
    wrote justifying them or giving their advantages.

    Of course you want a language to follow a certain theme or style (or
    "ethos", as you called it). But that does not mean you can't make
    ad-hoc decisions if you want - it is inevitable that you will do so.
    And it certainly does not mean you can't make the choices you want for
    your language.

    Too many ad-hoc choices mean you loose the logic and consistency in the language. Too few, and your language has nothing to it. Excessive
    consistency is great for some theoretical work - Turing machines,
    lambda calculus, infinite register machines, and the like. It is
    useless in a real language.

    Look at C as an example. Not everyone likes the language, and the only
    people who find nothing to dislike in it are people to haven't used it
    enough. But it is undoubtedly a highly successful language. All binary operators require the evaluation of both operands before evaluating the operator. (And before you start thinking that is unavoidable, it is
    not, and does not apply to all languages.) Except && and ||, where the
    second operand is not evaluated if it is not needed - that's an ad-hoc decision, different from the general rule. All access to objects must
    be through lvalues of compatible types - except for the ad-hoc rule that character type pointers can also be used.

    To be successful at anything - program language design or anything else
    - you always need to aim for a balance. Consistency is vital - too much consistency is bad. Generalisation is good - over-generalisation is
    bad. Too much ad-hoc is bad, so is too little.


    I haven't suggested ad-hoc choices.  I have tried to make reasoned
    suggestions.  Being different from languages you have used before, or
    how you envision your new language, does not make them ad-hoc.

    Saying you'd like selected combinations of operators to be banned looks
    like an ad-hoc approach to me.


    Then you misunderstand what I wrote. I don't know if that was my fault
    in poor explanations, or your fault in misreading or misunderstanding -
    no doubt, it was a combination.

    ...

    BTW, any time one thinks of 'treating X separately' it's good to be
    wary. Step-outs tend to make a language hard to learn and awkward
    to use.


    So do over-generalisations.

    Not really.

    Yes, really.

    You simply repeating phrases back to me but in the negative does not
    make your assertions correct or mine wrong.

    That's why I gave the example below...



    Imagine if you were to stop treating "letters", "digits" and
    "punctuation" separately, and say "They are all just characters.
    Let's treat them the same".  Now people can name a function "123", or
    "2+2". It's conceivable that you'd work out a grammar and parsing
    rules that allow that (Forth, for example, has no problem with
    functions that are named by digits.  You can redefine "2" to mean "1"
    if you like).  Do you think that would make the language easier to
    learn and less awkward to use?

    Certainly not. Why do you ask?

    I ask, because it is an example of over-generalisation that makes a
    language harder to learn and potentially a lot more confusing to understand.



    It's ad-hoc rules which become burdensome.

    Agreed.

    Phew!

    ...

    Seriously, try designing a language, yourself. You don't have to
    implement it. Just try coming up with a cohesive design of something
    you would like to program in.


    If I had the time...  :-)

    I fully appreciate that this is not an easy task.

    I'm sure you do but your view of the details is superficial. You have
    ideas which are interesting in themselves but you don't appear to
    appreciate how decisions bear on each other when you have to bring
    hundreds of them together.


    Oh, I do appreciate that. As I have said all along, I am not giving recommendations or claiming that my suggestions are the only way to do
    things. It is all up to /you/.



    ...

    Bart came up with an example something like

       +(+(+(+ x)))

    That's not at all sensible. You want that banned, too?


    Yes :-)  Seriously, I appreciate that there will always be compromises
    - trying to ban everything silly while allowing everything sensible
    would mean countless ad-hoc rules, and you are right to reject that.
    I am advocating drawing a line, just like you - the difference is
    merely a matter of where to draw that line.  I'd draw the line so that
    it throws out the increment and decrement operators entirely.  But if
    you really wanted to keep them, I'd make them postfix only and as
    statements, not in expressions - let "x++" mean "x += 1" which means
    "x = 1" which should, IMHO, be a statement and not allowed inside an
    expression.

    This does, indeed, in a sense, come down to where the designer decides
    to draw the line. Unfortunately there is no simple line.


    Agreed.

    For example, you spoke about banning side effects in expressions. For
    sure, you could do that. But then you thought side effects in function
    calls in expressions should possibly be treated differently and be left
    in! Making such rules is not as simple as it may appear.

    Agreed. That does not mean rules cannot be made.

    I am making suggestions and throwing up ideas - I am not designing the language. There are many ways you can make this all work. Banning side-effects from expressions has many positive benefits in a language -
    I've discussed several. (I haven't even mentioned one of the real big
    ones, which is how much easier and safer it makes parallel computation
    and multi-threading.) It also has complications, and it also limits
    what programmers can do. It's a trade-off, like most choices, and you
    have to decide if the benefits are worth the cost.

    (It's worth noting again that the real power of a programming language
    comes from what you cannot do, not from what you /can/ do.)

    All the
    decisions a language designer makes have a tendency to bear on each
    other, even if only in the ethos of the final language: whether it's
    simple and cohesive or ad hoc and confusing.


    They all have a bearing on each other, yes. And significant changes in
    one place may encourage changes in other places. But you are setting up
    a false dichotomy here - perhaps because you are not willing to consider
    making these other changes. (And if you don't want to, that's fine - I
    don't have any stake in how your language turns out. I just don't want
    you to miss out on possibilities because you think you /can't/ have
    them, rather than because you don't /want/ them.)

    Further, remember that the decisions the language designer makes have to
    be communicated to the programmer. If a designer says "these side
    effects are allowed but these other ones are not" then that just gives
    the programmer more to learn and remember.


    Sure. But programmers are not stupid (or at least, you are not catering
    for stupid programmers). They can learn more than one rule.

    As I say, you could try designing a language. You are a smart guy. You
    could work on a design in your head while walking to the shops, while
    waiting for a train, etc. As one of my books on language design says,
    "design repeatedly: it will make you a better designer".


    Oh, I have plenty of ideas for a language - I have no end to the number
    of languages, OS's, processors, and whatever that I have "designed" in
    my head :-) The devil's in the details, however, and I haven't taken
    the time for that!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Sun Nov 20 16:08:27 2022
    On 20/11/2022 12:28, David Brown wrote:

    Look at C as an example.  Not everyone likes the language, and the only people who find nothing to dislike in it are people to haven't used it enough.

    I disliked C at first glance having never used it. But I loved Algol68,
    having never used that either.

    But given a task now and the choice was between those two languages, I
    would choose C, mainly because some design choices of Algol68 syntax
    make writing code more painful than in C. (Ahead of both would be one of
    my two, by a mile.)

    However I can admire Algol68 for its design, even if it needed tweaking
    IMO, but I would never be able to do that for C, since a lot of it looks
    like it was thrown together with no thought at all, or under the
    influence of some substance.


    But it is undoubtedly a highly successful language.

    On the back of Unix inflicting it on everybody (can anyone prise Unix
    and C apart?), and the lack of viable alternatives.

    Successful languages, then, needed to be able to bend the rules a
    little, do underhand stuff, which C could do in spades (so could mine!).
    You can't really do that with a Wirth language or ones like Algol68.
    Ones like PL/M disappeared.

    Now people look askance at such practices, but C already had its foot in
    the door.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 21 16:01:40 2022
    On 18/11/2022 21:14, James Harris wrote:
    On 18/11/2022 11:00, David Brown wrote:
    On 15/11/2022 18:32, James Harris wrote:

    ...

    The side effects of even something awkward such as

       *(++p) = *(q++);

    are little different from those of the longer version

       p = p + 1;
       *p = *q;
       q = q + 1;

    The former is clearer, however. That makes it easier to see the intent..

    Really?  I have no idea what the programmer's intent was.  "*p++ =
    *q++;" is common enough that the intent is clear there, but from your
    first code I can't see /why/ the programmer wanted to /pre/increment
    "p".  Maybe he/she made a mistake?  Maybe he/she doesn't really
    understand the difference between pre-increment and post-increment?
    It's a common beginners misunderstanding.

    I don't think I know of any language which allows a programmer to say
    /why/ something is the case; that's what comments are for. Programs
    normally talk about /what/ to do, not why. The very fact that the
    assignment does something non-idiomatic is a sign that a comment could
    be useful. It's akin to

      for (i = 0; i <= n ....

    If the test really should be <= then a comment may be useful to explain
    why.

    Ideally there should be no need for a comment, because the code makes it
    clear - for example via the names of the identifiers, or from the rest
    of the context. That rarely happens in out-of-context snippets.



    On the other hand, it is quite clear from the separate lines exactly
    what order the programmer intended.

    What would you say are the differences in side-effects of these two
    code snippets?  (I'm assuming we are talking about C here.)

    That depends on whether the operations are ordered or not. In C they'd
    be different, potentially, from what they would be in my language. What
    would you say they are?


    You said the side-effects are "a little different", so I wanted to hear
    what you meant.

    In C, there is no pre-determined sequencing between the two increments -
    they can occur in any order, or can be interleaved. As far as the C
    abstract machine is concerned (and that's what determines what
    side-effects mean), unsequenced events are not ordered and it doesn't
    make sense to say which happened first. You can consider them as
    happening at the same time - and if that affects the outcome of the
    program, then it is at least unspecified behaviour if not undefined
    behaviour. (It would be undefined behaviour if "p" and "q" referred to
    the same object, for example.)

    So I don't think it really makes sense to say that the order is
    different. If the original "*(++p) = *(q++);" makes sense at all, and
    is defined behaviour, then it's behaviour is not distinguishable from
    within the C language from the expanded version.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Wed Nov 23 16:59:53 2022
    On 21/11/2022 15:01, David Brown wrote:
    On 18/11/2022 21:14, James Harris wrote:
    On 18/11/2022 11:00, David Brown wrote:
    On 15/11/2022 18:32, James Harris wrote:

    ...

    The side effects of even something awkward such as

       *(++p) = *(q++);

    are little different from those of the longer version

       p = p + 1;
       *p = *q;
       q = q + 1;

    The former is clearer, however. That makes it easier to see the
    intent..

    Really?  I have no idea what the programmer's intent was.  "*p++ =
    *q++;" is common enough that the intent is clear there, but from your
    first code I can't see /why/ the programmer wanted to /pre/increment
    "p".  Maybe he/she made a mistake?  Maybe he/she doesn't really
    understand the difference between pre-increment and post-increment?
    It's a common beginners misunderstanding.

    I don't think I know of any language which allows a programmer to say
    /why/ something is the case; that's what comments are for. Programs
    normally talk about /what/ to do, not why. The very fact that the
    assignment does something non-idiomatic is a sign that a comment could
    be useful. It's akin to

       for (i = 0; i <= n ....

    If the test really should be <= then a comment may be useful to
    explain why.

    Ideally there should be no need for a comment, because the code makes it clear - for example via the names of the identifiers, or from the rest
    of the context.  That rarely happens in out-of-context snippets.

    Either way, non-idiomatic code is a flag. And in that it's useful -
    especially if its easy to read.




    On the other hand, it is quite clear from the separate lines exactly
    what order the programmer intended.

    What would you say are the differences in side-effects of these two
    code snippets?  (I'm assuming we are talking about C here.)

    That depends on whether the operations are ordered or not. In C they'd
    be different, potentially, from what they would be in my language.
    What would you say they are?


    You said the side-effects are "a little different", so I wanted to hear
    what you meant.

    I said they were "little different", not "a little different". In other
    words, focus on the main point rather than minutiae such as what could
    happen if the pointers were identical or overlapped, much as you go on
    to mention:


    In C, there is no pre-determined sequencing between the two increments -
    they can occur in any order, or can be interleaved.  As far as the C abstract machine is concerned (and that's what determines what
    side-effects mean), unsequenced events are not ordered and it doesn't
    make sense to say which happened first.  You can consider them as
    happening at the same time - and if that affects the outcome of the
    program, then it is at least unspecified behaviour if not undefined behaviour.  (It would be undefined behaviour if "p" and "q" referred to
    the same object, for example.)

    So I don't think it really makes sense to say that the order is
    different.  If the original "*(++p) = *(q++);" makes sense at all, and
    is defined behaviour, then it's behaviour is not distinguishable from
    within the C language from the expanded version.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Wed Nov 23 19:06:06 2022
    On 23/11/2022 17:59, James Harris wrote:
    On 21/11/2022 15:01, David Brown wrote:
    On 18/11/2022 21:14, James Harris wrote:

    Either way, non-idiomatic code is a flag. And in that it's useful - especially if its easy to read.


    Yes.




    On the other hand, it is quite clear from the separate lines exactly
    what order the programmer intended.

    What would you say are the differences in side-effects of these two
    code snippets?  (I'm assuming we are talking about C here.)

    That depends on whether the operations are ordered or not. In C
    they'd be different, potentially, from what they would be in my
    language. What would you say they are?


    You said the side-effects are "a little different", so I wanted to
    hear what you meant.

    I said they were "little different", not "a little different".

    Ah, my mistake. Still, it implies you think there is /some/ difference.

    In other
    words, focus on the main point rather than minutiae such as what could
    happen if the pointers were identical or overlapped, much as you go on
    to mention:

    OK, so you don't think there is any differences in side-effect other
    than the possible issue I mentioned of undefined behaviour in very
    particular circumstances. That's fine - I just wanted to know if you
    were thinking of something else.

    (Note that the freedom for compilers to re-arrange code from the
    "compact" form to the "expanded" form is one of the reasons why such unsequenced accesses to the same object are undefined behaviour in C.)



    In C, there is no pre-determined sequencing between the two increments
    - they can occur in any order, or can be interleaved.  As far as the C
    abstract machine is concerned (and that's what determines what
    side-effects mean), unsequenced events are not ordered and it doesn't
    make sense to say which happened first.  You can consider them as
    happening at the same time - and if that affects the outcome of the
    program, then it is at least unspecified behaviour if not undefined
    behaviour.  (It would be undefined behaviour if "p" and "q" referred
    to the same object, for example.)

    So I don't think it really makes sense to say that the order is
    different.  If the original "*(++p) = *(q++);" makes sense at all, and
    is defined behaviour, then it's behaviour is not distinguishable from
    within the C language from the expanded version.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Wed Nov 23 18:31:27 2022
    On 20/11/2022 12:28, David Brown wrote:
    On 18/11/2022 20:01, James Harris wrote:
    On 15/11/2022 21:40, David Brown wrote:
    On 15/11/2022 20:09, James Harris wrote:
    On 15/11/2022 17:31, David Brown wrote:

    ...

    You assume /so/ many limitations on what you can do as a language
    designer.  You can do /anything/.  If you want to allow something, >>>>> allow it.  If you want to prohibit it, prohibit it.

    Sorry, but it doesn't work like that.

    Yes, it does.

    No, it does not. Your view of language design is far too simplistic.
    Note, also, that in a few paragraphs you say that you are not the
    language designer whereas I am, but then you go on to try to tell me
    how it works and how it doesn't and, previously, that anything can be
    done. You'd gain by /trying/ it yourself. They you might see that it's
    not as straightforward as you suggest.


    That is a fair point.  But I challenge you to show me where there are
    rules written for language designs.  Explain to me exactly why you are
    not allowed to, say, provide an operator "-" without a corresponding
    operator "+".  Tell me who is banning you from deciding that source code lines must be limited to 40 characters, or that every assignment
    statement shall be preceded by the keyword "please".  I'm not saying any
    of these things are a good idea (though something similar has been done
    in other cases), I am saying it is /your/ choice to do that or not.


    You can say "I can't have feature A and feature B and maintain the consistency I want."  You /cannot/ say "I can't have feature A".  It is /your/ decision not have feature A.  Choosing to have it may mean
    changing or removing feature B, or losing some consistency that you had
    hoped to maintain.  But it is your language, your choices, your responsibility - saying "I can't do that" is abdicating that
    responsibility.

    Well, your comments have let me know what you mean, at least, but when I
    say "it doesn't work like that" I mean that language design is not as
    simple as you suggest. In absolute terms I agree with you: you are right
    that a designer can make any decisions he wants. But in reality certain
    things are /infeasible/. You might as well say you could get from your
    house to the nearest supermarket by flying to another country first. In absolute terms you probably could do that and eventually get where you
    want to go but in reality it's so absurd a suggestion that it's infeasible.



    A language cannot be built on ad-hoc choices such as you have
    suggested.


    It most certainly can.  Every language is a collection of design
    decisions, and most of them are at least somewhat ad-hoc.

    However, my suggestions where certainly /not/ ad-hoc

    Hmm, you suggested banning side effects, except in function calls, and
    banning successive prefix "+" operators. Those suggestions seem rather
    ad hoc to me.

    - it was for a
    particular way of thinking about operators and expressions, with justification and an explanation of the benefits.  Whether you choose to follow those suggestions or not, is a matter of your personal choices
    for how you want your language to work - and /that/ choice is therefore somewhat ad-hoc.  They only appear ad-hoc if you don't understand what I wrote justifying them or giving their advantages.

    True, if there is a legitimate and useful reason for a rule then that
    rule will seem less ad hoc than if the reasons for it are unknown.


    Of course you want a language to follow a certain theme or style (or
    "ethos", as you called it).  But that does not mean you can't make
    ad-hoc decisions if you want - it is inevitable that you will do so. And
    it certainly does not mean you can't make the choices you want for your language.

    Too many ad-hoc choices mean you loose the logic and consistency in the language.  Too few, and your language has nothing to it.  Excessive consistency is great for some theoretical work  - Turing machines,
    lambda calculus, infinite register machines, and the like.  It is
    useless in a real language.

    Look at C as an example.  Not everyone likes the language, and the only people who find nothing to dislike in it are people to haven't used it enough.  But it is undoubtedly a highly successful language.  All binary operators require the evaluation of both operands before evaluating the operator.  (And before you start thinking that is unavoidable, it is
    not, and does not apply to all languages.)  Except && and ||, where the second operand is not evaluated if it is not needed - that's an ad-hoc decision, different from the general rule.  All access to objects must
    be through lvalues of compatible types - except for the ad-hoc rule that character type pointers can also be used.

    To be successful at anything - program language design or anything else
    - you always need to aim for a balance.  Consistency is vital - too much consistency is bad.  Generalisation is good - over-generalisation is
    bad.  Too much ad-hoc is bad, so is too little.

    Fair enough. Short-circuit evaluation is a good example of what you have
    been saying, although it effects a semantic change. By contrast, banning
    prefix "+" operators because you don't like them does not effect any
    useful change in the semantics of a program.



    I haven't suggested ad-hoc choices.  I have tried to make reasoned
    suggestions.  Being different from languages you have used before, or
    how you envision your new language, does not make them ad-hoc.

    Saying you'd like selected combinations of operators to be banned
    looks like an ad-hoc approach to me.


    Then you misunderstand what I wrote.  I don't know if that was my fault
    in poor explanations, or your fault in misreading or misunderstanding -
    no doubt, it was a combination.

    Maybe. I thought you wanted ++E++ banned because it had successive ++
    operators but perhaps I misunderstood. Was what you actually wanted
    banned /any/ use of ++ operators? If the language /is/ to have ++
    operators after all, though, would you still want ++E++ banned?

    ...

    Imagine if you were to stop treating "letters", "digits" and
    "punctuation" separately, and say "They are all just characters.
    Let's treat them the same".  Now people can name a function "123", or
    "2+2". It's conceivable that you'd work out a grammar and parsing
    rules that allow that (Forth, for example, has no problem with
    functions that are named by digits.  You can redefine "2" to mean "1"
    if you like).  Do you think that would make the language easier to
    learn and less awkward to use?

    Certainly not. Why do you ask?

    I ask, because it is an example of over-generalisation that makes a
    language harder to learn and potentially a lot more confusing to
    understand.

    I don't see any lack of generalisation in setting out rules for
    identifier names.

    ...

    [Snipped a bunch of points on which we agree.]


    Further, remember that the decisions the language designer makes have
    to be communicated to the programmer. If a designer says "these side
    effects are allowed but these other ones are not" then that just gives
    the programmer more to learn and remember.


    Sure.  But programmers are not stupid (or at least, you are not catering
    for stupid programmers).  They can learn more than one rule.

    You are rather changing your tune, there. Earlier you were concerned
    about programmers failing to understand the difference between
    pre-increment and post-increment!


    As I say, you could try designing a language. You are a smart guy. You
    could work on a design in your head while walking to the shops, while
    waiting for a train, etc. As one of my books on language design says,
    "design repeatedly: it will make you a better designer".


    Oh, I have plenty of ideas for a language - I have no end to the number
    of languages, OS's, processors, and whatever that I have "designed" in
    my head :-)  The devil's in the details, however, and I haven't taken
    the time for that!

    Yes, the devil is indeed in the details. It's one thing to have some
    good ideas. It's quite another to bring them together into a single product.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Wed Nov 23 18:50:50 2022
    On 23/11/2022 18:06, David Brown wrote:
    On 23/11/2022 17:59, James Harris wrote:
    On 21/11/2022 15:01, David Brown wrote:
    On 18/11/2022 21:14, James Harris wrote:


    Previously ===>


    The side effects of even something awkward such as

    *(++p) = *(q++);

    are little different from those of the longer version

    p = p + 1;
    *p = *q;
    q = q + 1;

    The former is clearer, however. That makes it easier to see the intent..


    What would you say are the differences in side-effects of these two
    code snippets?  (I'm assuming we are talking about C here.)

    That depends on whether the operations are ordered or not. In C
    they'd be different, potentially, from what they would be in my
    language. What would you say they are?


    You said the side-effects are "a little different", so I wanted to
    hear what you meant.

    I said they were "little different", not "a little different".

    Ah, my mistake.  Still, it implies you think there is /some/ difference.

    I thought there was the /potential/ for a difference (and I suspect that
    there is in C) but that that would distract from the point being made.

    The point remains: I was saying that the former is significantly clearer
    (as long as its effects are defined).


    In other words, focus on the main point rather than minutiae such as
    what could happen if the pointers were identical or overlapped, much
    as you go on to mention:

    OK, so you don't think there is any differences in side-effect other
    than the possible issue I mentioned of undefined behaviour in very
    particular circumstances.  That's fine - I just wanted to know if you
    were thinking of something else.

    (Note that the freedom for compilers to re-arrange code from the
    "compact" form to the "expanded" form is one of the reasons why such unsequenced accesses to the same object are undefined behaviour in C.)

    Understood.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Wed Nov 23 21:33:53 2022
    On 23/11/2022 19:31, James Harris wrote:
    On 20/11/2022 12:28, David Brown wrote:
    On 18/11/2022 20:01, James Harris wrote:
    On 15/11/2022 21:40, David Brown wrote:


    Well, your comments have let me know what you mean, at least, but when I
    say "it doesn't work like that" I mean that language design is not as
    simple as you suggest. In absolute terms I agree with you: you are right
    that a designer can make any decisions he wants. But in reality certain things are /infeasible/. You might as well say you could get from your
    house to the nearest supermarket by flying to another country first. In absolute terms you probably could do that and eventually get where you
    want to go but in reality it's so absurd a suggestion that it's infeasible.


    Sure. Not all suggestions are good in practice, and not all things that
    are possible are easy or a good trade-off. I am merely saying that the decisions are yours to make, even if you feel there is only one sane way
    to pick. You are still free to make the hard choice even if that means
    big knock-on effects.



    A language cannot be built on ad-hoc choices such as you have
    suggested.


    It most certainly can.  Every language is a collection of design
    decisions, and most of them are at least somewhat ad-hoc.

    However, my suggestions where certainly /not/ ad-hoc

    Hmm, you suggested banning side effects, except in function calls, and banning successive prefix "+" operators. Those suggestions seem rather
    ad hoc to me.


    They are not necessarily all /good/ suggestions! My point was merely
    that if you don't want people to be able to write +(+(+(+x))) in your
    language, you have the power to ban them if you want.

    - it was for a particular way of thinking about operators and
    expressions, with justification and an explanation of the benefits.
    Whether you choose to follow those suggestions or not, is a matter of
    your personal choices for how you want your language to work - and
    /that/ choice is therefore somewhat ad-hoc.  They only appear ad-hoc
    if you don't understand what I wrote justifying them or giving their
    advantages.

    True, if there is a legitimate and useful reason for a rule then that
    rule will seem less ad hoc than if the reasons for it are unknown.


    Indeed. I am not recommending a chaotic language!


    Of course you want a language to follow a certain theme or style (or
    "ethos", as you called it).  But that does not mean you can't make
    ad-hoc decisions if you want - it is inevitable that you will do so.
    And it certainly does not mean you can't make the choices you want for
    your language.

    Too many ad-hoc choices mean you loose the logic and consistency in
    the language.  Too few, and your language has nothing to it.
    Excessive consistency is great for some theoretical work  - Turing
    machines, lambda calculus, infinite register machines, and the like.
    It is useless in a real language.

    Look at C as an example.  Not everyone likes the language, and the
    only people who find nothing to dislike in it are people to haven't
    used it enough.  But it is undoubtedly a highly successful language.
    All binary operators require the evaluation of both operands before
    evaluating the operator.  (And before you start thinking that is
    unavoidable, it is not, and does not apply to all languages.)  Except
    && and ||, where the second operand is not evaluated if it is not
    needed - that's an ad-hoc decision, different from the general rule.
    All access to objects must be through lvalues of compatible types -
    except for the ad-hoc rule that character type pointers can also be used.

    To be successful at anything - program language design or anything
    else - you always need to aim for a balance.  Consistency is vital -
    too much consistency is bad.  Generalisation is good -
    over-generalisation is bad.  Too much ad-hoc is bad, so is too little.

    Fair enough. Short-circuit evaluation is a good example of what you have
    been saying, although it effects a semantic change. By contrast, banning prefix "+" operators because you don't like them does not effect any
    useful change in the semantics of a program.



    I haven't suggested ad-hoc choices.  I have tried to make reasoned
    suggestions.  Being different from languages you have used before,
    or how you envision your new language, does not make them ad-hoc.

    Saying you'd like selected combinations of operators to be banned
    looks like an ad-hoc approach to me.


    Then you misunderstand what I wrote.  I don't know if that was my
    fault in poor explanations, or your fault in misreading or
    misunderstanding - no doubt, it was a combination.

    Maybe. I thought you wanted ++E++ banned because it had successive ++ operators but perhaps I misunderstood. Was what you actually wanted
    banned /any/ use of ++ operators? If the language /is/ to have ++
    operators after all, though, would you still want ++E++ banned?


    I was suggesting banning any use of pre- and post- increment and
    decrement operators. They are unnecessary in a language, and (along
    with assignment operators that return values, rather than being strictly statements) they are a way of having side-effects in the middle of
    expressions that otherwise look like calculations or reading data.

    Unless you are aiming for a pure functional language, "side-effects" are necessary - it's how you get things done in the code. But IMHO they
    should be as clear as possible, not hidden away as extras. Changes to
    any object should be the main purpose of a statement or function call,
    rather than a little extra feature.

    Remember, the fewer places you can have side-effects - changes to an
    object, or IO functionality - the more freedom the compiler has to
    manipulate and optimise the code, the clearer the code is to the reader,
    the safer it is from accidentally changing things, the easier it is to
    be sure the code is correct, and the more the code can be thread-safe, re-entrant or run in parallel. Make every object immutable unless the programmer goes out of their way to insist that it is mutable - you can
    do so much more with it if its value cannot change!



    Imagine if you were to stop treating "letters", "digits" and
    "punctuation" separately, and say "They are all just characters.
    Let's treat them the same".  Now people can name a function "123",
    or "2+2". It's conceivable that you'd work out a grammar and parsing
    rules that allow that (Forth, for example, has no problem with
    functions that are named by digits.  You can redefine "2" to mean
    "1" if you like).  Do you think that would make the language easier
    to learn and less awkward to use?

    Certainly not. Why do you ask?

    I ask, because it is an example of over-generalisation that makes a
    language harder to learn and potentially a lot more confusing to
    understand.

    I don't see any lack of generalisation in setting out rules for
    identifier names.


    You can give nice general rules for an identifier - you can say
    identifiers must start with a letter, and consist of letters, digits and underscore characters. (That's a common choice for many languages, but
    not the only choice.) If you /over-generalise/, you might allow
    identifiers consisting solely of digits. And that can lead to allowing confusing code - such as this example from Forth :

    $ gforth
    Gforth 0.7.3, Copyright (C) 1995-2008 Free Software Foundation, Inc.
    Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
    Type `bye' to exit
    ok
    2 2 + . 4 ok
    : 2 3 ; ok
    2 2 + . 6 ok

    Forth is so general and free that you can redefine the meaning of "2"
    (or pretty much anything else). This is not a good idea.


    [Snipped a bunch of points on which we agree.]


    Further, remember that the decisions the language designer makes have
    to be communicated to the programmer. If a designer says "these side
    effects are allowed but these other ones are not" then that just
    gives the programmer more to learn and remember.


    Sure.  But programmers are not stupid (or at least, you are not
    catering for stupid programmers).  They can learn more than one rule.

    You are rather changing your tune, there. Earlier you were concerned
    about programmers failing to understand the difference between
    pre-increment and post-increment!


    Sometimes smart programmers get mixed up too - especially when trying to
    read code that is symbol-heavy and uses code that appears to be a common
    idiom, but is subtly different.


    As I say, you could try designing a language. You are a smart guy.
    You could work on a design in your head while walking to the shops,
    while waiting for a train, etc. As one of my books on language design
    says, "design repeatedly: it will make you a better designer".


    Oh, I have plenty of ideas for a language - I have no end to the
    number of languages, OS's, processors, and whatever that I have
    "designed" in my head :-)  The devil's in the details, however, and I
    haven't taken the time for that!

    Yes, the devil is indeed in the details. It's one thing to have some
    good ideas. It's quite another to bring them together into a single
    product.


    And that's assuming you can figure out which ideas are good!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)