Forum: >>> Magnum BBS <<<

Dereference relative to increment and decrement operators ++ --

From James Harris@21:1/5 to All on Mon Nov 7 11:55:41 2022

A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.

The basic question is: If ^ is a postfix dereference operator then what
should be the relative precedences of the following (where E is any subexpression)?

++E
E++
E^

(The same goes for -- but to make description easier I'll mention only ++.)

Taking a step back and considering general expression evaluation I have,
so far, been defining the apparent order. And I'd like to continue with
that. So it should be possible to combine multiple ++ operators
arbitrarily. For example,

++E + E++
++E++
V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something a programmer can rely on.

[[

As an aside, one thing I should point out is that while both pre-and post-increment require an lvalue it is easy for prefix ++ to also result
in an lvalue whereas postfix ++ more naturally produces an rvalue.
Prefix ++ can be translated to

increment the value at a certain address
use that /address/

By contrast, postfix ++ more naturally translates to

load into a register the /value/ at a certain address
increment the value left at that address

After postfix ++ the address may not be so usable because its value has
already been changed and yet the code said to increment it /after/ the operation (for some definition of 'after').

At any rate, that distinction between prefix and postfix ++ seems to be recognised at the following link where it says "Prefix versions of the
built-in operators return references and postfix versions return values."

https://en.cppreference.com/w/cpp/language/operator_incdec

]]

Setting that aside aside ... and going back to the query, what should be
the relative precedences of the three operators? For example, how should
the following be evaluated?

++E++^
++E^++

Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!

I guess it boils down to what's most convenient and comprehensible for a programmer but I don't know if there is a clear answer. What do you guys
think?

I've been scratching my head over this for a while so other opinions
would be most welcome!

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 7 12:52:23 2022

On 07/11/2022 12:22, Dmitry A. Kazakov wrote:

On 2022-11-07 12:55, James Harris wrote:

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.

One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.

That's all very well but what specifically would you prohibit? While
doing so be careful not to prohibit something that programmers have a legitimate reason to want.

Perhaps another lesson from the PL/1 era was to avoid arbitrary rules.
And yet an arbitrary prohibition of something can be such a rule.

:-)

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 7 13:22:41 2022

On 2022-11-07 12:55, James Harris wrote:

++E + E++
++E++
V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something a programmer can rely on.

One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 7 14:43:46 2022

On 2022-11-07 13:52, James Harris wrote:

On 07/11/2022 12:22, Dmitry A. Kazakov wrote:

On 2022-11-07 12:55, James Harris wrote:

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.

One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.

That's all very well but what specifically would you prohibit?

Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 7 15:06:54 2022

On 07/11/2022 13:43, Dmitry A. Kazakov wrote:

On 2022-11-07 13:52, James Harris wrote:

On 07/11/2022 12:22, Dmitry A. Kazakov wrote:

On 2022-11-07 12:55, James Harris wrote:

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.

One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.

That's all very well but what specifically would you prohibit?

Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))

+ usually means numeric addition

= here presumably means an assignment (and from right to left)

++ can also be assumed to mean in-place increment. Specifically:

++E is equivalent to: (E := E + 1; E)
E++ is equivalent to: (T := E; E := E + 1; T)

(When E can be harmlessly evaluated more than once; otherwise an extra temporary reference would need to be used.)

But I'm sure you know this already.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Mon Nov 7 15:58:27 2022

On 07/11/2022 12:55, James Harris wrote:

A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.

The basic question is: If ^ is a postfix dereference operator then what should be the relative precedences of the following (where E is any subexpression)?

++E
E++
E^

(The same goes for -- but to make description easier I'll mention only ++.)

Taking a step back and considering general expression evaluation I have,
so far, been defining the apparent order. And I'd like to continue with
that. So it should be possible to combine multiple ++ operators
arbitrarily. For example,

++E + E++
++E++
V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something a programmer can rely on.

I disagree entirely - unless you include giving an error message saying
the programmer should be fired for writing gibberish as "well defined
and something you can rely on". I can appreciate not wanting such
things to be run-time undefined behaviour, but there is no reason at all
to insist that it is acceptable by the compiler.

Setting that aside aside ... and going back to the query, what should be
the relative precedences of the three operators? For example, how should
the following be evaluated?

++E++^
++E^++

Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!

Make it a syntax error. You are not trying to implement <https://en.wikipedia.org/wiki/Brainfuck>, and are not under any
obligation to support people who want to code like that. On the other
hand, you /do/ have an obligation to try to catch mistakes, typos, and accidental errors in code.

I guess it boils down to what's most convenient and comprehensible for a programmer but I don't know if there is a clear answer. What do you guys think?

I've been scratching my head over this for a while so other opinions
would be most welcome!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Mon Nov 7 14:23:33 2022

On 07/11/2022 11:55, James Harris wrote:

A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.

The basic question is: If ^ is a postfix dereference operator then what should be the relative precedences of the following (where E is any subexpression)?

++E
E++
E^

(The same goes for -- but to make description easier I'll mention only ++.)

For unary operators, the evaluation order is rather peculiar yet seems
to be used in quite a few languages without anyone questioning it. So if
`a b c d` are unary operators, then the following:

a b E c d

is evaluated like this:

a (b ((E c) d))

That is, first all the post-fix operators in left-to-right order, then
all the prefix ones in right-left order. It sounds bizarre when put like
that!

Taking a step back and considering general expression evaluation I have,
so far, been defining the apparent order. And I'd like to continue with
that. So it should be possible to combine multiple ++ operators
arbitrarily. For example,

++E + E++

This is well defined, as unary operators bind more tightly than binary
ones. This is just (++E) + (++E).

However the evaluation order for '+' is not usually well-defined, so you
don't know which operand will be done first.

++E++

This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++). But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.

V = V++

This one doesn't have any problems, but is probably not useful: you're modifying V then replacing its value anyway, and with its original
value. That new V+1 value is discarded.

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something a programmer can rely on.

[[

As an aside, one thing I should point out is that while both pre-and post-increment require an lvalue it is easy for prefix ++ to also result
in an lvalue whereas postfix ++ more naturally produces an rvalue.
Prefix ++ can be translated to

increment the value at a certain address
use that /address/

By contrast, postfix ++ more naturally translates to

load into a register the /value/ at a certain address
increment the value left at that address

After postfix ++ the address may not be so usable because its value has already been changed and yet the code said to increment it /after/ the operation (for some definition of 'after').

At any rate, that distinction between prefix and postfix ++ seems to be recognised at the following link where it says "Prefix versions of the built-in operators return references and postfix versions return values."

https://en.cppreference.com/w/cpp/language/operator_incdec

I tried to get ++E++ to work using a suitable type for E, but in my
language it cannot work, as the first ++ still needs an lvalue; just an
rvalue which has a pointer type won't cut it.

However ++E++^ can work, where ^ is deref, and E is a pointer.

I think this is because in my language, for something to be a valid
lvalue, you need to be able to apply & address-of to it. The result of
E++ doesn't have an address. But (E++)^ works because & and ^ cancel
out. Or something...

Setting that aside aside ... and going back to the query, what should be
the relative precedences of the three operators? For example, how should
the following be evaluated?

++E++^
++E^++

Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!

You have the same issues in C, but that's OK because people are so
familiar with it. Also * deref is a prefix operator so you never have
two distinct postfix operators, unless you write E++ --.

But yes, parentheses are recommended when mixing certain prefix/postfix
ops. I think this one is clear enough however:

-E^

Deference E then negate the result. As is this: -E[i]; you wouldn't
assume that meant (-E)[i].

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Mon Nov 7 15:23:11 2022

On 07/11/2022 14:58, David Brown wrote:

On 07/11/2022 12:55, James Harris wrote:

A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.

The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?

   ++E
   E++
   E^

(The same goes for -- but to make description easier I'll mention only
++.)

Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.

I disagree entirely - unless you include giving an error message saying
the programmer should be fired for writing gibberish as "well defined
and something you can rely on". I can appreciate not wanting such
things to be run-time undefined behaviour, but there is no reason at all
to insist that it is acceptable by the compiler.

gcc accepts this C code (when E, V are both ints):

++E + E++;
V = V++;

It won't accept ++E++ because the first ++ expects an lvalue. Probably
the same will happen when you try and implement it elsewhere. So no
actual need to prohibit in the language - it just won't work.

Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?

   ++E++^
   ++E^++

Make it a syntax error.

The equivalent in C syntax for the first is:

++*(P++);

This compiles fine when P has type int* for example. It means this:

- Increment the pointe P
- Increment the location that P now points to (using the * deref op)

So no reason to prohibit anything; it is perfectly well-defined. The
first example is equivalent to:

++((*P)++);

This won't work for the same reason as above. This is hard to prohibit
via grammar rules, but it it not necessary as it fails on type-checking.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Mon Nov 7 15:43:20 2022

On 07/11/2022 15:18, David Brown wrote:

On 07/11/2022 15:23, Bart wrote:

V = V++

This one doesn't have any problems, but is probably not useful: you're
modifying V then replacing its value anyway, and with its original
value. That new V+1 value is discarded.

In C, it has /big/ problems - the side-effects on V are not sequenced,
so the expression is undefined behaviour. Other languages may differ - you'd have to read the specifications or standards for those languages.

I'd suggest that in C it would be a compiler problem. For example if it
did the assignment, and then decided to increment V.

To me that would be bizarre: I'd expect to evaluate the RHS as a single
term (V++), including any side-effects entailed, before writing the
resulting value (the old value of V) into V.

But in general you're right: I'm not keen on multiple things being
changed inside one expression. I tolerate ++ and -- (and chained
assignment) because they are so handy. But I don't allow augmented
assignments inside an expression as C does.

I think this is because in my language, for something to be a valid
lvalue, you need to be able to apply & address-of to it. The result of
E++ doesn't have an address. But (E++)^ works because & and ^ cancel
out. Or something...

It is a bad sign for a language when even the language author,
implementer, and experienced user is not sure how it works. As long as
the language is only ever meant to be for a single person, you can get
away with saying "I wouldn't write that, so it doesn't matter what it means". But if the OP has hopes that more than one person will ever see
his language, it should be specified well enough that these things are written down.

The 'or something' refers to the mechanism within my compiler which
determines what is a legal lvalue. I'd have to study 3700 lines of code
to discover exactly how it worked.

But it should be obvious (now that I've thought about it!) that a term
of the form X^, which is all that `E++^` is, should be a legal lvalue as
it can be used on either side of an assignment:

X^ := X^

(Although no doubt C will make that UB because that's what it likes to do.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Mon Nov 7 16:18:19 2022

On 07/11/2022 15:23, Bart wrote:

On 07/11/2022 11:55, James Harris wrote:

A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.

The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?

   ++E
   E++
   E^

(The same goes for -- but to make description easier I'll mention only
++.)

For unary operators, the evaluation order is rather peculiar yet seems
to be used in quite a few languages without anyone questioning it.

People /do/ question it. Knowing the order for a given language, and
using the operators, does not imply liking them or not questioning them.

And note that "operator precedence" is about parsing the expression - it
is /not/ the same as "order of evaluation". A language specification
should be clear on this.

So if
`a b c d` are unary operators, then the following:

   a b E c d

is evaluated like this:

      a (b ((E c) d))

That is, first all the post-fix operators in left-to-right order, then
all the prefix ones in right-left order. It sounds bizarre when put like that!

Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,

   ++E + E++

This is well defined, as unary operators bind more tightly than binary
ones. This is just (++E) + (++E).

It is not remotely "well defined" in C, but it might be well defined in
/your/ language. The /precedence/ of the operators and the parsing of
the expression is well defined in C, but its /behaviour/ is undefined as
the order of evaluation is unspecified so the side-effects are unsequenced.

However the evaluation order for '+' is not usually well-defined, so you don't know which operand will be done first.

   ++E++

This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++). But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.

   V = V++

This one doesn't have any problems, but is probably not useful: you're modifying V then replacing its value anyway, and with its original
value. That new V+1 value is discarded.

In C, it has /big/ problems - the side-effects on V are not sequenced,
so the expression is undefined behaviour. Other languages may differ -
you'd have to read the specifications or standards for those languages.

   https://en.cppreference.com/w/cpp/language/operator_incdec

I tried to get ++E++ to work using a suitable type for E, but in my
language it cannot work, as the first ++ still needs an lvalue; just an rvalue which has a pointer type won't cut it.

However ++E++^ can work, where ^ is deref, and E is a pointer.

I think this is because in my language, for something to be a valid
lvalue, you need to be able to apply & address-of to it. The result of
E++ doesn't have an address. But (E++)^ works because & and ^ cancel
out. Or something...

It is a bad sign for a language when even the language author,
implementer, and experienced user is not sure how it works. As long as
the language is only ever meant to be for a single person, you can get
away with saying "I wouldn't write that, so it doesn't matter what it
means". But if the OP has hopes that more than one person will ever see
his language, it should be specified well enough that these things are
written down.

Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?

   ++E++^
   ++E^++

Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!

You have the same issues in C, but that's OK because people are so
familiar with it. Also * deref is a prefix operator so you never have
two distinct postfix operators, unless you write E++ --.

But yes, parentheses are recommended when mixing certain prefix/postfix
ops. I think this one is clear enough however:

   -E^

Deference E then negate the result. As is this: -E[i]; you wouldn't
assume that meant (-E)[i].

C was fixed and unchangeable long ago (at least for such fundamental
things). A new language can be made better. If you think "parentheses
are recommend here", change it to "parentheses are /required/ here". If
you think "++E++" is confusing or questionable, make it a hard
compile-time error.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Mon Nov 7 17:43:15 2022

On 07/11/2022 16:06, Bart wrote:

On 07/11/2022 13:43, Dmitry A. Kazakov wrote:

On 2022-11-07 13:52, James Harris wrote:

On 07/11/2022 12:22, Dmitry A. Kazakov wrote:

On 2022-11-07 12:55, James Harris wrote:

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.

One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.

That's all very well but what specifically would you prohibit?

Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))

+ usually means numeric addition

= here presumably means an assignment (and from right to left)

++ can also be assumed to mean in-place increment. Specifically:

++E is equivalent to: (E := E + 1; E)
E++ is equivalent to: (T := E; E := E + 1; T)

(When E can be harmlessly evaluated more than once; otherwise an extra temporary reference would need to be used.)

In C, if there are side-effects from evaluating E then you will have
either undefined behaviour or at least implementation-dependent
behaviour, depending on the exact expression.

But I'm sure you know this already.

What you have given are the interpretations for C and similar languages, operating on arithmetic operands. Other languages may have different
meanings for the symbols. Even if the OP's language gives the same
meaning to the operators for integers, it might mean something different
for other types - including the possibility of operator overloads for
user types.

It might make sense for the language to define precedence order and
other details for the operators independent of the semantics. Or it
might make sense to have them depend on the types and the semantics -
languages differ in how they work.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Mon Nov 7 17:34:02 2022

On 07/11/2022 16:23, Bart wrote:

On 07/11/2022 14:58, David Brown wrote:

On 07/11/2022 12:55, James Harris wrote:

A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.

The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?

   ++E
   E++
   E^

(The same goes for -- but to make description easier I'll mention
only ++.)

Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.

I disagree entirely - unless you include giving an error message
saying the programmer should be fired for writing gibberish as "well
defined and something you can rely on". I can appreciate not wanting
such things to be run-time undefined behaviour, but there is no reason
at all to insist that it is acceptable by the compiler.

gcc accepts this C code (when E, V are both ints):

    ++E + E++;
    V = V++;

That's like saying that you can hit a screw with a hammer. Use the tool properly, and you will see the complaints. gcc is a C compiler, not
some kind of "official" guide to the language, and everyone knows that
without flags it is far too accepting of code that has undefined
behaviour or is otherwise clearly wrong even in cases that can be
spotted easily. With even basic warning flags enabled, these are marked.

(You've had this explained to you a few hundred times over the last
decade or so. I know you get some kind of perverse pleasure out of find
any way of making C and/or gcc look bad in your own eyes, but would you /please/ stop being such a petty child and stop writing things
deliberately intended to confuse, mislead or annoy others?)

For those that want to know the details, the most "official" C reference
site shows similar expressions as examples of undefined behaviour:

<https://en.cppreference.com/w/c/language/eval_order>

And the C standards give some related examples in section 6.5.

It won't accept ++E++ because the first ++ expects an lvalue. Probably
the same will happen when you try and implement it elsewhere. So no
actual need to prohibit in the language - it just won't work.

That makes /no/ sense. If by "it just won't work" you mean the compiler
won't accept it, then it is prohibited by the language - or your
compiler fails to implement the language. If you mean the compiler
accepts it but it "just won't work" at run-time, then that is exactly
what we want to avoid.

Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?

   ++E++^
   ++E^++

Make it a syntax error.

The equivalent in C syntax for the first is:

    ++*(P++);

This compiles fine when P has type int* for example. It means this:

- Increment the pointe P
- Increment the location that P now points to (using the * deref op)

So no reason to prohibit anything; it is perfectly well-defined.

There is good reason to prohibit it - you got it wrong, so despite being well-defined by the language, it is not clear code.

The actual meaning of "++*(P++);" is :

1. Remember the original value of P - call it P_orig
2. Increment P (that is, add sizeof(*P) to it).
3. Increment the int at the location pointed to by P_orig.
4. The value of the expression is the new updated value pointed to by P_orig.

No specific ordering of the two increments is implied here - they can be
done in either order. The compiler can assume that P and *P do not
overlap (something that could only happen using a union) - if they do,
the behaviour is undefined.

(Note that "++*(P++)" is the same as "++*P++" in C, but the extra
parentheses make it slightly less unclear.)

The
first example is equivalent to:

    ++((*P)++);

This won't work for the same reason as above. This is hard to prohibit
via grammar rules, but it it not necessary as it fails on type-checking.

In C, prohibitions against such code come from "constraints", which are
not part of the BNF grammar rules, but come before any kind of type
checking. Whether an expression is an "rvalue", a "modifible lvalue",
"a non-modifiable lvalue", or other classification, is not part of the
type system.

Other languages may handle this sort of thing differently - I can only
say what C does here. I see no fundamental reason why it cannot be
considered part of the grammar rules, but it might need a more advanced
grammar than C has.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Mon Nov 7 17:56:37 2022

On 07/11/2022 16:43, Bart wrote:

On 07/11/2022 15:18, David Brown wrote:

On 07/11/2022 15:23, Bart wrote:

V = V++

This one doesn't have any problems, but is probably not useful:
you're modifying V then replacing its value anyway, and with its
original value. That new V+1 value is discarded.

In C, it has /big/ problems - the side-effects on V are not sequenced,
so the expression is undefined behaviour. Other languages may differ
- you'd have to read the specifications or standards for those languages.

I'd suggest that in C it would be a compiler problem. For example if it
did the assignment, and then decided to increment V.

It is not a compiler problem - it is undefined behaviour in the
language, and if someone writes that code and tries to compile it as C,
they can have no reasonable expectation of any particular behaviour.
The compiler /may/ increment V, it may not, it may reject the code with
a compiler error (if it can prove beyond doubt that the code would
actually be run - you are allowed to put code with undefined runtime
behaviour in code that is never run). Other behaviour would, I think,
be so surprising to a programmer that it would be considered a poor implementation, even though the compiler might still be conforming.

To me that would be bizarre: I'd expect to evaluate the RHS as a single
term (V++), including any side-effects entailed, before writing the
resulting value (the old value of V) into V.

The C language does not have a sequence point on assignment. So if you
write "x = y++;", there is no sequencing between writing to "x" or
writing the incremented value to "y". (The value of "y + 1", and the
address of "x", must be evaluated before the assignment, obviously - but
these are also not sequenced with regard to each other.) When you write
"x = y++;", that is convenient - the compiler can generate the code in
whatever order is most efficient. But if "x" and "y" refer to the same
thing, you have two unsequenced side-effects to the same objects - that
is clearly undefined behaviour.

A language could certainly have a sequence point on assignment. But C
does not do so.

But in general you're right: I'm not keen on multiple things being
changed inside one expression. I tolerate ++ and -- (and chained
assignment) because they are so handy. But I don't allow augmented assignments inside an expression as C does.

C allows multiple things to be changed in one expression - but it does
not allow the /same/ thing to be changed multiple times without sequencing.

(I too generally prefer to change only one thing at a time in an
expression, regardless of what the language may allow.)

I think this is because in my language, for something to be a valid
lvalue, you need to be able to apply & address-of to it. The result
of E++ doesn't have an address. But (E++)^ works because & and ^
cancel out. Or something...

It is a bad sign for a language when even the language author,
implementer, and experienced user is not sure how it works. As long
as the language is only ever meant to be for a single person, you can
get away with saying "I wouldn't write that, so it doesn't matter what
it means". But if the OP has hopes that more than one person will
ever see his language, it should be specified well enough that these
things are written down.

The 'or something' refers to the mechanism within my compiler which determines what is a legal lvalue. I'd have to study 3700 lines of code
to discover exactly how it worked.

But it should be obvious (now that I've thought about it!) that a term
of the form X^, which is all that `E++^` is, should be a legal lvalue as
it can be used on either side of an assignment:

X^ := X^

(Although no doubt C will make that UB because that's what it likes to do.)

"*p = *p" is fine and fully defined in C (unless you have a pointer to volatile, when it will be implementation dependent).

"*p++ = *p++" is a different matter entirely.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Bart on Mon Nov 7 17:16:56 2022

On 2022-11-07 16:06, Bart wrote:

On 07/11/2022 13:43, Dmitry A. Kazakov wrote:

On 2022-11-07 13:52, James Harris wrote:

On 07/11/2022 12:22, Dmitry A. Kazakov wrote:

On 2022-11-07 12:55, James Harris wrote:

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.

One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.

That's all very well but what specifically would you prohibit?

Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))

+ usually means numeric addition

= here presumably means an assignment (and from right to left)

= means equality.

++ can also be assumed to mean in-place increment. Specifically:

++ means cheap keyboard with broken keys or coffee spilled over it... (:-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 7 17:26:57 2022

On 07/11/2022 13:43, Dmitry A. Kazakov wrote:

On 2022-11-07 13:52, James Harris wrote:

On 07/11/2022 12:22, Dmitry A. Kazakov wrote:

On 2022-11-07 12:55, James Harris wrote:

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.

One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.

That's all very well but what specifically would you prohibit?

Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))

It's easy to dislike a certain sequence of operators. It's harder to
define rules for their prohibition.

A programmer has freedom to put in any sequence of operators which
comply with the syntax and semantics of the language. A language
designer, by contrast, if he wants to add a rule to prohibit certain permutations has (a) to define what permutations are ruled out and (b)
think of the consequences of such a rule on other expressions which may
look much more legitimate.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 7 18:53:53 2022

On 2022-11-07 18:26, James Harris wrote:

On 07/11/2022 13:43, Dmitry A. Kazakov wrote:

On 2022-11-07 13:52, James Harris wrote:

On 07/11/2022 12:22, Dmitry A. Kazakov wrote:

On 2022-11-07 12:55, James Harris wrote:

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.

One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.

That's all very well but what specifically would you prohibit?

Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))

It's easy to dislike a certain sequence of operators. It's harder to
define rules for their prohibition.

1. Reduce number of precedence level to logical, additive,
multiplicative, highest order.

2. Require parenthesis for mixed operations at the same level (except
for * and /)

3. No side effects of operators.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Mon Nov 7 17:32:36 2022

On 07/11/2022 16:43, David Brown wrote:

On 07/11/2022 16:06, Bart wrote:

..

But I'm sure you know this already.

What you have given are the interpretations for C and similar languages, operating on arithmetic operands. Other languages may have different meanings for the symbols. Even if the OP's language gives the same
meaning to the operators for integers, it might mean something different
for other types - including the possibility of operator overloads for
user types.

If precedences were to vary with operand types then expressions would be
vary hard for programmers to read so IMO it's important for program
readability that precedences go with the operators and that they are independent of the types of the operands. If a programmer didn't know
what order

a + b * c

would be evaluated in until he looked up the types then even simple
programs would be very confusing.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Mon Nov 7 18:16:58 2022

On 07/11/2022 14:23, Bart wrote:

On 07/11/2022 11:55, James Harris wrote:

..

++E++

This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++). But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.

Yes. If

++E++

is going to be permitted then for programmer sanity wouldn't it be true
to say that both ++ operators need to refer to the same lvalue? If so then

++p

should probably have higher precedence than

p++

or perhaps their precedences could be the same but they be applied in left-to-right order.

It may be worth looking at other operators which take in AND produce
lvalues, most familiarly array indexing and field referencing, and hence
they can be incremented. Isn't it true that for both ++ operators of

++points.x[1]
points.x[1]++

that a programmer would normally want points.x[1] incremented, i.e.
field referencing and array indexing would take precedence over either
++ operator?

But now what about dereference? Should it also take precedence over the
++ operators or should it come after one or both? For instance, what
should the following mean?

++p^

Should it be

(++p)^

or

++(p^)

?

Which interpretation would programmers prefer? Frankly, I don't know
which would be best. :(

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Mon Nov 7 19:24:07 2022

On 07/11/2022 14:58, David Brown wrote:

On 07/11/2022 12:55, James Harris wrote:

A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.

The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?

   ++E
   E++
   E^

(The same goes for -- but to make description easier I'll mention only
++.)

Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.

I disagree entirely

Good. :)

- unless you include giving an error message saying
the programmer should be fired for writing gibberish as "well defined
and something you can rely on". I can appreciate not wanting such
things to be run-time undefined behaviour, but there is no reason at all
to insist that it is acceptable by the compiler.

As I said to Dmitry, if one wants to prohibit the above then one has to
define what exactly is being prohibited and to be careful not thereby to prohibit something else that may be more legitimate. Further, such a prohibition is an additional rule the programmer has to learn.

All in all, ISTM better to define such expressions. The programmer is
not forced to use them but at least if they are present in code and well defined then their meaning will be plain.

Take the first one,

++E + E++

It could be defined fairly easily. If operands to + are defined to
appear as though they were evaluated left then right and the ++
operators are set to be of higher precedence and defined to take effect
as soon as they are evaluated than

++E + E++

would evaluate as though the operations were

++E; E++; +

If E were a variable of value 5 then the result would be

6; 6++; + ===> 12 with E ending as 7

E&OE the expression is not actually all that hard to parse if the rules
are simple.

Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?

   ++E++^
   ++E^++

Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!

Make it a syntax error.

Why? What's so wrong with it? AISI if all three operators have the
requisite number of operands then how can it be an error in syntax?

..

On the other
hand, you /do/ have an obligation to try to catch mistakes, typos, and accidental errors in code.

Is it at least partially true that C defines a bunch of expressions as
UB because the rules were not clearly specified initially and different compilers chose different interpretations?

With a new language I cannot see why you might be against clear
definition. I am aware that it might make optimisation harder to achieve
but that would only apply in some cases and is still, IMO, better than
simply saying "that's not defined".

IOW I welcome your disagreement but don't understand it!

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Mon Nov 7 20:45:19 2022

On 07/11/2022 18:32, James Harris wrote:

On 07/11/2022 16:43, David Brown wrote:

On 07/11/2022 16:06, Bart wrote:

..

But I'm sure you know this already.

What you have given are the interpretations for C and similar
languages, operating on arithmetic operands. Other languages may have
different meanings for the symbols. Even if the OP's language gives
the same meaning to the operators for integers, it might mean
something different for other types - including the possibility of
operator overloads for user types.

If precedences were to vary with operand types then expressions would be
vary hard for programmers to read so IMO it's important for program readability that precedences go with the operators and that they are independent of the types of the operands. If a programmer didn't know
what order

a + b * c

would be evaluated in until he looked up the types then even simple
programs would be very confusing.

Agreed - but it doesn't make it impossible to use.

And precedence is not the only feature of operators. For example, in C
and C++, the && and || operators have the additional "short-circuit"
property where the second operand is evaluated if and only if necessary, depending on the result of the first operand. But if you overload these operators for your own types in C++, this is not the case - they act
like normal two-input functions and evaluate (without sequencing) both operands.

Some languages also allow you to make your own operators, perhaps also
using non-ASCII symbols. That will make some aspects of the language
more complex, but would also allow neater and clearer user code in some
cases.

(I'm not arguing for or against such things in /your/ language, merely
pointing out the possibilities.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Mon Nov 7 21:59:40 2022

On 07/11/2022 20:24, James Harris wrote:

On 07/11/2022 14:58, David Brown wrote:

On 07/11/2022 12:55, James Harris wrote:

A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.

The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?

   ++E
   E++
   E^

(The same goes for -- but to make description easier I'll mention
only ++.)

Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.

I disagree entirely

Good. :)

- unless you include giving an error message saying the programmer
should be fired for writing gibberish as "well defined and something
you can rely on". I can appreciate not wanting such things to be
run-time undefined behaviour, but there is no reason at all to insist
that it is acceptable by the compiler.

As I said to Dmitry, if one wants to prohibit the above then one has to define what exactly is being prohibited and to be careful not thereby to prohibit something else that may be more legitimate. Further, such a prohibition is an additional rule the programmer has to learn.

No one said this was easy! Though Dmitry had some suggestions of rules
to try.

These prohibitions aren't really additional rules for the programmer to
learn - it is primarily about disallowing things that a good programmer
is not going to write in the first place. No one should actually care
if "++E++" is allowed or not, because they should never write it.
Prohibiting it means you don't have to specify the order these operators
are applied, or whether the expression must be evaluated for
side-effects twice, or any of the rest of it. The only people that will
have to learn something extra are the sort of programmers who think it
is smart to write line noise.

All in all, ISTM better to define such expressions. The programmer is
not forced to use them but at least if they are present in code and well defined then their meaning will be plain.

No, the meaning will /not/ be plain. That's the point. Ideally you
should only allow constructs that do exactly what they appear to do,
without the reader having to study the manuals to understand some indecipherable gibberish that is technically legal code but completely
alien to them because no sane programmer would write it.

Take the first one,

++E + E++

It could be defined fairly easily. If operands to + are defined to
appear as though they were evaluated left then right and the ++
operators are set to be of higher precedence and defined to take effect
as soon as they are evaluated than

++E + E++

would evaluate as though the operations were

++E; E++; +

Then define it as "syntax error" and insist the programmer writes it
sensibly.

I cannot conceive of a reason to have a pre-increment operator in a
modern language, nor would I want post-increment to return a value (nor
any other kind of assignment). Ban side-effects in expressions -
require a statement. "x = y + 1;" is a statement, so it can affect "x".
"y++;" is a statement - a convenient abbreviation for "y = y + 1;".
"++x" no longer exists, and "x + x++;" makes no sense because it mixes
an expression and a statement.

What is the cost? The programmer might have to split things into a few
lines - but we have much bigger screens and vastly bigger disks than the
days when C was born. The programmer might need a few extra temporary variables - these are free with modern compiler techniques.

Ask yourself why "++x;" and the like exist in languages like C. The
reason is that early compilers were weak - they were close to dumb
translators into assembly, and if you wanted efficient results using the features of the target processor, you needed to write your code in a way
that mimicked the actual processor instructions. "INC A" was faster
than "ADD A, 1", so you write "x++" rather than "x = x + 1". This is no
longer the case in the modern world.

If E were a variable of value 5 then the result would be

6; 6++; + ===> 12 with E ending as 7

E&OE the expression is not actually all that hard to parse if the rules
are simple.

Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?

   ++E++^
   ++E^++

Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!

Make it a syntax error.

Why? What's so wrong with it? AISI if all three operators have the
requisite number of operands then how can it be an error in syntax?

/You/ are designing the syntax. You don't have to accept such
meaningless drivel in the code - /you/ can choose to make it a syntax
error. You can't pretend that it is a useful or intuitive code to human
eyes, so why make it legal for the compiler?

On the other hand, you /do/ have an obligation to try to catch
mistakes, typos, and accidental errors in code.

Is it at least partially true that C defines a bunch of expressions as
UB because the rules were not clearly specified initially and different compilers chose different interpretations?

Not really, no. Such cases are more often "implementation dependent".
Things are more often "undefined behaviour" because there is no sensible
way to define the behaviour, or no efficient way to implement defined behaviour, or where making it UB gives more benefits (such as
optimisation opportunities or debugging/warnings/error checking/run-time checks) than giving it a definition that would mostly be wrong. There
are a few cases of UB in C that could better as compile-time errors or implementation-dependent behaviour.

With a new language I cannot see why you might be against clear
definition.

I am /for/ a clear definition - I recommend it be defined as a
compile-time error. That /is/ a clear definition, it is not undefined behaviour.

(There are also situations where I think "undefined behaviour" is better
than defined behaviour, but that would be a different thread.)

I am aware that it might make optimisation harder to achieve
but that would only apply in some cases and is still, IMO, better than
simply saying "that's not defined".

IOW I welcome your disagreement but don't understand it!

I think it is great that you are happy to discuss this and I try my bes
to explain it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Tue Nov 8 00:15:37 2022

On 07/11/2022 16:34, David Brown wrote:

On 07/11/2022 16:23, Bart wrote:

gcc accepts this C code (when E, V are both ints):

     ++E + E++;
     V = V++;

That's like saying that you can hit a screw with a hammer. Use the tool properly, and you will see the complaints. gcc is a C compiler, not
some kind of "official" guide to the language, and everyone knows that without flags it is far too accepting of code that has undefined
behaviour or is otherwise clearly wrong even in cases that can be
spotted easily. With even basic warning flags enabled, these are marked.

(You've had this explained to you a few hundred times over the last
decade or so. I know you get some kind of perverse pleasure out of find
any way of making C and/or gcc look bad in your own eyes,

Well, isn't it? You recommended that a new language doesn't allow it,
but C does anyway, or at least its implementations do so.

(Unless you go out of /your/ way to ensure it doesn't pass. But you'd be
better off avoiding such code. There are a million ways of writing
nonsense code that cannot be prohibited by a compiler.)

It won't accept ++E++ because the first ++ expects an lvalue. Probably
the same will happen when you try and implement it elsewhere. So no
actual need to prohibit in the language - it just won't work.

That makes /no/ sense. If by "it just won't work"

I mean that you will not get any C compilers to get it to work: all
report hard errors, and will not generate any code.

All the errors mention that some operand is not an lvalue. You don't
really need a special rule in grammar to prohibit certain combinations
of expressions.

For the same reasons, it won't work in other languages unless they have
very different intepretations of what ++ means.

Now compare this kind of unequivocal error report with the wishy-washing handling of C compilers of those other two lines:

3 compilers pass them with no comment
1 compiler reports only a warning (and an invisible one to boot: Clang
shows certain messages in a light grey font, exactly the colour of my
console background!)

So no reason to prohibit anything; it is perfectly well-defined.

There is good reason to prohibit it - you got it wrong, so despite being well-defined by the language, it is not clear code.

The actual meaning of "++*(P++);" is :

    1. Remember the original value of P - call it P_orig
    2. Increment P (that is, add sizeof(*P) to it).
    3. Increment the int at the location pointed to by P_orig.
    4. The value of the expression is the new updated value pointed to by P_orig.

So, the meaning is that. The point is, it's well-defined and makes
sense. It may be confusing to look at, but look at ANY C source and you
will see complex expressions that are much harder to grok, like:

OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
}

(Is it even an expression? I /think/ it's function definition.)

So why single out increment operators? Because I got ++P confused with
P++ for a second? Then let's ban those two varieties of increment op too!

Note that ++*(P++) is equivalent to:

*(P += 1) += 1;

Do we ban this or not? (My language doesn't allow this, but again it's a
type issue because `+:=` doesn't return a value.

No specific ordering of the two increments is implied here - they can be
done in either order.

As I said in another post, that would be perverse.

In C, prohibitions against such code come from "constraints", which are
not part of the BNF grammar rules, but come before any kind of type checking. Whether an expression is an "rvalue", a "modifible lvalue",
"a non-modifiable lvalue", or other classification, is not part of the
type system.

That's up to the implementation. In my compilers including for C,
validating lvalues is part of the type-checking.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Tue Nov 8 08:04:54 2022

On 07/11/2022 17:53, Dmitry A. Kazakov wrote:

On 2022-11-07 18:26, James Harris wrote:

On 07/11/2022 13:43, Dmitry A. Kazakov wrote:

On 2022-11-07 13:52, James Harris wrote:

On 07/11/2022 12:22, Dmitry A. Kazakov wrote:

On 2022-11-07 12:55, James Harris wrote:

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual >>>>>> meaning is less important than it being well defined and so
something a programmer can rely on.

One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.

That's all very well but what specifically would you prohibit?

Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))

It's easy to dislike a certain sequence of operators. It's harder to
define rules for their prohibition.

1. Reduce number of precedence level to logical, additive,
multiplicative, highest order.

2. Require parenthesis for mixed operations at the same level (except
for * and /)

3. No side effects of operators.

Good suggestions, especially ruling out operators with side effects. You wouldn't believe how much trouble they've been giving me. (It would be
alright if one was willing to make a language out of all kinds of odd
features but not if one wants a language to be cohesive.)

I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in it
less comfortable, like the simplicity of a hair shirt. ;)

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Tue Nov 8 09:33:49 2022

On 08/11/2022 01:15, Bart wrote:

On 07/11/2022 16:34, David Brown wrote:

On 07/11/2022 16:23, Bart wrote:

gcc accepts this C code (when E, V are both ints):

     ++E + E++;
     V = V++;

That's like saying that you can hit a screw with a hammer. Use the
tool properly, and you will see the complaints. gcc is a C compiler,
not some kind of "official" guide to the language, and everyone knows
that without flags it is far too accepting of code that has undefined
behaviour or is otherwise clearly wrong even in cases that can be
spotted easily. With even basic warning flags enabled, these are marked. >>
(You've had this explained to you a few hundred times over the last
decade or so. I know you get some kind of perverse pleasure out of
find any way of making C and/or gcc look bad in your own eyes,

Well, isn't it? You recommended that a new language doesn't allow it,
but C does anyway, or at least its implementations do so.

C is a language from 5 decades ago, and has its flaws. Learn from them
and avoid repeating them in new languages. gcc is a tool for compiling programs written in C that is essential for an enormous mass of existing
code. A major feature is that it must remain compatible with existing
usage and existing code, when used in the same way as before - this
greatly restricts the features it can provide by default. So you need
to use compiler flags to change default behaviour.

How can this be difficult for you to understand? I don't believe it is.
You are a smart guy - yet you insist on playing the fool, again and
again. For years, you have posted to comp.lang.c with your hatred and misunderstandings about C and its tools, sometimes deliberately trying
to confuse and mislead others who are newer to the language. Please
don't do it here too.

(Unless you go out of /your/ way to ensure it doesn't pass. But you'd be better off avoiding such code. There are a million ways of writing
nonsense code that cannot be prohibited by a compiler.)

Yes, because "gcc -Wall" is /so/ hard to write. I mean, it takes hours
extra work, far out of your way. Write yourself a batch file with gcc
flags - you could have done it 20 years ago and saved yourself and
everyone else enormous effort.

A major point of a good programming language - aided by good tools - is
to reduce the amount of bad code that is accepted. Of course people
could write perfect code without the help from tools, but people are
usually imperfect and write bad code sometimes, knowingly or
unknowingly. As you say, a compiler can't prohibit all nonsense code,
but languages and tools can be designed to do their best.

It won't accept ++E++ because the first ++ expects an lvalue.
Probably the same will happen when you try and implement it
elsewhere. So no actual need to prohibit in the language - it just
won't work.

That makes /no/ sense. If by "it just won't work"

I mean that you will not get any C compilers to get it to work: all
report hard errors, and will not generate any code.

So it is prohibited by the language.

All the errors mention that some operand is not an lvalue. You don't
really need a special rule in grammar to prohibit certain combinations
of expressions.

No, indeed you don't - because it is prohibited by the language.

For the same reasons, it won't work in other languages unless they have
very different intepretations of what ++ means.

Now compare this kind of unequivocal error report with the wishy-washing handling of C compilers of those other two lines:

No, let's not. We are not talking about C - we are talking about how
the OP might handle such code in /his/ language. And if you want to
talk about C, grow up and use proper tools in a proper manner.

So no reason to prohibit anything; it is perfectly well-defined.

There is good reason to prohibit it - you got it wrong, so despite
being well-defined by the language, it is not clear code.

The actual meaning of "++*(P++);" is :

     1. Remember the original value of P - call it P_orig
     2. Increment P (that is, add sizeof(*P) to it).
     3. Increment the int at the location pointed to by P_orig.
     4. The value of the expression is the new updated value pointed
to by P_orig.

So, the meaning is that. The point is, it's well-defined and makes
sense.

It makes sense to the language - it does not make sense to humans (the
fact that /you/ got it wrong proves this, if there were any doubt).
Therefore it is not good programming. Therefore, if it is practical for
a language and/or tool to disallow it without too much other harm to the language, it should disallow it.

It may be confusing to look at, but look at ANY C source and you
will see complex expressions that are much harder to grok, like:

OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
         }

(Is it even an expression? I /think/ it's function definition.)

Are you arguing that because some people write C code that is even
harder to understand, the OP should allow these nonsense expressions in
his language? That "logic" is like saying that because there are bank
robbers, people should be allowed to drunk-drive.

So why single out increment operators? Because I got ++P confused with
P++ for a second? Then let's ban those two varieties of increment op too!

No one is singling out these operators - it's just the example the OP gave.

And yes, ban ++P - it is a pointless operator now. (See my other post discussing that.)

Note that ++*(P++) is equivalent to:

*(P += 1) += 1;

No, it is not. Again, your mistakes show why it is a really bad idea to
allow these kinds of expression.

Do we ban this or not? (My language doesn't allow this, but again it's a
type issue because `+:=` doesn't return a value.

Good. Assignment should be a statement, not an expression, and should
not return a value.

No specific ordering of the two increments is implied here - they can
be done in either order.

As I said in another post, that would be perverse.

In C, prohibitions against such code come from "constraints", which
are not part of the BNF grammar rules, but come before any kind of
type checking. Whether an expression is an "rvalue", a "modifible
lvalue", "a non-modifiable lvalue", or other classification, is not
part of the type system.

That's up to the implementation. In my compilers including for C,
validating lvalues is part of the type-checking.

You can mix up phases of translation as much as you want, and you can
happily check lvalue/rvalue constraints in the same part of the code as
you do type checking. But that does not make lvalue/rvalue
classification anything to do with types in C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 8 09:23:26 2022

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from your suggestions but can't help but think they would make programming in it
less comfortable, like the simplicity of a hair shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++; you
are wrong. Actually that train left the station. Today C++ fun is
templates. It is monstrous instantiations over instantiations barely
resembling program code. Modern times is a glorious combination of
Python performance with K&R C readability! (:-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to James Harris on Tue Nov 8 11:00:58 2022

On 07/11/2022 18:16, James Harris wrote:

..

But now what about dereference? Should it also take precedence over the
++ operators or should it come after one or both? For instance, what
should the following mean?

++p^

Should it be

(++p)^

or

++(p^)

?

Lightbulb moment! It occurred to me last night that although dereference
"is about" lvalues it doesn't actually take in an lvalue; it takes an
rvalue (i.e. if supplied an lvalue it will be 'converted' to an rvalue
before being input to the dereference operation). I had it wrongly
classified in my operators spreadsheet. Yet that feature of dereference
may help suggest where its precedence should put relative to the ++
operators, i.e. it should come after both of them.

If so, that makes the order

prefix ++ ;lvalue -> lvalue
postfix ++ ;lvalue -> rvalue
^ (dereference) ;rvalue ->

That may be the solution: to put those three operators in that order
relative to each other. I'll have to see how it would work out in
practice but it is certainly a decision with logical underpinnings.

Feel free to tell me if that order of precedences is bad!

I should say I omitted what dereference produces as the lvalue it
produces is lexically unrelated to any variable in the expression.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Tue Nov 8 12:43:02 2022

On 08/11/2022 08:33, David Brown wrote:

On 08/11/2022 01:15, Bart wrote:

(Unless you go out of /your/ way to ensure it doesn't pass. But you'd
be better off avoiding such code. There are a million ways of writing
nonsense code that cannot be prohibited by a compiler.)

Yes, because "gcc -Wall" is /so/ hard to write.

And it's SO hard for a compiler to just use that as a default! So it
stays safe for EVERYONE no matter how they invoke the compiler.

Take a function like this which I consider much more dangerous than
anything we've been discussing:

void fred() {}

My bcc compiler gives a hard error: "() params are not allowed". But
this works:

gcc c.c -c

OK, I'll have to write -Wall as you say:

gcc -Wall c.c -c

But, it still passes!

(So much existing code wrongly uses () to mean no parameters - thanks no
doubt to gcc's lax approach over decades - that I have to give bcc a
special option to enable it when it comes up.)

I mean, it takes hours
extra work, far out of your way. Write yourself a batch file with gcc
flags - you could have done it 20 years ago and saved yourself and
everyone else enormous effort.

Why do you expect people to have to themselves implement a chunk of the compiler they're using?

And have to do so for every compiler - at one time I was using 7 or 8.
ALL of them should be doing their jobs properly without being told.

A major point of a good programming language - aided by good tools - is
to reduce the amount of bad code that is accepted.

Yeah. In my language, A[i] only works when A is an array; P^ (pointer
deref) only works when P is a pointer.

Sounds obvious when put like that, but in C anything goes; Allowing A^
and P[i] enables a huge amout of dangerous nonsense.

I mean that you will not get any C compilers to get it to work: all
report hard errors, and will not generate any code.

So it is prohibited by the language.

So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
sounds like, it depends!

It may be confusing to look at, but look at ANY C source and you will
see complex expressions that are much harder to grok, like:

OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
}

(Is it even an expression? I /think/ it's function definition.)

Are you arguing that because some people write C code that is even
harder to understand, the OP should allow these nonsense expressions in
his language? That "logic" is like saying that because there are bank robbers, people should be allowed to drunk-drive.

YOU are arguing that you shouldn't be allowed to compose certain
operators because the result might be confusing. But why single out
these particular ones?

That is, combinations of increment and deref. More importantly, how
exactly do you expect the language to do so? If the result is sound, syntax-wise and type-wise, what criterea do you expect it to apply?

To be clear, the expression we are talking about isn't a nonsense one at
all and is perfectly well-behaved:

a := (10, 20, 30)

p:=^a[1] # p points to the 10

++(p++^) # step p to the 20 while incrementing the 10

println a # displays (11, 20, 30)
println p^ # displays 20

(This is dynamic scripting code.)

As for ++e + e++, while I would never write such a thing, I'm not going
to lose sleep over it. It that was banned, there are 99 other ways to
write code where behaviour depends on evaluation order:

#include <stdio.h>

int f1(void) {return puts("One");}
int f2(void) {return puts("Two");}
int f3(void) {return puts("Three");}

void f(int a, int b, int c){}

int main(void) {
f(f1(), f2(), f3());
}

This displays:

Three
Two
One

with gcc and bcc. With tcc, it shows:

One
Two
Three

I know what, let's ban functions! This is what you are saying. You're
throwing out the baby with the bathwater.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Tue Nov 8 13:24:28 2022

On 07/11/2022 16:34, David Brown wrote:

On 07/11/2022 16:23, Bart wrote:

On 07/11/2022 14:58, David Brown wrote:

On 07/11/2022 12:55, James Harris wrote:

..

gcc accepts this C code (when E, V are both ints):

     ++E + E++;
     V = V++;

..

(You've had this explained to you a few hundred times over the last
decade or so. I know you get some kind of perverse pleasure out of find
any way of making C and/or gcc look bad in your own eyes, but would you /please/ stop being such a petty child and stop writing things
deliberately intended to confuse, mislead or annoy others?)

Steady on, old boy! Surely we can make comments here without getting too personal.

..

   ++E++^
   ++E^++

Make it a syntax error.

The equivalent in C syntax for the first is:

     ++*(P++);

This compiles fine when P has type int* for example. It means this:

   - Increment the pointe P
   - Increment the location that P now points to (using the * deref op)

So no reason to prohibit anything; it is perfectly well-defined.

There is good reason to prohibit it - you got it wrong, so despite being well-defined by the language, it is not clear code.

The actual meaning of "++*(P++);" is :

    1. Remember the original value of P - call it P_orig
    2. Increment P (that is, add sizeof(*P) to it).
    3. Increment the int at the location pointed to by P_orig.
    4. The value of the expression is the new updated value pointed to by P_orig.

That's a good example of how legitimate code can me made to look like
gibberish by the evil programmer (tm). As I've mentioned elsewhere it's
hard to invent rules to prohibit particular constructs simply because
'we don't like the look of them' and it would make the language harder
to implement and understand if the language design included rules on 'aesthetics'.

It seems to have parallels with the free-speech debate. Free speech is
easy when we get to choose what should be free and what should be banned
- but then that's not free speech. In reality, free speech is hard
because others may be free to say things we don't like (although IMO
those who don't want to hear them should not have to listen to them ...
but that's another topic and getting off the point of the simile). In a
similar way, programming can be hard when other programmers write
constructs we don't like. I agree that it's best for a language to help programmers write readable and comprehensible programs - and even to
make them the easiest to write, if possible - but the very flexibility
which may allow them to do so may also give then the freedom to write
code we don't care for. I don't think one can legislate against that.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andy Walker@21:1/5 to James Harris on Tue Nov 8 15:13:20 2022

On 08/11/2022 08:04, James Harris wrote:

On 07/11/2022 17:53, Dmitry A. Kazakov wrote:

1. Reduce number of precedence level to logical, additive,
multiplicative, highest order.

Many people expect "and" to bind more tightly than "or", so you
perhaps need [at least] two levels of logical. Somewhere between C and
hair shirts, there is perhaps some more sensible number?

2. Require parenthesis for mixed operations at the same level
(except for * and /)

Don't know why the exception?

3. No side effects of operators.

How is "side effect" defined for this purpose? But in any case
you can do this if only language-defined operators are allowed, but if
you want to allow users to [re-]define their own, it's much harder.
Once you've used languages with proper operators, you'll never want to
go back! Lots of things other than numbers can sensibly be added or multiplied, even subtracted or divided.

Good suggestions, especially ruling out operators with side effects. You wouldn't believe how much trouble they've been giving me.

You can get too paranoid about side-effects. They're like many
aspects of programming; they can be used for good or evil, and on the
whole you should let programmers use them that way. Good programmers
will use them wisely, bad programmers will write bad programs no matter
how hard you try to make them write good ones.

[...]

I like the simplicity of the language which would result from your suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)

If operator precedence and parentheses are a problem for you,
you could always switch to Polish [or Reverse Polish] notation. Also
solves the problem of operators that take three or more operands.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Farnaby

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Andy Walker on Tue Nov 8 17:05:49 2022

On 2022-11-08 16:13, Andy Walker wrote:

On 08/11/2022 08:04, James Harris wrote:

On 07/11/2022 17:53, Dmitry A. Kazakov wrote:

1. Reduce number of precedence level to logical, additive,
multiplicative, highest order.

    Many people expect "and" to bind more tightly than "or", so you perhaps need [at least] two levels of logical. Somewhere between C and
hair shirts, there is perhaps some more sensible number?

There could be other operators like xor, implication etc.

2. Require parenthesis for mixed operations at the same level
(except for * and /)

    Don't know why the exception?

Because traditionally - and / do not require them. Actually the rule
should be: any same-valued non-associative operator need parenthesis:

a ^ b ^ c

is illegal (assuming ^ is exponentiation).

a + b + c

is OK.

3. No side effects of operators.

    How is "side effect" defined for this purpose?

I would have a stated contract on a subprogram not to have side effects.
Only such subprograms may implement operators. One can partially enforce
this contract by checking calls statically. Should the programmer
violate the contract in some other way, the result would be a bounded
run-time error.

    You can get too paranoid about side-effects. They're like many aspects of programming; they can be used for good or evil, and on the
whole you should let programmers use them that way. Good programmers
will use them wisely, bad programmers will write bad programs no matter
how hard you try to make them write good ones.

One could consider further contracts on actual arguments in order to
disallow expressions like:

Read + Read / Read

The general rule must be that evaluation order if not explicit must not
change the result (within the margins of rounding errors and exception propagation). Though programmers would like to address exceptions too.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Tue Nov 8 16:42:35 2022

On 07/11/2022 18:16, James Harris wrote:

On 07/11/2022 14:23, Bart wrote:

On 07/11/2022 11:55, James Harris wrote:

..

++E++

This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++). But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.

Yes. If

++E++

is going to be permitted

Then you need to define what it means. Here, suppose that in each case E
starts off as 100:

E++ # What value does E have afterwards?

++E # What value does E have afterwards?

X := E++ # What is the value of X?

++E++ # What is the value of E after?

X := ++E++ # What is the value of X? What is the type and value
# of the E++ portion?

I can't make ++E++ work in any of my languages because of type/lvalue discrepancies.

then for programmer sanity wouldn't it be true

to say that both ++ operators need to refer to the same lvalue? If so then

++p

should probably have higher precedence than

p++

or perhaps their precedences could be the same but they be applied in left-to-right order.

It would already be a big deal, and a vast improvement over C, that "^"
is a postfix op; don't push it!

It may be worth looking at other operators which take in AND produce
lvalues, most familiarly array indexing and field referencing, and hence
they can be incremented. Isn't it true that for both ++ operators of

++points.x[1]
points.x[1]++

that a programmer would normally want points.x[1] incremented, i.e.
field referencing and array indexing would take precedence over either
++ operator?

I'm not sure what you're asking here or where producing lvalues comes in
to it. Those examples work as expected in my language:

record R =
var x
end

points:=R((10,20,30))
println points # (10, 20, 30)

++points.x[1]
println points # (11, 20, 30)

points.x[1]++
println points # (12, 20, 30)

However, my syntax works a specific way:

* "." is not considered a normal binary operator (because it isn't).

* "[]" is not considered that either (this is more typical)

So `points.x[1]` forms a single expression term. Unary ops like `++`
work on a term. If "." was a normal binary op, then your example would
be parsed as:

(++points).x[1]

unless you make special rules just for ++.

Note, usually ++A and A++ are interchangeable. There is only different behaviour if you try to use the resulting value (the first then returns
new A, the second returns old A).

But now what about dereference? Should it also take precedence over the
++ operators or should it come after one or both? For instance, what
should the following mean?

++p^

Should it be

(++p)^

or

++(p^)

Isn't it just up to unary op evaluation? I already said how it's
typically done, so that ++p^ means ++(p^). If it's unclear, then just
use parentheses.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Tue Nov 8 17:29:50 2022

On 08/11/2022 14:24, James Harris wrote:

On 07/11/2022 16:34, David Brown wrote:

On 07/11/2022 16:23, Bart wrote:

The equivalent in C syntax for the first is:

     ++*(P++);

This compiles fine when P has type int* for example. It means this:

   - Increment the pointe P
   - Increment the location that P now points to (using the * deref op) >>>
So no reason to prohibit anything; it is perfectly well-defined.

There is good reason to prohibit it - you got it wrong, so despite
being well-defined by the language, it is not clear code.

The actual meaning of "++*(P++);" is :

     1. Remember the original value of P - call it P_orig
     2. Increment P (that is, add sizeof(*P) to it).
     3. Increment the int at the location pointed to by P_orig.
     4. The value of the expression is the new updated value pointed
to by P_orig.

That's a good example of how legitimate code can me made to look like gibberish by the evil programmer (tm). As I've mentioned elsewhere it's
hard to invent rules to prohibit particular constructs simply because
'we don't like the look of them' and it would make the language harder
to implement and understand if the language design included rules on 'aesthetics'.

You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can try.
Aim to make it harder to write convoluted code, and easier to write
clearer code. And try to make the clearer code more efficient, to
reduce the temptation to write evil code.

It seems to have parallels with the free-speech debate. Free speech is
easy when we get to choose what should be free and what should be banned
- but then that's not free speech. In reality, free speech is hard
because others may be free to say things we don't like (although IMO
those who don't want to hear them should not have to listen to them ...
but that's another topic and getting off the point of the simile).

There are /always/ limits on free speech. People don't always
understand that, but they exist. Failure to enforce appropriate limits
is as bad for society as failure to allow appropriate freedom of speech.
(But I agree that there is no easy answer as to where to draw the
lines, or who should be making or enforcing these limits.)

In a
similar way, programming can be hard when other programmers write
constructs we don't like. I agree that it's best for a language to help programmers write readable and comprehensible programs - and even to
make them the easiest to write, if possible - but the very flexibility
which may allow them to do so may also give then the freedom to write
code we don't care for. I don't think one can legislate against that.

I'm not sure it is the same - after all, if some one exercises their
rights to speak gibberish, or to give long, convoluted and
incomprehensible speaches, the listener has the right to go away, ignore
them, or fall asleep. It's harder for a compiler to do that!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Tue Nov 8 17:20:54 2022

On 08/11/2022 13:43, Bart wrote:

On 08/11/2022 08:33, David Brown wrote:

On 08/11/2022 01:15, Bart wrote:

(Unless you go out of /your/ way to ensure it doesn't pass. But you'd
be better off avoiding such code. There are a million ways of writing
nonsense code that cannot be prohibited by a compiler.)

Yes, because "gcc -Wall" is /so/ hard to write.

And it's SO hard for a compiler to just use that as a default!

Yes, it /is/ hard to have it as the default. As I said, gcc is an
essential tool for vast amounts of software. You can't change the
behaviour of such a critical tool without massive consequences.

I would be very happy to see the warnings of -Wall, and more, as hard
errors by default in gcc. But it's not feasible - it will break endless
amount of code and build scripts, some of which has not been looked at
in decades. I have been supportive of moves in gcc towards better
defaults, but it is inevitably a very slow and limited process.

You are used to your own little world of your own tools, your own
language, your own code. Perhaps you don't realise what it means to
work with other people - certainly you don't understand what it means
for a language and a toolchain to be vital for /millions/ of programs.

When you start a /new/ tool, you get to make different decisions. You
get to learn from the past. You can enable a range of warnings by
default (and if it is a new language, rather than warnings you aim to
prohibit the poor code in the language rules). You can enable
optimisation by default in your new tool, so that halfwits will no
longer compile without optimisation and complain about the quality of
the code generation. You can make things like the choice of language
standard a mandatory option, so that people can't get it wrong. (Even
better, it should be mandatory in each code file.)

Of course, anyone who has a clue how to use a computer, cares about the
quality of the their coding, understands the need to learn about their
tools, can set up whatever scripts, makefiles, CFLAGS, batch files, or
other conveniences to get their particular preferences for compiler
flags simply and easily. That does mean you have to spend a few minutes effort, and it means you no longer have an excuse for endless rants.
But I think most decent developers can manage it. Those that can't, or
won't, are likely to write shite code no matter what tools and languages
they are given because they simply don't care about what they are doing.

So it
stays safe for EVERYONE no matter how they invoke the compiler.

Take a function like this which I consider much more dangerous than
anything we've been discussing:

   void fred() {}

My bcc compiler gives a hard error: "() params are not allowed". But
this works:

Listen carefully. No one gives a **** about your compiler. It's /your/
toy, just like your own language. If you like it, great. If you can
make a living from it, even better. But it does not compare to real
toolchains in any sense. You live in a different world here. You can
do /exactly/ what you like with your own tools, because they are for you
alone. Other toolchains and languages are not, and do not have the
luxury of the choices you can make.

The world runs on bad defaults - they are everywhere, in all aspects of
life and of society. We have them because they have been that way for
so long that it is impossible, or nearly impossible, to change them.
Often they were good choices when they were made, long ago. You can
sometimes make gradual changes, and you can make better choices for new
things, but you can't make big and sudden changes to things that lots of
people rely upon.

   gcc c.c -c

OK, I'll have to write -Wall as you say:

   gcc -Wall c.c -c

But, it still passes!

It is valid code - why would it not pass?

(So much existing code wrongly uses () to mean no parameters - thanks no doubt to gcc's lax approach over decades - that I have to give bcc a
special option to enable it when it comes up.)

No, existing C code uses () to mean unspecified number of parameters -
anything from zero upwards.

You claim to have made a C compiler - did you never actually look at the language standards or learn the language?

Of course, since some people (myself included) see functions declared
with empty parentheses as poor style (it is, after all, obsolescent
since C99), gcc has an option to warn about it - "-Wstrict-prototypes".
But it is legal code in a form used by a lot of old code, and not
something you are likely to type accidentally, so it is not part of the
group "-Wall" of warnings that are mostly uncontroversial.

(In the next C standard, C23, "void foo()" will mean "foo" takes no
parameters, just like in C++.)

I mean, it takes hours extra work, far out of your way. Write
yourself a batch file with gcc flags - you could have done it 20 years
ago and saved yourself and everyone else enormous effort.

Why do you expect people to have to themselves implement a chunk of the compiler they're using?

What? You complain that gcc's source is millions of lines long. How
does a few command-line options count as "a chunk of the compiler" ?

And have to do so for every compiler - at one time I was using 7 or 8.
ALL of them should be doing their jobs properly without being told.

They are. You just don't understand what their jobs are.

A major point of a good programming language - aided by good tools -
is to reduce the amount of bad code that is accepted.

Yeah. In my language, A[i] only works when A is an array; P^ (pointer
deref) only works when P is a pointer.

Sounds obvious when put like that, but in C anything goes; Allowing A^
and P[i] enables a huge amout of dangerous nonsense.

No, despite your continued exaggerations, C is not "anything goes". But
it /does/ allow some constructs that other languages don't (and vice versa).

You may not have noticed, in your eagerness to condemn everything C
related, including anyone who actually understands and uses the
language, that I have repeatedly recommend that the OP /not/ copy C in
his new language. The design decision for C's subscript to be syntactic
sugar for pointer dereferencing (you can't apply it to an array in C,
despite appearances that confuse you) made sense when C was created, and
for the expected uses of the language. That decision is no longer a
good choice for a modern language.

I mean that you will not get any C compilers to get it to work: all
report hard errors, and will not generate any code.

So it is prohibited by the language.

So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
sounds like, it depends!

Yes, it is prohibited by the language - as I said, and you bizarrely
claimed otherwise (saying "no actual need to prohibit in the language -
it just won't work"). It has nothing to do with types, it is in
language constraint clauses.

It may be confusing to look at, but look at ANY C source and you will
see complex expressions that are much harder to grok, like:

OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
          }

(Is it even an expression? I /think/ it's function definition.)

Are you arguing that because some people write C code that is even
harder to understand, the OP should allow these nonsense expressions
in his language? That "logic" is like saying that because there are
bank robbers, people should be allowed to drunk-drive.

YOU are arguing that you shouldn't be allowed to compose certain
operators because the result might be confusing. But why single out
these particular ones?

I didn't. You are making things up.

I recommend you take a step outside to your garden. Jump up and down
and scream "I hate C" at the top of your voice, until you are hoarse and
red in the face. Get it out your system. Then come back here, stop
posting ludicrous anti-C drivel, and maybe you can go back to
contributing usefully to the discussion. You have more experience in
home-made languages than most people - try to give useful advice and
leave anything about C to people who can talk about it rationally.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to All on Tue Nov 8 16:16:34 2022

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Tue Nov 8 17:46:58 2022

On 08/11/2022 16:20, David Brown wrote:

On 08/11/2022 13:43, Bart wrote:

On 08/11/2022 08:33, David Brown wrote:

On 08/11/2022 01:15, Bart wrote:

(Unless you go out of /your/ way to ensure it doesn't pass. But
you'd be better off avoiding such code. There are a million ways of
writing nonsense code that cannot be prohibited by a compiler.)

Yes, because "gcc -Wall" is /so/ hard to write.

And it's SO hard for a compiler to just use that as a default!

Yes, it /is/ hard to have it as the default.

No it isn't. And the consequences of allowing terrible, error prone
legacy code are considerable.

You need ONE new option in a compiler, example:

gcc --classic

(Or, more apt, --unsafe.)

That means that the next generation of lazy and/or newbie C programmers
who don't bother with options get used to a stricter and safer version
of the language.

But it's not feasible - it will break endless
amount of code and build scripts,

They are likely to break anyway - don't you say that you archive
complete compilers to ensure your projects can always be built?

You are used to your own little world of your own tools, your own
language, your own code. Perhaps you don't realise what it means to
work with other people - certainly you don't understand what it means
for a language and a toolchain to be vital for /millions/ of programs.

Here we are designing new languages and new tools. They don't
necessarily need to be for the mass market.

    gcc c.c -c

OK, I'll have to write -Wall as you say:

    gcc -Wall c.c -c

But, it still passes!

It is valid code - why would it not pass?

Because it's fucking stupid code:

#include <stdio.h>

int fred() {return 0;}

int main(void) {
fred(1,2,3,4,5,6,7,8,9,10);
fred("Hello, World!");
fred(fred,fred,fred(fred(fred)));
}

On what planet could all those calls to fred() be correct? All of them,
except at most one, will be wrong. And dangerous.

Yet 'gcc -Wall -Wextra -Wpedantic etc etc` passes it quite happily.

That is a fucking stupid compiler.

(So much existing code wrongly uses () to mean no parameters - thanks
no doubt to gcc's lax approach over decades - that I have to give bcc
a special option to enable it when it comes up.)

No, existing C code uses () to mean unspecified number of parameters - anything from zero upwards.

No, all the C code I've seen routinely uses () to mean zero parameters
only. The problem with that is that any number of parameters of any
types can be passed, clearly incorrectly, and it cannot be detected.

Code that uses () correctly (normally associated with function pointers)
needs to ensure that the call and the callee match in argument counts
and types. That's why it is dangerous. But this use is unusual.

(In my language, that is achieved with explicit function pointer casts.)

You claim to have made a C compiler - did you never actually look at the language standards or learn the language?

I made a compiler for a subset of C - minus some features. () parameters
need to be enabled by a legacy switch like the one I mentioned, in my
case called '-old'.

(In the next C standard, C23, "void foo()" will mean "foo" takes no parameters, just like in C++.)

Will my nonsense program still pass?

I mean, it takes hours extra work, far out of your way. Write
yourself a batch file with gcc flags - you could have done it 20
years ago and saved yourself and everyone else enormous effort.

Why do you expect people to have to themselves implement a chunk of
the compiler they're using?

What? You complain that gcc's source is millions of lines long. How
does a few command-line options count as "a chunk of the compiler" ?

By using 1000s of options to control every aspect of the process. The
options form a mini-DSL to build a custom dialect of a language.

No, despite your continued exaggerations, C is not "anything goes". But
it /does/ allow some constructs that other languages don't (and vice
versa).

You may not have noticed, in your eagerness to condemn everything C
related, including anyone who actually understands and uses the
language, that I have repeatedly recommend that the OP /not/ copy C in
his new language. The design decision for C's subscript to be syntactic sugar for pointer dereferencing (you can't apply it to an array in C,
despite appearances that confuse you)

You mean appearances like this:

int A[10];
int x;

x = *A;

gcc is happy with this.

So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
sounds like, it depends!

Yes, it is prohibited by the language

Yet no compiler stopped me creating an executable. So why is ++E++ a
hard error and not ++E + E++?

- as I said, and you bizarrely
claimed otherwise (saying "no actual need to prohibit in the language -
it just won't work").

Yes, about ++E++. Not ++E + E++; I merely observed that gcc didn't take
the latter seriously.

It has nothing to do with types, it is in
language constraint clauses.

It may be confusing to look at, but look at ANY C source and you
will see complex expressions that are much harder to grok, like:

OP(op,3f) { F =
((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;           } >>>>
(Is it even an expression? I /think/ it's function definition.)

Are you arguing that because some people write C code that is even
harder to understand, the OP should allow these nonsense expressions
in his language? That "logic" is like saying that because there are
bank robbers, people should be allowed to drunk-drive.

YOU are arguing that you shouldn't be allowed to compose certain
operators because the result might be confusing. But why single out
these particular ones?

I didn't. You are making things up.

You said:

Make it a syntax error.

about ++E++^ and ++E^++. Before going on to compare that syntax with
Brainfuck.

I recommend you take a step outside to your garden. Jump up and down
and scream "I hate C" at the top of your voice, until you are hoarse and
red in the face. Get it out your system.

I suggest you do the same with "I hate Bart".

I already know that the stuff I do is miles better than C, while still
being simple, low-level, small footprint and easy to build fast. Thanks
for reminding me what a quagmire it is.

Then come back here, stop
posting ludicrous anti-C drivel, and maybe you can go back to
contributing usefully to the discussion. You have more experience in home-made languages than most people - try to give useful advice and
leave anything about C to people who can talk about it rationally.

This is not the C group. C comes up tangentially from time to time. But
I believe it was mostly you who wholesale dragged C and its compilers
into the discussion.

Please do not reply to this. I'm not interested in taking it any further.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Bart on Tue Nov 8 19:18:14 2022

On 2022-11-08 18:46, Bart wrote:

No it isn't. And the consequences of allowing terrible, error prone
legacy code are considerable.

Wrong. Backward compatibility trumps everything, absolutely everything.
Legacy C, FORTRAN, COBOL code is far more stable than whatever new garbage.

Unless your new language supports strong typing, contracts and formal verification, I'd better take old C code, than newly introduced fancy bugs.

From a new language I expect new technological level. So long you guys
are keeping on reinventing C, I'd better stay with C.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Dmitry A. Kazakov on Tue Nov 8 19:15:50 2022

On 08/11/2022 18:18, Dmitry A. Kazakov wrote:

On 2022-11-08 18:46, Bart wrote:

No it isn't. And the consequences of allowing terrible, error prone
legacy code are considerable.

Wrong. Backward compatibility trumps everything, absolutely everything. Legacy C, FORTRAN, COBOL code is far more stable than whatever new garbage.

Unless your new language supports strong typing, contracts and formal verification, I'd better take old C code, than newly introduced fancy bugs.

From a new language I expect new technological level. So long you guys
are keeping on reinventing C, I'd better stay with C.

They are plenty of newer, more ground-breaking languages around that
might suit you: Rust, Zig, Odin, Dart, Julia... Or functional ones like Haskell, OCaml, F#.

The ones I create are for personal use and occupy a particular niche in
the range of possible languages. They are roughly represented by M and Q
along this line:

C--M-----Q-----------Python

My codebase is small so backward compatibility (with my own languages)
is not a great priority. I'm doing some innovative stuff with how the
compilers work and how projects are managed and run, but the feature set
of the languages themselves is not advanced, and actually less important.

You yourself said to keep the set of operations minimal. I'm doing that
with the features I have no interest in, such as over-strict type systems.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Tue Nov 8 21:38:59 2022

On 08/11/2022 20:15, Bart wrote:

On 08/11/2022 18:18, Dmitry A. Kazakov wrote:

On 2022-11-08 18:46, Bart wrote:

No it isn't. And the consequences of allowing terrible, error prone
legacy code are considerable.

Wrong. Backward compatibility trumps everything, absolutely
everything. Legacy C, FORTRAN, COBOL code is far more stable than
whatever new garbage.

Unless your new language supports strong typing, contracts and formal
verification, I'd better take old C code, than newly introduced fancy
bugs.

From a new language I expect new technological level. So long you
guys are keeping on reinventing C, I'd better stay with C.

They are plenty of newer, more ground-breaking languages around that
might suit you: Rust, Zig, Odin, Dart, Julia... Or functional ones like Haskell, OCaml, F#.

Most of these are something very different from C. Haskell and OCaml
are, as you say, functional programming languages and a totally
different concept. (They are a lot of fun and highly educational.) I
don't know if anyone actually uses F# - it always struck me as another Microsoft me-too language, like C# but without the market popularity.

Dart is dead, AFAIK, showing the risks of picking a new language. Zig
is arguably a low-level C alternative, except it too is barely used.
Odin is - well, you know it is not a mainstream choice when it doesn't
even have a Wikipedia page. And Julia has its fans for particular uses,
which do not overlap much with the kind of task you'd do in C.

Which leaves Rust as the only new serious alternative to C these days
(along with C++). Rust has its pros and cons, but is definitely worth considering. Certainly any budding low-level language designer should
learn it and play with it, to see which features they want to copy and
which to drop. One of the nice things about Rust is that the add-on
tools for resource tracking are inspiring better analysis tools for C
and C++ - such rivalry is often useful.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Tue Nov 8 21:28:24 2022

On 08/11/2022 18:46, Bart wrote:

On 08/11/2022 16:20, David Brown wrote:

On 08/11/2022 13:43, Bart wrote:

On 08/11/2022 08:33, David Brown wrote:

On 08/11/2022 01:15, Bart wrote:

(Unless you go out of /your/ way to ensure it doesn't pass. But
you'd be better off avoiding such code. There are a million ways of
writing nonsense code that cannot be prohibited by a compiler.)

Yes, because "gcc -Wall" is /so/ hard to write.

And it's SO hard for a compiler to just use that as a default!

Yes, it /is/ hard to have it as the default.

No it isn't. And the consequences of allowing terrible, error prone
legacy code are considerable.

You need ONE new option in a compiler, example:

   gcc --classic

(Or, more apt, --unsafe.)

That would be one way to break all existing build systems.

It really is quite simple.

(Note that I /agree/ with you entirely that it would be /better/ if gcc
was far stricter by default. I am trying to explain to you why it cannot.)

Also remember that gcc plays two significantly different roles. In one
case, it is a development tool - you use it with appropriate warnings
and flags (and other tools, such as debuggers and profilers) to help
write correct code. Here warnings are vital feedback to the developer.

The other role is as a systems compiler. This is not a concept familiar
to Windows users, but in the *nix world (and most other OS's) software
is often distributed as source. The source is, at least in theory,
known to be correct - so the compiler should be as quiet as practically possible. The assumption is that any warnings that might be given are
false positives. That has to be the default setting for compiler options.

    gcc c.c -c

OK, I'll have to write -Wall as you say:

    gcc -Wall c.c -c

But, it still passes!

It is valid code - why would it not pass?

Because it's fucking stupid code:

I agree. But it is valid C code, so any C compiler has to accept it.

    #include <stdio.h>

    int fred() {return 0;}

    int main(void) {
        fred(1,2,3,4,5,6,7,8,9,10);
        fred("Hello, World!");
        fred(fred,fred,fred(fred(fred)));
    }

On what planet could all those calls to fred() be correct? All of them, except at most one, will be wrong. And dangerous.

Now you have code that /is/ undefined behaviour. And a good enough
compiler /could/ spot that and reject it. Unfortunately, it is often
very hard to do in real-life cases, and there is little point in making
the effort to deal with pathological artificial cases.

gcc has warnings that will complain about this code. It would have been
better if these were enabled by default, or at least in -Wall.

Yet 'gcc -Wall -Wextra -Wpedantic etc etc` passes it quite happily.

That is a fucking stupid compiler.

It is a compiler that values backwards compatibility above usability,
for a language that puts backwards compatibility above new features or improvements to the language. Both the language and the toolchain
expect users to take responsibility of their code and their use of their
tools.

When I program in C, using gcc, the above code would be marked as an
error. If I can do it, so can you. It would be nice if it were easier
or more automatic, but that's how it goes.

(So much existing code wrongly uses () to mean no parameters - thanks
no doubt to gcc's lax approach over decades - that I have to give bcc
a special option to enable it when it comes up.)

No, existing C code uses () to mean unspecified number of parameters -
anything from zero upwards.

No, all the C code I've seen routinely uses () to mean zero parameters
only.

Existing code uses it to mean an unspecified number of parameters,
anything from zero upwards.

It's very likely that meaning /zero/ parameters is the most common usage (especially since that's what it means in C++, and what it will mean in
C23). But it is not the only usage. In particular, it can be a K&R
style declaration of an external function, regardless of the number of parameters. There is still K&R C code in use today - and there are
still some people who write their C in K&R style (perhaps because some
people think "The C Programming Language" is a good way to learn the
language).

The problem with that is that any number of parameters of any
types can be passed, clearly incorrectly, and it cannot be detected.

Yes. That's why function prototypes were added to the language in C90.
But non-prototype declarations could not be removed.

Code that uses () correctly (normally associated with function pointers) needs to ensure that the call and the callee match in argument counts
and types. That's why it is dangerous. But this use is unusual.

What do you mean? There is no non-obsolescent use of "()" in
declarations of functions, types, pointers, or anything else. It exists
only as a backwards compatibility feature for K&R style non-prototype
function declarations. And of course you can /call/ a function with no parameters as "foo()", either by function name or via a pointer-to-function-with-no-parameters. When you use prototype function declarations, the compiler can check that the call and callee match in
argument counts and types (assuming you use the same declarations in
each source file).

(In my language, that is achieved with explicit function pointer casts.)

If you have function pointer casts, your risk of error is high and you
need to checking things manually. The same applies in C - there are no implicit casts of function pointer types.

You claim to have made a C compiler - did you never actually look at
the language standards or learn the language?

I made a compiler for a subset of C - minus some features. () parameters
need to be enabled by a legacy switch like the one I mentioned, in my
case called '-old'.

If it is just a subset of C, with no attempt at conformity, then it is misleading to refer to it as a C compiler. (It can still be a useful
tool for your own use.)

(In the next C standard, C23, "void foo()" will mean "foo" takes no
parameters, just like in C++.)

Will my nonsense program still pass?

No. It would be as though you had written "int fred(void) {return 0;}"
as the first line.

I mean, it takes hours extra work, far out of your way. Write
yourself a batch file with gcc flags - you could have done it 20
years ago and saved yourself and everyone else enormous effort.

Why do you expect people to have to themselves implement a chunk of
the compiler they're using?

What? You complain that gcc's source is millions of lines long. How
does a few command-line options count as "a chunk of the compiler" ?

By using 1000s of options to control every aspect of the process. The
options form a mini-DSL to build a custom dialect of a language.

Most people don't need more than a few options, you cannot control
"every aspect", it is not a DSL and you are not making a "custom
dialect". That is all wild exaggeration. For most programmers, and
most uses, "-Wall -std=c11 -O2" will cover the basics. My work is more specialised, and I regularly need a fair number of extra flags (such as specifying details of the target processor, linking setup, etc.). I
also like to have paranoid levels of warnings and some weird
optimisation flags - but I am quite unusual in that respect.

No, despite your continued exaggerations, C is not "anything goes".
But it /does/ allow some constructs that other languages don't (and
vice versa).

You may not have noticed, in your eagerness to condemn everything C
related, including anyone who actually understands and uses the
language, that I have repeatedly recommend that the OP /not/ copy C in
his new language. The design decision for C's subscript to be
syntactic sugar for pointer dereferencing (you can't apply it to an
array in C, despite appearances that confuse you)

You mean appearances like this:

    int A[10];
    int x;

    x = *A;

gcc is happy with this.

You haven't used the subscript operator here. My point is, even though
"A" is an array, writing "x = A[2]" does not apply the subscript
operator to an array. Many people find that surprising or confusing -
but that is simply how arrays work in C.

So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
sounds like, it depends!

Yes, it is prohibited by the language

Yet no compiler stopped me creating an executable. So why is ++E++ a
hard error and not ++E + E++?

My apologies - I have mixed things up. "++E++" is prohibited by the
language. "++E + E++" is undefined behaviour - the language gives no requirements as to what it might mean. The compiler can generate code
doing anything it likes.

A complicating aspect is that most undefined behaviour is /runtime/
undefined behaviour. That is, it is only a mistake if the code in
question is actually run. This means that if the code with undefined
behaviour is never encountered at runtime, it is not an error in the
code - so the compiler is obliged to generate code for the rest of the
program, and cannot (if it is conforming) stop with an error unless it
can prove that the code will definitely be run. (In theory, it could do
that if it can see code flow from main(), but in practice compilers
don't do that - it would be a huge effort for little benefit.)

- as I said, and you bizarrely claimed otherwise (saying "no actual
need to prohibit in the language - it just won't work").

Yes, about ++E++. Not ++E + E++; I merely observed that gcc didn't take
the latter seriously.

Again, I mixed them up. gcc /does/ warn about "++E + E++", if you
enable warnings.

It has nothing to do with types, it is in language constraint clauses.

It may be confusing to look at, but look at ANY C source and you
will see complex expressions that are much harder to grok, like:

OP(op,3f) { F =
((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;           }

(Is it even an expression? I /think/ it's function definition.)

Are you arguing that because some people write C code that is even
harder to understand, the OP should allow these nonsense expressions
in his language? That "logic" is like saying that because there are
bank robbers, people should be allowed to drunk-drive.

YOU are arguing that you shouldn't be allowed to compose certain
operators because the result might be confusing. But why single out
these particular ones?

I didn't. You are making things up.

You said:

Make it a syntax error.

about ++E++^ and ++E^++. Before going on to compare that syntax with Brainfuck.

Yes, because that is how it reads.

But at no point have I said anything to suggest I was talking about only
these operators. The OP used postfix and prefix increment as an example
- I didn't pick them. If the example had been "x ^= ~1<<-x*.1%!*y" I'd
have treated it the same way.

I recommend you take a step outside to your garden. Jump up and down
and scream "I hate C" at the top of your voice, until you are hoarse
and red in the face. Get it out your system.

I suggest you do the same with "I hate Bart".

I don't - this is not remotely personal, and I often enjoy our
discussions. But I /do/ get frustrated about having to say the same
things again and again.

I already know that the stuff I do is miles better than C, while still
being simple, low-level, small footprint and easy to build fast. Thanks
for reminding me what a quagmire it is.

The quagmire is of your own making.

Look, no one is suggesting C is perfect, or that gcc is perfect. All
along in this group, I have been arguing for people making new languages
to avoid the mistakes of C.

But it does not help anyone if you exaggerate the problems,
misunderstand the language, and steadfastly insist on taking the worst
possible view on it. I don't know your language - but I haven't the
slightest doubt that I could write as ugly, confusing and bad code in it
as you can in C. (As the saying goes, you can write FORTRAN in any
language.) What you write is /pointless/. It helps no one - not you,
not the OP.

Then come back here, stop posting ludicrous anti-C drivel, and maybe
you can go back to contributing usefully to the discussion. You have
more experience in home-made languages than most people - try to give
useful advice and leave anything about C to people who can talk about
it rationally.

This is not the C group. C comes up tangentially from time to time. But
I believe it was mostly you who wholesale dragged C and its compilers
into the discussion.

No, it was not. You were the first to mention C in this thread branch,
and the first to mention gcc.

But we are both to blame for digging further and re-running old
discussions. I apologise for my part in it, especially for some of my
less diplomatic wording. Blame it on frustration (which of course still
means blame it on me).

Please do not reply to this. I'm not interested in taking it any further.

I generally write replies inline as I read posts, so I am seeing this
request now.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Tue Nov 8 23:06:11 2022

On 08/11/2022 20:28, David Brown wrote:

On 08/11/2022 18:46, Bart wrote:

I made a compiler for a subset of C - minus some features. ()
parameters need to be enabled by a legacy switch like the one I
mentioned, in my case called '-old'.

If it is just a subset of C, with no attempt at conformity, then it is misleading to refer to it as a C compiler. (It can still be a useful
tool for your own use.)

There must be 1000s of amateur C compiler projects, probably more than
for any other language.

Mine was able to build programs like Lua, Tiny C, Seed7 and SQLite, and
run those programs to varying degrees. So more capable than most, but I
usually refer to it in docs as a C-subset compiler.

Whatever C coding I do these days will be in that subset, and mine will
be my first choice of C compiler. Any machine-generated C will be in
that same subset.

(In the next C standard, C23, "void foo()" will mean "foo" takes no
parameters, just like in C++.)

Will my nonsense program still pass?

No. It would be as though you had written "int fred(void) {return 0;}"
as the first line.

So what happened to backwards compatibility? All those programs which
call () functions with more arguments will no longer work.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Wed Nov 9 09:12:05 2022

On 09/11/2022 00:06, Bart wrote:

On 08/11/2022 20:28, David Brown wrote:

On 08/11/2022 18:46, Bart wrote:

I made a compiler for a subset of C - minus some features. ()
parameters need to be enabled by a legacy switch like the one I
mentioned, in my case called '-old'.

If it is just a subset of C, with no attempt at conformity, then it is
misleading to refer to it as a C compiler. (It can still be a useful
tool for your own use.)

There must be 1000s of amateur C compiler projects, probably more than
for any other language.

Mine was able to build programs like Lua, Tiny C, Seed7 and SQLite, and
run those programs to varying degrees. So more capable than most, but I usually refer to it in docs as a C-subset compiler.

OK.

Whatever C coding I do these days will be in that subset, and mine will
be my first choice of C compiler. Any machine-generated C will be in
that same subset.

Almost everyone in practice uses a subset of the language in choice. If
you are releasing a compiler for general use, you have to support
everything (even monstrosities like "longjmp"), but for your own use (or
for specialised use, such as tied to a transcompiler for a different
language), you only need the features you will use yourself.

(In the next C standard, C23, "void foo()" will mean "foo" takes no
parameters, just like in C++.)

Will my nonsense program still pass?

No. It would be as though you had written "int fred(void) {return
0;}" as the first line.

So what happened to backwards compatibility? All those programs which
call () functions with more arguments will no longer work.

I still don't understand what you mean.

int (*f)(void);
int (*g)(int, int);

void foo(void) {
f(); // Good
g(1, 2); // Good
f(1, 2); // Error
g(); // Error
}

Function pointers of different types - that is, different return types, different numbers or types of parameters - are incompatible in C. There
are no implicit convertions between them (unlike objects pointers and
void*), there is no common base type, and any use of explicit casts will
lead to undefined behaviour if you don't cast back to the right type
before calling the function. Even the cast back and forth is
implementation dependent - an implementation could use different sizes
for different function pointer types.

In practice, of course, most implementations have the same size of
pointer for all function types. On some real-world targets it will be different from the size of object pointers (imagine a 16-bit system with "large" code model and "small" data model, or vice versa). But C still
does not allow you to mess with the function pointer types without
explicit "I know what I am doing" casts.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Wed Nov 9 11:24:14 2022

On 09/11/2022 08:12, David Brown wrote:

On 09/11/2022 00:06, Bart wrote:

So what happened to backwards compatibility? All those programs which
call () functions with more arguments will no longer work.

I still don't understand what you mean.

int (*f)(void);
int (*g)(int, int);

void foo(void) {
    f();        // Good
    g(1, 2);        // Good
    f(1, 2);        // Error
    g();        // Error
}

Function pointers of different types - that is, different return types, different numbers or types of parameters - are incompatible in C. There
are no implicit convertions between them (unlike objects pointers and
void*), there is no common base type, and any use of explicit casts will
lead to undefined behaviour if you don't cast back to the right type
before calling the function. Even the cast back and forth is
implementation dependent - an implementation could use different sizes
for different function pointer types.

In practice, of course, most implementations have the same size of
pointer for all function types. On some real-world targets it will be different from the size of object pointers (imagine a 16-bit system with "large" code model and "small" data model, or vice versa). But C still
does not allow you to mess with the function pointer types without
explicit "I know what I am doing" casts.

The following code is a legitimate use of () parameter lists:

#include <stdio.h>

int f1(int a) {return a;}
int f2(int a, int b) {return a+b;}
int f3(int a, int b, int c) {return a+b+c;}

int (*fntable[])() = {NULL, f3, f1, f2};
int args[] = {0, 3, 1, 2};

int main(void) {
int n =3, x;

switch (args[n]) {
case 1: x=fntable[n](10); break;
case 2: x=fntable[n](20,30); break;
case 3: x=fntable[n](40,50,60); break;
}

printf("x=%d\n",x);
}

'fntable' is populated with functions of mixed signatures. When calling
one of those functions, the user-code must ensure the function pointer
is called with the right arguments for that specific function.

That is done here with the switch statement. If the () in this line:

int (*fntable[])() = {NULL, f3, f1, f2};

is assumed to be (void) in C23, then initialising with f1, f2, f3 will
be illegal, and all those calls will be too. This is why I said, what
happened to backwards compatibility.

In my language there is no equivalent to C's unchecked () parameter
list. There I would likely use the equivalent of void* pointers and
apply a cast at the point of call. The same could be done in C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Wed Nov 9 05:53:47 2022

On Tuesday, 8 November 2022 at 16:42:36 UTC, Bart wrote:

On 07/11/2022 18:16, James Harris wrote:

On 07/11/2022 14:23, Bart wrote:

On 07/11/2022 11:55, James Harris wrote:

..

++E++

This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++). But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.

Yes. If

++E++

is going to be permitted

Then you need to define what it means.

Indeed. Given the relative precedences mentioned in another reply I'd have the operations in this order:

prefix ++
postfix ++

Replies below based on that order (although I gather you have the opposite order).

Here, suppose that in each case E
starts off as 100:

E++ # What value does E have afterwards?

101

++E # What value does E have afterwards?

Also 101

X := E++ # What is the value of X?

100

++E++ # What is the value of E after?

102

(++E)++

X := ++E++ # What is the value of X? What is the type and value
# of the E++ portion?

X: 101
(E would still end up holding 102)
As a subexpression it would have the type and the intermediate value of E (101), not of E++. The trailing ++ would not contribute to the value of the subexpression; it would serve only to alter the stored value after the intermediate value of 101 had
flown the nest..

X := ((++E)++)

I can't make ++E++ work in any of my languages because of type/lvalue discrepancies.

Is that because you evaluate postfix ++ first?

then for programmer sanity wouldn't it be true

to say that both ++ operators need to refer to the same lvalue? If so then

++p

should probably have higher precedence than

p++

or perhaps their precedences could be the same but they be applied in left-to-right order.

It would already be a big deal, and a vast improvement over C, that "^"
is a postfix op; don't push it!

:)

It may be worth looking at other operators which take in AND produce lvalues, most familiarly array indexing and field referencing, and hence they can be incremented. Isn't it true that for both ++ operators of

++points.x[1]
points.x[1]++

that a programmer would normally want points.x[1] incremented, i.e.
field referencing and array indexing would take precedence over either
++ operator?

I'm not sure what you're asking here or where producing lvalues comes in
to it.
Those examples work as expected in my language:

record R =
var x
end

points:=R((10,20,30))
println points # (10, 20, 30)

++points.x[1]
println points # (11, 20, 30)

points.x[1]++
println points # (12, 20, 30)

However, my syntax works a specific way:

* "." is not considered a normal binary operator (because it isn't).

* "[]" is not considered that either (this is more typical)

So `points.x[1]` forms a single expression term. Unary ops like `++`
work on a term. If "." was a normal binary op, then your example would
be parsed as:

(++points).x[1]

unless you make special rules just for ++.

I have "." and "[....]" as normal operators but of high precedence. They are prioritised before the others we have been talking about. Either way the effect is the same.

Note, usually ++A and A++ are interchangeable. There is only different behaviour if you try to use the resulting value (the first then returns
new A, the second returns old A).

But now what about dereference? Should it also take precedence over the
++ operators or should it come after one or both? For instance, what should the following mean?

++p^

Should it be

(++p)^

or

++(p^)

Isn't it just up to unary op evaluation? I already said how it's
typically done, so that ++p^ means ++(p^). If it's unclear, then just
use parentheses.

Yes, it's up to unary op evaluation order. As a result of this discussion I have gone for **E before E** before E^ and would evaluate the above as

(++p)^

--
James

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Wed Nov 9 16:18:55 2022

On 09/11/2022 12:24, Bart wrote:

On 09/11/2022 08:12, David Brown wrote:

On 09/11/2022 00:06, Bart wrote:

So what happened to backwards compatibility? All those programs which
call () functions with more arguments will no longer work.

I still don't understand what you mean.

int (*f)(void);
int (*g)(int, int);

void foo(void) {
     f();        // Good
     g(1, 2);        // Good
     f(1, 2);        // Error
     g();        // Error
}

Function pointers of different types - that is, different return
types, different numbers or types of parameters - are incompatible in
C. There are no implicit convertions between them (unlike objects
pointers and void*), there is no common base type, and any use of
explicit casts will lead to undefined behaviour if you don't cast back
to the right type before calling the function. Even the cast back and
forth is implementation dependent - an implementation could use
different sizes for different function pointer types.

In practice, of course, most implementations have the same size of
pointer for all function types. On some real-world targets it will be
different from the size of object pointers (imagine a 16-bit system
with "large" code model and "small" data model, or vice versa). But C
still does not allow you to mess with the function pointer types
without explicit "I know what I am doing" casts.

The following code is a legitimate use of () parameter lists:

    #include <stdio.h>

    int f1(int a) {return a;}
    int f2(int a, int b) {return a+b;}
    int f3(int a, int b, int c) {return a+b+c;}

    int (*fntable[])() = {NULL, f3, f1, f2};

That is using a K&R style non-prototyped function types.

    int args[] = {0, 3, 1, 2};

    int main(void) {
        int n =3, x;

        switch (args[n]) {
        case 1: x=fntable[n](10); break;
        case 2: x=fntable[n](20,30); break;
        case 3: x=fntable[n](40,50,60); break;
        }

        printf("x=%d\n",x);
    }

'fntable' is populated with functions of mixed signatures. When calling
one of those functions, the user-code must ensure the function pointer
is called with the right arguments for that specific function.

That is done here with the switch statement. If the () in this line:

    int (*fntable[])() = {NULL, f3, f1, f2};

is assumed to be (void) in C23, then initialising with f1, f2, f3 will
be illegal, and all those calls will be too. This is why I said, what happened to backwards compatibility.

Yes, this could be a problem, and is an interesting example.

I must admit I did not realise that pointers to "old-style" function
type pointers are considered compatible to any function type pointer
with the same return type - that was new to me. Old-style (K&R)
function declarations have been obsolescent since C89/C90, and it is a
source of annoyance to me that such obsolescent features have been
allowed to remain in the language for /so/ long.

I can appreciate that using "int (*)()" is convenient here because it
avoids the need for casts. I would personally include the casts - using typedefs for clarity - because calling via old-style function types is
more limited. If one of your functions has a parameter of type "float",
an integer type smaller than "int", _Bool, or a character type, then the behaviour is undefined.

In my language there is no equivalent to C's unchecked () parameter
list. There I would likely use the equivalent of void* pointers and
apply a cast at the point of call. The same could be done in C.

Yes, though you would want to use "void (*)(void)" pointers in C, rather
than "void *" pointers - function pointers and object pointers are very different beasts.

I'm glad to see that you don't have an equivalent C's long-outdated
unchecked function types. And I'm glad it is being dropped in C23.
(clang has warned about it by default for a while, and gcc is
considering doing so - but wary due to compatibility with existing
builds. I expect they will run a trial "rebuild everything in Debian
and see if anything breaks" before deciding.)

I would recommend that in your own code generator, you use "void
(*)(void)" pointers and put in the casts in the generated C code. If
they are already present in the source Bart-language files, it should
not be difficult.

Then you could add a hard error on any use of K&R declarations to your C compiler, and score a point over gcc :-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Thu Nov 10 10:42:53 2022

On 09/11/2022 13:53, James Harris wrote:

On Tuesday, 8 November 2022 at 16:42:36 UTC, Bart wrote:

++E++ # What is the value of E after?

102

(++E)++

X := ++E++ # What is the value of X? What is the type and value
# of the E++ portion?

X: 101
(E would still end up holding 102)

As a subexpression it would have the type and the intermediate value of E (101), not of E++.

This is where your approach differs from mine. It sounds like you would
also allow this:

++ ++ ++ E

so that E becomes 103? It doesn't then matter whether ++ is prefix or
postfix. If that isn't the case (yours only works with mixed
prefix/postfix), then I can only explain it like this:

++E is the same as: (E:=E+1; E) # that final E is an lvalue

E++ is the same as: (T:=E; E:=E+1; T)

the final T /is/ an lvalue, but not the right one! You can't use it to
modify E.

That wouldn't work for me anyway because T is a transient value
(typically stored on the stack, register or unaccessible temporary -
it's got to exist somewhere!) with no lvalue. It's similar to this:

A+B is the same as: (T:=A+B; T)

It's clear that ++(A+B) can't work unless you change what ++ means (eg.
++A now means (A+1) because whatever ++ modifies is not accessible).

If that's how your approach works, then it would be unorthogonal:

++(E++) works
(++E)++ doesn't

even though you'd expect E to be 102 in both cases (and to deliver 101
in both cases too). And:

++ ++ E works
E ++ ++ doesn't

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 15:54:08 2022

On 07/11/2022 16:16, Dmitry A. Kazakov wrote:

On 2022-11-07 16:06, Bart wrote:

On 07/11/2022 13:43, Dmitry A. Kazakov wrote:

On 2022-11-07 13:52, James Harris wrote:

On 07/11/2022 12:22, Dmitry A. Kazakov wrote:

On 2022-11-07 12:55, James Harris wrote:

   ++E + E++
   ++E++
   V = V++

...

++ means cheap keyboard with broken keys or coffee spilled over it... (:-))

A bit like Ada's --, then. ;-)

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Sun Nov 13 17:26:06 2022

On 2022-11-13 16:54, James Harris wrote:

On 07/11/2022 16:16, Dmitry A. Kazakov wrote:

++ means cheap keyboard with broken keys or coffee spilled over it...
(:-))

A bit like Ada's --, then. ;-)

In Ada -- is a comment, not operator.

An interesting question regarding operator's symbol is: sticking to
ASCII or going Unicode. Let's say we wanted an increment operator (I do
not). Why ++ from 60's? Take the increment (∆) or the upwards arrow ↑ etc.

Note that all arguments against Unicode apply to operators. If the thing
is difficult to type then it is difficult to remember: precedence level, associativity, semantics. If you can hold these in your head, you could remember the key combination as well. If you do not, then, maybe, having
a subprogram Increment() would be better choice?

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Sun Nov 13 17:11:45 2022

On 08/11/2022 16:29, David Brown wrote:

On 08/11/2022 14:24, James Harris wrote:

...

You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can try.
Aim to make it harder to write convoluted code, and easier to write clearer code. And try to make the clearer code more efficient, to
reduce the temptation to write evil code.

I agree

...

In a similar way, programming can be hard when other programmers write
constructs we don't like. I agree that it's best for a language to
help programmers write readable and comprehensible programs - and even
to make them the easiest to write, if possible - but the very
flexibility which may allow them to do so may also give then the
freedom to write code we don't care for. I don't think one can
legislate against that.

I'm not sure it is the same - after all, if some one exercises their
rights to speak gibberish, or to give long, convoluted and
incomprehensible speaches, the listener has the right to go away, ignore them, or fall asleep. It's harder for a compiler to do that!

I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Sun Nov 13 16:55:59 2022

On 10/11/2022 10:42, Bart wrote:

On 09/11/2022 13:53, James Harris wrote:

On Tuesday, 8 November 2022 at 16:42:36 UTC, Bart wrote:

++E++ # What is the value of E after?

102

(++E)++

X := ++E++ # What is the value of X? What is the type and value
# of the E++ portion?

X: 101
(E would still end up holding 102)

As a subexpression it would have the type and the intermediate value
of E (101), not of E++.

This is where your approach differs from mine. It sounds like you would
also allow this:

    ++ ++ ++ E

so that E becomes 103?

That sounds right. Prefix operators have naturally to be applied right
to left.

It looks, by the way, almost as mad as a programmer writing

E = E + 1 + 1 + 1

It doesn't then matter whether ++ is prefix or
postfix.

I don't follow. In general (e.g. if autoincrement were to be used in an expression) then it /would/ matter whether prefix or postfix were used.

If that isn't the case (yours only works with mixed
prefix/postfix), then I can only explain it like this:

   ++E is the same as: (E:=E+1; E)        # that final E is an lvalue

   E++ is the same as: (T:=E; E:=E+1; T)

Yes.

the final T /is/ an lvalue, but not the right one! You can't use it to
modify E.

AISI that T would be (or, if you prefer, would be converted to) an rvalue.

That wouldn't work for me anyway because T is a transient value
(typically stored on the stack, register or unaccessible temporary -
it's got to exist somewhere!) with no lvalue. It's similar to this:

   A+B is the same as: (T:=A+B; T)

It's clear that ++(A+B) can't work unless you change what ++ means (eg.
++A now means (A+1) because whatever ++ modifies is not accessible).

It's similar if A were a struct. You could have

A.F

but you could not have

(A + 4).F

The LHS of the . operation, in this case, has to be an lvalue.

If that's how your approach works, then it would be unorthogonal:

     ++(E++) works
     (++E)++ doesn't

I have it the other way round.

++E would consume and produce an lvalue.
E++ would consume an lvalue and produce an rvalue.

This appears to be the same as in C/C++ as mentioned at

https://en.cppreference.com/w/cpp/language/operator_incdec

even though you'd expect E to be 102 in both cases (and to deliver 101
in both cases too).

I wouldn't expect ++(E++) to work at all. E++ produces an rvalue which
would be no good as input to ++E as ++E requires an lvalue.

Prefix and postfix ++ are different operators, despite the visual
similarity.

And:

     ++ ++ E works
     E ++ ++ doesn't

That I agree with, which makes your earlier comments more puzzling. I am
not sure what you were driving at.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 19:13:04 2022

On 13/11/2022 16:26, Dmitry A. Kazakov wrote:

On 2022-11-13 16:54, James Harris wrote:

On 07/11/2022 16:16, Dmitry A. Kazakov wrote:

++ means cheap keyboard with broken keys or coffee spilled over it...
(:-))

A bit like Ada's --, then. ;-)

In Ada -- is a comment, not operator.

Indeed.

An interesting question regarding operator's symbol is: sticking to
ASCII or going Unicode. Let's say we wanted an increment operator (I do
not). Why ++ from 60's? Take the increment (∆) or the upwards arrow ↑ etc.

As you know, I don't like Unicode for program source. It can be hard to
type, hard to read aloud, and hard to compare when two glyphs look the
same but have different encodings. A small set of characters such as
ASCII has none of those problems.

As for replacing ++ with something else I have tried things like

+> prefix increment
<+ postfix increment

or maybe ++> and <++.

But I don't know if programmers in general would care for them.

Note that all arguments against Unicode apply to operators. If the thing
is difficult to type then it is difficult to remember: precedence level, associativity, semantics. If you can hold these in your head, you could remember the key combination as well. If you do not, then, maybe, having
a subprogram Increment() would be better choice?

Agreed.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Sun Nov 13 19:05:25 2022

On 07/11/2022 20:59, David Brown wrote:

On 07/11/2022 20:24, James Harris wrote:

On 07/11/2022 14:58, David Brown wrote:

On 07/11/2022 12:55, James Harris wrote:

...

So it should be possible to combine multiple ++
operators arbitrarily. For example,

   ++E + E++
   ++E++
   V = V++

Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.

I disagree entirely

Good. :)

- unless you include giving an error message saying the programmer
should be fired for writing gibberish as "well defined and something
you can rely on". I can appreciate not wanting such things to be
run-time undefined behaviour, but there is no reason at all to insist
that it is acceptable by the compiler.

As I said to Dmitry, if one wants to prohibit the above then one has
to define what exactly is being prohibited and to be careful not
thereby to prohibit something else that may be more legitimate.
Further, such a prohibition is an additional rule the programmer has
to learn.

No one said this was easy! Though Dmitry had some suggestions of rules
to try.

These prohibitions aren't really additional rules for the programmer to
learn - it is primarily about disallowing things that a good programmer
is not going to write in the first place. No one should actually care
if "++E++" is allowed or not, because they should never write it.

Yet a programmer may find such an expression in someone else's code.

Prohibiting it means you don't have to specify the order these operators
are applied, or whether the expression must be evaluated for
side-effects twice, or any of the rest of it. The only people that will have to learn something extra are the sort of programmers who think it
is smart to write line noise.

And the programmers who have to read such code.

All in all, ISTM better to define such expressions. The programmer is
not forced to use them but at least if they are present in code and
well defined then their meaning will be plain.

No, the meaning will /not/ be plain. That's the point. Ideally you
should only allow constructs that do exactly what they appear to do,
without the reader having to study the manuals to understand some indecipherable gibberish that is technically legal code but completely
alien to them because no sane programmer would write it.

There are tradeoffs. Make a language too simple and while it will be
easy to learn programs written in it will he indecipherable. By
contrast, make a language too complex and it becomes harder to learn and programs written in it can still be indecipherable except to veterans.

AISI somewhere between those two extremes is a sweet spot in which a
language is reasonably easy to learn, and programs written in it would naturally be reasonably easy to understand.

Take the first one,

   ++E + E++

It could be defined fairly easily. If operands to + are defined to
appear as though they were evaluated left then right and the ++
operators are set to be of higher precedence and defined to take
effect as soon as they are evaluated than

   ++E + E++

would evaluate as though the operations were

   ++E; E++; +

Then define it as "syntax error" and insist the programmer writes it sensibly.

It's not a typical syntax error. Each operator would have the correct
number of operators.

Further, it may not be known whether the expression is well defined or
not until run time. Consider two pointers, p0 and p1.

++(*p0) + (*p1)++

AIUI in some languages if p0 and p1 point to different locations then
the expression is well defined. If they point to the same location,
however, then the effect is not well defined.

There's no need for that complexity and uncertainty. Why not, instead,
say that the expression is defined to function as though the operands
were evaluated in a specific order? Wouldn't that be easier for a
programmer to understand and rely on?

I cannot conceive of a reason to have a pre-increment operator in a
modern language, nor would I want post-increment to return a value (nor
any other kind of assignment). Ban side-effects in expressions -
require a statement. "x = y + 1;" is a statement, so it can affect "x".
"y++;" is a statement - a convenient abbreviation for "y = y + 1;".
"++x" no longer exists, and "x + x++;" makes no sense because it mixes
an expression and a statement.

What is the cost? The programmer might have to split things into a few lines - but we have much bigger screens and vastly bigger disks than the
days when C was born. The programmer might need a few extra temporary variables - these are free with modern compiler techniques.

Ask yourself why "++x;" and the like exist in languages like C. The
reason is that early compilers were weak - they were close to dumb translators into assembly, and if you wanted efficient results using the features of the target processor, you needed to write your code in a way
that mimicked the actual processor instructions. "INC A" was faster
than "ADD A, 1", so you write "x++" rather than "x = x + 1". This is no longer the case in the modern world.

To be clear, my motivation for including ++x and x++ is not about any of
those things but is all about readability and (to a lesser extent)
writability. Some intentions are more naturally and more simply
expressed with autoincrement or autodecrement. If they can be used to
make code /clearer/ then ISTM it is worth doing the work to try to
include them.

Against that, however, is the concern that their inclusion may encourage programmers to write code which is unnecessarily cryptic. It /may/ be
better to require programmers to code x = x + 1 or similar instead. ATM
the jury is out.

...

I am aware that it might make optimisation harder to achieve but that
would only apply in some cases and is still, IMO, better than simply
saying "that's not defined".

IOW I welcome your disagreement but don't understand it!

I think it is great that you are happy to discuss this and I try my bes
to explain it.

Thanks, I appreciate the input!

Have to say that each of these discussions that we in this group have
over what would appear to be minutiae well illustrate how much time it
can take to resolve even tiny decisions in the design of a language!

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to James Harris on Sun Nov 13 19:30:13 2022

James Harris <james.harris.1@gmail.com> writes:

As you know, I don't like Unicode for program source. It can be hard to
type, hard to read aloud, and hard to compare when two glyphs look the
same but have different encodings. A small set of characters such as
ASCII has none of those problems.

But it can be cumbersome to escape all quotation characters
in a string, as in "\"\\" in C.

Imagine one would use some obscure Unicode characters as
string delimiters. For example,

ᒪ CANADIAN SYLLABICS MA
ᒧ CANADIAN SYLLABICS MO

. Programmers will surely find a way to map them to their
keyboards somehow. Then that string literal would be just
ᒧ"\ᒪ!

(One would just have to use escapes in the rare case that
one really needs to have those Canadian syllabics ma or mo
within a string literal.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Stefan Ram on Sun Nov 13 20:01:34 2022

On 13/11/2022 19:30, Stefan Ram wrote:

James Harris <james.harris.1@gmail.com> writes:

As you know, I don't like Unicode for program source. It can be hard to
type, hard to read aloud, and hard to compare when two glyphs look the
same but have different encodings. A small set of characters such as
ASCII has none of those problems.

But it can be cumbersome to escape all quotation characters
in a string, as in "\"\\" in C.

Rather than allowing non-ASCII in source I came up with a scheme of what
you might call 'named characters' extending the backslash idea of C to
allow names instead of single characters after the backslash. It's off
topic for this thread but it allows non-ASCII characters to be named
(such that the names consist of ASCII characters and would thus be
readable and universal).

Imagine one would use some obscure Unicode characters as
string delimiters. For example,

ᒪ CANADIAN SYLLABICS MA
ᒧ CANADIAN SYLLABICS MO

. Programmers will surely find a way to map them to their
keyboards somehow. Then that string literal would be just
ᒧ"\ᒪ!

I cannot read that. Nor could many other programmers.

(One would just have to use escapes in the rare case that
one really needs to have those Canadian syllabics ma or mo
within a string literal.)

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 19:18:12 2022

On 08/11/2022 08:23, Dmitry A. Kazakov wrote:

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in it
less comfortable, like the simplicity of a hair shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++; you
are wrong. Actually that train left the station. Today C++ fun is
templates. It is monstrous instantiations over instantiations barely resembling program code. Modern times is a glorious combination of
Python performance with K&R C readability! (:-))

Contrast

*p := *q
p := p + 1
q := q + 1

Perhaps

+p++ := *q++

expresses that part of the algorithm in a way which is more natural and
easier to read...?

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Sun Nov 13 19:41:56 2022

On 07/11/2022 14:23, Bart wrote:

On 07/11/2022 11:55, James Harris wrote:

...

For unary operators, the evaluation order is rather peculiar yet seems
to be used in quite a few languages without anyone questioning it. So if
`a b c d` are unary operators, then the following:

   a b E c d

is evaluated like this:

      a (b ((E c) d))

That is, first all the post-fix operators in left-to-right order, then
all the prefix ones in right-left order. It sounds bizarre when put like that!

Indeed, right then left does sound unnatural, albeit that it's
relatively easy to remember. Could it be the comparative simplicity of
"right then left" is why C has postfix ++ of higher precedence than
prefix ++, even though that would be an awkward way to order such
operations?

Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,

   ++E + E++

This is well defined, as unary operators bind more tightly than binary
ones. This is just (++E) + (++E).

Isn't it, rather,

(++E) + (E++)

?

However the evaluation order for '+' is not usually well-defined, so you don't know which operand will be done first.

   ++E++

This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++).

Perhaps the scheme is wrong (for some definition of wrong)...?

IMO (++E)++ makes more sense.

But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.

Better to give ++ (prefix) precedence over ++ (postfix), perhaps.

...

At any rate, that distinction between prefix and postfix ++ seems to
be recognised at the following link where it says "Prefix versions of
the built-in operators return references and postfix versions return
values."

   https://en.cppreference.com/w/cpp/language/operator_incdec

I tried to get ++E++ to work using a suitable type for E, but in my
language it cannot work, as the first ++ still needs an lvalue; just an rvalue which has a pointer type won't cut it.

However ++E++^ can work, where ^ is deref, and E is a pointer.

I think this is because in my language, for something to be a valid
lvalue, you need to be able to apply & address-of to it. The result of
E++ doesn't have an address. But (E++)^ works because & and ^ cancel
out. Or something...

As a guess, could your compiler be making an optimisation (coalescing postincrement and dereference) in a way which contravenes the definition
of the operators?

Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?

   ++E++^
   ++E^++

Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!

You have the same issues in C, but that's OK because people are so
familiar with it. Also * deref is a prefix operator so you never have
two distinct postfix operators, unless you write E++ --.

But yes, parentheses are recommended when mixing certain prefix/postfix
ops. I think this one is clear enough however:

   -E^

Deference E then negate the result. As is this: -E[i]; you wouldn't
assume that meant (-E)[i].

Yes, of the operators you mention I have the precedence order as

array indexing
dereference
unary minus

Thus my expression parser ought to do the sensible thing in each case.
The example you give is interesting because IIRC you give all unary
prefix operators a very high precedence. ISTM that such an approach
could try to evaluate the above wrongly and that it may be better to
order precedences by what the operators consume and produce than just to
boost the precedences of prefix operators. A prime example is boolean
not. In

not a > b

the "a > b" part produces a boolean, which is a natural input to "not".
It is surely sensible to have all boolean consumers as of lower
precedence than operations which produce booleans rather than saying
that "not" has a higher precedence because it is a prefix operator. Your choice, of course, but I know you appreciate a challenge. ;-)

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 20:49:09 2022

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

On 08/11/2022 08:23, Dmitry A. Kazakov wrote:

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++; you
are wrong. Actually that train left the station. Today C++ fun is
templates. It is monstrous instantiations over instantiations barely
resembling program code. Modern times is a glorious combination of
Python performance with K&R C readability! (:-))

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more natural
and easier to read...?

No algorithm requires you resort to pointers. With arrays it is plain assignment:

   p := q;

You're assuming this is part of a loop. But perhaps other things are
happening after each *p++ = *q++: the next source or dest might be
different, or the next transfer might be of a different type and/or size.

This is what makes such lower level operations so useful. A solution in
a higher level but more limiting language might require ingenuity to get
around the strictness.

which BTW could be performed in parallel or by a single instruction on a
CISC machine or with a bunch optimizations.

Well, perhaps this *p++=*q++ is part of the result of such a process,
where the target happens to be C source code. Few languages higher level
than ASM are suited for that job.

But by all means continue to pour scorn on it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Bart on Sun Nov 13 22:14:28 2022

On 2022-11-13 21:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

On 08/11/2022 08:23, Dmitry A. Kazakov wrote:

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun
is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more natural
and easier to read...?

No algorithm requires you resort to pointers. With arrays it is plain
assignment:

    p := q;

You're assuming this is part of a loop. But perhaps other things are happening after each *p++ = *q++: the next source or dest might be
different, or the next transfer might be of a different type and/or size.

This is what makes such lower level operations so useful. A solution in
a higher level but more limiting language might require ingenuity to get around the strictness.

which BTW could be performed in parallel or by a single instruction on
a CISC machine or with a bunch optimizations.

Well, perhaps this *p++=*q++ is part of the result of such a process,
where the target happens to be C source code. Few languages higher level
than ASM are suited for that job.

But by all means continue to pour scorn on it.

Sure, the point was about algorithms. Some artificially constructed
cases do not count for programmers. This is why there is no need in such low-level languages. If something falls out of effective and safe
techniques, the program gets redesigned.

It is called engineering. If you have to design a new type of screw for
your project, you are a bad engineer.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Sun Nov 13 21:19:11 2022

On 2022-11-13 20:18, James Harris wrote:

On 08/11/2022 08:23, Dmitry A. Kazakov wrote:

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++; you
are wrong. Actually that train left the station. Today C++ fun is
templates. It is monstrous instantiations over instantiations barely
resembling program code. Modern times is a glorious combination of
Python performance with K&R C readability! (:-))

Contrast

*p := *q
p := p + 1
q := q + 1

Perhaps

+p++ := *q++

expresses that part of the algorithm in a way which is more natural and easier to read...?

No algorithm requires you resort to pointers. With arrays it is plain assignment:

p := q;

which BTW could be performed in parallel or by a single instruction on a
CISC machine or with a bunch optimizations. Consider the case when the
target object requires construction and destruction. Referencing would
need to construct a new object, destruct target, construct new target,
destruct copy. In Ada the compiler is allowed to skip the intermediates
and perform bitwise copy with "adjusting" the result.

The morale: low-level overly specified stuff is not only uncomfortable
and dangerous it is greatly inefficient in large scale programming.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Sun Nov 13 21:06:34 2022

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

On 08/11/2022 08:23, Dmitry A. Kazakov wrote:

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun
is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more natural
and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just as easily
have

a[i++] := b[j++]

With arrays it is plain
assignment:

    p := q;

You're assuming this is part of a loop. But perhaps other things are happening after each *p++ = *q++: the next source or dest might be
different, or the next transfer might be of a different type and/or size.

Yes, and less than a whole array may be being copied, or part of an
array may be being copied over another part of the same array, etc.

This is what makes such lower level operations so useful. A solution in
a higher level but more limiting language might require ingenuity to get around the strictness.

which BTW could be performed in parallel or by a single instruction on
a CISC machine or with a bunch optimizations.

Well, perhaps this *p++=*q++ is part of the result of such a process,
where the target happens to be C source code. Few languages higher level
than ASM are suited for that job.

Compilers are getting good at recognising code which can be turned into intrinsics, though IMO we shouldn't ask too much of them.

For example, an entire loop to count the bits set in a word may be
recognised by a compiler and on an x86 target turned into a single
popcnt instruction but AISI it would be better for the language to
supply pseudofunctions for operations such as that. Why? Primarily
because they make the source code easier to read.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Nov 13 22:10:55 2022

On 13/11/2022 21:29, Dmitry A. Kazakov wrote:

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

On 08/11/2022 08:23, Dmitry A. Kazakov wrote:

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from
your suggestions but can't help but think they would make
programming in it less comfortable, like the simplicity of a hair >>>>>>> shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun >>>>>> is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more natural
and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just as
easily have

   a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square root,
post exponentiation, post factorial when the humankind need them so bad?

I cannot tell what point you are trying to make.

You remind be a salesman selling the toaster equipped with a toilet
brush. Know what? I do not need this combination...

I think you only need the toilet brush.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Sun Nov 13 22:29:13 2022

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

On 08/11/2022 08:23, Dmitry A. Kazakov wrote:

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from your >>>>>> suggestions but can't help but think they would make programming
in it less comfortable, like the simplicity of a hair shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun
is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more natural
and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just as easily
have

a[i++] := b[j++]

Same question. What for?

Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square root,
post exponentiation, post factorial when the humankind need them so bad?

You remind be a salesman selling the toaster equipped with a toilet
brush. Know what? I do not need this combination...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 01:10:59 2022

On 13/11/2022 21:29, Dmitry A. Kazakov wrote:

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

On 08/11/2022 08:23, Dmitry A. Kazakov wrote:

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from
your suggestions but can't help but think they would make
programming in it less comfortable, like the simplicity of a hair >>>>>>> shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun >>>>>> is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more natural
and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just as
easily have

   a[i++] := b[j++]

Same question. What for?

Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square root,
post exponentiation, post factorial when the humankind need them so bad?

You remind be a salesman selling the toaster equipped with a toilet
brush. Know what? I do not need this combination...

But I do!

My largest program uses ++ nearly 1000 times (all varieties).

It would use expexp, sqrtsqrt or !! exactly zero times each. So it would
be very poor language design, and pointless.

There is also no prior art; noone would have a clue what it meant.
Neither do I; I think your sqrtsqrt means:

(t:=x; x:=sqrt(x); t)

But this is easy enough to implement within usercode; it doesn't need a dedicated language feature, as it's not something that a decent
percentage of a language's community would commonly use.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 14 08:52:21 2022

On 2022-11-13 23:10, James Harris wrote:

On 13/11/2022 21:29, Dmitry A. Kazakov wrote:

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

On 08/11/2022 08:23, Dmitry A. Kazakov wrote:

On 2022-11-08 09:04, James Harris wrote:

I like the simplicity of the language which would result from
your suggestions but can't help but think they would make
programming in it less comfortable, like the simplicity of a
hair shirt. ;)

If you think programmers are dying to write stuff like *p++=*q++; >>>>>>> you are wrong. Actually that train left the station. Today C++
fun is templates. It is monstrous instantiations over
instantiations barely resembling program code. Modern times is a >>>>>>> glorious combination of Python performance with K&R C
readability! (:-))

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more
natural and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just as
easily have

   a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need them
so bad?

I cannot tell what point you are trying to make.

There exist an infinite number of combinations that could be operators.
E.g. divide by two and reboot the computer.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Mon Nov 14 10:26:46 2022

On 13/11/2022 18:11, James Harris wrote:

On 08/11/2022 16:29, David Brown wrote:

On 08/11/2022 14:24, James Harris wrote:

...

You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can
try. Aim to make it harder to write convoluted code, and easier to
write clearer code. And try to make the clearer code more efficient,
to reduce the temptation to write evil code.

I agree

...

In a similar way, programming can be hard when other programmers
write constructs we don't like. I agree that it's best for a language
to help programmers write readable and comprehensible programs - and
even to make them the easiest to write, if possible - but the very
flexibility which may allow them to do so may also give then the
freedom to write code we don't care for. I don't think one can
legislate against that.

I'm not sure it is the same - after all, if some one exercises their
rights to speak gibberish, or to give long, convoluted and
incomprehensible speaches, the listener has the right to go away,
ignore them, or fall asleep. It's harder for a compiler to do that!

I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!

You /say/ that, but you don't appear to believe it or be interested in
making it happen.

On the one side, you claim you want a clear language that is
understandable for programmers and maintenance. On the other side, you
want to decide what "++E++" should mean, with random "^" characters
thrown in for good measure.

These two statements go together as well as Dmitry's toaster and toilet
brush. It doesn't matter how precisely you define how the combination
can be used and what it does, it is still not a good or useful thing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to James Harris on Mon Nov 14 10:24:18 2022

James Harris <james.harris.1@gmail.com> writes:

Rather than allowing non-ASCII in source I came up with a scheme of what
you might call 'named characters' extending the backslash idea of C to
allow names instead of single characters after the backslash. It's off
topic for this thread but it allows non-ASCII characters to be named
(such that the names consist of ASCII characters and would thus be
readable and universal).

Here's an example of a Python program.

print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

It prints:

\"

. But in Python one could also write

print( r'\"' )

to get the same output.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 14 11:32:46 2022

On 2022-11-14 11:16, James Harris wrote:

On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

On 2022-11-13 23:10, James Harris wrote:

On 13/11/2022 21:29, Dmitry A. Kazakov wrote:

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

...

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more
natural and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just as
easily have

   a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

There's no particular algorithm; the construct is a potential component
of many algorithms.

Show me one that is not array assignment.

It's as though someone points out a brick and
someone else says "what does it mean?"

No, it is like building up a brick factory in your back garden for the
purpose of cracking walnuts...

Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need
them so bad?

I cannot tell what point you are trying to make.

There exist an infinite number of combinations that could be
operators. E.g. divide by two and reboot the computer.

OK. ATM I have a generous **but limited** number of operators.

Limited by which criteria?

Again, IMO it's important for the language to provide such
pseudofunctions so that a programmer's code can be made clearer,
simpler, and more readable.

Like this one?

++p+++

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Mon Nov 14 10:47:29 2022

On 14/11/2022 09:26, David Brown wrote:

On 13/11/2022 18:11, James Harris wrote:

On 08/11/2022 16:29, David Brown wrote:

I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!

You /say/ that, but you don't appear to believe it or be interested in
making it happen.

On the one side, you claim you want a clear language that is
understandable for programmers and maintenance. On the other side, you
want to decide what "++E++" should mean, with random "^" characters
thrown in for good measure.

In-place, value-returning increment ops written as ++ and -- are common
in languages.

So are pointer-dereference operators in lower-level languages, whether
written as * or ^.

Once you have those two possibilities in a language, why shouldn't you
define what combinations of those operators might mean?

(I just differ from James in thinking that successive *value-returning**
++ or -- operators, whether prefix or postfix, are not meaningful. I'd
also think it would be bad form to chain them, but it is not practical
to be ban at the syntax level.

However, I have sometimes banned even `a+b` in some contexts, when the resulting value is unused.)

Is your point that you shouldn't have either of those operators? ++ and
-- can be replaced at some inconvenience. But getting rid of dereference
is harder; if P is a pointer:

print P

will this display the value of the pointer, or the value of its target?
If only there was a way to specify that precisely!

Note that when p and q are byte pointers, then *p++ = *q++ (or p++^ :=
q++^) corresponds to the one-byte Z80 LDI instruction.

So it's something so meaningless that that tiny 8-bit processor decided
to give it its own instruction.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Mon Nov 14 10:44:12 2022

On 14/11/2022 09:26, David Brown wrote:

On 13/11/2022 18:11, James Harris wrote:

On 08/11/2022 16:29, David Brown wrote:

...

You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can
try. Aim to make it harder to write convoluted code, and easier to
write clearer code. And try to make the clearer code more efficient,
to reduce the temptation to write evil code.

I agree

...

I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!

You /say/ that, but you don't appear to believe it or be interested in
making it happen.

That comment surprises me a little. /The main point/ of this, AISI, is
to make the job of the programmer simpler and to help him write code
which is more readable. You said yourself that (paraphrasing) when
there's a choice between clear and convoluted code it's important for a language to make the clearer code the easier one to write.

On the one side, you claim you want a clear language that is
understandable for programmers and maintenance.

Yes.

On the other side, you
want to decide what "++E++" should mean, with random "^" characters
thrown in for good measure.

Not quite. As language designer I have to decide what facilities will be provided but I do not have absolute control over what a programmer may
do with the facilities. Nor would a programmer want to work with a
language which implemented unnecessary rules or rules which he may see
as arbitrary.

The expression you mention is just one of a myriad of what you might
consider to be potential nasties. If I am going to prohibit that one
then what about all the others?

These two statements go together as well as Dmitry's toaster and toilet brush. It doesn't matter how precisely you define how the combination
can be used and what it does, it is still not a good or useful thing.

OK, let's take the combination you mentioned:

++E++

I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in the
order that it says it. Remember that unlike C I define the apparent
order of evaluation so the expression is perfectly well formed.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 11:00:02 2022

On 14/11/2022 10:32, Dmitry A. Kazakov wrote:

On 2022-11-14 11:16, James Harris wrote:

a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

There's no particular algorithm; the construct is a potential
component of many algorithms.

Show me one that is not array assignment.

What exactly is your objection: that there shouldn't be an increment
operator at all?

Then it's end of discussion. But if you allow a value-returning
increment operator, then someone could use it in multiple places in the
same expresion, together with other operators, and the language has to
be able to deal with it.

Note that any language that has reference parameters would allow this:

a[postincr(i)] := b[postincr(j)]

Or is that something else you're not keen on?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Mon Nov 14 11:11:57 2022

On 14/11/2022 10:44, James Harris wrote:

On 14/11/2022 09:26, David Brown wrote:

On 13/11/2022 18:11, James Harris wrote:

On 08/11/2022 16:29, David Brown wrote:

...

You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can
try. Aim to make it harder to write convoluted code, and easier to >>>> write clearer code. And try to make the clearer code more
efficient, to reduce the temptation to write evil code.

I agree

...

I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!

You /say/ that, but you don't appear to believe it or be interested in
making it happen.

That comment surprises me a little. /The main point/ of this, AISI, is
to make the job of the programmer simpler and to help him write code
which is more readable. You said yourself that (paraphrasing) when
there's a choice between clear and convoluted code it's important for a language to make the clearer code the easier one to write.

On the one side, you claim you want a clear language that is
understandable for programmers and maintenance.

Yes.

On the other side, you want to decide what "++E++" should mean, with
random "^" characters thrown in for good measure.

Not quite. As language designer I have to decide what facilities will be provided but I do not have absolute control over what a programmer may
do with the facilities. Nor would a programmer want to work with a
language which implemented unnecessary rules or rules which he may see
as arbitrary.

The expression you mention is just one of a myriad of what you might
consider to be potential nasties. If I am going to prohibit that one
then what about all the others?

These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.

OK, let's take the combination you mentioned:

++E++

I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in the
order that it says it. Remember that unlike C I define the apparent
order of evaluation so the expression is perfectly well formed.

But does this have the same priorities as:

op1 E op2

(where op2 is commonly done first) or does it have special rules, so
that in:

--E++

the -- is done first? If it's different, then what is the ordering when
mixed with other unary ops?

You explained somewhere the circumstances where you think this is
meaningful, but I can't remember what the rules are and I can't find the
exact post.

This is the problem. You shouldn't need to stop and think. I make the
rules simple by stipulating that value-returning ++ and -- only ever
return rvalues.

Because if they ever start to return lvalues, then this becomes possible:

++E := 0
E++ := 0

(Whichever one is legal in your scheme.) So I think there is little
useful expressivity to be gained.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Bart on Mon Nov 14 12:12:52 2022

On 2022-11-14 12:00, Bart wrote:

On 14/11/2022 10:32, Dmitry A. Kazakov wrote:

On 2022-11-14 11:16, James Harris wrote:

a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

There's no particular algorithm; the construct is a potential
component of many algorithms.

Show me one that is not array assignment.

What exactly is your objection: that there shouldn't be an increment
operator at all?

There is no objection because, so far, there was no argument.

Note that any language that has reference parameters would allow this:

a[postincr(i)] := b[postincr(j)]

Or is that something else you're not keen on?

That language shall not have by-value or by-reference objects unless
explicitly contracted by the programmer. Forcing by-reference on scalars
is the best way to turn a modern processor into an i368.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 11:03:57 2022

On 14/11/2022 10:32, Dmitry A. Kazakov wrote:

On 2022-11-14 11:16, James Harris wrote:

On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

On 2022-11-13 23:10, James Harris wrote:

On 13/11/2022 21:29, Dmitry A. Kazakov wrote:

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

...

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more
natural and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just as
easily have

   a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

There's no particular algorithm; the construct is a potential
component of many algorithms.

Show me one that is not array assignment.

if is_name_first(b[j])
a[i++] = b[j++]
rep while is_name_follow(b[j])
a[i++] = b[j++]
end rep
a[i] = 0
return TOK_NAME
end if

Now, what don't you like about the ++ operators in that? How would you
prefer to write it?

It's as though someone points out a brick and someone else says "what
does it mean?"

No, it is like building up a brick factory in your back garden for the purpose of cracking walnuts...

Says the man who likes Ada! ;-)

Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need
them so bad?

I cannot tell what point you are trying to make.

There exist an infinite number of combinations that could be
operators. E.g. divide by two and reboot the computer.

OK. ATM I have a generous **but limited** number of operators.

Limited by which criteria?

The set of operators is limited to what's reasonably necessary such as
the usual stuff: function calls, array references, field selection,
bitwise operations, arithmetic, comparison, boolean and assignment. Most
are present in C; only a few are not such as bitwise combinations (e.g.
nand and nor) and these two: concatenate and boolean (aka logical) xor.
What's so bad about that?

Again, IMO it's important for the language to provide such
pseudofunctions so that a programmer's code can be made clearer,
simpler, and more readable.

Like this one?

   ++p+++

I don't have a +++ operator so I am not sure what that is supposed to
mean. It's no valid in my language.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 10:16:13 2022

On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

On 2022-11-13 23:10, James Harris wrote:

On 13/11/2022 21:29, Dmitry A. Kazakov wrote:

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

...

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more
natural and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just as
easily have

   a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

There's no particular algorithm; the construct is a potential component
of many algorithms. It's as though someone points out a brick and
someone else says "what does it mean?" or "show me the house that would
be built from it before I can judge whether a brick is a good idea or
not". It's a potential component, nothing more.

Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need
them so bad?

I cannot tell what point you are trying to make.

There exist an infinite number of combinations that could be operators.
E.g. divide by two and reboot the computer.

OK. ATM I have a generous **but limited** number of operators. For
anything else the programmer would have to call a function. For example,
I mentioned here recently that some compilers are extensive enough that
they can recognise a loop which counts the number of bits set in a word.
I would not have that as an operator but as a function (or
pseudofunction). It might be invoked on c as

x := a + b + bitcount(c) + d

Again, IMO it's important for the language to provide such
pseudofunctions so that a programmer's code can be made clearer,
simpler, and more readable.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Mon Nov 14 11:06:50 2022

On 13/11/2022 16:55, James Harris wrote:

On 10/11/2022 10:42, Bart wrote:

It's clear that ++(A+B) can't work unless you change what ++ means
(eg. ++A now means (A+1) because whatever ++ modifies is not accessible).

It's similar if A were a struct. You could have

A.F

but you could not have

(A + 4).F

The LHS of the . operation, in this case, has to be an lvalue.

That's not right. C allows functions to return structs; they are not
lvalues, but you can apply ".":

typedef struct{int x,y;}Point;

Point F(void) {
Point p;
return p;
}

int main(void)
{
Point p;
int a;

// F()=p; // Not valid: not an lvalue
a=F().x; // Valid
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 14 12:29:10 2022

On 2022-11-14 12:03, James Harris wrote:

On 14/11/2022 10:32, Dmitry A. Kazakov wrote:

On 2022-11-14 11:16, James Harris wrote:

On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

On 2022-11-13 23:10, James Harris wrote:

On 13/11/2022 21:29, Dmitry A. Kazakov wrote:

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

...

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more >>>>>>>>>> natural and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just as >>>>>>> easily have

   a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

There's no particular algorithm; the construct is a potential
component of many algorithms.

Show me one that is not array assignment.

if is_name_first(b[j])
    a[i++] = b[j++]
    rep while is_name_follow(b[j])
      a[i++] = b[j++]
    end rep
    a[i] = 0
    return TOK_NAME
end if

Now, what don't you like about the ++ operators in that? How would you
prefer to write it?

From parser production code:

procedure Get_Identifier
( Code : in out Source'Class;
Line : String;
Pointer : Integer;
Argument : out Tokens.Argument_Token
) is
Index : Integer := Pointer + 1;
Malformed : Boolean := False;
Underline : Boolean := False;
Symbol : Character;
begin
while Index <= Line'Last loop
Symbol := Line (Index);
if Is_Alphanumeric (Symbol) then
Underline := False;
elsif '_' = Symbol then
Malformed := Malformed or Underline;
Underline := True;
else
exit;
end if;
Index := Index + 1;
end loop;
Malformed := Malformed or Underline;
Set_Pointer (Code, Index);
Argument.Location := Link (Code);
Argument.Value := new Identifier (Index - Pointer);
declare
This : Identifier renames Identifier (Argument.Value.all);
begin
This.Location := Argument.Location;
This.Malformed := Malformed;
This.Value := Line (Pointer..Index - 1);
end;
end Get_Identifier;

It's as though someone points out a brick and someone else says "what
does it mean?"

No, it is like building up a brick factory in your back garden for the
purpose of cracking walnuts...

Says the man who likes Ada! ;-)

Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need
them so bad?

I cannot tell what point you are trying to make.

There exist an infinite number of combinations that could be
operators. E.g. divide by two and reboot the computer.

OK. ATM I have a generous **but limited** number of operators.

Limited by which criteria?

The set of operators is limited to what's reasonably necessary such as
the usual stuff: function calls, array references, field selection,
bitwise operations, arithmetic, comparison, boolean and assignment. Most
are present in C; only a few are not such as bitwise combinations (e.g.
nand and nor) and these two: concatenate and boolean (aka logical) xor. What's so bad about that?

You came to a conclusion without putting any objective criteria upfront.

Again, IMO it's important for the language to provide such
pseudofunctions so that a programmer's code can be made clearer,
simpler, and more readable.

Like this one?

    ++p+++

I don't have a +++ operator so I am not sure what that is supposed to
mean. It's no valid in my language.

It could a part of even simpler and brilliantly readable:

++p+++q+++r

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 12:45:25 2022

On 14/11/2022 11:29, Dmitry A. Kazakov wrote:

On 2022-11-14 12:03, James Harris wrote:

On 14/11/2022 10:32, Dmitry A. Kazakov wrote:

On 2022-11-14 11:16, James Harris wrote:

On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

On 2022-11-13 23:10, James Harris wrote:

On 13/11/2022 21:29, Dmitry A. Kazakov wrote:

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

...

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more >>>>>>>>>>> natural and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just >>>>>>>> as easily have

   a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

There's no particular algorithm; the construct is a potential
component of many algorithms.

Show me one that is not array assignment.

   if is_name_first(b[j])
     a[i++] = b[j++]
     rep while is_name_follow(b[j])
       a[i++] = b[j++]
     end rep
     a[i] = 0
     return TOK_NAME
   end if

Now, what don't you like about the ++ operators in that? How would you
prefer to write it?

From parser production code:

procedure Get_Identifier
          ( Code     : in out Source'Class;
             Line     : String;
             Pointer : Integer;
             Argument : out Tokens.Argument_Token
          ) is
   Index     : Integer := Pointer + 1;
   Malformed : Boolean := False;
   Underline : Boolean := False;
   Symbol    : Character;
begin
   while Index <= Line'Last loop
      Symbol := Line (Index);
      if Is_Alphanumeric (Symbol) then
         Underline := False;
      elsif '_' = Symbol then
         Malformed := Malformed or Underline;
         Underline := True;
      else
         exit;
      end if;
      Index := Index + 1;
   end loop;
   Malformed := Malformed or Underline;
   Set_Pointer (Code, Index);
   Argument.Location := Link (Code);
   Argument.Value := new Identifier (Index - Pointer);
   declare
      This : Identifier renames Identifier (Argument.Value.all);
   begin
      This.Location := Argument.Location;
      This.Malformed := Malformed;
      This.Value     := Line (Pointer..Index - 1);
   end;
end Get_Identifier;

Clearly you get paid by the line. Even then, the code where a substring
is copied into another location, which would require the double-stepping
of the relevant pointer/indices of the earlier example, is missing here.

I don't have a +++ operator so I am not sure what that is supposed to
mean. It's no valid in my language.

It could a part of even simpler and brilliantly readable:

++p+++q+++r

This is legal in Ada:

a := +b;

But for some reason, you can't write ++b or + +b, it has to be `a := +
(+b)`. So while you can't do +b++++c, you can write:

a:=+b+(+(+(+(c))));

Do the parentheses make this acceptable? I guess not, since you won't
like those ++ ops no matter how many brackets are added.

My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little about it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Bart on Mon Nov 14 14:17:29 2022

On 2022-11-14 13:45, Bart wrote:

On 14/11/2022 11:29, Dmitry A. Kazakov wrote:

On 2022-11-14 12:03, James Harris wrote:

On 14/11/2022 10:32, Dmitry A. Kazakov wrote:

On 2022-11-14 11:16, James Harris wrote:

On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

On 2022-11-13 23:10, James Harris wrote:

On 13/11/2022 21:29, Dmitry A. Kazakov wrote:

On 2022-11-13 22:06, James Harris wrote:

On 13/11/2022 20:49, Bart wrote:

On 13/11/2022 20:19, Dmitry A. Kazakov wrote:

On 2022-11-13 20:18, James Harris wrote:

...

Contrast

   *p := *q
   p := p + 1
   q := q + 1

Perhaps

   +p++ := *q++

expresses that part of the algorithm in a way which is more >>>>>>>>>>>> natural and easier to read...?

No algorithm requires you resort to pointers.

This doesn't have to be about pointers, Dmitry. One could just >>>>>>>>> as easily have

   a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

There's no particular algorithm; the construct is a potential
component of many algorithms.

Show me one that is not array assignment.

   if is_name_first(b[j])
     a[i++] = b[j++]
     rep while is_name_follow(b[j])
       a[i++] = b[j++]
     end rep
     a[i] = 0
     return TOK_NAME
   end if

Now, what don't you like about the ++ operators in that? How would
you prefer to write it?

From parser production code:

procedure Get_Identifier
           ( Code     : in out Source'Class;
              Line     : String;
              Pointer : Integer;
              Argument : out Tokens.Argument_Token
           ) is
    Index     : Integer := Pointer + 1;
    Malformed : Boolean := False;
    Underline : Boolean := False;
    Symbol    : Character;
begin
    while Index <= Line'Last loop
       Symbol := Line (Index);
       if Is_Alphanumeric (Symbol) then
          Underline := False;
       elsif '_' = Symbol then
          Malformed := Malformed or Underline;
          Underline := True;
       else
          exit;
       end if;
       Index := Index + 1;
    end loop;
    Malformed := Malformed or Underline;
    Set_Pointer (Code, Index);
    Argument.Location := Link (Code);
    Argument.Value := new Identifier (Index - Pointer);
    declare
       This : Identifier renames Identifier (Argument.Value.all);
    begin
       This.Location := Argument.Location;
       This.Malformed := Malformed;
       This.Value     := Line (Pointer..Index - 1);
    end;
end Get_Identifier;

Clearly you get paid by the line. Even then, the code where a substring
is copied into another location, which would require the double-stepping
of the relevant pointer/indices of the earlier example, is missing here.

It is there:

This.Value := Line (Pointer..Index - 1);

assigning array as a whole.

I don't have a +++ operator so I am not sure what that is supposed to
mean. It's no valid in my language.

It could a part of even simpler and brilliantly readable:

   ++p+++q+++r

This is legal in Ada:

     a := +b;

But for some reason, you can't write ++b or + +b, it has to be `a := +
(+b)`. So while you can't do +b++++c, you can write:

     a:=+b+(+(+(+(c))));

Do the parentheses make this acceptable?

I don't understand the point. Why would you like to have two unary
pluses in a row?

My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little about it.

No, the point is that no reasonable code should be nonsense-looking and conversely.

Increments cross that line. They were perfectly acceptable in K&R C. C
was a very large and quite complex language then. It was beautiful
comparing to FORTRAN IV, but I could not use it on a 64K machine. A
5-pass C compiler took an eternity. Machines then were small and simple. Programs were tiny. *++p was a reasonably complex code. When looking at
the program having this, it was perfectly clear what it does and why it
is there. That was 40 years ago. Now, even a microcontoller is far more
complex and powerful and programmed in a very different manner. We just
do not need ++ anymore anywhere.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Mon Nov 14 15:47:25 2022

On 14/11/2022 11:44, James Harris wrote:

On 14/11/2022 09:26, David Brown wrote:

On 13/11/2022 18:11, James Harris wrote:

On 08/11/2022 16:29, David Brown wrote:

...

You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can
try. Aim to make it harder to write convoluted code, and easier to >>>> write clearer code. And try to make the clearer code more
efficient, to reduce the temptation to write evil code.

I agree

...

I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!

You /say/ that, but you don't appear to believe it or be interested in
making it happen.

That comment surprises me a little.

It shouldn't, given what you have written and what /I/ have written in response.

/The main point/ of this, AISI, is
to make the job of the programmer simpler and to help him write code
which is more readable. You said yourself that (paraphrasing) when
there's a choice between clear and convoluted code it's important for a language to make the clearer code the easier one to write.

We agree on this as a major point of a language.

What we disagree on is whether attempting to assign meaning to
random-looking collections of punctuation characters furthers that goal.

On the one side, you claim you want a clear language that is
understandable for programmers and maintenance.

Yes.

On the other side, you want to decide what "++E++" should mean, with
random "^" characters thrown in for good measure.

Not quite. As language designer I have to decide what facilities will be provided but I do not have absolute control over what a programmer may
do with the facilities.

Agreed.

Nor would a programmer want to work with a
language which implemented unnecessary rules or rules which he may see
as arbitrary.

Disagreed.

Baring the Barts of the world, /every/ programmer works with languages
that contain rules that he/she thinks are unnecessary, arbitrary, or at
the very least, sub-optimal. I have never heard of anyone who thinks
the language they use is perfect in every way.

Your challenge is to make a language that makes it easy to write good
code, and hard to write bad code. That second part is at least as
important, arguably significantly more important, than the first part.
It is much harder to achieve in language design. (And it is impossible
to achieve completely - it is /always/ possible to write bad code.)

The expression you mention is just one of a myriad of what you might
consider to be potential nasties. If I am going to prohibit that one
then what about all the others?

Prohibit nasty ones.

A big step in that direction is to say that assignment is a statement,
not an expression, and that variables cannot be changed by side-effects.
(How you relate this to function calls is a related and complex issue
that I have been glossing over here. An idea would be to distinguish
between "procedures" that may have side effects, and "functions" that do
not.)

That means there is no such thing as an "increment" operator - post or pre.

It also /hugely/ simplifies the language - both for the programmer, and
for the implementer. If expressions have no side-effects, they can be duplicated, split up, re-arranged, moved around in code, all without
affected the behaviour of the program.

/You/ are responsible for your language, and you /can/ make arbitrary
decisions and limitations. (My suggestions have not been arbitrary -
they have been backed up by reasoning, even if opinions on them differ.)
If you want to say that functions with a cyclomatic complexity over 20
cause a warning and those over 30 cause a hard error, that's /your/
decision. Some programmers will complain that they can't translate
their old C/Pascal/Fortran/APL/whatever code directly into your
language. Other programmers will be glad because the code they are
faced with is guaranteed not to have big, incomprehensible functions.

These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.

OK, let's take the combination you mentioned:

++E++

I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in the
order that it says it. Remember that unlike C I define the apparent
order of evaluation so the expression is perfectly well formed.

The very fact that you are discussing how to define it means it is not
clear and obvious. It is not obvious which order the increments happen,
or if the order is defined, or if the order matters. It is not obvious
what the return value should be. It is not obvious where you have
lvalues or rvalues (not that a language should necessarily have such
concepts). It is not obvious what happens to E.

The only thing that /is/ obvious about it, is that any use of it in real
code would be an extraordinarily bad idea.

Thus for a language targeting humans (some languages are for
intermediate code, or very low-level implementations stuff) with an aim
to help people write good clear code and avoid bad an incomprehensible
code, there is only one reaction a compiler could have to that
expression - it /must/ be an error of some kind.

It is up to /you/ to draw the lines between things that are acceptable
and things that are not, with the understanding that such a line is
never going to be perfect - it will have sub-optimal choices in both directions. I've given suggestions - you don't have to follow them.

What you don't get to do is claim "I want to help programmers write good
code" and then /allow/ this kind of shite with specific rules to define
it. That's far worse than C saying "if you write ++E + E++, the
behaviour is undefined - it's stupid, wrong, and you are on your own if
you write such nonsense, but I can't stop you doing it".

There's a very tempting myth in language design that /defining/
behaviour is key - that gibberish and incorrect code can somehow be made "correct" by defining its behaviour. You are not alone in this - lots
of languages try to achieve "no undefined behaviour" by defining the
behaviour of everything instead of banning things that have no correct behaviour.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Stefan Ram on Mon Nov 14 15:14:40 2022

On 14/11/2022 10:24, Stefan Ram wrote:

James Harris <james.harris.1@gmail.com> writes:

Rather than allowing non-ASCII in source I came up with a scheme of what
you might call 'named characters' extending the backslash idea of C to
allow names instead of single characters after the backslash. It's off
topic for this thread but it allows non-ASCII characters to be named
(such that the names consist of ASCII characters and would thus be
readable and universal).

Here's an example of a Python program.

print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

It prints:

\"

I see they are Unicode names. If I were to support such names I would
have your example as something like

print "\U:reverse solidus/\U:quotation mark/"

but I prefer shorter names such as

print "\bksl/\q11/"

At least with Unicode someone has already defined a name for every
character, but Unicode includes a lot of nonsense such as a character called

NORTH INDIC FRACTION THREE SIXTEENTHS

and a character called

BOX DRAWINGS VERTICAL HEAVY AND RIGHT LIGHT

not forgetting

DENTISTRY SYMBOL LIGHT DOWN AND HORIZONTAL WITH CIRCLE

and fortunately they didn't omit

UPWARDS TRIANGLE-HEADED ARROW LEFTWARDS OF DOWNWARDS TRIANGLE-HEADED
ARROW

so that's all right, then. Yes, that last one is the name of just one character!!!

There are so many potential Unicode character names that the tables
which would be needed just to convert names to codes would probably be
larger than the rest of my compiler put together.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 15:24:17 2022

On 14/11/2022 13:17, Dmitry A. Kazakov wrote:

On 2022-11-14 13:45, Bart wrote:

Clearly you get paid by the line. Even then, the code where a
substring is copied into another location, which would require the
double-stepping of the relevant pointer/indices of the earlier
example, is missing here.

It is there:

   This.Value := Line (Pointer..Index - 1);

assigning array as a whole.

OK, but this is then doing it in two passes. The original used only one
pass. And the code when doing the transfer, whether as a loop or
utilising a machine's block copy features, is not shown here.

I don't have a +++ operator so I am not sure what that is supposed
to mean. It's no valid in my language.

It could a part of even simpler and brilliantly readable:

   ++p+++q+++r

This is legal in Ada:

      a := +b;

But for some reason, you can't write ++b or + +b, it has to be `a := +
(+b)`. So while you can't do +b++++c, you can write:

      a:=+b+(+(+(+(c))));

Do the parentheses make this acceptable?

I don't understand the point. Why would you like to have two unary
pluses in a row?

I wanted to write deliberately confusing code the same way you did with ++p+++q+++r. Here I don't know which you intend to be ++, which are
binary +, and which are unary plus. Although lexing rules usally mean tokenisation is like this:

++p++ +q++ +r

while parsing rules would require that those +'s are binary adds, so
better written like this:

++p++ + q++ + r

Here, my while language won't make sense of ++p++, James' will, so this expression is simply adding those 3 terms, but you've deliberately
chosen to write it so as to appear gobbledygook.

My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little about it.

No, the point is that no reasonable code should be nonsense-looking and conversely.

Increments cross that line.

But it performs a task that is needed, and in their absence, would
simply be implemented, less efficiently and with more cluttery code,
using other means.

I have my own misgivings about it: there are in all 6 varieties of
Increment (++x; --x; a:=++x; a:=--x; a:=x++; a:=x--), which are a pig to implement, and they spoil the lines of code like this:

a[++n] := x
b[n] := y
c[n] := z

Delete that first line, and you need to remember to transfer that ++n to
the next line. And not to repeat that ++n as is easy to do.

But I like the convenience too much. Also, in dynamic code, ++x can more
easily map to a single byte code instruction, than x:=x+1 or even x+:=1.

In my sources, it would be very unusual to have more than one ++ or --
in any expression.

They were perfectly acceptable in K&R C. C
was a very large and quite complex language then. It was beautiful
comparing to FORTRAN IV, but I could not use it on a 64K machine. A
5-pass C compiler took an eternity.

Did it? I never tried one. I only found out recently that they were that
slow from reading reviews of C compilers of that era in Byte magazine.

However that was also when I developed my own compilers on 64KB
machines, and they never took more than a few seconds: I made sure of
that. The only limiting factor would have been the speed of floppy disk transfer.

Machines then were small and simple.
Programs were tiny. *++p was a reasonably complex code.

I can't remember if I had ++ then; I might have done. Then it would have
given a significant advantage to write ++A[i] than A[i]:=A[i]+1. It
still does now.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Mon Nov 14 16:23:24 2022

On 14/11/2022 11:47, Bart wrote:

On 14/11/2022 09:26, David Brown wrote:

On 13/11/2022 18:11, James Harris wrote:

On 08/11/2022 16:29, David Brown wrote:

I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!

You /say/ that, but you don't appear to believe it or be interested in
making it happen.

On the one side, you claim you want a clear language that is
understandable for programmers and maintenance. On the other side,
you want to decide what "++E++" should mean, with random "^"
characters thrown in for good measure.

In-place, value-returning increment ops written as ++ and -- are common
in languages.

Yes. And bugs are common in programs. Being common does not
necessarily mean it's a good idea.

(It doesn't necessarily mean it's a bad idea either - I am not implying
that increment and decrement are themselves a major cause of bugs! But
mixing side-effects inside expressions /is/ a cause of bugs.)

So are pointer-dereference operators in lower-level languages, whether written as * or ^.

Same again.

Once you have those two possibilities in a language, why shouldn't you
define what combinations of those operators might mean?

If you don't have them, you don't have a problem.

Pointer dereferencing like this is not a requirement for a language. If
you have "proper" arrays (I write it like that because the concept of
"array" can be defined in many ways), multiple return values for
functions, and a way to define data structures such as trees and lists,
where else do you actually need pointers?

Pure functional programming languages don't have pointers, or increment operators - they don't even have assignment. Functional programming
languages are usually considered quite high level, but some slightly
impure functional programming languages - such as OCaml - are very
efficient compiled languages that rival C, Pascal, Ada, Fortran, etc.,
for speed. OCaml /does/, AFAIUI (I am no expert in that language) have variables and pointers or references, but they are very rarely seen
explicitly, and are intentionally cumbersome to use.

Maybe the OP is designing a language in which pointer dereferencing and increment are expected to turn up so often that it is useful to combine
them. But I think it is at lot more likely that this is a mistaken
assumption based on limited experience with different kinds of
programming languages. The result will be like your own language - a re-implementation of C or Pascal, with some benefits and some new disadvantages, and nothing of real innovation or interest. I am trying
to make suggestions to break that pattern.

(I just differ from James in thinking that successive *value-returning**
++ or -- operators, whether prefix or postfix, are not meaningful. I'd
also think it would be bad form to chain them, but it is not practical
to be ban at the syntax level.

If you think it is "bad form", ban it. For any language that is going
to be successful in a wider field, not just a plaything for one person,
the man-hour effort in /using/ the language will far outweigh the effort /designing/ or /implementing/ it. Thus it does not matter if a good
design choice is difficult to implement, as it will save effort in the
long run.

However, I have sometimes banned even `a+b` in some contexts, when the resulting value is unused.)

Is your point that you shouldn't have either of those operators?

Yes! What gave it away - the first three or four times I said as much?
(And - since repeating myself seems helpful in this thread - I would
also avoid other assignment operators or operators with side-effects. Assignment should be a statement, not an expression.)

++ and
-- can be replaced at some inconvenience. But getting rid of dereference
is harder; if P is a pointer:

print P

will this display the value of the pointer, or the value of its target?
If only there was a way to specify that precisely!

Note that when p and q are byte pointers, then *p++ = *q++ (or p++^ :=
q++^) corresponds to the one-byte Z80 LDI instruction.

So it's something so meaningless that that tiny 8-bit processor decided
to give it its own instruction.

The instruction set for a processor is - or should be - completely
different from a programming language, even one that is relatively
low-level. The source code might handle an array called "xs" and loop
through it with "for x in xs do ... ", while the generated assembly
might use a "*p++" instruction.

If I want to write in assembly, I can write assembly - I don't want to
do that, so I don't. Even in C, I don't make heavy use of "*p++", and
very rarely in the middle of complex expressions - usually array access
is clearer. (Of course I use increment operator, especially in loops,
because that's how C is written. But a new language can do better than
that.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Mon Nov 14 15:46:03 2022

On 14/11/2022 11:06, Bart wrote:

On 13/11/2022 16:55, James Harris wrote:

On 10/11/2022 10:42, Bart wrote:

It's clear that ++(A+B) can't work unless you change what ++ means
(eg. ++A now means (A+1) because whatever ++ modifies is not
accessible).

It's similar if A were a struct. You could have

   A.F

but you could not have

   (A + 4).F

The LHS of the . operation, in this case, has to be an lvalue.

That's not right. C allows functions to return structs; they are not
lvalues, but you can apply ".":

    typedef struct{int x,y;}Point;

    Point F(void) {
        Point p;
        return p;
    }

    int main(void)
    {
        Point p;
        int a;

    // F()=p;           // Not valid: not an lvalue
        a=F().x;         // Valid
    }

Well, C structs, even those which are returned from a function, would
naturally /have/ lvalues.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Mon Nov 14 15:29:50 2022

On 14/11/2022 11:00, Bart wrote:

On 14/11/2022 10:32, Dmitry A. Kazakov wrote:

On 2022-11-14 11:16, James Harris wrote:

a[i++] := b[j++]

Same question. What for?

What do you mean, "What for?"?

Show me the algorithm.

There's no particular algorithm; the construct is a potential
component of many algorithms.

Show me one that is not array assignment.

What exactly is your objection: that there shouldn't be an increment
operator at all?

Then it's end of discussion. But if you allow a value-returning
increment operator, then someone could use it in multiple places in the
same expresion, together with other operators, and the language has to
be able to deal with it.

Note that any language that has reference parameters would allow this:

a[postincr(i)] := b[postincr(j)]

I've seen similar in C++ but

postincr(i)

looks potentially misleading. Yes, there's the function name as a clue
but a programmer could easily read that expression without realising
that the function could change variable i. The action of the interface
is not clear from the syntax.

Worse, if a language can change i in the above expression then a
programmer may find he has to check *every* function that gets called to
find out what other actual parameters can be changed.

Have you thought of making it evident in the syntax with something like

postincr(&i)

where & doesn't mean "address of", as in C, but simply flags up that i
is an inout parameter and could be changed by the callee? The absence of
& would provide assurance that the variable mentioned would not be
treated by the callee as inout.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Bart on Mon Nov 14 16:54:26 2022

On 2022-11-14 16:24, Bart wrote:

On 14/11/2022 13:17, Dmitry A. Kazakov wrote:

On 2022-11-14 13:45, Bart wrote:

Clearly you get paid by the line. Even then, the code where a
substring is copied into another location, which would require the
double-stepping of the relevant pointer/indices of the earlier
example, is missing here.

It is there:

This.Value := Line (Pointer..Index - 1);

assigning array as a whole.

OK, but this is then doing it in two passes. The original used only one
pass.

The original code must allocate the result of unknown in advance length *upfront*. So it must use pool followed by reallocs. The code above
allocates memory in the arena where the AST is kept, not in the pool,
and first when the length of the identifier is already determined.

And the code when doing the transfer, whether as a loop or
utilising a machine's block copy features, is not shown here.

That is up to the compiler optimization, which is the point.

My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little about
it.

No, the point is that no reasonable code should be nonsense-looking
and conversely.

Increments cross that line.

But it performs a task that is needed,

The task not needed.

They were perfectly acceptable in K&R C. C was a very large and quite
complex language then. It was beautiful comparing to FORTRAN IV, but I
could not use it on a 64K machine. A 5-pass C compiler took an eternity.

Did it? I never tried one. I only found out recently that they were that
slow from reading reviews of C compilers of that era in Byte magazine.

They were written in C using inferior tools like Lex and Yacc. Under
memory constraint it was customary to do multiple passes and keep
intermediate results on the disk. Even assembler took 2 passes, I
believe, and required up to 10 minutes time followed by 10 minutes linking.

As I said, C was a complex language then. Its ill thought syntax
contributed to compiler complexity. It took decades before C compilers
could produce meaningful error messages.

C was reasonably good for a medium sized PDP machine. It is awful now.
Why people continue to borrow its worst features is beyond me.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Mon Nov 14 15:59:24 2022

On 14/11/2022 15:23, David Brown wrote:

On 14/11/2022 11:47, Bart wrote:

Once you have those two possibilities in a language, why shouldn't you
define what combinations of those operators might mean?

If you don't have them, you don't have a problem.

Pointer dereferencing like this is not a requirement for a language. If
you have "proper" arrays (I write it like that because the concept of
"array" can be defined in many ways), multiple return values for
functions, and a way to define data structures such as trees and lists,
where else do you actually need pointers?

I have a dynamic language with proper, first class lists, trees,
strings, records, which take care of 90% of the pointer uses in a
C-class language. Yet they can still be useful. This is an extract from
a program that dumps the contents of an EXE file:

coffptr:=makeref(pedata+coffoffset,imagefileheader)

genstrln("Coff header: "+tostr(coffptr^))

genstrln("Machine: "+tostr(coffptr^.machine,"h2"))
genstrln("Nsections: "+tostr(coffptr^.nsections,"h2"))
genstrln("Timestamp: "+tostr(coffptr^.timedatestamp,"h4"))
genstrln("Symtab offset: "+tostr(coffptr^.symtaboffset))
genstrln("Nsymbols: "+tostr(coffptr^.nsymbols))
genstrln("Opt Hdr size: "+tostr(coffptr^.optheadersize))
genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b"))
genline()

('pedata' is a simple byte pointer; I've chosen to load the EXE into a
simple memory block. 'makeref' takes an offset into that block and
returns a pointer to that offset, interpreted as a pointer to a
particular struct type. "^" is a deref op.

'imagefileheader' is this type:

type imagefileheader=struct
wt_word machine # (u16)
wt_word nsections
wt_dword timedatestamp # (u32)
wt_dword symtaboffset
wt_dword nsymbols
wt_word optheadersize
wt_word characteristics
end
)

Pure functional programming languages don't have pointers, or increment operators - they don't even have assignment. Functional programming languages are usually considered quite high level, but some slightly
impure functional programming languages - such as OCaml - are very
efficient compiled languages that rival C, Pascal, Ada, Fortran, etc.,
for speed. OCaml /does/, AFAIUI (I am no expert in that language) have variables and pointers or references, but they are very rarely seen explicitly, and are intentionally cumbersome to use.

That pure functional languages aren't used everywhere suggests they
aren't great at the everyday tasks like the ones I deal with.

(I would like to see Haskell's take on that task of decoding that EXE
file, and dealing with that specific data layout. My example was the
simplest part of it.

For that matter, how would you do it in Python? Rather painfully I would imagine.)

Maybe the OP is designing a language in which pointer dereferencing and increment are expected to turn up so often that it is useful to combine them. But I think it is at lot more likely that this is a mistaken assumption based on limited experience with different kinds of
programming languages. The result will be like your own language - a re-implementation of C or Pascal, with some benefits and some new disadvantages, and nothing of real innovation or interest.

Innovation these days seems to be:

* To create incomprehensible languages that require several advanced
degrees in mathematics, PL and type theory to understand

* To make it as hard as possible to perform any tasks by removing
features such as loops, mutable variables and functions with
side-effects. (It's worth bearing in mind that most elements in a
computer system: display, file-system and don't forget the memory, are necessarily mutable.)

* To tie you up in knots with strictly typed everything (or in Rust,
with its 'borrow checker'.

No thanks. My innovation is keeping this stuff simple, accessible, fast,
and at a human scale.

I am trying
to make suggestions to break that pattern.

Look at Reddits PL forum. At least 90% of new languages there are
FP-based. Yet when you look at the implementation languages, it tends to
be a different story.

(I just differ from James in thinking that successive
*value-returning** ++ or -- operators, whether prefix or postfix, are
not meaningful. I'd also think it would be bad form to chain them, but
it is not practical to be ban at the syntax level.

If you think it is "bad form", ban it.

Obviously I can't ban `a + b`. Equally obviously, this code is pointless:

a + b;

Given an expression-based language, what would you do? In the past,
after working with C, I would unthinkingly type:

a = b

in my syntax instead of a := b. This wasn't an error: it would compare a
and b then discard the result. But it was a bug.

For any language that is going
to be successful in a wider field, not just a plaything for one person,
the man-hour effort in /using/ the language will far outweigh the effort /designing/ or /implementing/ it. Thus it does not matter if a good
design choice is difficult to implement, as it will save effort in the
long run.

In my case, implementing a series of compilers over about 20 years took
perhaps one year of /part-time/ work. The rest of it was using the
language, even as an individual. So at least 20:1.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to James Harris on Mon Nov 14 16:18:00 2022

James Harris <james.harris.1@gmail.com> writes:

but I prefer shorter names such as
print "\bksl/\q11/"

I see. You could use Unicode names as a fallback for those
characters for which you have not defined a name yet.

The raw string literals in Python have a strange irregularity:
One can use a backslash with no escape, except at the very end
of the string, so one can write

r"\abc" to get \abc, but one can not write

r"abc\" to get abc\. Transcript:

print( r"\abc" )

|\abc

print( r"abc\" )

| File "<stdin>", line 1
| print( r"abc\" )
| ^
|SyntaxError: EOL while scanning string literal

I tried to design an escape mechanism for string literals in my
language "Unotal" that has no irregularities.

I already described this kind of string literals here on 2021-09-29,
but in the meantime I have actually written an implementation.

I also have written a tiny demo implementation in Python which
can be used to experiment with the notation (see below).

Here's a short summary of my notation:

- a string literal is written using brackets, as in [abc],
which means the string "abc" (3 characters: a, b, and c).

- nested brackets are allowed: [abc[def]ghi] is "abc[def]ghi"
(11 characters).

- a single left bracket is written as "[`]". This is
admittedly ugly, but it is very rare in most kinds of texts,
so that conflicts with texts containing a literal "[`]"
should be very rare.

- a single right bracket is written as "[]`" for similar
reasons.

I tried to make sure that no other rules are needed and that
every text can be encoded this way.

Here's a small Python program with a tiny scanner.

source code

def scan( source ):
return source[ 1: -1 ].replace( '[`]', '[' ).replace( '[]`', ']' )

def demo( source ):
print( f"{source:14}", scan( source ))

print( f"{'literal':14}", 'meaning' )
demo( '[def]' )
demo( '[de[]f]' )
demo( '[de[`]f]' )
demo( '[[de[`]]f]' )
demo( '[de[]`f]' )
demo( '[de[`]`[]`f]' )
demo( '[de[`][]``f]' )
demo( '[`]]' )
demo( '[[]`]' )

output

literal meaning
[def] def
[de[]f] de[]f
[de[`]f] de[f
[[de[`]]f] [de[]f
[de[]`f] de]f
[de[`]`[]`f] de[`]f
[de[`][]``f] de[]`f
[`]] `]
[[]`] ]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Mon Nov 14 16:21:13 2022

On 14/11/2022 11:11, Bart wrote:

On 14/11/2022 10:44, James Harris wrote:

On 14/11/2022 09:26, David Brown wrote:

...

OK, let's take the combination you mentioned:

   ++E++

I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well formed.

But does this have the same priorities as:

   op1 E op2

(where op2 is commonly done first) or does it have special rules, so
that in:

--E++

the -- is done first? If it's different, then what is the ordering when
mixed with other unary ops?

You explained somewhere the circumstances where you think this is
meaningful, but I can't remember what the rules are and I can't find the exact post.

The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
postfix ++ or --.

Your example would probably be better expressed as

E - 1

!

As for the bigger picture of operator precedences, it's based on the transformations which operators naturally make to their operands. For
example, comparisons (such as less-than) naturally take numbers and
produce booleans so their precedences put them after numeric operators
(+, * etc) and before boolean operators (and, or, not, etc). The natural
series is (simplified)

locations (such as field selection and array indexing)
numbers (retrieved from those locations)
comparisons (of those numbers)
booleans

The latter take booleans and produce other booleans. IOW booleans are
the bottom of the chain.

In summary, the operators which work on locations (lvalues) come first,
then those which work on numbers, then the comparisons and then those
which work on booleans. There could not be a more natural order!!

That said, autoincrement, dereference and bitwise operators are a bit
(sic) anomalous - e.g. one could make a case for bitwise coming before
or after operations on numbers. I chose to put them before so "bit
patterns" come between locations and numbers.

This is the problem. You shouldn't need to stop and think. I make the
rules simple by stipulating that value-returning ++ and -- only ever
return rvalues.

Because if they ever start to return lvalues, then this becomes possible:

   ++E := 0
   E++ := 0

(Whichever one is legal in your scheme.) So I think there is little
useful expressivity to be gained.

Indeed, neither of those is useful.

Interestingly, I just tried

++E = 0;

with cc and c++ compilers. The first rejected it (as ++E needing to be
an lvalue); the second accepted it. That's not a reflection of either
compiler, BTW, but without checking it's probably more to do with the language/dialect definition. FWIW I expect the second compiler could be persuaded to issue a warning about unreachable code or suchlike.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Stefan Ram on Mon Nov 14 16:44:15 2022

On 14/11/2022 16:18, Stefan Ram wrote:

James Harris <james.harris.1@gmail.com> writes:

but I prefer shorter names such as
print "\bksl/\q11/"

I see. You could use Unicode names as a fallback for those
characters for which you have not defined a name yet.

That's a possibility if explicitly creating a Unicode string but I see
Unicode as "for printing rather than for processing". IMO something else
is needed for the latter.

The raw string literals in Python have a strange irregularity:
One can use a backslash with no escape, except at the very end
of the string, so one can write

r"\abc" to get \abc, but one can not write

r"abc\" to get abc\. Transcript:

print( r"\abc" )

|\abc

print( r"abc\" )

| File "<stdin>", line 1
| print( r"abc\" )
| ^
|SyntaxError: EOL while scanning string literal

Yes, that's odd.

I tried to design an escape mechanism for string literals in my
language "Unotal" that has no irregularities.

I already described this kind of string literals here on 2021-09-29,
but in the meantime I have actually written an implementation.

I also have written a tiny demo implementation in Python which
can be used to experiment with the notation (see below).

Here's a short summary of my notation:

- a string literal is written using brackets, as in [abc],
which means the string "abc" (3 characters: a, b, and c).

- nested brackets are allowed: [abc[def]ghi] is "abc[def]ghi"
(11 characters).

- a single left bracket is written as "[`]". This is
admittedly ugly, but it is very rare in most kinds of texts,
so that conflicts with texts containing a literal "[`]"
should be very rare.

- a single right bracket is written as "[]`" for similar
reasons.

I tried to make sure that no other rules are needed and that
every text can be encoded this way.

Here's a small Python program with a tiny scanner.

source code

def scan( source ):
return source[ 1: -1 ].replace( '[`]', '[' ).replace( '[]`', ']' )

def demo( source ):
print( f"{source:14}", scan( source ))

print( f"{'literal':14}", 'meaning' )
demo( '[def]' )
demo( '[de[]f]' )
demo( '[de[`]f]' )
demo( '[[de[`]]f]' )
demo( '[de[]`f]' )
demo( '[de[`]`[]`f]' )
demo( '[de[`][]``f]' )
demo( '[`]]' )
demo( '[[]`]' )

output

literal meaning
[def] def
[de[]f] de[]f
[de[`]f] de[f
[[de[`]]f] [de[]f
[de[]`f] de]f
[de[`]`[]`f] de[`]f
[de[`][]``f] de[]`f
[`]] `]
[[]`] ]

That's cool, especially the brevity of your Python code! I remember the discussion - and the hours of my life it took to try to come up with a
scheme I was happy with. As I remember, the big problem was
metacharacters - those which are used to delimit the string or a
character name - as I guess is true of your left-bracket example.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Mon Nov 14 17:37:08 2022

On 14/11/2022 14:47, David Brown wrote:

On 14/11/2022 11:44, James Harris wrote:

On 14/11/2022 09:26, David Brown wrote:

On 13/11/2022 18:11, James Harris wrote:

On 08/11/2022 16:29, David Brown wrote:

...

The expression you mention is just one of a myriad of what you might
consider to be potential nasties. If I am going to prohibit that one
then what about all the others?

Prohibit nasty ones.

Enumerating the 'nasty ones' is the problem. If there are 20 dyadic
operators then there are something like 400 ways of combining them. AIUI
you want me to pick some of those 400 and tell the programmer "you
cannot combine these two even where there's no type mismatch".

A big step in that direction is to say that assignment is a statement,
not an expression,

Done that.

and that variables cannot be changed by side-effects.

I will not be doing that. I know you favour functional programming, and
that's fine, but the language I am working on is unapologetically
imperative.

(How you relate this to function calls is a related and complex issue
that I have been glossing over here. An idea would be to distinguish between "procedures" that may have side effects, and "functions" that do not.)

That means there is no such thing as an "increment" operator - post or pre.

It also /hugely/ simplifies the language - both for the programmer, and
for the implementer. If expressions have no side-effects, they can be duplicated, split up, re-arranged, moved around in code, all without
affected the behaviour of the program.

This needs a separate discussion, David. It is far too big a topic for
this thread. (Feel free to start a new one; I have plenty to say!) All I
can say here is as I said above, the language I am working on is
imperative.

...

These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.

OK, let's take the combination you mentioned:

++E++

I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well formed.

The very fact that you are discussing how to define it means it is not
clear and obvious.

On that I must disagree. One cannot expect to understand a language
without at least learning the basics. Consider

a * b + c

I would parse that as (a*b)+c but not all languages would. As a reader
of infix code you would have to know the order in which operators would
be applied. You have to know the basics of a language in order to read
code written in it.

It is not obvious which order the increments happen,
or if the order is defined, or if the order matters. It is not obvious
what the return value should be. It is not obvious where you have
lvalues or rvalues (not that a language should necessarily have such concepts). It is not obvious what happens to E.

Of course it's not obvious to someone who doesn't know the rules of the language. A language designer cannot produce a language with no rules.
What a language designer /can/ do is to make the rules simple and understandable - but someone who reads the code still has to understand
what the rules are.

As for the rules we have been discussing here those I have come up with
are, in the main, the ones you would be familiar with; even the new ones
are logical and simple. Once you understand them it's incredibly easy to
parse an expression, even of the kind which, to you, looks like gibberish.

In fact, I have to say that neither of those you have objected to should
really look like gibberish, even to the uninitiated. Would you find

++A + B++

objectionable? If not, I cannot see why you would find

++E + E++

so objectionable, either. Isn't the only new thing you, as a reader of
such code, would need to know is the operations are carried out in?
Otherwise it's just like the preceding expression.

...

There's a very tempting myth in language design that /defining/
behaviour is key - that gibberish and incorrect code can somehow be made "correct" by defining its behaviour. You are not alone in this - lots
of languages try to achieve "no undefined behaviour" by defining the behaviour of everything instead of banning things that have no correct behaviour.

My reason for defining /apparent/ code behaviour is to ensure
computational consistency on different platforms. Who wouldn't want that?

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Mon Nov 14 18:28:01 2022

On 14/11/2022 16:14, James Harris wrote:

On 14/11/2022 10:24, Stefan Ram wrote:

James Harris <james.harris.1@gmail.com> writes:

Rather than allowing non-ASCII in source I came up with a scheme of what >>> you might call 'named characters' extending the backslash idea of C to
allow names instead of single characters after the backslash. It's off
topic for this thread but it allows non-ASCII characters to be named
(such that the names consist of ASCII characters and would thus be
readable and universal).

Here's an example of a Python program.

print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

It prints:

\"

I see they are Unicode names. If I were to support such names I would
have your example as something like

print "\U:reverse solidus/\U:quotation mark/"

but I prefer shorter names such as

print "\bksl/\q11/"

At least with Unicode someone has already defined a name for every
character, but Unicode includes a lot of nonsense such as a character
called

How about using the HTML character entity names? That would be
"\"". These are a good deal shorter than Unicode names but
are vastly better than inventing your own names.

<https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Mon Nov 14 18:02:58 2022

On 14/11/2022 17:28, David Brown wrote:

On 14/11/2022 16:14, James Harris wrote:

On 14/11/2022 10:24, Stefan Ram wrote:

James Harris <james.harris.1@gmail.com> writes:

Rather than allowing non-ASCII in source I came up with a scheme of
what
you might call 'named characters' extending the backslash idea of C to >>>> allow names instead of single characters after the backslash. It's off >>>> topic for this thread but it allows non-ASCII characters to be named
(such that the names consist of ASCII characters and would thus be
readable and universal).

   Here's an example of a Python program.

print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

   It prints:

\"

I see they are Unicode names. If I were to support such names I would
have your example as something like

   print "\U:reverse solidus/\U:quotation mark/"

but I prefer shorter names such as

   print "\bksl/\q11/"

At least with Unicode someone has already defined a name for every
character, but Unicode includes a lot of nonsense such as a character
called

How about using the HTML character entity names? That would be "\"". These are a good deal shorter than Unicode names but
are vastly better than inventing your own names.

<https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>

Thanks for the pointer. I /could/ include them with my syntax along the
lines of

"\H:Backslash/\H:quot/"

where H: indicates HTML names. I did consider them before but had to
reject them. I cannot remember all the reasons why, now, but from taking
a quick look they appear, like Unicode, to be more for printing than for processing. For example, there is frac45 for 4/5 but such a scheme
allows only the fractions which are predefined. Also, HTML names combine diacritics with characters (e.g. yacute) whereas AISI it's important for
them to be kept separate.

What's needed, IMO, is a set of names intended for /processing/ rather
than for typesetting.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Mon Nov 14 18:46:20 2022

On 14/11/2022 16:59, Bart wrote:

On 14/11/2022 15:23, David Brown wrote:

On 14/11/2022 11:47, Bart wrote:

Once you have those two possibilities in a language, why shouldn't
you define what combinations of those operators might mean?

If you don't have them, you don't have a problem.

Pointer dereferencing like this is not a requirement for a language.
If you have "proper" arrays (I write it like that because the concept
of "array" can be defined in many ways), multiple return values for
functions, and a way to define data structures such as trees and
lists, where else do you actually need pointers?

I have a dynamic language with proper, first class lists, trees,
strings, records, which take care of 90% of the pointer uses in a
C-class language. Yet they can still be useful. This is an extract from
a program that dumps the contents of an EXE file:

    coffptr:=makeref(pedata+coffoffset,imagefileheader)

    genstrln("Coff header:     "+tostr(coffptr^))

    genstrln("Machine:         "+tostr(coffptr^.machine,"h2"))
    genstrln("Nsections:       "+tostr(coffptr^.nsections,"h2"))
    genstrln("Timestamp:       "+tostr(coffptr^.timedatestamp,"h4"))
    genstrln("Symtab offset:   "+tostr(coffptr^.symtaboffset))
    genstrln("Nsymbols:        "+tostr(coffptr^.nsymbols))
    genstrln("Opt Hdr size:    "+tostr(coffptr^.optheadersize))
    genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b"))
    genline()

None of that needs pointers or references.

Initialise a "coff_header" read-only unmutable variable from a slice of
the memory array holding the image. Then there is no reference or
pointer in the source code - you are using local data. If the image is
also held as a unmutable data, then the local "variable" can be
/implemented/ as a pointer for efficiency - but /logically/ in the
source code it is its own entity. (See the benefits of having
everything unmutable and read-only unless you can't avoid it?)

Pure functional programming languages don't have pointers, or
increment operators - they don't even have assignment. Functional
programming languages are usually considered quite high level, but
some slightly impure functional programming languages - such as OCaml
- are very efficient compiled languages that rival C, Pascal, Ada,
Fortran, etc., for speed. OCaml /does/, AFAIUI (I am no expert in
that language) have variables and pointers or references, but they are
very rarely seen explicitly, and are intentionally cumbersome to use.

That pure functional languages aren't used everywhere suggests they
aren't great at the everyday tasks like the ones I deal with.

That they are used /somewhere/ suggests that they work fine for many
tasks. For example, if you ever make a phone call, it is likely that it
passes through equipment running code in Erlang - a functional
programming language.

Functional programming languages have a reputation for being difficult
to learn and use. That's not entirely undeserved, but they have many advantages over imperative languages. You spend more time learning
them, and less time fixing bugs in your code. (They are not a good
match for small microcontrollers, however.)

(I would like to see Haskell's take on that task of decoding that EXE
file, and dealing with that specific data layout. My example was the
simplest part of it.

For that matter, how would you do it in Python? Rather painfully I would imagine.)

Nah, it's easy enough in Python. A bytes array to hold the original
image, and a "struct.unpack" on a slice of the array to pull out the
contents.

I'd have to look up some more for the Haskell syntax.

Maybe the OP is designing a language in which pointer dereferencing
and increment are expected to turn up so often that it is useful to
combine them. But I think it is at lot more likely that this is a
mistaken assumption based on limited experience with different kinds
of programming languages. The result will be like your own language -
a re-implementation of C or Pascal, with some benefits and some new
disadvantages, and nothing of real innovation or interest.

Innovation these days seems to be:

* To create incomprehensible languages that require several advanced
degrees in mathematics, PL and type theory to understand

* To make it as hard as possible to perform any tasks by removing
features such as loops, mutable variables and functions with
side-effects. (It's worth bearing in mind that most elements in a
computer system: display, file-system and don't forget the memory, are necessarily mutable.)

* To tie you up in knots with strictly typed everything (or in Rust,
with its 'borrow checker'.

No thanks. My innovation is keeping this stuff simple, accessible, fast,
and at a human scale.

I disagree with your scepticism, but I agree that there are lots of
languages with different paradigms for different purposes.

However, making yet-another-C is IMHO a pointless exercise. It might be
better in some ways, but not enough to make it worth the effort.

I am trying to make suggestions to break that pattern.

Look at Reddits PL forum. At least 90% of new languages there are
FP-based. Yet when you look at the implementation languages, it tends to
be a different story.

(I just differ from James in thinking that successive
*value-returning** ++ or -- operators, whether prefix or postfix, are
not meaningful. I'd also think it would be bad form to chain them,
but it is not practical to be ban at the syntax level.

If you think it is "bad form", ban it.

Obviously I can't ban `a + b`. Equally obviously, this code is pointless:

    a + b;

You can ban that. Rule number 42 - the result of an expression must be assigned to a variable, used in another expression, or passed as the
argument to a function call. No problem.

Given an expression-based language, what would you do? In the past,
after working with C, I would unthinkingly type:

    a = b

in my syntax instead of a := b. This wasn't an error: it would compare a
and b then discard the result. But it was a bug.

Any decent C compile (with the right options) will complain about the C equivalent, "a == b;", as a statement with no effect. It could just as
easily be made an error in a language.

And if you follow my suggestion that expressions can't have
side-effects, then it's easy to distinguish between "statements" and "expressions" because you no longer have a C-style "expression statement".

For any language that is going to be successful in a wider field,
not just a plaything for one person, the man-hour effort in /using/
the language will far outweigh the effort /designing/ or
/implementing/ it. Thus it does not matter if a good design choice is
difficult to implement, as it will save effort in the long run.

In my case, implementing a series of compilers over about 20 years took perhaps one year of /part-time/ work. The rest of it was using the
language, even as an individual. So at least 20:1.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Mon Nov 14 18:43:20 2022

On 14/11/2022 15:24, Bart wrote:

On 14/11/2022 13:17, Dmitry A. Kazakov wrote:

On 2022-11-14 13:45, Bart wrote:

...

My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little about
it.

No, the point is that no reasonable code should be nonsense-looking
and conversely.

Increments cross that line.

But it performs a task that is needed, and in their absence, would
simply be implemented, less efficiently and with more cluttery code,
using other means.

I have my own misgivings about it: there are in all 6 varieties of
Increment (++x; --x; a:=++x; a:=--x; a:=x++; a:=x--), which are a pig to implement, and they spoil the lines of code like this:

I thought you correctly said before that ++x was (++x; x) and x++ was
(t:=x; x++; rval(t)). Adding -- that's still only four varieties, IMO.

    a[++n] := x
    b[n] := y
    c[n] := z

Delete that first line, and you need to remember to transfer that ++n to
the next line. And not to repeat that ++n as is easy to do.

Interjecting somewhat, as a programmer I'd probably write that as

++n
a[n] = x
b[n] = y
c[n] = z

because to me it's clearer and would make code maintenance easier,

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 14 19:41:14 2022

On 2022-11-14 19:26, James Harris wrote:

On 14/11/2022 11:29, Dmitry A. Kazakov wrote:

On 2022-11-14 12:03, James Harris wrote:

On 14/11/2022 10:32, Dmitry A. Kazakov wrote:

On 2022-11-14 11:16, James Harris wrote:

On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

...

Show me the algorithm.

There's no particular algorithm; the construct is a potential
component of many algorithms.

Show me one that is not array assignment.

   if is_name_first(b[j])
     a[i++] = b[j++]
     rep while is_name_follow(b[j])
       a[i++] = b[j++]
     end rep
     a[i] = 0
     return TOK_NAME
   end if

Now, what don't you like about the ++ operators in that? How would
you prefer to write it?

From parser production code:

procedure Get_Identifier
           ( Code     : in out Source'Class;
              Line     : String;
              Pointer : Integer;
              Argument : out Tokens.Argument_Token
           ) is
    Index     : Integer := Pointer + 1;
    Malformed : Boolean := False;
    Underline : Boolean := False;
    Symbol    : Character;
begin
    while Index <= Line'Last loop
       Symbol := Line (Index);
       if Is_Alphanumeric (Symbol) then
          Underline := False;
       elsif '_' = Symbol then
          Malformed := Malformed or Underline;
          Underline := True;
       else
          exit;
       end if;
       Index := Index + 1;
    end loop;
    Malformed := Malformed or Underline;
    Set_Pointer (Code, Index);
    Argument.Location := Link (Code);
    Argument.Value := new Identifier (Index - Pointer);
    declare
       This : Identifier renames Identifier (Argument.Value.all);
    begin
       This.Location := Argument.Location;
       This.Malformed := Malformed;
       This.Value     := Line (Pointer..Index - 1);
    end;
end Get_Identifier;

Well, that's an astonishingly long piece of code, Dmitry,

Because it is a production code. It must deal with different types of
sources, with error handling and syntax tree generation.

and if I read
it correctly it doesn't even check whether it begins on a name-first character: that has to be decided before the procedure starts!

Exactly, because that is already established by the parser since the
language grammar distinguishes identifiers by the first character.

But I am not sure I do understand it. Even allowing for what I believe
is meant to be double underscore detection (except at the start and
end?) it takes significantly more study than the simple name-first, name-follow code which preceded it.

That's how the language defines it. This example is from an Ada 95
parser. Ada 95 RM 2.3:

https://www.adahome.com/rm95/rm9x-02-03.html

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Mon Nov 14 18:26:11 2022

On 14/11/2022 11:29, Dmitry A. Kazakov wrote:

On 2022-11-14 12:03, James Harris wrote:

On 14/11/2022 10:32, Dmitry A. Kazakov wrote:

On 2022-11-14 11:16, James Harris wrote:

On 14/11/2022 07:52, Dmitry A. Kazakov wrote:

...

Show me the algorithm.

There's no particular algorithm; the construct is a potential
component of many algorithms.

Show me one that is not array assignment.

   if is_name_first(b[j])
     a[i++] = b[j++]
     rep while is_name_follow(b[j])
       a[i++] = b[j++]
     end rep
     a[i] = 0
     return TOK_NAME
   end if

Now, what don't you like about the ++ operators in that? How would you
prefer to write it?

From parser production code:

procedure Get_Identifier
          ( Code     : in out Source'Class;
             Line     : String;
             Pointer : Integer;
             Argument : out Tokens.Argument_Token
          ) is
   Index     : Integer := Pointer + 1;
   Malformed : Boolean := False;
   Underline : Boolean := False;
   Symbol    : Character;
begin
   while Index <= Line'Last loop
      Symbol := Line (Index);
      if Is_Alphanumeric (Symbol) then
         Underline := False;
      elsif '_' = Symbol then
         Malformed := Malformed or Underline;
         Underline := True;
      else
         exit;
      end if;
      Index := Index + 1;
   end loop;
   Malformed := Malformed or Underline;
   Set_Pointer (Code, Index);
   Argument.Location := Link (Code);
   Argument.Value := new Identifier (Index - Pointer);
   declare
      This : Identifier renames Identifier (Argument.Value.all);
   begin
      This.Location := Argument.Location;
      This.Malformed := Malformed;
      This.Value     := Line (Pointer..Index - 1);
   end;
end Get_Identifier;

Well, that's an astonishingly long piece of code, Dmitry, and if I read
it correctly it doesn't even check whether it begins on a name-first
character: that has to be decided before the procedure starts!

But I am not sure I do understand it. Even allowing for what I believe
is meant to be double underscore detection (except at the start and
end?) it takes significantly more study than the simple name-first,
name-follow code which preceded it.

Nevertheless, I take your point about you preferring

Index := Index + 1;

over

Index++

and your preference for a separate step to transfer the characters.

...

Again, IMO it's important for the language to provide such
pseudofunctions so that a programmer's code can be made clearer,
simpler, and more readable.

Like this one?

    ++p+++

I don't have a +++ operator so I am not sure what that is supposed to
mean. It's no valid in my language.

It could a part of even simpler and brilliantly readable:

++p+++q+++r

In my language that's also a syntax error. You'd have to separate the
operators to make it legal.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Mon Nov 14 18:43:35 2022

On 14/11/2022 17:46, David Brown wrote:

On 14/11/2022 16:59, Bart wrote:

I have a dynamic language with proper, first class lists, trees,
strings, records, which take care of 90% of the pointer uses in a
C-class language. Yet they can still be useful. This is an extract
from a program that dumps the contents of an EXE file:

     coffptr:=makeref(pedata+coffoffset,imagefileheader)

     genstrln("Coff header:     "+tostr(coffptr^))

     genstrln("Machine:         "+tostr(coffptr^.machine,"h2")) >>      genstrln("Nsections:       "+tostr(coffptr^.nsections,"h2")) >>      genstrln("Timestamp:       "+tostr(coffptr^.timedatestamp,"h4"))
     genstrln("Symtab offset:   "+tostr(coffptr^.symtaboffset))
     genstrln("Nsymbols:        "+tostr(coffptr^.nsymbols))
     genstrln("Opt Hdr size:    "+tostr(coffptr^.optheadersize))
     genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b")) >>      genline()

None of that needs pointers or references.

Initialise a "coff_header" read-only unmutable variable from a slice of
the memory array holding the image.

At minimum, this tasks needs to the ability to take a block of bytes,
and variously interpret parts of it as primitive numeric types of
specific widths and signedness.

That can be helped by having pointers to such types. It can be further
helped by allowing a struct type which is a collection of such types in
a particular layout. And a way to transfer data from arbitrary bytes to
that struct object. Or to map the address of that struct into the middle
of that block. (Or as I do it above, set a pointer to that struct to the
middle of the block.)

This is stuff which is meat-and-drink to a lower-level language like C,
or like mine (even my scripting language).

It requires some effort in Python and the result will be clunky (and
probably require some add-on modules). While a functional language will struggle (to be accurate, the programmer will struggle because they've
chosen the wrong language).

Here's a more challenging record type that comes up in OBJ files:

type imagesymbol=struct
union
stringz*8 shortname
struct
u32 short
u32 long
end
u64 longname
end
u32 value
u16 sectionno
u16 symtype
byte storageclass
byte nauxsymbols
end

(Again, this is defined directly in my /dynamic/ scripting language.)

No thanks. My innovation is keeping this stuff simple, accessible,
fast, and at a human scale.

I disagree with your scepticism, but I agree that there are lots of
languages with different paradigms for different purposes.

However, making yet-another-C is IMHO a pointless exercise. It might be better in some ways, but not enough to make it worth the effort.

If you're going to use a C-class language, then why not one with some
modern refinements? That's what I do.

(For example, default 64-bit everything; a module scheme; value arrays;
sane type syntax; whole-program compilation; slices; expression-based
(see below); a 'byte' type! )

Obviously I can't ban `a + b`. Equally obviously, this code is pointless:

     a + b;

You can ban that. Rule number 42 - the result of an expression must be assigned to a variable, used in another expression, or passed as the
argument to a function call. No problem.

This is effectively what I did. The only expressions allowed as
standalone statements were assignments; function calls; increments.
Anything else required an `eval` prefix to force evaluation.

Given an expression-based language, what would you do? In the past,
after working with C, I would unthinkingly type:

     a = b

in my syntax instead of a := b. This wasn't an error: it would compare
a and b then discard the result. But it was a bug.

Any decent C compile (with the right options) will complain about the C equivalent, "a == b;", as a statement with no effect. It could just as easily be made an error in a language.

And if you follow my suggestion that expressions can't have
side-effects, then it's easy to distinguish between "statements" and "expressions" because you no longer have a C-style "expression statement".

Because my early languages were loosely based on Algol68, not C, they
were expression-based. Later I simplified to distinct statements and expressions, but now I've gone back.

Now both my languages are expression-based. That is, statements and
expressions are interchangeable. That's supposed to be good, right,
because FP languages work the same way? I think expression-based are
regarded as superior.

But it does make some things harder.

For a start, any expression can have side-effects, because an expression
can be or can include what you might call a statment.

So I can get rid of ++ here:

A[i++] := 0

but I could simply write it like this:

A[t:=i; i:=i+1; t] := 0

In which case I might as well keep the ++.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Mon Nov 14 19:10:04 2022

On 14/11/2022 18:43, James Harris wrote:

On 14/11/2022 15:24, Bart wrote:

On 14/11/2022 13:17, Dmitry A. Kazakov wrote:

On 2022-11-14 13:45, Bart wrote:

...

My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little
about it.

No, the point is that no reasonable code should be nonsense-looking
and conversely.

Increments cross that line.

But it performs a task that is needed, and in their absence, would
simply be implemented, less efficiently and with more cluttery code,
using other means.

I have my own misgivings about it: there are in all 6 varieties of
Increment (++x; --x; a:=++x; a:=--x; a:=x++; a:=x--), which are a pig
to implement, and they spoil the lines of code like this:

I thought you correctly said before that ++x was (++x; x) and x++ was
(t:=x; x++; rval(t)). Adding -- that's still only four varieties, IMO.

There are four value-returning versions, which are implemented
differently from non-value-returning or standalone versions:

++x
--x

Here, x++ and x-- are treated as ++x and --x. If I write this in my
dynamic language (using p^, as a simple variable results in a dedicated bytecode for the first two):

++(p^)
(p^)++
a:=++(p^)
a:=(p^)++

Then the generated bytecode is this (annotated):

pushm p # ++(p^)
incrptr

pushm p # (p^)++
incrptr

pushm p # a:=++(p^)
incrload
popm a

pushm p # a:=(p^)++
loadincr
popm a

Three lots of code (plus three more for --)

(Another operator with separate value/non-value versions is assignment.
With augmented assignment, I only support the non-value version, so a :=
(x +:= y) is not allowed.

I also split functions into non-value-returning procs, and
value-returning functions.

Calling a proc where a value is expected won't work. Calling a function
then discarding its value is sometimes implemented differently - like a
proc.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Mon Nov 14 20:51:22 2022

On 14/11/2022 19:02, James Harris wrote:

On 14/11/2022 17:28, David Brown wrote:

On 14/11/2022 16:14, James Harris wrote:

On 14/11/2022 10:24, Stefan Ram wrote:

James Harris <james.harris.1@gmail.com> writes:

Rather than allowing non-ASCII in source I came up with a scheme of
what
you might call 'named characters' extending the backslash idea of C to >>>>> allow names instead of single characters after the backslash. It's off >>>>> topic for this thread but it allows non-ASCII characters to be named >>>>> (such that the names consist of ASCII characters and would thus be
readable and universal).

   Here's an example of a Python program.

print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )

   It prints:

\"

I see they are Unicode names. If I were to support such names I would
have your example as something like

   print "\U:reverse solidus/\U:quotation mark/"

but I prefer shorter names such as

   print "\bksl/\q11/"

At least with Unicode someone has already defined a name for every
character, but Unicode includes a lot of nonsense such as a character
called

How about using the HTML character entity names? That would be
"\"". These are a good deal shorter than Unicode names but
are vastly better than inventing your own names.

<https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>

Thanks for the pointer. I /could/ include them with my syntax along the
lines of

"\H:Backslash/\H:quot/"

"&Backslash;" (Unicode U+2216 Set minus) is a different character ∖ from
\, which is "\" (Unicode U++005C Reverse solidus).

where H: indicates HTML names. I did consider them before but had to
reject them. I cannot remember all the reasons why, now, but from taking
a quick look they appear, like Unicode, to be more for printing than for processing. For example, there is frac45 for 4/5 but such a scheme
allows only the fractions which are predefined. Also, HTML names combine diacritics with characters (e.g. yacute) whereas AISI it's important for
them to be kept separate.

What's needed, IMO, is a set of names intended for /processing/ rather
than for typesetting.

That makes little sense to me. Are you intending to invent your own
character encoding, or your own fonts here? Are you planning on making
your own display or print system?

The character "⅘" is easily typed on *nix keyboards with a compose key
(with common setups), HTML has it as "&frac45;", Unicode has it as
"U+2158 Vulgar fraction four fifths". They support fractions that are
common enough to exist as characters in fonts. You can't add your own
personal "twenty two sevenths" character and expect it to turn up when
printed, nor will you ever come across it when reading files or
documents from elsewhere. (Of course you can choose to support only a
subset of the HTML or Unicode names.)

And what do you mean by "processing", and what makes you think it is
remotely relevant to separate diacritics from characters? In some
languages, "ä" is a letter "a" with a diacritic, in others it is an
entirely distinct letter of its own. The same applies to lots of
characters. Unicode has a complex system of "normalisation" for
relating combining diacritics and letters into single combined Unicode characters, which are often a better choice for display than you would
get with by displaying two individual graphemes.

Are you going to try to split up Chinese or Korean characters into their components? What about Mongolian, or Arabic?

If you try to mess with text or characters for "processing", you'll get
it wrong. The least bad thing you can do is make it convenient to use -
to input and output. That means UTF-8 and a way to type them in source
code (like C's "\uNNNN" or HTML's "&xNNNN;") and optionally a way to
name them (such as using HTML's names) when they are inconvenient to
type directly, as the Unicode hex numbers are hard to remember.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Mon Nov 14 21:45:37 2022

On 14/11/2022 18:37, James Harris wrote:

On 14/11/2022 14:47, David Brown wrote:

On 14/11/2022 11:44, James Harris wrote:

On 14/11/2022 09:26, David Brown wrote:

On 13/11/2022 18:11, James Harris wrote:

On 08/11/2022 16:29, David Brown wrote:

...

The expression you mention is just one of a myriad of what you might
consider to be potential nasties. If I am going to prohibit that one
then what about all the others?

Prohibit nasty ones.

Enumerating the 'nasty ones' is the problem. If there are 20 dyadic
operators then there are something like 400 ways of combining them. AIUI
you want me to pick some of those 400 and tell the programmer "you
cannot combine these two even where there's no type mismatch".

Then enable nice ones, and make everything else an error. (That's the
usual way to do it.)

A big step in that direction is to say that assignment is a statement,
not an expression,

Done that.

and that variables cannot be changed by side-effects.

I will not be doing that. I know you favour functional programming, and that's fine, but the language I am working on is unapologetically
imperative.

Many unapologetically imperative languages do not allow side-effects in expressions. It is a natural rule for functional programming languages,
since pure functional programming does not have side-effects or
modifiable variables at all. But there is absolutely /nothing/ about
being an imperative language that suggests you need to allow
side-effects or assignments /within/ expressions.

(How you relate this to function calls is a related and complex
issue that I have been glossing over here. An idea would be to
distinguish between "procedures" that may have side effects, and
"functions" that do not.)

That means there is no such thing as an "increment" operator - post or
pre.

It also /hugely/ simplifies the language - both for the programmer,
and for the implementer. If expressions have no side-effects, they
can be duplicated, split up, re-arranged, moved around in code, all
without affected the behaviour of the program.

This needs a separate discussion, David. It is far too big a topic for
this thread. (Feel free to start a new one; I have plenty to say!) All I
can say here is as I said above, the language I am working on is
imperative.

And being imperative has nothing to do with it, as I said above.

As for new threads, that's up to you - it's your language, and you can
discuss the aspects that interest you. I'm just trying to make
suggestions based on long experience, with suggestions for things that I
think will make a "better" language (for some value of "better") than
existing ones. I've used perhaps 30 programming languages. I don't
claim to have used them all extensively, and I have forgotten a /lot/,
but I think it is more than most people. And I've seen a lot of bad
code in many languages (some of the bad code I've written myself).

I'm hoping that you are trying to do something other than making a new
C. I'm hoping you are not trying to make an "ultimate language for
everyone and every use on every target". I'm hoping you are not trying
to re-write everything related to language development. I'm hoping you
are not trying to invent the "perfect" language after learning just one
or two existing languages.

These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.

OK, let's take the combination you mentioned:

++E++

I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well formed. >>>

The very fact that you are discussing how to define it means it is not
clear and obvious.

On that I must disagree.

You think it is /obvious/ what "++E++" means?

One cannot expect to understand a language
without at least learning the basics. Consider

a * b + c

I would parse that as (a*b)+c but not all languages would. As a reader
of infix code you would have to know the order in which operators would
be applied. You have to know the basics of a language in order to read
code written in it.

Agreed. But as soon as you say "infix operators with common
mathematical precedence for common operations", it's done. You've distinguished it from post-fix notation of Forth (where you would write
"a b * c + "), from strict left-to-write languages (where "a + b * c"
would parse "(a + b) * c"), and from pre-fix notation languages (where
you would perhaps write "+(*(a, b), c)" ).

"++E++" remains meaningless to experienced programmers.

It is not obvious which order the increments happen, or if the order
is defined, or if the order matters. It is not obvious what the
return value should be. It is not obvious where you have lvalues or
rvalues (not that a language should necessarily have such concepts).
It is not obvious what happens to E.

Of course it's not obvious to someone who doesn't know the rules of the language. A language designer cannot produce a language with no rules.
What a language designer /can/ do is to make the rules simple and understandable - but someone who reads the code still has to understand
what the rules are.

I do not think "++E++" will be clear and obvious to someone who /does/
know the rules of the language. Remember, this is not something that
will be commonly used and become idiomatic, like "*p++ = *q++;" is in C.
Programmers will always need to look up the details - that's why it is
not a good idea.

As for the rules we have been discussing here those I have come up with
are, in the main, the ones you would be familiar with; even the new ones
are logical and simple. Once you understand them it's incredibly easy to parse an expression, even of the kind which, to you, looks like gibberish.

In fact, I have to say that neither of those you have objected to should really look like gibberish, even to the uninitiated. Would you find

++A + B++

objectionable?

Yes.

If not, I cannot see why you would find

++E + E++

so objectionable, either.

It is worse, because you are changing the same thing twice in an
unordered manner.

Isn't the only new thing you, as a reader of
such code, would need to know is the operations are carried out in?
Otherwise it's just like the preceding expression.

...

There's a very tempting myth in language design that /defining/
behaviour is key - that gibberish and incorrect code can somehow be
made "correct" by defining its behaviour. You are not alone in this -
lots of languages try to achieve "no undefined behaviour" by defining
the behaviour of everything instead of banning things that have no
correct behaviour.

My reason for defining /apparent/ code behaviour is to ensure
computational consistency on different platforms. Who wouldn't want that?

My reason for disallowing such expressions is to ensure computational consistency on different platforms, /including/ the human reader at a
glance. Who wouldn't want /that/ ?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andy Walker@21:1/5 to David Brown on Mon Nov 14 23:28:44 2022

On 14/11/2022 20:45, David Brown wrote:
[To James:]

[...] But there is absolutely
/nothing/ about being an imperative language that suggests you need
to allow side-effects or assignments /within/ expressions.

If assignments within expressions are verboten, then you need to
either forbid assignments within functions or have two classes of function, those with and those without [inc sub-procedures]. If assignments are
allowed at all, then you cannot in general tell at compile time whether
any assignment is reached at run time, leading to further complications.
If you regard output as a side-effect, that too leads to problems. Yet
during program development it is common to insert temporary diagnostic
printing or variables. I can understand the concept of languages without assignments or other side effects; and of languages with them; I find it difficult to see the point of languages where such things are allowed in
some places and not in others. If we need hair shirts [and I'm not sure
that we do], they should be worn all the time, not put on and taken off
in accordance with arcane rules that only high priests understand.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Herold

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Tue Nov 15 01:14:42 2022

On 14/11/2022 16:21, James Harris wrote:

On 14/11/2022 11:11, Bart wrote:

On 14/11/2022 10:44, James Harris wrote:

On 14/11/2022 09:26, David Brown wrote:

...

OK, let's take the combination you mentioned:

   ++E++

I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well formed.

But does this have the same priorities as:

    op1 E op2

(where op2 is commonly done first) or does it have special rules, so
that in:

   --E++

the -- is done first? If it's different, then what is the ordering
when mixed with other unary ops?

You explained somewhere the circumstances where you think this is
meaningful, but I can't remember what the rules are and I can't find
the exact post.

The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
postfix ++ or --.

Is that for all unary operators or just ++ and --?

If only for those, then those exceptions will make rules for all unary
ops even more bizarre. If for all of them, then one consequence is that
`-P^` is parsed as `(-P)^`, so that you try to negate a pointer, instead
of what's at the pointer target.

However, I have my own exceptions, which are casts:

ref u16(P)^ := 0

If P was a byte pointer, this will now write 16 bits not 8. Here, the
cast is done first, differently from a unary operator.

But I cover this in my list below: casts are syntax so trump everything.
(Which comes first in `ref T (X)[i]`, I don't know, I'd have to test it.
But this would probably be written to as (X[i]) to remove doubt.)

Your example would probably be better expressed as

E - 1

!

As for the bigger picture of operator precedences, it's based on the transformations which operators naturally make to their operands. For example, comparisons (such as less-than) naturally take numbers and
produce booleans so their precedences put them after numeric operators
(+, * etc) and before boolean operators (and, or, not, etc). The natural series is (simplified)

locations (such as field selection and array indexing)
numbers (retrieved from those locations)
comparisons (of those numbers)
booleans

So this is about binary ops now? I make that even simpler by saying all
binary ops come after unary ops. The tighest binding, starting with the tightest, is:

Syntax-bound ("." and "[]" for example)
Unary ops (postfix LTR then prefix RTL)
Binary (**, Mul, Add, Compare, Logical)

Because if they ever start to return lvalues, then this becomes possible:

    ++E := 0
    E++ := 0

(Whichever one is legal in your scheme.) So I think there is little
useful expressivity to be gained.

Indeed, neither of those is useful.

Interestingly, I just tried

++E = 0;

with cc and c++ compilers. The first rejected it (as ++E needing to be
an lvalue); the second accepted it. That's not a reflection of either compiler, BTW, but without checking it's probably more to do with the language/dialect definition. FWIW I expect the second compiler could be persuaded to issue a warning about unreachable code or suchlike.

What does C++ expect it to mean? I know it allows the result of ?: as an lvalue, which C doesn't. But the meaning of that is clear, and I have it
too:

(a | b | c) := x # assign to b or c depending on a

(Actually you can have an arbitrarily complex expression on the LHS as
an lvalue, including Switch and if-then-elsif chains. But whether they
still work in my languages, I don't know as I rarely use the feature.

Actually it comes up indirectly here:

F((a | b | c))

when F is a function taking a reference parameter, so the arg must be an Lvalue. So either b or c is modified in F. This one works, because I
think it came up recently and it had to!)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Tue Nov 15 09:07:00 2022

On 14/11/2022 19:43, Bart wrote:

On 14/11/2022 17:46, David Brown wrote:

On 14/11/2022 16:59, Bart wrote:

I have a dynamic language with proper, first class lists, trees,
strings, records, which take care of 90% of the pointer uses in a
C-class language. Yet they can still be useful. This is an extract
from a program that dumps the contents of an EXE file:

coffptr:=makeref(pedata+coffoffset,imagefileheader)

genstrln("Coff header: "+tostr(coffptr^))

genstrln("Machine: "+tostr(coffptr^.machine,"h2"))
genstrln("Nsections: "+tostr(coffptr^.nsections,"h2"))
genstrln("Timestamp: "+tostr(coffptr^.timedatestamp,"h4"))
genstrln("Symtab offset: "+tostr(coffptr^.symtaboffset))
genstrln("Nsymbols: "+tostr(coffptr^.nsymbols))
genstrln("Opt Hdr size: "+tostr(coffptr^.optheadersize))
genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b"))
genline()

None of that needs pointers or references.

Initialise a "coff_header" read-only unmutable variable from a slice
of the memory array holding the image.

At minimum, this tasks needs to the ability to take a block of bytes,
and variously interpret parts of it as primitive numeric types of
specific widths and signedness.

Agreed. Still, pointers are unnecessary there.

That can be helped by having pointers to such types. It can be further helped by allowing a struct type which is a collection of such types in
a particular layout. And a way to transfer data from arbitrary bytes to
that struct object. Or to map the address of that struct into the middle
of that block. (Or as I do it above, set a pointer to that struct to the middle of the block.)

It is certainly handy to have a way of interpreting the bytes of an
image as a struct type with particular layout. It is not necessary, but
it is handy.

This is stuff which is meat-and-drink to a lower-level language like C,
or like mine (even my scripting language).

It requires some effort in Python and the result will be clunky (and probably require some add-on modules).

import struct # Standard module
bs = open("potato_c.cof").read()

machine, nsections, timestamp, symtaboffset, nsymbols,

optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

That's it. Three lines. I would not think of C for this kind of thing
- Python is /much/ better suited. I'd only start looking at C (or C++)
if I need so high speed that the Python code was not fast enough, even
with PyPy.

While a functional language will
struggle (to be accurate, the programmer will struggle because they've chosen the wrong language).

I don't believe that. I am not familiar enough with Haskell to be able
to give the code, but I have no doubts at all that someone experienced
with Haskell will manage it fine. IO is not hard in the language, and
it has all the built-in modules needed for such interfaces.

Haskell is apparently number 25 on the list of language popularities on
Github, with about 0.4% usage. That's not huge, but not insignificant
either. But then, it was never intended to be a major practical
language - though some people and companies (Facebook uses it for
content analysis) do use it for practical work. It's main motivations
are for teaching people good software development, developing new
techniques, algorithms and methods, and figuring out what "works" and
could be incorporated in other languages.

It is that last feature that is most noticeable. Most major modern
languages are not pure functional languages in themselves, but contain
aspects from functional programming. I can't think of any serious,
popular language with significant development in the last decade that
does not have lambdas and the ability to work with functions as objects.
Most high-level languages have list comprehensions and higher-order
functions (map, filter, etc.). Many support defining high-level data structures directly without the need to mess around with pointers
manually. Many encourage immutable data as the norm.

This is why I bring it up here - not because I think the OP should be
making a functional programming language, but because I think he should
be taking inspiration and incorporating ideas from that world.

Here's a more challenging record type that comes up in OBJ files:

type imagesymbol=struct
union
stringz*8 shortname
struct
u32 short
u32 long
end
u64 longname
end
u32 value
u16 sectionno
u16 symtype
byte storageclass
byte nauxsymbols
end

(Again, this is defined directly in my /dynamic/ scripting language.)

Again, peanuts in Python - and I expect also peanuts in Haskell.

No thanks. My innovation is keeping this stuff simple, accessible,
fast, and at a human scale.

I disagree with your scepticism, but I agree that there are lots of
languages with different paradigms for different purposes.

However, making yet-another-C is IMHO a pointless exercise. It might
be better in some ways, but not enough to make it worth the effort.

If you're going to use a C-class language, then why not one with some
modern refinements? That's what I do.

(For example, default 64-bit everything; a module scheme; value arrays;

sane type syntax; whole-program compilation; slices; expression-based
(see below); a 'byte' type! )

Nothing of that is /remotely/ worth making a new language and giving up
on everything C - tools, compilers, developer familiarity, libraries,
and all the rest.

I'm not saying that these are not good things (though I might disagree
with you on some of the details). I am saying that it is not worth it.

This is why we still have C, and why it is so popular in practice - it
is not because anyone thinks it is a "perfect" language, it is because
the benefits of the C ecosystem outweigh the small benefits of minor
variations of the language.

And I think most of what you like could be achieved by using a subset of
C++ along with a few template libraries. (To be fair, that was
certainly not the case when you started your language.)

Obviously I can't ban `a + b`. Equally obviously, this code is
pointless:

a + b;

You can ban that. Rule number 42 - the result of an expression must
be assigned to a variable, used in another expression, or passed as
the argument to a function call. No problem.

This is effectively what I did. The only expressions allowed as
standalone statements were assignments; function calls; increments.
Anything else required an `eval` prefix to force evaluation.

Good.

Given an expression-based language, what would you do? In the past,
after working with C, I would unthinkingly type:

a = b

in my syntax instead of a := b. This wasn't an error: it would
compare a and b then discard the result. But it was a bug.

Any decent C compile (with the right options) will complain about the
C equivalent, "a == b;", as a statement with no effect. It could just
as easily be made an error in a language.

And if you follow my suggestion that expressions can't have
side-effects, then it's easy to distinguish between "statements" and
"expressions" because you no longer have a C-style "expression
statement".

Because my early languages were loosely based on Algol68, not C, they
were expression-based. Later I simplified to distinct statements and expressions, but now I've gone back.

Now both my languages are expression-based. That is, statements and expressions are interchangeable. That's supposed to be good, right,
because FP languages work the same way? I think expression-based are regarded as superior.

I have nothing against expressions - I am against side-effects in
expressions.

But it does make some things harder.

For a start, any expression can have side-effects, because an expression
can be or can include what you might call a statment.

So I can get rid of ++ here:

A[i++] := 0

but I could simply write it like this:

A[t:=i; i:=i+1; t] := 0

In which case I might as well keep the ++.

Ban side-effects in expressions, and you have :

A[i] := 0
i = i + 1

It is not hard.

And of course, a large proportion of increments are in loops. So now
you have (mixing syntaxes from different languages to avoid prejudice) :

for i in range(10) {
A[i] = 0
}

Or :

for a& in A {
a = 0
}

Or :
A = [0 for a in A]

Or :
A = [0] * 10

Or :
A.set(0)

Or :

A = [0 .. ]

Or :

A = [0 .. ][range(A)]

There are endless choices here, none of which need an increment
operator, or pointers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Andy Walker on Tue Nov 15 09:16:07 2022

On 15/11/2022 00:28, Andy Walker wrote:

On 14/11/2022 20:45, David Brown wrote:
[To James:]

[...] But there is absolutely
/nothing/ about being an imperative language that suggests you need
to allow side-effects or assignments /within/ expressions.

If assignments within expressions are verboten, then you need to either forbid assignments within functions or have two classes of function, those with and those without [inc sub-procedures].

Yes, I mentioned in one post that functions are a complicating factor
that need to be discussed, but are probably best covered in a separate
thread. Two classes of function was a possibility I mentioned.

If assignments are
allowed at all, then you cannot in general tell at compile time whether
any assignment is reached at run time, leading to further complications.
If you regard output as a side-effect, that too leads to problems. Yet during program development it is common to insert temporary diagnostic printing or variables. I can understand the concept of languages without assignments or other side effects; and of languages with them; I find it difficult to see the point of languages where such things are allowed in
some places and not in others. If we need hair shirts [and I'm not sure that we do], they should be worn all the time, not put on and taken off
in accordance with arcane rules that only high priests understand.

Rules need to be clear, certainly. But it entirely possible to have a
language that allows assignment in some places and not others. Impure functional programming languages (such as OCaml) have that.

There is no easy answer to these kinds of design decisions, and in the
end it is the OP who must decide. I am suggesting ways to make the
language clear and perhaps easier for compiler analysis. There are
always trade-offs going on - every choice makes some things easier and
other things harder.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Tue Nov 15 08:42:19 2022

On 15/11/2022 01:14, Bart wrote:

On 14/11/2022 16:21, James Harris wrote:

On 14/11/2022 11:11, Bart wrote:

...

The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
postfix ++ or --.

Is that for all unary operators or just ++ and --?

Just ++ and ==, as stated.

If only for those, then those exceptions will make rules for all unary
ops even more bizarre. If for all of them, then one consequence is that
`-P^` is parsed as `(-P)^`, so that you try to negate a pointer, instead
of what's at the pointer target.

However, I have my own exceptions, which are casts:

    ref u16(P)^ := 0

Rather than casts I have conversions. Are they the same? I never really understood how people use the term 'cast'. Either way, you raise a good
point: Where should type conversions come in the order of precedence.

I currently have them as function calls so they come at the top. I would
have your example as

(ref u16)(P)* = 0

If P was a byte pointer, this will now write 16 bits not 8. Here, the
cast is done first, differently from a unary operator.

But I cover this in my list below: casts are syntax so trump everything. (Which comes first in `ref T (X)[i]`, I don't know, I'd have to test it.
But this would probably be written to as (X[i]) to remove doubt.)

...

Because if they ever start to return lvalues, then this becomes
possible:

    ++E := 0
    E++ := 0

(Whichever one is legal in your scheme.) So I think there is little
useful expressivity to be gained.

Indeed, neither of those is useful.

Interestingly, I just tried

   ++E = 0;

with cc and c++ compilers. The first rejected it (as ++E needing to be
an lvalue); the second accepted it. That's not a reflection of either
compiler, BTW, but without checking it's probably more to do with the
language/dialect definition. FWIW I expect the second compiler could
be persuaded to issue a warning about unreachable code or suchlike.

What does C++ expect it to mean?

Probably the same as my language (since it retains the lvalue) but I
don't know C++.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Tue Nov 15 09:35:19 2022

On 14/11/2022 19:51, David Brown wrote:

On 14/11/2022 19:02, James Harris wrote:

On 14/11/2022 17:28, David Brown wrote:

On 14/11/2022 16:14, James Harris wrote:

On 14/11/2022 10:24, Stefan Ram wrote:

James Harris <james.harris.1@gmail.com> writes:

Rather than allowing non-ASCII in source I came up with a scheme
of what
you might call 'named characters' extending the backslash idea of
C to
allow names instead of single characters after the backslash. It's >>>>>> off
topic for this thread but it allows non-ASCII characters to be named >>>>>> (such that the names consist of ASCII characters and would thus be >>>>>> readable and universal).

...

What's needed, IMO, is a set of names intended for /processing/ rather
than for typesetting.

That makes little sense to me. Are you intending to invent your own character encoding, or your own fonts here? Are you planning on making
your own display or print system?

As I've already said, Unicode and HTML are fine for output. Where
programmers work with the semantics of characters, however, they need characters to be in semantic categories, you know: letters, arithmetic
symbols, digits, different cases, etc. So far I've not come across
anything to support that multilingually. AISI what's needed is a way to
expand character encodings to bit fields such as

<category><base character><variant><diacritics><appearance>

where

category = group (e.g. alphabetic letters, punctuation, etc)
base character = main semantic identification (e.g. an 'a')
variant (e.g. upper or lower case)
diacritics (those applied to this character in this location)
appearance (e.g. a round 'a' or a printer's 'a' or unspecified)

Note that that's purely about semantics; it doesn't include typefaces or character sizes or bold or italic etc which are all for rendering.

The character "⅘" is easily typed on *nix keyboards with a compose key (with common setups), HTML has it as "&frac45;", Unicode has it as
"U+2158 Vulgar fraction four fifths". They support fractions that are common enough to exist as characters in fonts. You can't add your own personal "twenty two sevenths" character and expect it to turn up when printed, nor will you ever come across it when reading files or
documents from elsewhere. (Of course you can choose to support only a subset of the HTML or Unicode names.)

Anything like that which doesn't scale should be classified as
patronising garbage. I can give you my more-negative comments about such nonsense, if you like. >:-|

If you want to discuss this further I'd appreciate it if you could start another thread; I would reply to it. This thread is already full!

And what do you mean by "processing", and what makes you think it is
remotely relevant to separate diacritics from characters?

A standard is needed. Which to use? If there are 100 letters and 40
diacritics then there would be 140 codes. If they were to be
amalgamated, however, then you could have up to 4000 combined
characters. And that's allowing each letter to have only one diacritic.
Allow two and you have 160,000 potential characters. Etc.

In some
languages, "ä" is a letter "a" with a diacritic, in others it is an
entirely distinct letter of its own. The same applies to lots of characters. Unicode has a complex system of "normalisation" for
relating combining diacritics and letters into single combined Unicode characters, which are often a better choice for display than you would
get with by displaying two individual graphemes.

See above.

Are you going to try to split up Chinese or Korean characters into their components? What about Mongolian, or Arabic?

I'd consider it but I don't yet know enough about those languages or how
they are typically processed in programs to comment.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 15 11:06:55 2022

On 2022-11-15 10:35, James Harris wrote:

As I've already said, Unicode and HTML are fine for output. Where
programmers work with the semantics of characters, however, they need characters to be in semantic categories, you know: letters, arithmetic symbols, digits, different cases, etc. So far I've not come across
anything to support that multilingually. AISI what's needed is a way to expand character encodings to bit fields such as

<category><base character><variant><diacritics><appearance>

where

category = group (e.g. alphabetic letters, punctuation, etc)
base character = main semantic identification (e.g. an 'a')
variant (e.g. upper or lower case)
diacritics (those applied to this character in this location)
appearance (e.g. a round 'a' or a printer's 'a' or unspecified)

Note that that's purely about semantics; it doesn't include typefaces or character sizes or bold or italic etc which are all for rendering.

I am not sure what are you trying to say. The Unicode characterization
is defined in the file:

https://unicode.org/Public/UNIDATA/UnicodeData.txt

Programming languages like Ada (since 2005) define blanks, punctuation,
letters etc in terms of Unicode characterization in order to support multilingual programs. Not the best idea, IMO, but for what it is worth:

https://docs.adacore.com/live/wave/arm12/html/arm12/arm12-2-3.html#S0002

There is no problem with Unicode string literals whatsoever. You just
place characters as they are. The only escape is "" for ". That is all.

Surely programmers are advised to never ever use anything but ASCII in identifiers and literals. If you need something else, use the code point
to string conversion and concatenation:

der_Aerger := Wide_Character'Val (16#C4#) & "rger";

(Diaeresis rhymes with diarrhea (:-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 15 13:14:44 2022

On 2022-11-15 12:44, James Harris wrote:

Do you also believe that the Unix

bytes = read(fd, &buf[1], reqd);

should be prohibited since it has the side effect within the expression
of modifying the buffer? If so, what would you replace it with??

That is simple. Ada's standard library has it:

procedure Read
( Stream : in out Root_Stream_Type;
Item : out Stream_Element_Array;
Last : out Stream_Element_Offset
) is abstract;

Item is an array:

type Stream_Element_Array is
array (Stream_Element_Offset range <>) of aliased Stream_Element;

It is also a "virtual" operation in C++ terms to be overridden by new implementation of stream. Last is the index of the last element read.
Notice non-sliding bounds, as you can do this:

Last := Buff'First - 1;
loop
Read (S, Buff (Last + 1..Buff'Last), Last); -- Non-blocking chunk
exit when Last = Buff'Last; -- Done
end loop;

Since bounds do not slide Last stays valid for all array slices.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Tue Nov 15 11:44:16 2022

On 14/11/2022 20:45, David Brown wrote:

On 14/11/2022 18:37, James Harris wrote:

On 14/11/2022 14:47, David Brown wrote:

On 14/11/2022 11:44, James Harris wrote:

On 14/11/2022 09:26, David Brown wrote:

...

A big step in that direction is to say that assignment is a
statement, not an expression,

Done that.

and that variables cannot be changed by side-effects.

I will not be doing that. I know you favour functional programming,
and that's fine, but the language I am working on is unapologetically
imperative.

Many unapologetically imperative languages do not allow side-effects in expressions. It is a natural rule for functional programming languages, since pure functional programming does not have side-effects or
modifiable variables at all. But there is absolutely /nothing/ about
being an imperative language that suggests you need to allow
side-effects or assignments /within/ expressions.

Do you also believe that the Unix

bytes = read(fd, &buf[1], reqd);

should be prohibited since it has the side effect within the expression
of modifying the buffer? If so, what would you replace it with??

...

These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.

OK, let's take the combination you mentioned:

   ++E++

I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well
formed.

The very fact that you are discussing how to define it means it is
not clear and obvious.

On that I must disagree.

You think it is /obvious/ what "++E++" means?

If you don't know the rules then its not obvious.

If you know the rules then it's *blindingly* obvious. What's more, the
rules are easy to learn.

...

"++E++" remains meaningless to experienced programmers.

It may be meaningless to programmers who wrongly try to apply the rules
of other languages to it but why would you do that? It's an invalid
thing to do. Languages differ.

It is not obvious which order the increments happen, or if the order
is defined, or if the order matters. It is not obvious what the
return value should be. It is not obvious where you have lvalues or
rvalues (not that a language should necessarily have such concepts).
It is not obvious what happens to E.

Of course it's not obvious to someone who doesn't know the rules of
the language. A language designer cannot produce a language with no
rules. What a language designer /can/ do is to make the rules simple
and understandable - but someone who reads the code still has to
understand what the rules are.

I do not think "++E++" will be clear and obvious to someone who /does/
know the rules of the language.

It parses as

(++E)++

Operations would appear to be applied in that order, prefix then
postfix. It's not complicated. Take an example:

E = 5
A = (++E)++ ;if it helps you to see it that way but parens not needed
print A, E

result:

6 7

If you want a bit more formality:

++E ==> (E := E + 1; E)
E++ ==> (T := E; E := E + 1; valof(T))

Remember, this is not something that
will be commonly used and become idiomatic, like "*p++ = *q++;" is in C.
Programmers will always need to look up the details - that's why it is
not a good idea.

Maybe you find it hard to read because you are trying to look at it as a
single operation - a bit like a spaceship symbol? It's not. It's two
entirely separate operations that just happen to be adjacent. Think of
them in that way and as long as you know the order in which they will be applied the overall effect is obvious.

As for the rules we have been discussing here those I have come up
with are, in the main, the ones you would be familiar with; even the
new ones are logical and simple. Once you understand them it's
incredibly easy to parse an expression, even of the kind which, to
you, looks like gibberish.

In fact, I have to say that neither of those you have objected to
should really look like gibberish, even to the uninitiated. Would you
find

   ++A + B++

objectionable?

Yes.

If not, I cannot see why you would find

   ++E + E++

so objectionable, either.

It is worse, because you are changing the same thing twice in an
unordered manner.

That's not true. I've said many times that the apparent order would be
defined. Did you not read what I wrote or is there some other reason you
still think it would be unordered?

Operands to binops like "+" would appear to be evaluated left-then-right.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Andy Walker on Tue Nov 15 12:16:18 2022

On 14/11/2022 23:28, Andy Walker wrote:

On 14/11/2022 20:45, David Brown wrote:
[To James:]

[...] But there is absolutely
/nothing/ about being an imperative language that suggests you need
to allow side-effects or assignments /within/ expressions.

If assignments within expressions are verboten, then you need to either forbid assignments within functions or have two classes of function, those with and those without [inc sub-procedures]. If assignments are allowed at all, then you cannot in general tell at compile time whether
any assignment is reached at run time, leading to further complications.
If you regard output as a side-effect, that too leads to problems. Yet during program development it is common to insert temporary diagnostic printing or variables. I can understand the concept of languages without assignments or other side effects; and of languages with them; I find it difficult to see the point of languages where such things are allowed in
some places and not in others. If we need hair shirts [and I'm not sure that we do], they should be worn all the time, not put on and taken off
in accordance with arcane rules that only high priests understand.

Largely agreed. I was going to reply to that to tackle the 'alarming
wave' (tm) of recommendations in this group for functional programming
but as it may snowball into a bigger discussion I've started a new
thread. qv

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Tue Nov 15 15:11:26 2022

On 15/11/2022 09:42, James Harris wrote:

On 15/11/2022 01:14, Bart wrote:

On 14/11/2022 16:21, James Harris wrote:

On 14/11/2022 11:11, Bart wrote:

...

The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
postfix ++ or --.

Is that for all unary operators or just ++ and --?

Just ++ and ==, as stated.

If only for those, then those exceptions will make rules for all unary
ops even more bizarre. If for all of them, then one consequence is
that `-P^` is parsed as `(-P)^`, so that you try to negate a pointer,
instead of what's at the pointer target.

However, I have my own exceptions, which are casts:

     ref u16(P)^ := 0

Rather than casts I have conversions. Are they the same? I never really understood how people use the term 'cast'. Either way, you raise a good point: Where should type conversions come in the order of precedence.

In C terminology, a "cast" is an explicit conversion. So :

int x = 123;
double y = 12.3;

x = y;

is an implict conversion - the value in "y", of type "double", is
converted to type "int" automatically.

x = (int) y;

is a cast - it is an /explicit/ conversion. In this case, it does
exactly the same thing to the value, of course.

"Typecast", on the other hand, is something that happens to actors when
they have played on kind of role too long and people don't think they
can change. It has no meaning in C terminology - though people use it, thinking it means "cast".

Other languages can have slightly different terminology.

Because if they ever start to return lvalues, then this becomes
possible:

    ++E := 0
    E++ := 0

(Whichever one is legal in your scheme.) So I think there is little
useful expressivity to be gained.

Indeed, neither of those is useful.

Interestingly, I just tried

   ++E = 0;

with cc and c++ compilers. The first rejected it (as ++E needing to
be an lvalue); the second accepted it. That's not a reflection of
either compiler, BTW, but without checking it's probably more to do
with the language/dialect definition. FWIW I expect the second
compiler could be persuaded to issue a warning about unreachable code
or suchlike.

What does C++ expect it to mean?

Probably the same as my language (since it retains the lvalue) but I
don't know C++.

In both C and C++, "++E" means precisely the same as "E += 1", which
again means precisely the same as "E = E + 1". (This is assuming no overloaded operators in C++.)

In C, the result of "(E = E + 1)" is the /value/ of E after the
addition, and is thus an lvalue.

In C++, the result is a /reference/ to E after the addition, and
therefore an lvalue.

I don't know the precise reasoning for the difference - perhaps it is
simply that the addition of references to the language made it natural
to have an lvalue in such cases, while C does not have references.

("References" in C++ terminology are just names for lvalues. They can
be thought of as non-null pointers that are dereferenced automatically -
except the compiler will not actually make pointers or put things in
memory unless necessary.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Tue Nov 15 15:22:45 2022

On 15/11/2022 12:44, James Harris wrote:

On 14/11/2022 20:45, David Brown wrote:

On 14/11/2022 18:37, James Harris wrote:

On 14/11/2022 14:47, David Brown wrote:

On 14/11/2022 11:44, James Harris wrote:

On 14/11/2022 09:26, David Brown wrote:

...

A big step in that direction is to say that assignment is a
statement, not an expression,

Done that.

and that variables cannot be changed by side-effects.

I will not be doing that. I know you favour functional programming,
and that's fine, but the language I am working on is unapologetically
imperative.

Many unapologetically imperative languages do not allow side-effects
in expressions. It is a natural rule for functional programming
languages, since pure functional programming does not have
side-effects or modifiable variables at all. But there is absolutely
/nothing/ about being an imperative language that suggests you need to
allow side-effects or assignments /within/ expressions.

Do you also believe that the Unix

bytes = read(fd, &buf[1], reqd);

should be prohibited since it has the side effect within the expression
of modifying the buffer? If so, what would you replace it with??

As I said before (a couple of times at least), function calls are
another matter and should be considered separately.

One possibility is to distinguish between "functions" that have no side
effects and can therefore be freely mixed, re-arranged, duplicated,
omitted, etc., and "procedures" that have side-effects and must be
called exactly as requested in the code. Such "procedures" would not be allowed in expressions - only as statements or part of assignment
statements.

In many cases where you have modification of parameters or passing by
non-const address, a more advanced language could use multiple returns :

bytes, data = read(fd, max_count)

But that might require considerable compiler effort to generate
efficient results in other cases.

...

These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the >>>>>> combination can be used and what it does, it is still not a good
or useful thing.

OK, let's take the combination you mentioned:

++E++

I wonder why you see a problem with it. As I see it, it increments
E before evaluation and then increments E after evaluation. What is
so complex about that? It does exactly what it says on the tin, and
in the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well
formed.

The very fact that you are discussing how to define it means it is
not clear and obvious.

On that I must disagree.

You think it is /obvious/ what "++E++" means?

If you don't know the rules then its not obvious.

Yes.

If you know the rules then it's *blindingly* obvious. What's more, the
rules are easy to learn.

No. If you see that written, it is blindingly obvious that the
programmer is a smart-arse that thinks it is "cool" to write something
that looks like a flourish or ornament at the end of a book chapter,
instead of writing "E += 2" or "E = E + 2".

I am not interested in pandering to smart-arse programmers. There are
too many of them already, and they don't need encouragement.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Tue Nov 15 15:26:57 2022

On 15/11/2022 08:07, David Brown wrote:

On 14/11/2022 19:43, Bart wrote:

It requires some effort in Python and the result will be clunky (and probably require some add-on modules).

import struct        # Standard module
bs = open("potato_c.cof").read()

machine, nsections, timestamp, symtaboffset, nsymbols,

optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

That's it. Three lines. I would not think of C for this kind of thing
- Python is /much/ better suited. I'd only start looking at C (or C++)
if I need so high speed that the Python code was not fast enough, even
with PyPy.

I said it will be clunky and require add-on modules and it is and does.
(BTW you might be missing an argument in that struct.unpack_from call.)

Using that approach for the nested structs and unions of my other
example is not so straightforward. You basically have to fight for every
field.

The result is a tuple of unnamed fields. You really want a proper
record, which is yet another add-on, with a choice of several modules
depending on which set of characterics you need.

The elements of the tuple are also normal Python variables. If you
wanted to modify elements (which needs a mutable tuple anyway), they
will not behave the same way as those packed types, and then you'd have
to write the whole struct back (using .pack), and will need to know its provenance, which here has been lost. With a reference like mine, that
is built-in.

In short, it's a hack. But it's a typical approach using in scripting languages.

'struct' is also not a true Python module; it's a front end for an
internal one called `_struct`, likely implemented in C, and almost
certainly using pointers.

(Whereas, by having pointers as intrinsic features, I know I could
implement such a module in my language, if I needed to. Think of it as a language building feature.)

While a functional language will
struggle (to be accurate, the programmer will struggle because they've chosen the wrong language).

I don't believe that. I am not familiar enough with Haskell to be able
to give the code, but I have no doubts at all that someone experienced
with Haskell will manage it fine. IO is not hard in the language, and
it has all the built-in modules needed for such interfaces.

Haskell is apparently number 25 on the list of language popularities on Github, with about 0.4% usage. That's not huge, but not insignificant either. But then, it was never intended to be a major practical
language - though some people and companies (Facebook uses it for
content analysis) do use it for practical work. It's main motivations
are for teaching people good software development, developing new
techniques, algorithms and methods, and figuring out what "works" and
could be incorporated in other languages.

I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.

Anything that makes a big deal out of closures, continuations, currying, lambdas and higher order functions. I have little use for such things otherwise.

Haskell is great for elegantly defining certain kinds of types and
algorithms, not so good for reams of boilerplate code or UI stuff which
is what much of programming is.

It doesn't even have loops (AIUI); one task of the EXE reader is to
displays lists of sections, imports, exports, base relocations...

Loops are such a basic requirement, and yet a language designer decides
they don't need them. After all you can emulate any loop using
recursion, no matter that it makes for less readable, less intuitive
(and less efficient) code.

It is that last feature that is most noticeable. Most major modern languages are not pure functional languages in themselves, but contain aspects from functional programming.

Yeah. It turns out that pretty every language you've heard of (except C)
has higher-order functions. Too much pressure from academics I reckon.

Such features have some very subtle behaviours which I find incredibly
hard to get my head around.

(See the 'twice plus-three' example in https://en.wikipedia.org/wiki/Higher-order_function. It me ages to
figure out what was going on there, and what was needed in an
implementation to make it work.)

They make understanding whatever:

E++^

means child's play by comparison. And yet THIS is the feature you want
to ban! I don't get it. Remember not everone is a mathematician.

E++^ is well defined in my code. Instead of:

doswitch c:=p++^
when 'A'..'Z' then...

which involves modifying two things, I'd have to write ... actually I
don't know what; there is nowhere to put them. I'd have to split that
loop and switch, and add a new indent level:

do
c := S[p] # change to an indexed string
p := p + 1
switch c
...

It's a bit like turning HLL code into assembly! Plus, with that extra S,
two extra 'p's, and that '1' there are quite a few extra things to get
wrong, and extra lines for somebody to grok and relate to each other.

I can't think of any serious,
popular language with significant development in the last decade that
does not have lambdas and the ability to work with functions as objects.

Exactly, and that is totally wrong. Too much attention is paid to
academics who seem to know little about designing accessible languages.

In Python, every function is really a variable initialised [effectively]
to some anonymous function. Which means that with 100% of the functions,
you can do this for any defined function F:

F = 42

Or, more subtly, setting it to any arbitrary functions. That sounds
incredibly unsafe.

So Python has immutable tuples, but mutable functions! Every identifier
is a variable that you can rebind to something else.

With my scripting language, you can do that with exactly 0% of the
defined functions. If you want mutable function references, /then/ you
use a variable: G := F.

This is why I bring it up here - not because I think the OP should be
making a functional programming language, but because I think he should
be taking inspiration and incorporating ideas from that world.

Sure, I've taken lots of ideas from functional languages, ones that
still work in an imperative style. For example I use functions like
head() and tail() from Haskell (except mine aren't lazy). I once had
list-comps too, but they fell into disuse.

I don't have lambda functions, but I have a thing called deferred code,
which I haven't yet gotten around to. The problem is figuring out
exactly how in-depth the implementation should be, because no matter
what you do, there will be yet another example from the FP world which
reveals one more dimension you hadn't realised existed.

Then I start to think, I don't really want the people who might use a
language like mine to need to bother their heads about it. Code should
be clear and obvious; relying on the incredibly subtle and obscure
behaviours associated with lambdas, closures et al, isn't.

Here's a more challenging record type that comes up in OBJ files:

      type imagesymbol=struct
          union
              stringz*8 shortname
              struct
                  u32   short
                  u32   long
              end
              u64      longname
          end
          u32 value
          u16 sectionno
          u16 symtype
          byte storageclass
          byte nauxsymbols
      end

(Again, this is defined directly in my /dynamic/ scripting language.)

Again, peanuts in Python - and I expect also peanuts in Haskell.

I'm sure there is a way to do it with enough effort. But as effortlessly
as this, as readable, and with the abilty to just do P.shortname to get
an actual string? You will get there eventually, but it's basically DIY.

You don't believe amateurs can add value to what mainstream languages
can do. The trouble is that you don't appreciate the things that add
value, so long as there is some unreadable, unsafe way to hack your way
around a task.

Nothing of that is /remotely/ worth making a new language and giving up
on everything C - tools, compilers, developer familiarity, libraries,
and all the rest.

With an army of people behind it, such tools can be created for a new
language.

In the other group, I mentioned how C code is still predominantly
32-bit, even on 64-bit hardware. That is a big one just by itself.

But, what do I care? MY language /is/ fully 64-bit, I can do 1<<60
without remembering to do 1ULL<60, it /has/ a module system, namespaces,
the works, and that gives me a kick when I compare it to C.

I'm not saying that these are not good things (though I might disagree
with you on some of the details). I am saying that it is not worth it.

This is why we still have C, and why it is so popular in practice - it
is not because anyone thinks it is a "perfect" language, it is because
the benefits of the C ecosystem outweigh the small benefits of minor variations of the language.

This is exactly why there can still be a place for a language like C,
but tidied up and brought up-to-date without all its baggage. People
want a language they feel they know inside-out, and can be made to do
anyway; they want to feel in charge and confident.

Except that the people creating alternatives, usually try to do much,
and lose many of the attributes of C that make it attractive.

And I think most of what you like could be achieved by using a subset of
C++ along with a few template libraries. (To be fair, that was
certainly not the case when you started your language.)

Templates have problems. Whatever problem they are a solution too, needs
to be done another way if you want a language that is much, much simpler
and faster to build than C++.

Ban side-effects in expressions, and you have :

    A[i] := 0
    i = i + 1

It is not hard.

It IS hard. Did you miss the bit where I said that expressions and
statements are interchangeable in an expression-based language? So you
HAVE to allow anything within those square brackets.

I use 'unit' to refer to any expression-or-statement. So the syntax for
an array index is A[unit] with a single unit, however my example used a sequence of three, not allowed. But that just means I'd have to write it
like this:

A[(t:=i; i:=i=1; t)] := 0

And of course, a large proportion of increments are in loops. So now
you have (mixing syntaxes from different languages to avoid prejudice) :

    for i in range(10) {
        A[i] = 0
    }

Or :

    for a& in A {
        a = 0
    }

Or :
    A = [0 for a in A]

Or :
    A = [0] * 10

Or :
    A.set(0)

Or :

    A = [0 .. ]

Or :

    A = [0 .. ][range(A)]

There are endless choices here, none of which need an increment
operator, or pointers.

And in FP, you don't have loops, or assignments. At this rate there
won't be anything left! Why won't we all just code in lambda calculus as
that can apparently represent any program.

There are still plenty of increments outside of loops (actually I don't
use for-loops much in my programs, mainly in smaller contexts), as well
as inside loops when you're incrementing something that doesn't happen
to be the loop index.

This discussion is about whether to have a shorter way of writing:

<expr> := <the exact same expr> + 1

And whether that is:

<expr> +:= 1

or either of:

++<expr>
<expr>++

(This example uses the non-value-returning variety.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Tue Nov 15 16:22:25 2022

On 15/11/2022 15:26, Bart wrote:

On 15/11/2022 08:07, David Brown wrote:

...

Most major modern
languages are not pure functional languages in themselves, but contain
aspects from functional programming.

Yeah. It turns out that pretty every language you've heard of (except C)
has higher-order functions. Too much pressure from academics I reckon.

Such features have some very subtle behaviours which I find incredibly
hard to get my head around.

(See the 'twice plus-three' example in https://en.wikipedia.org/wiki/Higher-order_function. It me ages to
figure out what was going on there, and what was needed in an
implementation to make it work.)

They make understanding whatever:

E++^

means child's play by comparison. And yet THIS is the feature you want
to ban! I don't get it.

Good point! Preferences definitely vary.

...

This discussion is about whether to have a shorter way of writing:

<expr> := <the exact same expr> + 1

And whether that is:

<expr> +:= 1

or either of:

++<expr>
<expr>++

(This example uses the non-value-returning variety.)

Yes, AISI the discussion was primarily about what such operations should
mean and how they should be ordered relative to each other. The subtext
was whether they should be included at all. At least now I've got a good
way to include them that choice is still open. That's far better than
just simply banning them and avoiding the challenge.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 15 17:35:44 2022

On 2022-11-15 17:22, James Harris wrote:

Yes, AISI the discussion was primarily about what such operations should
mean and how they should be ordered relative to each other. The subtext
was whether they should be included at all. At least now I've got a good
way to include them that choice is still open. That's far better than
just simply banning them and avoiding the challenge.

That depends on your priorities. E.g. the Rubik's cube. You rotate a
row, four sides change. That's fun. But me, a prosaic programmer, just
pluck the slates one by one and put them back in order... (:-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Tue Nov 15 16:59:36 2022

On 15/11/2022 16:35, Dmitry A. Kazakov wrote:

On 2022-11-15 17:22, James Harris wrote:

Yes, AISI the discussion was primarily about what such operations
should mean and how they should be ordered relative to each other. The
subtext was whether they should be included at all. At least now I've
got a good way to include them that choice is still open. That's far
better than just simply banning them and avoiding the challenge.

That depends on your priorities. E.g. the Rubik's cube. You rotate a
row, four sides change. That's fun. But me, a prosaic programmer, just
pluck the slates one by one and put them back in order... (:-))

:-)

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Tue Nov 15 16:58:05 2022

On 15/11/2022 14:22, David Brown wrote:

On 15/11/2022 12:44, James Harris wrote:

...

Do you also believe that the Unix

bytes = read(fd, &buf[1], reqd);

should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??

As I said before (a couple of times at least), function calls are
another matter and should be considered separately.

Then you wouldn't be able to prevent a programmer coding

a = b + nudge_up(&c) + d;

and therefore the programmer may query why ++c is not available in the
first place.

BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward to use.

One possibility is to distinguish between "functions" that have no side effects and can therefore be freely mixed, re-arranged, duplicated,
omitted, etc., and "procedures" that have side-effects and must be
called exactly as requested in the code. Such "procedures" would not be allowed in expressions - only as statements or part of assignment
statements.

Classifying functions by whether they have side effects or not is not as clear-cut as it may at first appear. Please see the thread I started
today on functional programming.

In many cases where you have modification of parameters or passing by non-const address, a more advanced language could use multiple returns :

bytes, data = read(fd, max_count)

But that might require considerable compiler effort to generate
efficient results in other cases.

Thanks for the suggestion. I wondered about that and I like it in
principle but couldn't see how one would then sensibly (i.e. efficiently
and in keeping with the rest of the language) then go on to write the
data (whose length we would not know in advance, although we would know
its maximum size) to the correct part of the buffer.

...

You think it is /obvious/ what "++E++" means?

If you don't know the rules then its not obvious.

Yes.

If you know the rules then it's *blindingly* obvious. What's more, the
rules are easy to learn.

No. If you see that written, it is blindingly obvious that the
programmer is ...

No, it is not. In languages in which 'nudge' operators are supported
many programmers may write

++E

as a subexpression if they want E to be incremented before it is
evaluated. They may also write

E++

if they want E to be incremented after it is evaluated. And if the
algorithm they are they are implementing calls for E to be incremented
before and after then programmers should be able to code both. This is
not about them being clever. It's about them being able to have the code naturally express the intent of the algorithm and to reflect the
processing that's in the programmer's mind.

All that's required, compared with C, is for the apparent evaluation
order to be defined.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Tue Nov 15 18:31:55 2022

On 15/11/2022 17:58, James Harris wrote:

On 15/11/2022 14:22, David Brown wrote:

On 15/11/2022 12:44, James Harris wrote:

...

Do you also believe that the Unix

bytes = read(fd, &buf[1], reqd);

should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??

As I said before (a couple of times at least), function calls are
another matter and should be considered separately.

Then you wouldn't be able to prevent a programmer coding

a = b + nudge_up(&c) + d;

Why wouldn't I (as a language designer) be able to prevent that?

We don't live in the days of assemblers where the only thing you know
about a function is its name and a hope that you've got the parameters
right. The declaration of a function (from module import, or however
you want to handle it) can include all sorts of information about the
function. Is it pure? Is it small, suitable for inlining? Can it
throw a exceptions? What are its pre-conditions and post-conditions?
(Some languages support "programming by contract" to make it hugely
easier to check correctness.)

and therefore the programmer may query why ++c is not available in the
first place.

You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something, allow
it. If you want to prohibit it, prohibit it. Don't run away claiming
you had no choice or hide behind assertions about what other programmers
or other languages do.

If want side-effects in operators, functions, etc., then that's fine -
but it is /your/ choice to support that feature, and /your/ choice to
drop the benefits achievable by /not/ allowing them. You are not forced
to allow them, any more than you are forced to follow any of my
suggestions. (And often my suggestions are not given as
recommendations, but merely to air the options you have.)

BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward to use.

So do over-generalisations.

One possibility is to distinguish between "functions" that have no
side effects and can therefore be freely mixed, re-arranged,
duplicated, omitted, etc., and "procedures" that have side-effects and
must be called exactly as requested in the code. Such "procedures"
would not be allowed in expressions - only as statements or part of
assignment statements.

Classifying functions by whether they have side effects or not is not as clear-cut as it may at first appear. Please see the thread I started
today on functional programming.

You'll notice I've replied to it :-)

The classification is not /entirely/ clear-cut, but not for the reasons
you think or described. It is possible to allow side-effects that have
no visible or logical effect, but exist only as hidden effects for implementation efficiency purposes. Such a feature has to be controlled carefully to ensure there are not logical effects.

In many cases where you have modification of parameters or passing by
non-const address, a more advanced language could use multiple returns :

bytes, data = read(fd, max_count)

But that might require considerable compiler effort to generate
efficient results in other cases.

Thanks for the suggestion. I wondered about that and I like it in
principle but couldn't see how one would then sensibly (i.e. efficiently
and in keeping with the rest of the language) then go on to write the
data (whose length we would not know in advance, although we would know
its maximum size) to the correct part of the buffer.

I never said this would be easy!

...

You think it is /obvious/ what "++E++" means?

If you don't know the rules then its not obvious.

Yes.

If you know the rules then it's *blindingly* obvious. What's more,
the rules are easy to learn.

No. If you see that written, it is blindingly obvious that the
programmer is ...

No, it is not. In languages in which 'nudge' operators are supported
many programmers may write

++E

as a subexpression if they want E to be incremented before it is
evaluated. They may also write

E++

if they want E to be incremented after it is evaluated. And if the
algorithm they are they are implementing calls for E to be incremented
before and after then programmers should be able to code both. This is
not about them being clever. It's about them being able to have the code naturally express the intent of the algorithm and to reflect the
processing that's in the programmer's mind.

All that's required, compared with C, is for the apparent evaluation
order to be defined.

I can appreciate that you want to give a meaning to "++E", that you want
to give a meaning to "E++", and you expect programmers to use one or the
other in different contexts. I can appreciate that you want to define
order of evaluation within expressions.

But I have yet to see any indication that "++E++" could ever be a
sensible expression in any real code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Tue Nov 15 17:32:13 2022

On 14/11/2022 15:23, David Brown wrote:

On 14/11/2022 11:47, Bart wrote:

...

In-place, value-returning increment ops written as ++ and -- are
common in languages.

Yes. And bugs are common in programs. Being common does not
necessarily mean it's a good idea.

(It doesn't necessarily mean it's a bad idea either - I am not implying
that increment and decrement are themselves a major cause of bugs! But mixing side-effects inside expressions /is/ a cause of bugs.)

The side effects of even something awkward such as

*(++p) = *(q++);

are little different from those of the longer version

p = p + 1;
*p = *q;
q = q + 1;

The former is clearer, however. That makes it easier to see the intent..

Just blaming operators you don't like is unsound - especially since, as
you seem to suggest below, you use them in your own code!!!

...

[discussion of ++ and -- operators]

Is your point that you shouldn't have either of those operators?

Yes! What gave it away - the first three or four times I said as much?

...

... (Of course I use increment operator, especially in loops,
because that's how C is written. But a new language can do better than that.)

If you think ++ and -- shouldn't exist then why not ban them from your
own programming for a while before you try to get them banned from a new language?

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Tue Nov 15 19:05:35 2022

On 15/11/2022 16:26, Bart wrote:

On 15/11/2022 08:07, David Brown wrote:

On 14/11/2022 19:43, Bart wrote:

It requires some effort in Python and the result will be clunky (and
probably require some add-on modules).

import struct        # Standard module
bs = open("potato_c.cof").read()

machine, nsections, timestamp, symtaboffset, nsymbols,

optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

That's it. Three lines. I would not think of C for this kind of
thing - Python is /much/ better suited. I'd only start looking at C
(or C++) if I need so high speed that the Python code was not fast
enough, even with PyPy.

I said it will be clunky and require add-on modules and it is and does.

It is not "clunky" by any sane view - certainly not compared to your
code (or code written in C). And no, it does not require add-on modules
- the "struct" module is part of Python.

(BTW you might be missing an argument in that struct.unpack_from call.)

No, I am not. There is an optional third argument, but it is optional.

Using that approach for the nested structs and unions of my other
example is not so straightforward. You basically have to fight for every field.

You have to define every field in every language, or define the ones you
want along with offsets to skip uninteresting data.

The result is a tuple of unnamed fields. You really want a proper
record, which is yet another add-on, with a choice of several modules depending on which set of characterics you need.

You can do that in Python.

The elements of the tuple are also normal Python variables. If you
wanted to modify elements (which needs a mutable tuple anyway), they
will not behave the same way as those packed types, and then you'd have
to write the whole struct back (using .pack), and will need to know its provenance, which here has been lost. With a reference like mine, that
is built-in.

In short, it's a hack. But it's a typical approach using in scripting languages.

In short, you are making up shit in an attempt to make your own language
look better than other languages, because you'd rather say something
silly than admit that any other language could be better in any way for
any task.

'struct' is also not a true Python module; it's a front end for an
internal one called `_struct`, likely implemented in C, and almost
certainly using pointers.

Please re-think what you wrote there. I hope you can realise how
ridiculous you are being.

(Whereas, by having pointers as intrinsic features, I know I could
implement such a module in my language, if I needed to. Think of it as a language building feature.)

While a functional language will
struggle (to be accurate, the programmer will struggle because they've >> > chosen the wrong language).

I don't believe that. I am not familiar enough with Haskell to be
able to give the code, but I have no doubts at all that someone
experienced with Haskell will manage it fine. IO is not hard in the
language, and it has all the built-in modules needed for such interfaces.

Haskell is apparently number 25 on the list of language popularities
on Github, with about 0.4% usage. That's not huge, but not
insignificant either. But then, it was never intended to be a major
practical language - though some people and companies (Facebook uses
it for content analysis) do use it for practical work. It's main
motivations are for teaching people good software development,
developing new techniques, algorithms and methods, and figuring out
what "works" and could be incorporated in other languages.

I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.

Anything that makes a big deal out of closures, continuations, currying, lambdas and higher order functions. I have little use for such things otherwise.

So because /you/ don't understand these things or how they are used, you
assume that people who /do/ understand them can't write programs in
functional programming languages?

Haskell is great for elegantly defining certain kinds of types and algorithms, not so good for reams of boilerplate code or UI stuff which
is what much of programming is.

As I mentioned in another posts, there are opinions, and there are
/qualified/ opinions.

It's /fine/ if functional programming doesn't interest you. No one can
be interested in everything or learn about all kinds of programming.
But wise people don't make categorical statements about topics they know nothing about.

It doesn't even have loops (AIUI); one task of the EXE reader is to
displays lists of sections, imports, exports, base relocations...

Loops are such a basic requirement, and yet a language designer decides
they don't need them. After all you can emulate any loop using
recursion, no matter that it makes for less readable, less intuitive
(and less efficient) code.

It is that last feature that is most noticeable. Most major modern
languages are not pure functional languages in themselves, but contain
aspects from functional programming.

Yeah. It turns out that pretty every language you've heard of (except C)
has higher-order functions. Too much pressure from academics I reckon.

Such features have some very subtle behaviours which I find incredibly
hard to get my head around.

You are a smart guy. You could get your head around it quite easily, if
only you were willing. (I can try and help, if you want - but only if
you promise not to give up before you start, claim that it is all
useless, ugly, or clunky without justification, and not to go off on a
tangent about how much better your own language is.)

(See the 'twice plus-three' example in https://en.wikipedia.org/wiki/Higher-order_function. It me ages to
figure out what was going on there, and what was needed in an
implementation to make it work.)

Try pasting the C++ code into godbot.org, and compile with -O2 :

auto twice = [](auto f)
{
return [f](int x) {
return f(f(x));
};
};

auto plus_three = [](int i)
{
return i + 3;
};

int foo()
{
auto g = twice(plus_three);

return g(7);
}

The compiler happily turns "foo" into "return 13".

But I agree that gcc has to work harder to implement this kind of thing
than your own compiler for your own language.

I can't think of any serious, popular language with significant
development in the last decade that does not have lambdas and the
ability to work with functions as objects.

Exactly, and that is totally wrong. Too much attention is paid to
academics who seem to know little about designing accessible languages.

It's not academics that use these features - it is practical
programmers. A big use of lambdas, for example, is in callbacks and
event handlers - used all the time in GUI programs and Javascript.

Academics may invent this kind of thing, and use languages like Haskell
to play with them - but they are implemented in real languages because
real programmers use them for real code. Rust, Go, C++ - these are not academics' languages.

In Python, every function is really a variable initialised [effectively]
to some anonymous function. Which means that with 100% of the functions,
you can do this for any defined function F:

    F = 42

Or, more subtly, setting it to any arbitrary functions. That sounds incredibly unsafe.

Python /is/ unsafe - it's a very dynamic language, with little
compile-time checking. It is checked at run-time.

But no, Python does not have variables at all. It has /names/, that are references to objects. A function is an object, usually (but not
necessarily) given a name with a "def" statement. That name can be
rebound to a different object, just like any other name.

So Python has immutable tuples, but mutable functions! Every identifier
is a variable that you can rebind to something else.

Functions are not mutable in Python. You misunderstand.

With my scripting language, you can do that with exactly 0% of the
defined functions. If you want mutable function references, /then/ you
use a variable: G := F.

Function pointers are not function objects.

This is why I bring it up here - not because I think the OP should be
making a functional programming language, but because I think he
should be taking inspiration and incorporating ideas from that world.

Sure, I've taken lots of ideas from functional languages, ones that
still work in an imperative style. For example I use functions like
head() and tail() from Haskell (except mine aren't lazy). I once had list-comps too, but they fell into disuse.

OK.

I don't have lambda functions, but I have a thing called deferred code,
which I haven't yet gotten around to. The problem is figuring out
exactly how in-depth the implementation should be, because no matter
what you do, there will be yet another example from the FP world which reveals one more dimension you hadn't realised existed.

There are always more possibilities.

Then I start to think, I don't really want the people who might use a language like mine to need to bother their heads about it. Code should
be clear and obvious; relying on the incredibly subtle and obscure
behaviours associated with lambdas, closures et al, isn't.

A major part of clarity is in the mind of the observer.

You don't believe amateurs can add value to what mainstream languages
can do. The trouble is that you don't appreciate the things that add
value, so long as there is some unreadable, unsafe way to hack your way around a task.

Amateurs can make good things. But the idea that a single amateur can revolutionise a particular field is almost, but not quite, a complete
myth. It doesn't stop amateurs, and it doesn't stop them /occasionally/
coming up with something great - whether it is a programming language, something in science, maths, or anything else. However, almost
invariably, the successful ones are the ones that build a community
around a good idea and work together.

Nothing of that is /remotely/ worth making a new language and giving
up on everything C - tools, compilers, developer familiarity,
libraries, and all the rest.

With an army of people behind it, such tools can be created for a new language.

Yes. The main task is not pondering some insignificant variation on an existing language (even if that variation is, in some way, "better").
The main task is coming up with something that inspires that army of
people to get behind it.

In the other group, I mentioned how C code is still predominantly
32-bit, even on 64-bit hardware. That is a big one just by itself.

But, what do I care? MY language /is/ fully 64-bit, I can do 1<<60
without remembering to do 1ULL<60, it /has/ a module system, namespaces,
the works, and that gives me a kick when I compare it to C.

I'm not saying that these are not good things (though I might disagree
with you on some of the details). I am saying that it is not worth it.

This is why we still have C, and why it is so popular in practice - it
is not because anyone thinks it is a "perfect" language, it is because
the benefits of the C ecosystem outweigh the small benefits of minor
variations of the language.

This is exactly why there can still be a place for a language like C,
but tidied up and brought up-to-date without all its baggage. People
want a language they feel they know inside-out, and can be made to do
anyway; they want to feel in charge and confident.

Where are all these people that want something almost, but not quite,
exactly like C?

Except that the people creating alternatives, usually try to do much,
and lose many of the attributes of C that make it attractive.

And I think most of what you like could be achieved by using a subset
of C++ along with a few template libraries. (To be fair, that was
certainly not the case when you started your language.)

Templates have problems. Whatever problem they are a solution too, needs
to be done another way if you want a language that is much, much simpler
and faster to build than C++.

But what if you don't care that C++ needs a complex compiler, because
you are not a compiler writer? That applies to 99.999% of programmers.

Ban side-effects in expressions, and you have :

     A[i] := 0
     i = i + 1

It is not hard.

It IS hard. Did you miss the bit where I said that expressions and
statements are interchangeable in an expression-based language? So you
HAVE to allow anything within those square brackets.

You and James are forever seeing problems - you think you are /forced/
into decisions. Take responsibility - you don't /have/ to allow
anything you don't want to allow.

I use 'unit' to refer to any expression-or-statement. So the syntax for
an array index is A[unit] with a single unit, however my example used a sequence of three, not allowed. But that just means I'd have to write it
like this:

    A[(t:=i; i:=i=1; t)] := 0

And of course, a large proportion of increments are in loops. So now
you have (mixing syntaxes from different languages to avoid prejudice) :

     for i in range(10) {
         A[i] = 0
     }

Or :

     for a& in A {
         a = 0
     }

Or :
     A = [0 for a in A]

Or :
     A = [0] * 10

Or :
     A.set(0)

Or :

     A = [0 .. ]

Or :

     A = [0 .. ][range(A)]

There are endless choices here, none of which need an increment
operator, or pointers.

And in FP, you don't have loops, or assignments.

You have recursion, so you don't need loops. And you have assignment -
you just don't have /re-assignment/.

At this rate there
won't be anything left! Why won't we all just code in lambda calculus as
that can apparently represent any program.

Why don't you just make a Turing machine? It is an imperative language,
and it's really quite simple.

There are still plenty of increments outside of loops (actually I don't
use for-loops much in my programs, mainly in smaller contexts), as well
as inside loops when you're incrementing something that doesn't happen
to be the loop index.

This discussion is about whether to have a shorter way of writing:

<expr> := <the exact same expr> + 1

And whether that is:

<expr> +:= 1

or either of:

++<expr>
<expr>++

(This example uses the non-value-returning variety.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Tue Nov 15 19:09:04 2022

On 15/11/2022 17:31, David Brown wrote:

On 15/11/2022 17:58, James Harris wrote:

On 15/11/2022 14:22, David Brown wrote:

On 15/11/2022 12:44, James Harris wrote:

...

Do you also believe that the Unix

bytes = read(fd, &buf[1], reqd);

should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??

As I said before (a couple of times at least), function calls are
another matter and should be considered separately.

Then you wouldn't be able to prevent a programmer coding

a = b + nudge_up(&c) + d;

Why wouldn't I (as a language designer) be able to prevent that?

The question is not whether prevention would be possible but whether you
(i.e. DB) would consider it /advisable/. If you prevented it then a lot
of familiar programming patterns and a number of existing APIs would
become unavailable to you so choose wisely...! :-)

...

You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something, allow it. If you want to prohibit it, prohibit it.

Sorry, but it doesn't work like that. A language cannot be built on
ad-hoc choices such as you have suggested. In this very thread you've
suggested I prohibit certain operator combinations, and that I ban side
effects in expressions but maybe not necessarily those from parameters
in function calls. It's not that simple. If a language designer were to
'pick and mix' like that the resultant language would be a nightmare to
learn and use. There has to be a language 'ethos' - i.e. an overall
approach it takes - and it has to follow consistent principles if it is
going to be a good design rather than a bad one.

...

BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward to use.

So do over-generalisations.

Not really. It's ad-hoc rules which become burdensome. By contrast,
saying any operator can be 'adjacent' to any other as long as the types
are honoured makes learning a language more logical. It may give the
programmer freedoms you personally don't like but they make the language
easier to learn and use.

Seriously, try designing a language, yourself. You don't have to
implement it. Just try coming up with a cohesive design of something you
would like to program in.

One possibility is to distinguish between "functions" that have no
side effects and can therefore be freely mixed, re-arranged,
duplicated, omitted, etc., and "procedures" that have side-effects
and must be called exactly as requested in the code. Such
"procedures" would not be allowed in expressions - only as statements
or part of assignment statements.

Classifying functions by whether they have side effects or not is not
as clear-cut as it may at first appear. Please see the thread I
started today on functional programming.

You'll notice I've replied to it :-)

:-)

...

All that's required, compared with C, is for the apparent evaluation
order to be defined.

I can appreciate that you want to give a meaning to "++E",

No, I don't! You have this all wrong. The reason for considering the
inclusion of the operators we have been discussing in this thread is to
allow a more natural style of expression for algorithms that it suits.
You seem to keep thinking the goal is to attribute meaning to symbols.
That's not so.

that you want
to give a meaning to "E++", and you expect programmers to use one or the other in different contexts. I can appreciate that you want to define
order of evaluation within expressions.

I don't /want/ to define the order of evaluation; I /do/ define the
(apparent) order of evaluation. That's part of my language's ethos. If
I, in addition, permit ++ etc and dereference then their apparent order
/has/ to be defined, and it now has been.

But I have yet to see any indication that "++E++" could ever be a
sensible expression in any real code.

Bart came up with an example something like

+(+(+(+ x)))

That's not at all sensible. You want that banned, too?

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Tue Nov 15 19:22:39 2022

On 15/11/2022 18:05, David Brown wrote:

On 15/11/2022 16:26, Bart wrote:

I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.

Anything that makes a big deal out of closures, continuations,
currying, lambdas and higher order functions. I have little use for
such things otherwise.

So because /you/ don't understand these things or how they are used, you assume that people who /do/ understand them can't write programs in functional programming languages?

No. It's nothing I've ever used, and unlikely ever to use. I like my
functions plain, and static, just like you prefer your expressions to
use only simple operators with no side-effects.

I still can't comprehend why YOU think this stuff is simple and obvious,
yet you are stumped by an increment of a pointer followed by a dereference.

On a list of must-haves for a programming language, not only would they
not be at the top of my list, they wouldn't even be at the bottom!

Haskell is great for elegantly defining certain kinds of types and
algorithms, not so good for reams of boilerplate code or UI stuff
which is what much of programming is.

As I mentioned in another posts, there are opinions, and there are /qualified/ opinions.

My opinion comes from 20 years of writing code to /get things done/ in a working environment. Which includes developing the languages and
choosing the features that that best made that possible. Never once did
I think that 'currying' was going to dramatically transform how I coded;
never did I spend days working around the omissions of closures.

Such features have some very subtle behaviours which I find incredibly
hard to get my head around.

You are a smart guy. You could get your head around it quite easily, if only you were willing.

No, it is hard, obscure, subtle. Take my word for it.

Try pasting the C++ code into godbot.org, and compile with -O2 :

The compiler happily turns "foo" into "return 13".

So what does that mean? C++ clearly has support for this, and some optimisations which can collapse the function calls into a constant
value. That tells us nothing about how hard the task is or how hard it
is understand exactly why the task is more difficult that it at first looks.

I discussed on Reddit how such a thing would look in my dynamic language
if I decided to implement it, which is like this:

fun twice(f) = {x:f(f(x))} # {...} is my deferred code syntax
fun plusthree(x) = x+3 # 'fun' is for one-liners

g := twice(plusthree)
println g(7)

I instead tried with a mock-up, which had two components: the
transformations my bytecode compiler would do, and the support code that
would need to be supplied by the language. The mock-up within the
working language looked like this:

# Transformed user code
fun af$1(x, f, g) = f(f(x))
fun twice(f) = makecls(af$1, f)
fun plusthree(x) = x+3

g := twice(plusthree)
println callcls(g, 7)

# Emulating interpreter support
record cls = (var fn, a, b)

func makecls(f, ?a, ?b)=
cls(f, a, b)
end

func callcls(c, x)=
fn := c.fn
fn(x, c.a, c.b)
end

This produced the correct result. Enough worked also so that `twice`
could be called again with a different argument, while the original `g`
still worked. (A cruder implementation could hardcode things so that,
while it produced '13', it would only work with a one-time argument to `twice`.)

This is where it turned out that there were further refinements needed
to make it work with more challenging examples.

In the end I didn't do the necessary changes as, while intriguing to
work on, mere box-ticking was not a worthwhile use of my time, nor a
worthwhile complication in my product, since I was never going to use it.

It's not academics that use these features - it is practical
programmers. A big use of lambdas, for example, is in callbacks and
event handlers - used all the time in GUI programs and Javascript.

Yes, that was my deferred code feature, itself deferred. (It means I
have to instead define an explicit, named function.)

Academics may invent this kind of thing, and use languages like Haskell
to play with them - but they are implemented in real languages because
real programmers use them for real code. Rust, Go, C++ - these are not academics' languages.

I find idiomatic Rust incomprehensible.

In Python, every function is really a variable initialised
[effectively] to some anonymous function. Which means that with 100%
of the functions, you can do this for any defined function F:

F = 42

Or, more subtly, setting it to any arbitrary functions. That sounds
incredibly unsafe.

Python /is/ unsafe - it's a very dynamic language, with little
compile-time checking. It is checked at run-time.

And my dynamic language is a lot less dynamic, so is safer, but of
course Python is superior.

But no, Python does not have variables at all. It has /names/, that are references to objects. A function is an object, usually (but not necessarily) given a name with a "def" statement. That name can be
rebound to a different object, just like any other name.

So Python has immutable tuples, but mutable functions! Every
identifier is a variable that you can rebind to something else.

Functions are not mutable in Python. You misunderstand.

I've used 'mutable' to mean two things: in-place modification of an
object, and being able to re-bind a name to something else. These are
conflated everywhere, but I reckoned people would get the point.

In Python, a function like this:

def F():
pass

is more or less equivalent to:

def __0001():
pass
F = __0001

Effectively any function is just a variable to which has been assigned
some anonymous function (although in practice, the function retains its
'F' identify even if the user's 'F' variable has been assigned a
different value).

The end result is the same: you can never be sure that 'F' still refers
to that static function.

99.99% of the time you never want such functions to change, so why make
it possible? I can understand that in Python, a bytecode compiler might
not know in advance what F is, but that can be mitigated.

When I once experimented with such a language, any such tentative
functions were initialised at runtime, but once initialised, could not
be changed. So whether an identifier was the name of a function, module,
class or variable, was set at runtime, then fixed.

If you want it to be dynamic, then use a 'variable' (the clue is in the
name).

With my scripting language, you can do that with exactly 0% of the
defined functions. If you want mutable function references, /then/ you
use a variable: G := F.

Function pointers are not function objects.

Is there any practical difference?

You don't believe amateurs can add value to what mainstream languages
can do. The trouble is that you don't appreciate the things that add
value, so long as there is some unreadable, unsafe way to hack your
way around a task.

Amateurs can make good things. But the idea that a single amateur can revolutionise a particular field is almost, but not quite, a complete
myth.

I don't want to revolutionise everything. I just hope someone else
would, but the current state of PL design to me looks dire.

However I can take my ideas and use them myself, and sod everyone else;
it's their loss.

You seem convinced that that incredibly hackish and unprofessional way
of accessing the contents of that executable file is just as good as
doing it properly. Well carry on thinking that if you want.

(I don't know what it is about scripting languages, and the way they
eschew a feature as straightforward as a record with fields defined at compile-time. Either it doesn't exist, or they try and emulate such a
thing badly and inefficiently.)

This is exactly why there can still be a place for a language like C,
but tidied up and brought up-to-date without all its baggage. People
want a language they feel they know inside-out, and can be made to do
anyway; they want to feel in charge and confident.

Where are all these people that want something almost, but not quite,
exactly like C?

There are loads that want to extend C, or import some favourite feature
of C++ into C, or some that want to write Python-like code in C. This is
apart from the ones doing implementing yet another new take on a
functional language.

What I'm talking about however is the popularity of C; why would they
use C, rather then the next one up which is C++?

To me the answer is clear, I guess to you it's less so.

But what if you don't care that C++ needs a complex compiler, because
you are not a compiler writer? That applies to 99.999% of programmers.

I think people care when their project requires a long edit-run cycle.

It IS hard. Did you miss the bit where I said that expressions and
statements are interchangeable in an expression-based language? So you
HAVE to allow anything within those square brackets.

You and James are forever seeing problems - you think you are /forced/
into decisions. Take responsibility - you don't /have/ to allow
anything you don't want to allow.

It's you who don't like it, not me! As I've tried to explain, in an expression-based language, you can have statements inside expressions
inside statements. Everything can have a side-effect.

Even gnu-C has that feature.

At this rate there won't be anything left! Why won't we all just code
in lambda calculus as that can apparently represent any program.

Why don't you just make a Turing machine? It is an imperative language,
and it's really quite simple.

You've missed my point. It's not me reducing everthing down to a handful
of features. Lambda-calculus is where you can easily end up.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Tue Nov 15 22:40:34 2022

On 15/11/2022 20:09, James Harris wrote:

On 15/11/2022 17:31, David Brown wrote:

On 15/11/2022 17:58, James Harris wrote:

On 15/11/2022 14:22, David Brown wrote:

On 15/11/2022 12:44, James Harris wrote:

...

Do you also believe that the Unix

bytes = read(fd, &buf[1], reqd);

should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace
it with??

As I said before (a couple of times at least), function calls are
another matter and should be considered separately.

Then you wouldn't be able to prevent a programmer coding

a = b + nudge_up(&c) + d;

Why wouldn't I (as a language designer) be able to prevent that?

The question is not whether prevention would be possible but whether you (i.e. DB) would consider it /advisable/. If you prevented it then a lot
of familiar programming patterns and a number of existing APIs would
become unavailable to you so choose wisely...! :-)

I am not the language designer here - and I still don't really grok what
kind of language /you/ want, what you understand from before, what uses
it should have, or what you think is wrong with existing languages. (Or
maybe this is all for fun and interest, which is always the best reason
for doing anything.) That makes it hard to give recommendations.

Some familiar programming patterns may not be usable, but that's always
the case. And for some common patterns, I say good riddance! I can't
see why there would be issues with API's, however - though you always
need some kind of FFI wrapper.

...

You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something,
allow it. If you want to prohibit it, prohibit it.

Sorry, but it doesn't work like that.

Yes, it does.

A language cannot be built on
ad-hoc choices such as you have suggested.

I haven't suggested ad-hoc choices. I have tried to make reasoned
suggestions. Being different from languages you have used before, or
how you envision your new language, does not make them ad-hoc.

In this very thread you've
suggested I prohibit certain operator combinations, and that I ban side effects in expressions but maybe not necessarily those from parameters
in function calls. It's not that simple. If a language designer were to
'pick and mix' like that the resultant language would be a nightmare to
learn and use. There has to be a language 'ethos' - i.e. an overall
approach it takes - and it has to follow consistent principles if it is
going to be a good design rather than a bad one.

I agree with your philosophy here. I disagree that my suggestions don't
fit with that - it's just that the "language ethos", as you call it
(it's as good a term as any other) is different from what you imagined.

...

BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward to
use.

So do over-generalisations.

Not really.

Yes, really.

Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters. Let's
treat them the same". Now people can name a function "123", or "2+2".
It's conceivable that you'd work out a grammar and parsing rules that
allow that (Forth, for example, has no problem with functions that are
named by digits. You can redefine "2" to mean "1" if you like). Do you
think that would make the language easier to learn and less awkward to use?

It's ad-hoc rules which become burdensome.

Agreed.

By contrast,
saying any operator can be 'adjacent' to any other as long as the types
are honoured makes learning a language more logical. It may give the programmer freedoms you personally don't like but they make the language easier to learn and use.

I don't see a need for operators like ++ and --, either as prefix or
postfix. I don't see a need for assignment, either simple or complex
(like "+=") as returning a value - neither an lvalue, or an rvalue.

There's nothing arbitrary or ad-hoc about not having these in the
language. Lots of languages have nothing like that.

Seriously, try designing a language, yourself. You don't have to
implement it. Just try coming up with a cohesive design of something you would like to program in.

If I had the time... :-)

I fully appreciate that this is not an easy task.

One possibility is to distinguish between "functions" that have no
side effects and can therefore be freely mixed, re-arranged,
duplicated, omitted, etc., and "procedures" that have side-effects
and must be called exactly as requested in the code. Such
"procedures" would not be allowed in expressions - only as
statements or part of assignment statements.

Classifying functions by whether they have side effects or not is not
as clear-cut as it may at first appear. Please see the thread I
started today on functional programming.

You'll notice I've replied to it :-)

:-)

...

All that's required, compared with C, is for the apparent evaluation
order to be defined.

I can appreciate that you want to give a meaning to "++E",

No, I don't! You have this all wrong. The reason for considering the inclusion of the operators we have been discussing in this thread is to
allow a more natural style of expression for algorithms that it suits.
You seem to keep thinking the goal is to attribute meaning to symbols.
That's not so.

that you want to give a meaning to "E++", and you expect programmers
to use one or the other in different contexts. I can appreciate that
you want to define order of evaluation within expressions.

I don't /want/ to define the order of evaluation; I /do/ define the (apparent) order of evaluation. That's part of my language's ethos. If
I, in addition, permit ++ etc and dereference then their apparent order
/has/ to be defined, and it now has been.

But I have yet to see any indication that "++E++" could ever be a
sensible expression in any real code.

Bart came up with an example something like

+(+(+(+ x)))

That's not at all sensible. You want that banned, too?

Yes :-) Seriously, I appreciate that there will always be compromises -
trying to ban everything silly while allowing everything sensible would
mean countless ad-hoc rules, and you are right to reject that. I am
advocating drawing a line, just like you - the difference is merely a
matter of where to draw that line. I'd draw the line so that it throws
out the increment and decrement operators entirely. But if you really
wanted to keep them, I'd make them postfix only and as statements, not
in expressions - let "x++" mean "x += 1" which means "x = 1" which
should, IMHO, be a statement and not allowed inside an expression.

Of course, it might also be interesting to go in the other direction
entirely and be very flexible with operators - let users define their
own operators, using Unicode symbols, letters, and mixtures, both for
their own types and existing ones. If someone wants to write code that involves a lot of squaring, then let them define operators so they can
write "x = squareof y", or "x = y²". They'd be able to write more of a
mess, but also be able to write some things very nicely.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Tue Nov 15 23:11:08 2022

On 15/11/2022 18:05, David Brown wrote:

On 15/11/2022 16:26, Bart wrote:

import struct # Standard module
bs = open("potato_c.cof").read()

machine, nsections, timestamp, symtaboffset, nsymbols,

optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

That's it. Three lines. I would not think of C for this kind of
thing - Python is /much/ better suited.

I don't believe you. The C equivalent of the big struct below will
already exist; would you really waste time on composing a long string of
H and I characters, hoping you don't make a mistake, and then have to
spend time isolating the individual anonymous tuple elements by index?

There are 45 fields here; 61 if you split 'imagedir' into its two
components. A reminder: the code below is /already/ in my scripting
language.

type optionalheader=struct !exe/dll only
wt_word magic
byte majorlv
byte minorlv
wt_dword codesize
wt_dword idatasize
wt_dword zdatasize
wt_dword entrypoint
wt_dword codebase
word64 imagebase
wt_dword sectionalignment
wt_dword filealignment
wt_word majorosv
wt_word minorosv
wt_word majorimagev
wt_word minorimagev
wt_word majorssv
wt_word minorssv
wt_dword win32version
wt_dword imagesize
wt_dword headerssize
wt_dword checksum
wt_word subsystem
wt_word dllcharacteristics
word64 stackreserve
word64 stackcommit
word64 heapreserve
word64 heapcommit
wt_dword loaderflags
wt_dword rvadims
imagedir exporttable
imagedir importtable
imagedir resourcetable
imagedir exceptiontable
imagedir certtable
imagedir basereloctable
imagedir debug
imagedir architecture
imagedir globalptr
imagedir tlstable
imagedir loadconfigtable
imagedir boundimport
imagedir iat
imagedir delayimportdescr
imagedir clrheader
imagedir reserved
end

I said it will be clunky and require add-on modules and it is and does.

It is not "clunky" by any sane view - certainly not compared to your
code (or code written in C).

Most of my code is formatting the output. You access fields using coffptr.nsections for example.

And no, it does not require add-on modules

- the "struct" module is part of Python.

(BTW you might be missing an argument in that struct.unpack_from call.)

No, I am not. There is an optional third argument, but it is optional.

What about the second argument? I don't understand how the function call
knows to get the data from 'bs'.

Using that approach for the nested structs and unions of my other
example is not so straightforward. You basically have to fight for
every field.

You have to define every field in every language, or define the ones you
want along with offsets to skip uninteresting data.

When properly supported, you can define the fields of a struct just as
you would in any static language (see above example), and you can write handling code just as conveniently.

You don't have to manually write strings of anonymous letter codes and
have to remember their ordering everywhere they are used. That is just
crass.

I went out of my way to add such facilities in my scripting language,
because I felt it was important. So you can code just as you would in a
static language but with the convenience of informal scripting.

Clearly you don't care for such things and prefer a hack.

The result is a tuple of unnamed fields. You really want a proper
record, which is yet another add-on, with a choice of several modules
depending on which set of characterics you need.

You can do that in Python.

Yeah, I know, you can do anything in Python, since there is an army of
people who will create the necessary add-on modules to create ugly and cumbersome bolted-on solutions.

I can list dozens of things that my scripting language does better than
Python. (Here, such a list exists: https://github.com/sal55/langs/blob/master/QLang/QBasics.md.)

In short, you are making up shit in an attempt to make your own language
look better than other languages, because you'd rather say something
silly than admit that any other language could be better in any way for
any task.

Not at all. Python is better for lots of things, mainly because there
are a million libraries that people have written for it, armies of
volunteers who have written suitable, bindings or written all sorts of
shit. And there is huge community and lots of resources to help out.

It is also full of as many advanced, esoteric features that you could
wish for.

But it is short of the more basic and primitive features of the kind I
use and find invaluable.

'struct' is also not a true Python module; it's a front end for an
internal one called `_struct`, likely implemented in C, and almost
certainly using pointers.

Please re-think what you wrote there. I hope you can realise how
ridiculous you are being.

Tell me. Maybe struct.py could be written in pure Python; I don't know.
I'm saying I guarantee mine would have the necessary features to do so.

But this started off being about pointers. Here's another challenge:
this program is for Windows, and displays the first 2 bytes of the
executable image of the interpreter, as loaded in memory:

println peek(0x400000, u16):"m"

fun peek(addr, t=byte) = makeref(addr, t)^

This displays 'MZ' (the signature of PE files on Windows). But of
interest is how Python would implement that peek() function.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Tue Nov 15 23:30:25 2022

On 15/11/2022 21:40, David Brown wrote:

If someone wants to write code that
involves a lot of squaring, then let them define operators so they can
write "x = squareof y", or "x = y²". They'd be able to write more of a mess, but also be able to write some things very nicely.

I have such an operator, called `sqr`. And also briefly allowed the
superscript version (as a postfix op), until Unicode came along and
spoilt it all.

One reason I had sqr was because it was in Pascal (iirc). But it
genuinely comes in useful. Sure, I could also use x**2, but ** used to
be only defined for floats, while `sqr` has been used for much longer.

You could also ask why some languages have a dedicated `sqrt` function
when they could just as easily do x**0.5 or pow(x**0.5).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Wed Nov 16 11:44:21 2022

On 15/11/2022 10:06, Dmitry A. Kazakov wrote:

On 2022-11-15 10:35, James Harris wrote:

As I've already said, Unicode and HTML are fine for output. Where
programmers work with the semantics of characters, however, they need
characters to be in semantic categories, you know: letters, arithmetic
symbols, digits, different cases, etc. So far I've not come across
anything to support that multilingually. AISI what's needed is a way
to expand character encodings to bit fields such as

   <category><base character><variant><diacritics><appearance>

where

   category = group (e.g. alphabetic letters, punctuation, etc)
   base character = main semantic identification (e.g. an 'a')
   variant (e.g. upper or lower case)
   diacritics (those applied to this character in this location)
   appearance (e.g. a round 'a' or a printer's 'a' or unspecified)

Note that that's purely about semantics; it doesn't include typefaces
or character sizes or bold or italic etc which are all for rendering.

I am not sure what are you trying to say.

I am suggesting that a modern language should define a multilingual
model for text /processing/. As part of that, programs need to work with different aspects of characters. Hence the bitfields, above.

The Unicode characterization
is defined in the file:

   https://unicode.org/Public/UNIDATA/UnicodeData.txt

Thanks for the link.

...

There is no problem with Unicode string literals whatsoever. You just
place characters as they are. The only escape is "" for ". That is all.

Two problems with that, AISI:

1. some of the characters look like others

2. discussing unrecognised characters with someone else!

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Wed Nov 16 13:02:43 2022

On 2022-11-16 12:44, James Harris wrote:

On 15/11/2022 10:06, Dmitry A. Kazakov wrote:

There is no problem with Unicode string literals whatsoever. You just
place characters as they are. The only escape is "" for ". That is all.

Two problems with that, AISI:

1. some of the characters look like others

For the reader, not for the compiler. If you want Unicode you get the
whole package, homoglyphs included.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Wed Nov 16 17:02:43 2022

On 16/11/2022 00:30, Bart wrote:

On 15/11/2022 21:40, David Brown wrote:

If someone wants to write code that involves a lot of squaring, then
let them define operators so they can write "x = squareof y", or "x =
y²". They'd be able to write more of a mess, but also be able to
write some things very nicely.

I have such an operator, called `sqr`. And also briefly allowed the superscript version (as a postfix op), until Unicode came along and
spoilt it all.

Why would Unicode spoil it?

One reason I had sqr was because it was in Pascal (iirc). But it
genuinely comes in useful. Sure, I could also use x**2, but ** used to
be only defined for floats, while `sqr` has been used for much longer.

You could also ask why some languages have a dedicated `sqrt` function
when they could just as easily do x**0.5 or pow(x**0.5).

It's more common to see such functions in a language's library than part
of the core language. You'd have "sqrt" as a function because it is far
and away the most common use of raising something to a fractional power,
much more familiar from school mathematics, and much more efficient to implement than a general power function.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Wed Nov 16 17:50:56 2022

On 15/11/2022 20:22, Bart wrote:

On 15/11/2022 18:05, David Brown wrote:

On 15/11/2022 16:26, Bart wrote:

I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.

Anything that makes a big deal out of closures, continuations,
currying, lambdas and higher order functions. I have little use for
such things otherwise.

So because /you/ don't understand these things or how they are used,
you assume that people who /do/ understand them can't write programs
in functional programming languages?

No. It's nothing I've ever used, and unlikely ever to use. I like my functions plain, and static, just like you prefer your expressions to
use only simple operators with no side-effects.

Of course you won't use something when you won't even consider trying to
learn about it.

I still can't comprehend why YOU think this stuff is simple and obvious,
yet you are stumped by an increment of a pointer followed by a dereference.

I haven't written anything to suggest that I am "stumped" by this. My
point was to say it is unnecessary to support such expressions in a
programming language, and a language may be better in some ways if it
does not allow increment operators or even pointers.

Now, it is undeniable /fact/ that programming languages do not need
operators such as increment, or other operators that cause side-effects.
It is undeniable fact that programming languages do not need pointers.
There are countless languages that have neither of these things, and
they have been used successfully for all sorts of purposes. You don't
even have to look at functional programming languages or other "it's all
too complicated for me" languages - neither Python nor Pascal has
increment operators, Python has no pointers. BASIC has been a
phenomenally successful language, especially for people who like simple languages, and has neither.

It is also objective fact that not having these features has advantages
and disadvantages (I've gone through these enough already).

Whether a given language is /better overall/ with or without these
features - that is an entirely different question. It is very
subjective, and depends highly on the kind of language (it's "ethos", as
James puts it), the kind of use it will see, and the kind of people who
will use it. That's why I raise suggestions here, rather than giving recommendations. (On other topics, such as James' - forgive the strong language - insane ideas about character encodings, I have given recommendations.)

On a list of must-haves for a programming language, not only would they
not be at the top of my list, they wouldn't even be at the bottom!

Yes, but for you, a "must-have" list for a programming language would be
mainly "must be roughly like ancient style C in functionality, but with
enough change in syntax and appearance so that no one will think it is
C". If that's what you like, and what pays for your daily bread, then
that's absolutely fine.

And there's no doubt that a large proportion of programmers go through
their career without ever considering higher order functions (functions
that operate on or return functions).

But equally there's no doubt that they /are/ useful for many people in
many types of coding. Sometimes higher order functions are used without
people knowing about them - Python decorators are a fine example.

Actually, Python declarators are such a good example that I recommend
this link <https://realpython.com/primer-on-python-decorators/> that
gives a number of useful examples.

Think of this example. You have some code with functions "foo", "bar"
and "foobar". Mostly you call them as they are in your code.

But sometimes you want to "decorate" the calls for ease of debugging -
you want to print out "Starting foo with parameters..." at the start,
and "Returning from foo with result..." at the end. You don't want to
change "foo" itself, because you only want this tracing sometimes.

So you write :

int debug_foo(int x, double y, char * z) {
printf("Starting foo with parameters %i %f %p\n",
x, y, z);
int r = foo(x, y, z);
printf("Returning from foo with result %i\n", r);
return r;
}

That's fine - now you call "debug_foo" instead of "foo" in the cases you
want. You can even use "#define foo debug_foo" to make it active
throughout the rest of the file.

Then you need to do it again for "bar".

Then you need to do it again for "foobar".

Then you decide to add timings to the traces, and have to re-do all
three debug functions.

Then you need to copy it again for "barfoo", and debug the typo you made
in "debug_foobar".

You start thinking, "I know macros are evil, but maybe they can be used
to automate this somehow?". You get an ugly but workable solution
letting you write:

MAKE_DEBUG_FUNC(debug_foo, foo);
MAKE_DEBUG_FUNC(debug_bar, bar);

Then you realise that it doesn't work for "foobar", because that only
takes two parameters. And now you have two macros "MAKE_DEBUG_FUNC_idp"
and "MAKE_DEBUG_FUNC_id".

Then you start wondering if you can make a macro that makes the macros,
or if you should thrown the computer out the window.

Alternatively, you could make a higher order function in a language that supports it. This is in C++20 (it's slightly neater than early C++
versions can do). I'm not claiming that this is at all clear or obvious
to people with experience only in imperative languages - understanding
how to make something like this takes effort and practice. And the
syntax is C++ style, which will be unfamiliar to people used to
functional programming languages. But I wanted to write it out, as a
working function.

#include <iostream>

auto debug(auto const& f) {
return [&f](auto... args) {
std::cout << "Calling ";
((std::cout << " " << args), ...);
std::cout << "\n";
auto r = f(args...);
std::cout << "Returning " << r << "\n";
return r;
};
}

Suppose your real functions are :

int foo(int x);
int bar(int x, double y);
double foobar(int x, double y, const char * p);

Your original code was:

int a = foo(10);
int b = bar(20, 3.14);
double c = foobar(30, 2.71828, "Hello");

Your debug version is :

int a = debug(foo)(10);
int b = debug(bar)(20, 3.14);
double c = debug(foobar)(30, 2.71828, "Hello");

Note that the "debug" function is applied to "foo", and returns a
function that is then called with "10" as the parameter.

You can also write (at file scope, or inside a function) :

auto debug_foo = debug(foo);

and use :
int a = debug_foo(10);

None of this gives you things you could not do by hand. But if you find yourself doing the same thing by hand many times, then it is natural to
ask if it can be automated - if you can write a function to do that.
You can, if you have higher order functions.

(C++ is not perfect here by any means, and more could be added to the
language. For example, Python lets you make functions that take classes
as parameters or return classes. C++ does not (yet) have that level of metaprogramming.)

Haskell is great for elegantly defining certain kinds of types and
algorithms, not so good for reams of boilerplate code or UI stuff
which is what much of programming is.

As I mentioned in another posts, there are opinions, and there are
/qualified/ opinions.

My opinion comes from 20 years of writing code to /get things done/ in a working environment. Which includes developing the languages and
choosing the features that that best made that possible. Never once did
I think that 'currying' was going to dramatically transform how I coded; never did I spend days working around the omissions of closures.

That's like saying you have 20 years of experience as a taxi driver, and
never once had to use "flaps" or "ailerons", or even think about the
concept. You therefore can't understand why pilots want to use them all
the time. You can give a qualified opinion on driving round roundabouts
and may be an expert on gearing, but you have no basis for a qualified
opinion on flying.

So again - mocking and dismissing concepts that you know nothing about,
makes you look foolish. (Your ignorance of the topic is not the issue -
we are all ignorant of almost everything.)

Such features have some very subtle behaviours which I find
incredibly hard to get my head around.

You are a smart guy. You could get your head around it quite easily,
if only you were willing.

No, it is hard, obscure, subtle. Take my word for it.

No, I will not take your word for it. You know nothing about it.

Try pasting the C++ code into godbot.org, and compile with -O2 :

The compiler happily turns "foo" into "return 13".

So what does that mean? C++ clearly has support for this, and some optimisations which can collapse the function calls into a constant
value. That tells us nothing about how hard the task is or how hard it
is understand exactly why the task is more difficult that it at first
looks.

True. I just wanted to show that an implementation can give efficient
results from high order functions.

If you want a blow-by-blow explanation of the code, I can give it - but
only if you want it.

I discussed on Reddit how such a thing would look in my dynamic language
if I decided to implement it, which is like this:

    fun twice(f) = {x:f(f(x))}       # {...} is my deferred code syntax
    fun plusthree(x) = x+3           # 'fun' is for one-liners

    g := twice(plusthree)
    println g(7)

That looks okay to me (though it would look a /lot/ better - IMHO - if
you used spaces more often. It would also stop newsreaders trying to
turn your code into smilies :-) ).

I instead tried with a mock-up, which had two components: the
transformations my bytecode compiler would do, and the support code that would need to be supplied by the language. The mock-up within the
working language looked like this:

    # Transformed user code
    fun af$1(x, f, g) = f(f(x))
    fun twice(f) = makecls(af$1, f)
    fun plusthree(x) = x+3

    g := twice(plusthree)
    println callcls(g, 7)

    # Emulating interpreter support
    record cls = (var fn, a, b)

    func makecls(f, ?a, ?b)=
        cls(f, a, b)
    end

    func callcls(c, x)=
        fn := c.fn
        fn(x, c.a, c.b)
    end

This produced the correct result. Enough worked also so that `twice`
could be called again with a different argument, while the original `g`
still worked. (A cruder implementation could hardcode things so that,
while it produced '13', it would only work with a one-time argument to `twice`.)

One thing stands out here - it looks like you are trying to make "twice"
into a stand-alone, run-time function. That can be done in interpreted languages, but in compiled languages it is, at best, inefficient. The
normal method would be to consider "twice" as a compile-time
metafunction, so that when the compiler sees "twice(plusthree)" it
generates an anonymous function { x : plusthree(plusthree(x)) }, which
can be compiled normally. "g" is effectively a function pointer
initialised to this anonymous function.

This is where it turned out that there were further refinements needed
to make it work with more challenging examples.

In the end I didn't do the necessary changes as, while intriguing to
work on, mere box-ticking was not a worthwhile use of my time, nor a worthwhile complication in my product, since I was never going to use it.

It's not academics that use these features - it is practical
programmers. A big use of lambdas, for example, is in callbacks and
event handlers - used all the time in GUI programs and Javascript.

Yes, that was my deferred code feature, itself deferred. (It means I
have to instead define an explicit, named function.)

That's the impression I got. I don't know how you handle captures of
local variables (if you do so at all).

In some languages, such as Lua, /all/ functions are anonymous
(lambda's). When you write "function foo(x) ..." in Lua, it is
syntactic sugar for "foo = function (x) ...".

Academics may invent this kind of thing, and use languages like
Haskell to play with them - but they are implemented in real languages
because real programmers use them for real code. Rust, Go, C++ -
these are not academics' languages.

I find idiomatic Rust incomprehensible.

I haven't tried Rust enough to comment. I'd want a minimum of a
dedicated long weekend learning and trying it before I could say if I
thought it was going to work out for me or not.

In Python, every function is really a variable initialised
[effectively] to some anonymous function. Which means that with 100%
of the functions, you can do this for any defined function F:

     F = 42

Or, more subtly, setting it to any arbitrary functions. That sounds
incredibly unsafe.

Python /is/ unsafe - it's a very dynamic language, with little
compile-time checking. It is checked at run-time.

And my dynamic language is a lot less dynamic, so is safer, but of
course Python is superior.

Let's just say that the ecosystem for Python is large enough to call it
a successful language. It's choices of trade-offs between flexibility
and static error checking seem to be fine for many uses. (Python is
"safe" in other ways, of course - but the ability to re-bind names to
different objects gives an easy way to make mistakes that are not found
until run-time testing.)

But no, Python does not have variables at all. It has /names/, that
are references to objects. A function is an object, usually (but not
necessarily) given a name with a "def" statement. That name can be
rebound to a different object, just like any other name.

So Python has immutable tuples, but mutable functions! Every
identifier is a variable that you can rebind to something else.

Functions are not mutable in Python. You misunderstand.

I've used 'mutable' to mean two things: in-place modification of an
object, and being able to re-bind a name to something else. These are conflated everywhere, but I reckoned people would get the point.

They are not conflated by /me/ - they are totally different concepts.
If you choose to refer to yourself as "Bartholomew" rather than "Bart",
it is quite different from turning yourself into an octopus.

But I'll assume you are just mixing up terms, rather than
misunderstanding fundamental concepts of Python. (And again, ignorance
is not a problem - it's possible to do a lot of practical programming in
Python without really understanding that it does not have variables.)

In Python, a function like this:

    def F():
        pass

is more or less equivalent to:

   def __0001():
       pass
   F = __0001

If you like, yes.

Effectively any function is just a variable to which has been assigned
some anonymous function (although in practice, the function retains its
'F' identify even if the user's 'F' variable has been assigned a
different value).

Python does not have variables. It has /identifiers/. Change
"variable" for "identifier" in your description, and "assigned" to
"bound", and you've got it right.

The end result is the same: you can never be sure that 'F' still refers
to that static function.

99.99% of the time you never want such functions to change, so why make
it possible? I can understand that in Python, a bytecode compiler might
not know in advance what F is, but that can be mitigated.

Python is flexible that way - identifiers can be bound and rebound to
almost anything.

As is apparent from my posts, I usually prefer a stricter language.
Python makes it easy to write powerful code in relatively few lines. It
also makes it easy to make mistakes that would be caught by a compiler
of a stricter language. That's the balance it picks.

When I once experimented with such a language, any such tentative
functions were initialised at runtime, but once initialised, could not
be changed. So whether an identifier was the name of a function, module, class or variable, was set at runtime, then fixed.

If you want it to be dynamic, then use a 'variable' (the clue is in the name).

With my scripting language, you can do that with exactly 0% of the
defined functions. If you want mutable function references, /then/
you use a variable: G := F.

Function pointers are not function objects.

Is there any practical difference?

Yes, but there is a good bit of overlap in how you can use them. And if
there is no way in a language to manipulate functions, then function
pointers are all you need.

You don't believe amateurs can add value to what mainstream languages
can do. The trouble is that you don't appreciate the things that add
value, so long as there is some unreadable, unsafe way to hack your
way around a task.

Amateurs can make good things. But the idea that a single amateur can
revolutionise a particular field is almost, but not quite, a complete
myth.

I don't want to revolutionise everything. I just hope someone else
would, but the current state of PL design to me looks dire.

However I can take my ideas and use them myself, and sod everyone else;
it's their loss.

You seem convinced that that incredibly hackish and unprofessional way
of accessing the contents of that executable file is just as good as
doing it properly. Well carry on thinking that if you want.

And you seem convinced that the Python code I showed is "hackish" and "unprofessional". I honestly have no idea why anyone might think that,
even if that person had never heard of programming other than in C or C variants.

(I don't know what it is about scripting languages, and the way they
eschew a feature as straightforward as a record with fields defined at compile-time. Either it doesn't exist, or they try and emulate such a
thing badly and inefficiently.)

The code works fine - it is clear and simple, shorter than in your
language, and easy to modify and maintain.

If you prefer to think of structures matching C struct definitions
(which are /one/ way to describe a file format, but certainly not the
only way), you can use the "ctypes" Python module and define a structure.

This is exactly why there can still be a place for a language like C,
but tidied up and brought up-to-date without all its baggage. People
want a language they feel they know inside-out, and can be made to do
anyway; they want to feel in charge and confident.

Where are all these people that want something almost, but not quite,
exactly like C?

There are loads that want to extend C, or import some favourite feature
of C++ into C, or some that want to write Python-like code in C. This is apart from the ones doing implementing yet another new take on a
functional language.

Relatively speaking - compared to the people that just use C - there are
very few. There are some, but the "extended C" languages rarely have
any relevance or popularity (baring some extensions of major compiler suppliers). The languages that are successful are those that do
something /significantly/ different - enough of a difference to make it
worth leaving behind the tools, the experience, the colleagues, the code
from their old choice of language. It's a high bar, and not one
achieved by saying "I think "int * p;" is backwards - I'll make an
alternative to C that has "ref int p" instead".

What I'm talking about however is the popularity of C; why would they
use C, rather then the next one up which is C++?

To me the answer is clear, I guess to you it's less so.

There are /many/ reasons. Unwillingness to learn something new that
takes a significant effort - your /real/ reason - is certainly one of
them. But it is not the only one.

But what if you don't care that C++ needs a complex compiler, because
you are not a compiler writer? That applies to 99.999% of programmers.

I think people care when their project requires a long edit-run cycle.

Yes. But as long as it is fast enough, they don't care if it is faster.
And they'd rather have good than fast - I'd rather have a 10 minute
compile time that found my bugs then, than a 10 second compile time and
only see the bugs after 10 hours of run-time testing. Of course, I'd
rather have a 10 second compile time. 1 second would be nice, but not
more useful. Any speedup beyond 1 second is irrelevant.

It IS hard. Did you miss the bit where I said that expressions and
statements are interchangeable in an expression-based language? So
you HAVE to allow anything within those square brackets.

You and James are forever seeing problems - you think you are /forced/
into decisions. Take responsibility - you don't /have/ to allow
anything you don't want to allow.

It's you who don't like it, not me! As I've tried to explain, in an expression-based language, you can have statements inside expressions
inside statements. Everything can have a side-effect.

Again - you have the feature because you /choose/ to have it. That's
fine - if that's what you want, great. I've no problem with that. But
I /do/ have a problem when you (or James) say you /have/ to allow this,
or are /forced/ to do that, or can't prohibit something else.

Even gnu-C has that feature.

At this rate there won't be anything left! Why won't we all just code
in lambda calculus as that can apparently represent any program.

Why don't you just make a Turing machine? It is an imperative
language, and it's really quite simple.

You've missed my point. It's not me reducing everthing down to a handful
of features. Lambda-calculus is where you can easily end up.

No one is reducing everything down to a handful of features. I am
suggesting - not recommending - that it is possible to remove the
feature of allowing side-effects in expressions, and gaining the
features of clearer code that is easier to optimise and easier to
confirm correctness.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Wed Nov 16 20:04:01 2022

On 16/11/2022 16:50, David Brown wrote:

On 15/11/2022 20:22, Bart wrote:

Of course you won't use something when you won't even consider trying to learn about it.

I've thought about learning Chinese. Then I decided there was no point.

I still can't comprehend why YOU think this stuff is simple and
obvious, yet you are stumped by an increment of a pointer followed by
a dereference.

I haven't written anything to suggest that I am "stumped" by this. My
point was to say it is unnecessary to support such expressions in a programming language, and a language may be better in some ways if it
does not allow increment operators or even pointers.

It is something I value, but you don't. And higher order functions are something you value, but I don't.

That's all it is.

Now, it is undeniable /fact/ that programming languages do not need
operators such as increment, or other operators that cause side-effects.

Just `a := b` causes a side effect. Possibly quite a big one if 'b' is a substantial data structure and ':=' does a deep copy.

There is usually a task to be done. In `A[++i] := 0`, I want two things
to change, which is going to happen whether write it like that, or as `i
:= i+1; A[i] := 0`. So why write `i` 3 times?

It is not a big deal. Maybe in functional programming it might be, but
here *I* am specifying the paradigm and I say it's OK.

I'm not asking you or anyone else to use my language.

Yes, but for you, a "must-have" list for a programming language would be mainly "must be roughly like ancient style C in functionality, but with enough change in syntax and appearance so that no one will think it is
C". If that's what you like, and what pays for your daily bread, then that's absolutely fine.

Yes, I don't need a higher level language for what I use it for. But
there are still dozens of things which make the experience superior to
just using C. Ones either you genuinely don't appreciate, or are just
pissing on for the sake of it.

* Case-insensitive
* 1-based and N-based
* Algol-style syntax, line-oriented and largely semicolon-free, and sane
type syntax
* Module scheme (define everything in exactly one place)
* Namespaces
* Encapsulation (define functions inside records etc)
* Out-of-order definitions including easy mutual record cross-references
* Regular i8-i64 and u8-u64 type denotations, include 'byte' (u8)
* Default 64-bit 'int', 'word' types, and 64-bit integer constants
* Built-in print and read statements WITH NO STUPID FORMAT CODES
* Keyword and default function parameters
* Fewer, more intuitive operator precedences
* Does not conflate arrays and pointers
* 'Proper' for loops; for-in loops
* Separate 'switch' and 'case' selection; the latter has no restrictions
(and no stupid fallthrough on switch)
* Proper named constants
* Break out of nested loops
* Embed strings and binary files
* 'Tabledata' and 'enumdata' features (compare with X-macros)
* Function reflection
* Built-in, overloaded ops like abs, min, max
* 'Properties' such as .len and .lwb
* Built-in 'swap'
* Bit/field extracion/insertion syntax
* Multiple function return values
* Multiple assignment
* Slices (including slices of char arrays to give counted strings)
* Doc strings
* Whole-program compiler that does not need a separate build system
* Pass-by-reference
* Value arrays

Yeah, just like C! If you think this lot is just C with a paint-job,
then you're in denial.

Of course, I fully expect you to be completely dismissive of all of
this. I wouldn't swap any of these for higher-order functions.

And there's no doubt that a large proportion of programmers go through
their career without ever considering higher order functions (functions
that operate on or return functions).

Too right. To be able to use such things, they MUST be 100% intuitive
and be usable with 100% confidence. But that's just the author; you need
to consider other readers of your code too, and those who have to
maintain it.

To me they are a very long way from being 100% intuitive. So what do you
think I should do: strive to be a 10th-rate programmer in a functional
language I've no clue about; give up programming and tend to my garden;
or carry on coding in a style that *I* understand 100% (and most others
will too)?

The stuff I do simply doesn't require a sophisticated language with
advanced types and curried functions invented on-the-fly. Here is an
actual example from an old app, a small function to keep it short:

proc displaypoletotal =
if not poleenabled then return fi
print @poledev, chr(31), chr(14) ! clear display
print @poledev, "Total:", rightstr(strcash(total, paymentunit), 14)
end

(This is part of a POS and displays running totals, on an LED display
mounted on a pole, driven from a serial port. It ran in a duty-free area
and worked with multiple currencies.)

What can higher-order-functions do for me here? Absolutely sod-all.

But equally there's no doubt that they /are/ useful for many people in
many types of coding. Sometimes higher order functions are used without people knowing about them - Python decorators are a fine example.

Actually, Python declarators are such a good example that I recommend

Decorators?

this link <https://realpython.com/primer-on-python-decorators/> that
gives a number of useful examples.

Decorators are a /very/ good example of a Python feature that I could
never get my head around. 5 minutes later, I'd have to look them up again.

Think of this example. You have some code with functions "foo", "bar"
and "foobar". Mostly you call them as they are in your code.

auto debug(auto const& f) {
    return [&f](auto... args) {
        std::cout << "Calling ";
        ((std::cout << " " << args), ...);
        std::cout << "\n";
        auto r = f(args...);
        std::cout << "Returning " << r << "\n";
        return r;
    };
}

Suppose your real functions are :

    int foo(int x);
    int bar(int x, double y);
    double foobar(int x, double y, const char * p);

Your original code was:

    int a = foo(10);
    int b = bar(20, 3.14);
    double c = foobar(30, 2.71828, "Hello");

None of this gives you things you could not do by hand. But if you find yourself doing the same thing by hand many times, then it is natural to
ask if it can be automated - if you can write a function to do that. You
can, if you have higher order functions.

I can't follow the C++ debug function at all. But I notice the user code changes from 'foo()' to 'debug()()'; I thought this could be done while
leaving the foo() call unchanged.

But no, my language doesn't deal with parameter lists as a first class
entity at all. (At best it can access them as a list object, but it
doesn't help here.)

The best I can do here is to have a dedicated function for each number
of arguments, and to use dynamic code to allow the same function for any
types:

func debug3(f, a,b,c)=
println "Calling",f,"with",a,b,c
f(a,b,c)
end

func foobar(a,b,c)=
println "FooBar",a,b,c
return a+b+c
end

x:=debug3(foobar, 5,6,7) # in place of foobar(5, 6, 7)

println x

This displays:

Calling <procid:"foobar"> with 5 6 7
FooBar 5 6 7
18

However this loses to ability to use any keyword or default arguments
for FooBar, since they are only available for direct calls (it's done at compile-time).

So I can see that that C++ debug does some very hairy stuff, to make it
work with static types and for any function, but I just can't understand it.

However, given the requirement you outlined, I could probably come up
with a custom feature to do just that. Although it might be in the form
of a compiler option which injects the debug code at the start of the
relevant functions. Then the user code does not need updating.

See, when you have control of the language and implementation, there are
more and better possibilities.

That's like saying you have 20 years of experience as a taxi driver, and never once had to use "flaps" or "ailerons", or even think about the concept. You therefore can't understand why pilots want to use them all
the time. You can give a qualified opinion on driving round roundabouts
and may be an expert on gearing, but you have no basis for a qualified opinion on flying.

I don't want to fly. (I was once in a small aircraft flying at 7000 ft.
But I've also ridden a bike at 8000 ft, although over a mountain in that
case. So who needs to fly?!)

So again - mocking and dismissing concepts that you know nothing about,
makes you look foolish. (Your ignorance of the topic is not the issue -
we are all ignorant of almost everything.)

Have I ever called you ignorant? I don't care about these concepts; they
are not for me. But I appreciate lots of things you don't care for.

Look at this code; it is a silly task, but concentrate on the bit that
does the input:

real a,b,c

print "Three numbers: "
readln a, b, c

println "Their sum is:", a+b+c

The spec is that the three numbers are read /from the same line/, and
can be separated with commas or spaces.

Try to do that `readln` part in Python, and just as simply. Even in C
it's an ordeal.

(My code actually works on either of my languages, static or dynamic.
That's a bonus feature. Imagine a solution in Python or C that works
with both languages.)

No, it is hard, obscure, subtle. Take my word for it.

No, I will not take your word for it. You know nothing about it.

I implemented it, remember? Even if it was a mock-up to see if a
proposed built-in approach would work.

Yes, that was my deferred code feature, itself deferred. (It means I
have to instead define an explicit, named function.)

That's the impression I got. I don't know how you handle captures of
local variables (if you do so at all).

When I had local functions for a while, they could access static
variables, user types, named constants, macros, enums and other local
functions within a containing function. Plus of course anything defined globally. But not parameters and stack-frame variables of the enclosing functions.

Quite a lot could actually be done that way. So it could with my
deferred code objects.

Effectively any function is just a variable to which has been assigned
some anonymous function (although in practice, the function retains
its 'F' identify even if the user's 'F' variable has been assigned a
different value).

Python does not have variables. It has /identifiers/. Change
"variable" for "identifier" in your description, and "assigned" to
"bound", and you've got it right.

Just call them variables that work in a particular way: they are
references to objects, but can never be references to other variables.

When you assign a value, you are copying a reference.

And you seem convinced that the Python code I showed is "hackish" and "unprofessional".

Defining a struct's layout as "IIHHIII" or whatever? Yeah, that's really professional!

The code works fine - it is clear and simple, shorter than in your
language, and easy to modify and maintain.

Really? The struct changes: two fields are swapped. You have to count
along counting which one those characters needed to be exchanged. And
that multiple assignment needs to be revised too. It's a bit hit and miss.

If you prefer to think of structures matching C struct definitions
(which are /one/ way to describe a file format, but certainly not the
only way), you can use the "ctypes" Python module and define a structure.

So why didn't you do that in the first place? I assume that can define
pointers too? (Since structs can contain pointers and you might need to
access what they point to.)

But I guess that this was about you proving that pointers were
unnecessary...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Wed Nov 16 21:45:18 2022

On 16/11/2022 12:02, Dmitry A. Kazakov wrote:

On 2022-11-16 12:44, James Harris wrote:

On 15/11/2022 10:06, Dmitry A. Kazakov wrote:

There is no problem with Unicode string literals whatsoever. You just
place characters as they are. The only escape is "" for ". That is all.

Two problems with that, AISI:

1. some of the characters look like others

For the reader, not for the compiler. If you want Unicode you get the
whole package, homoglyphs included.

Yes, Unicode has all kinds of problems for humans. Something else is
needed for programming but I don't think it's been created yet.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Wed Nov 16 22:01:58 2022

On 16/11/2022 16:02, David Brown wrote:

On 16/11/2022 00:30, Bart wrote:

On 15/11/2022 21:40, David Brown wrote:

If someone wants to write code that involves a lot of squaring, then
let them define operators so they can write "x = squareof y", or "x =
y²". They'd be able to write more of a mess, but also be able to
write some things very nicely.

I have such an operator, called `sqr`. And also briefly allowed the
superscript version (as a postfix op), until Unicode came along and
spoilt it all.

Why would Unicode spoil it?

I was using 8-bit code pages for western European alphabets since
probably from the end of the 80s. It was simple, I supported it and it
worked well. (At that time, I was also responsible for vector fonts
within my apps.)

But Unicode makes everything harder, with characters taking up multiple
bytes, and a lot of the time it just doesn't work. (I've seen Unicode
errors on everything from TV subtitles to supermarket receipts, and that
was a few weeks ago.)

Then you have the choices of UCS2, UCS4, UTF8, with patchy support for
UTF8 within Windows. Even if I get it working on my machine, how do I
know that someone else running my program will have their machine set up properly?

For me it's just not worth it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Wed Nov 16 23:02:01 2022

On 14/11/2022 18:41, Dmitry A. Kazakov wrote:

On 2022-11-14 19:26, James Harris wrote:

On 14/11/2022 11:29, Dmitry A. Kazakov wrote:

On 2022-11-14 12:03, James Harris wrote:

...

   if is_name_first(b[j])
     a[i++] = b[j++]
     rep while is_name_follow(b[j])
       a[i++] = b[j++]
     end rep
     a[i] = 0
     return TOK_NAME
   end if

Now, what don't you like about the ++ operators in that? How would
you prefer to write it?

From parser production code:

procedure Get_Identifier
           ( Code     : in out Source'Class;
              Line     : String;
              Pointer : Integer;
              Argument : out Tokens.Argument_Token
           ) is
    Index     : Integer := Pointer + 1;
    Malformed : Boolean := False;
    Underline : Boolean := False;
    Symbol    : Character;
begin
    while Index <= Line'Last loop
       Symbol := Line (Index);
       if Is_Alphanumeric (Symbol) then
          Underline := False;
       elsif '_' = Symbol then
          Malformed := Malformed or Underline;
          Underline := True;
       else
          exit;
       end if;
       Index := Index + 1;
    end loop;
    Malformed := Malformed or Underline;
    Set_Pointer (Code, Index);
    Argument.Location := Link (Code);
    Argument.Value := new Identifier (Index - Pointer);
    declare
       This : Identifier renames Identifier (Argument.Value.all);
    begin
       This.Location := Argument.Location;
       This.Malformed := Malformed;
       This.Value     := Line (Pointer..Index - 1);
    end;
end Get_Identifier;

Well, that's an astonishingly long piece of code, Dmitry,

Because it is a production code.

So was the code which preceded it.

It must deal with different types of
sources, with error handling and syntax tree generation.

In fairness, detecting double and trailing underscores adds work to the
Ada code so I've been thinking how I might write the Ada version. The
following is my attempt. It is untested and in a somewhat experimental
form but I think it's quite readable. The main part of the code would be

errors = 0
last_char = line(pointer)
rep for i = pointer + 1, while i le line_last, ++i
ch = line(i)
if ch eq '_'
if last_char eq '_' so ++errors ;Consecutive underscores
on not is_alphanum(ch)
break rep ;If neither underscore nor alphanum we are done
end if
last_char = ch
end rep
if last_char eq '_' so ++errors ;Trailing underscore
....
this.Malformed = bool(errors)

Some notes on the code. I found the Ada program's

Malformed := Malformed or Underline;

to be clever but it took a bit of thinking about to work out what it was intended to do in the context. So I changed it to an error count which
is incremented with

++errors

and used bool(errors) at the end.

Rather than having a boolean called Underline I just kept a copy of the
last character.

Since I didn't need the Underline boolean I found I could also get rid
of a branch of the if statement so the code is a little shorter. But
more important, I think the code is clearer. YMMV but I bet you can
understand it!

In the context of this discussion, the code uses ++i to increment the
index and ++errors to increment the error count. (There's no danger of
either of them overflowing, given the line length.) Neither is embedded
in an expression.

On the clarity of the "++" operator note that any of these would do the
same:

i = i + 1
i += 1
++i

I assert that the last one is the most readable. It makes the
programmer's intent clear at a glance.

...

But I am not sure I do understand it. Even allowing for what I believe
is meant to be double underscore detection (except at the start and
end?) it takes significantly more study than the simple name-first,
name-follow code which preceded it.

That's how the language defines it. This example is from an Ada 95
parser. Ada 95 RM 2.3:

   https://www.adahome.com/rm95/rm9x-02-03.html

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Thu Nov 17 11:34:30 2022

On 16/11/2022 23:01, Bart wrote:

On 16/11/2022 16:02, David Brown wrote:

On 16/11/2022 00:30, Bart wrote:

On 15/11/2022 21:40, David Brown wrote:

If someone wants to write code that involves a lot of squaring, then
let them define operators so they can write "x = squareof y", or "x
= y²". They'd be able to write more of a mess, but also be able to
write some things very nicely.

I have such an operator, called `sqr`. And also briefly allowed the
superscript version (as a postfix op), until Unicode came along and
spoilt it all.

Why would Unicode spoil it?

I was using 8-bit code pages for western European alphabets since
probably from the end of the 80s. It was simple, I supported it and it
worked well. (At that time, I was also responsible for vector fonts
within my apps.)

Such code pages did work, but were very limited. In the UK, code pages typically meant nothing worse than mixups between # and £. Go beyond
the English speaking world, and code pages were a nightmare. If one non-English Western European language was enough, they were often not
/too/ bad - but supporting multiple languages was often hugely
complicated and fraught with errors.

Unicode made some things more complex, but other things far easier - it
is not a surprise to me that it has supplanted pretty much every usage
where plain old 7-bit ASCII is insufficient. I understand how Unicode
can be difficult, but it is solving a difficult problem.

But back to your superscript square operator - does that mean you used
an extended ASCII code in a specific code page for superscript 2 (I
think it is 0xfb in Latin-9), but when Unicode came out you stopped
using anything beyond 7-bit ASCII?

But Unicode makes everything harder, with characters taking up multiple bytes, and a lot of the time it just doesn't work. (I've seen Unicode
errors on everything from TV subtitles to supermarket receipts, and that
was a few weeks ago.)

That's not a Unicode problem - that's a software bug.

Then you have the choices of UCS2, UCS4, UTF8, with patchy support for
UTF8 within Windows. Even if I get it working on my machine, how do I
know that someone else running my program will have their machine set up properly?

For me it's just not worth it.

In the early days of Unicode, there were different encodings. For the
last couple of decades it's been clear that there is /one/ sensible
encoding - UTF-8. Everything else exists only as long as it is needed
for backwards compatibility and until software, OS's and API's are
changed - you only need them on the boundaries of code (file I/O,
calling external API's). And yes, I know that is an extra hassle.

I do understand that it is /much/ easier to stick to 7-bit ASCII if that
is all you need. But if 7-bit ASCII is not sufficient, the UTF-8 is
much easier than anything else.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Bart on Thu Nov 17 11:40:29 2022

On 17/11/2022 11:24, Bart wrote:

Hmm, I just compiled it with both bcc and tcc, and they both correctly
show €°£ when using code page 65001. So that's something, but what's up with gcc?

BTW ° (degree symbol in case it doesn't display properly) was something
I did use more extensively both in my scripting language and my apps' CLI.

So sin(30°) would evaluate to 0.5, instead of having to do sin(pi/6
(typical of ordinary languages) or sin(30 deg) which was the fall-back
version.

Both ° and 'deg' applied a scaling factor to the number.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Bart on Thu Nov 17 13:12:06 2022

On 2022-11-17 12:24, Bart wrote:

If I wanted to display UTF8 right now on Windows, say from a C program
even, I would have to fight it. If I write this (created with Notepad):

#include <stdio.h>
int main(void) {
printf("€°£");
}

If you want to display UTF-8, you must obviously use UTF-8, no?

#include <stdio.h>
int main(void) {
printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
}

In CMD:

CHCP 65001

Active code page: 65001

main.exe

€°£

Of course, you could use the code you wrote under the condition that
both the editor and the compiler use UTF-8.

Which is why every programming guideline must require ASCII-7 source
like I provided.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Thu Nov 17 11:24:44 2022

On 17/11/2022 10:34, David Brown wrote:

On 16/11/2022 23:01, Bart wrote:

On 16/11/2022 16:02, David Brown wrote:

On 16/11/2022 00:30, Bart wrote:

On 15/11/2022 21:40, David Brown wrote:

If someone wants to write code that involves a lot of squaring,
then let them define operators so they can write "x = squareof y",
or "x = y²". They'd be able to write more of a mess, but also be
able to write some things very nicely.

I have such an operator, called `sqr`. And also briefly allowed the
superscript version (as a postfix op), until Unicode came along and
spoilt it all.

Why would Unicode spoil it?

I was using 8-bit code pages for western European alphabets since
probably from the end of the 80s. It was simple, I supported it and it
worked well. (At that time, I was also responsible for vector fonts
within my apps.)

Such code pages did work, but were very limited. In the UK, code pages typically meant nothing worse than mixups between # and £. Go beyond
the English speaking world, and code pages were a nightmare. If one non-English Western European language was enough, they were often not
/too/ bad - but supporting multiple languages was often hugely
complicated and fraught with errors.

Unicode made some things more complex, but other things far easier - it
is not a surprise to me that it has supplanted pretty much every usage
where plain old 7-bit ASCII is insufficient. I understand how Unicode
can be difficult, but it is solving a difficult problem.

But back to your superscript square operator - does that mean you used
an extended ASCII code in a specific code page for superscript 2 (I
think it is 0xfb in Latin-9), but when Unicode came out you stopped
using anything beyond 7-bit ASCII?

I used one that I called 'ANSI' but is Windows-1252 (https://en.wikipedia.org/wiki/Windows-1252). There, superscript 2 was
code 0xB2.

Before that I used an older one associated with MSDOS, I think code page
850, where superscript 2 was code 0xFD.

(IMO the best set of extended character codes was used by the Amstrad
PCW from the 1980s; a Z80-based word processing machine. Very elegantly
set out.)

I provided support for French and German (mainly Swiss variations) and
Dutch. This included providing special keyboard layouts used on
digitising tablets, and supplying the vector fonts necessary for
pen-plotters. (These were largely nicked from AutoCAD but I added
support for accents, 'hats', cedillas etc, plus some special symbols.)

But Unicode makes everything harder, with characters taking up
multiple bytes, and a lot of the time it just doesn't work. (I've seen
Unicode errors on everything from TV subtitles to supermarket
receipts, and that was a few weeks ago.)

That's not a Unicode problem - that's a software bug.

It means even the big boys have issues with it.

Then you have the choices of UCS2, UCS4, UTF8, with patchy support for
UTF8 within Windows. Even if I get it working on my machine, how do I
know that someone else running my program will have their machine set
up properly?

For me it's just not worth it.

In the early days of Unicode, there were different encodings. For the
last couple of decades it's been clear that there is /one/ sensible
encoding - UTF-8.

If I wanted to display UTF8 right now on Windows, say from a C program
even, I would have to fight it. If I write this (created with Notepad):

#include <stdio.h>
int main(void) {
printf("€°£");
}

and compile with gcc, it shows:

Γé¼┬░┬ú

I'm not sure what code page it's on, but if I switch to 65001 which is
supposed to be UTF8, then it shows:

��

(or equivalent in the terminal font). If I dump the C source, it does
indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC code
for the Euro sign.

I'm sure that on Linux it works perfectly within a terminal window. But
I'm on Windows and I can't be bothered to do battle. Even if /I/ get it
to work, I can't guarantee it for anyone else.

Hmm, I just compiled it with both bcc and tcc, and they both correctly
show €°£ when using code page 65001. So that's something, but what's up with gcc?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Dmitry A. Kazakov on Thu Nov 17 12:35:17 2022

On 17/11/2022 12:12, Dmitry A. Kazakov wrote:

On 2022-11-17 12:24, Bart wrote:

If I wanted to display UTF8 right now on Windows, say from a C program
even, I would have to fight it. If I write this (created with Notepad):

   #include <stdio.h>
   int main(void) {
       printf("€°£");
   }

If you want to display UTF-8, you must obviously use UTF-8, no?

    #include <stdio.h>
    int main(void) {
        printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
    }

This wasn't the problem. I verified that the text file contained the
correct UTF8 sequences, and the two other compilers worked. This was a
problem with gcc, which also fails your version.

In CMD:

CHCP 65001

Active code page: 65001

main.exe

€°£

Of course, you could use the code you wrote under the condition that
both the editor and the compiler use UTF-8.

The point about UTF8 is that it doesn't matter. So the string contains 'character' E2; in C, this is just a byte array, it should just pass it
as it is to the printf function.

Which is why every programming guideline must require ASCII-7 source
like I provided.

That would work, but is also completely impractical for large amounts of non-ASCII content. Or even small amounts. You /need/ editor support. I
don't have it and don't do enough with Unicode to make it worth the trouble.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Bart on Thu Nov 17 14:20:21 2022

On 2022-11-17 13:35, Bart wrote:

On 17/11/2022 12:12, Dmitry A. Kazakov wrote:

On 2022-11-17 12:24, Bart wrote:

If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created with
Notepad):

   #include <stdio.h>
   int main(void) {
       printf("€°£");
   }

If you want to display UTF-8, you must obviously use UTF-8, no?

     #include <stdio.h>
     int main(void) {
         printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
     }

This wasn't the problem. I verified that the text file contained the
correct UTF8 sequences, and the two other compilers worked. This was a problem with gcc, which also fails your version.

The above was compiled with gcc version 10.3.1 20210520.

In CMD:

;CHCP 65001

Active code page: 65001

;main.exe

€°£

Of course, you could use the code you wrote under the condition that
both the editor and the compiler use UTF-8.

The point about UTF8 is that it doesn't matter. So the string contains 'character' E2; in C, this is just a byte array, it should just pass it
as it is to the printf function.

It does, but the terminal driver interprets octets you output there. You
can verify the actual output by redirecting the standard output.

That would work, but is also completely impractical for large amounts of non-ASCII content. Or even small amounts. You /need/ editor support. I
don't have it and don't do enough with Unicode to make it worth the
trouble.

That's is another guideline topic: you never ever place localization
stuff in the source code.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to Dmitry A. Kazakov on Thu Nov 17 15:06:46 2022

On 17/11/2022 13:20, Dmitry A. Kazakov wrote:

On 2022-11-17 13:35, Bart wrote:

On 17/11/2022 12:12, Dmitry A. Kazakov wrote:

On 2022-11-17 12:24, Bart wrote:

If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):

   #include <stdio.h>
   int main(void) {
       printf("€°£");
   }

If you want to display UTF-8, you must obviously use UTF-8, no?

     #include <stdio.h>
     int main(void) {
         printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
     }

This wasn't the problem. I verified that the text file contained the
correct UTF8 sequences, and the two other compilers worked. This was a
problem with gcc, which also fails your version.

The above was compiled with gcc version 10.3.1 20210520.

And run on Windows?

Further tests show that it works in every case, including using gcc with
puts, and gcc+printf under WSL. It only fails with gcc + printf + Windows.

Odd. But then my point is you can't rely on it. You still need the UTF8
code page set.

That's not all, because console display is different from graphical display.

#include <stdio.h>
#include <windows.h>
int main(void) {
MessageBox(0,"\xE2\x82\xAC\xC2\xB0\xC2\xA3",
"\xE2\x82\xAC\xC2\xB0\xC2\xA3",0);
}

This displays only gobbledygook. Of course, this is set to use
MessageBoxA, which expects an ASCII string; but why won't it take UTF8
and show something sensible?

Presumably it needs the correct code page set for the WinAPI, not the
one I set for the console. But I can't find a way to do it; MS docs
suggest setting this in a resource file or XML manifest file; WTF?

The alternative is to use MessageBoxW, but that means switching
everything to UCS2; it is a massive upheaval.

Any wonder that I'm just not interested? Here I have to say that Linux
seems to get it right.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to Bart on Thu Nov 17 16:20:10 2022

On 2022-11-17 16:06, Bart wrote:

On 17/11/2022 13:20, Dmitry A. Kazakov wrote:

On 2022-11-17 13:35, Bart wrote:

On 17/11/2022 12:12, Dmitry A. Kazakov wrote:

On 2022-11-17 12:24, Bart wrote:

If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):

   #include <stdio.h>
   int main(void) {
       printf("€°£");
   }

If you want to display UTF-8, you must obviously use UTF-8, no?

     #include <stdio.h>
     int main(void) {
         printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
     }

This wasn't the problem. I verified that the text file contained the
correct UTF8 sequences, and the two other compilers worked. This was
a problem with gcc, which also fails your version.

The above was compiled with gcc version 10.3.1 20210520.

And run on Windows?

Of course.

Further tests show that it works in every case, including using gcc with puts, and gcc+printf under WSL. It only fails with gcc + printf + Windows.

Odd. But then my point is you can't rely on it. You still need the UTF8
code page set.

That's not all, because console display is different from graphical
display.

   #include <stdio.h>
   #include <windows.h>
   int main(void) {
       MessageBox(0,"\xE2\x82\xAC\xC2\xB0\xC2\xA3",
                    "\xE2\x82\xAC\xC2\xB0\xC2\xA3",0);
   }

This displays only gobbledygook. Of course, this is set to use
MessageBoxA, which expects an ASCII string; but why won't it take UTF8
and show something sensible?

Because Windows GDI is ASCII (MessageBoxA) or else UTF-16 (MessageBoxW).

Presumably it needs the correct code page set for the WinAPI, not the
one I set for the console. But I can't find a way to do it; MS docs
suggest setting this in a resource file or XML manifest file; WTF?

There is no any pages in Windows GDI.

Any wonder that I'm just not interested? Here I have to say that Linux
seems to get it right.

Linux used code pages as well. It adopted Unicode very late and took
UTF-8 straight away. On its part Windows started early and took UCS-2 by splitting all calls into xxxA and xxxW. Later on, when Unicode grew
larger Microsoft silently replaced UCS-2 with UTF-16. All xxxW calls are
UTF-16 now.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Fri Nov 18 08:12:43 2022

On 16/11/2022 21:04, Bart wrote:

On 16/11/2022 16:50, David Brown wrote:

On 15/11/2022 20:22, Bart wrote:

Of course you won't use something when you won't even consider trying
to learn about it.

I've thought about learning Chinese. Then I decided there was no point.

That's fine. But you wouldn't enter a discussion with a linguist who
has some experience with Chinese, and try to tell them that Chinese
grammar is beyond human comprehension. You could say that /you/ think
it looks like Chinese writing would be hard to learn - but you could
/not/ say anything about how hard it is for Chinese speakers to learn.
You could not even say that it really would be difficult for you to
learn, because you haven't tried or investigated enough.

I still can't comprehend why YOU think this stuff is simple and
obvious, yet you are stumped by an increment of a pointer followed by
a dereference.

I haven't written anything to suggest that I am "stumped" by this. My
point was to say it is unnecessary to support such expressions in a
programming language, and a language may be better in some ways if it
does not allow increment operators or even pointers.

It is something I value, but you don't.

Again, I have written nothing to indicate that. You read so much
between the lines that you miss the words I actually write. When I
write "You could consider doing A in your /new/ language - it would give
these advantages...", that is /exactly/ what I mean. It does not mean
"I don't like B", or "I am stumped by C", or "You should not do D".

And higher order functions are something you value, but I don't.

That bit is true.

That's all it is.

I can fully respect your personal preferences - that's not the issue for
me. I find it sad and disappointing that someone can have such strong
opinions about something they have never really considered or tried to
learn about, and consequently don't understand, but I guess that is
human nature. It's ancient survival tactics, honed by evolution - when something is new, different and unknown, human instinct is to fear it
and run away, rather than investigate and study it.

The bit that really bugs me is how you (and James) can hold such strong opinions about how /other people/ might like and use these features and languages that support them. Is it so hard to accept that some people
like using higher order functions? Or that some people write code in functional programming languages, because they find it a better choice
for their needs? Is it so hard to accept that other people can write
code for the same task in widely different languages, and /your/ code in
/your/ language is not the "perfect" solution or the only "non-clunky" code?

Now, it is undeniable /fact/ that programming languages do not need
operators such as increment, or other operators that cause side-effects.

Just `a := b` causes a side effect. Possibly quite a big one if 'b' is a substantial data structure and ':=' does a deep copy.

It is not an expression in many languages, but a statement. Indeed, the
symbol ":=" is not an "operator" in many languages - it's just part of
the syntax of an assignment statement.

There is usually a task to be done. In `A[++i] := 0`, I want two things
to change, which is going to happen whether write it like that, or as `i
:= i+1; A[i] := 0`. So why write `i` 3 times?

If I want two things to change, why try to squeeze it into /one/
expression or statement? Why not write two statements, each one doing a
single clear and simple task?

(As to writing "i" three times - again, these things are often found in
loops, where a good syntax can mean "i" is never written at all.)

It is not a big deal. Maybe in functional programming it might be, but
here *I* am specifying the paradigm and I say it's OK.

I'm not asking you or anyone else to use my language.

I am not asking you to use functional programming either - I am merely
asking you to appreciate that /others/ do so, and they will find your
language as clunky, repetitive, ugly and inexpressive in comparison.

Yes, but for you, a "must-have" list for a programming language would
be mainly "must be roughly like ancient style C in functionality, but
with enough change in syntax and appearance so that no one will think
it is C". If that's what you like, and what pays for your daily
bread, then that's absolutely fine.

Yes, I don't need a higher level language for what I use it for. But
there are still dozens of things which make the experience superior to
just using C. Ones either you genuinely don't appreciate, or are just
pissing on for the sake of it.

Again, I have written nothing to indicate that. You read so much
between the lines that you miss the words I actually write. When I
write "You could consider doing A in your /new/ language - it would give
these advantages...", that is /exactly/ what I mean. It does not mean
"I don't like B", or "I am stumped by C", or "You should not do D".

* Case-insensitive

Subjective, and I disagree.

* 1-based and N-based

(I assume you mean array indexing, or possibly loops?) Subjective, and
I disagree if the starting 1 is implicit. If it is explicit - "array 1
to 10 of int" - then I definitely like it.

* Algol-style syntax, line-oriented and largely semicolon-free, and sane
type syntax

Subjective on all points. But these are mostly syntactic details - C
and Pascal are both considered "ALGOL family" languages despite
syntactic differences.

* Module scheme (define everything in exactly one place)

Objective - a good module system is always a good idea. /Defining/
things in one place is good, but being able to /declare/ them elsewhere
is useful as it lets you separate interfaces from implementations.

* Namespaces

Objective - good.

* Encapsulation (define functions inside records etc)

Objective - good. However, it is not so important that functions are
defined /inside/ a record/struct/etc. The important part is that you
can make a user-defined type containing data of other types, and that
you can restrict access to the internals to be via specific functions or operations.

* Out-of-order definitions including easy mutual record cross-references

Subjective - convenient in some ways, less convenient in others. I
personally don't mind for functions and variables. But it is definitely
useful for defining recursive structure types.

* Regular i8-i64 and u8-u64 type denotations, include 'byte' (u8)

Objective - very good for low-level languages, unnecessary for higher
level languages (except for FFI and other interfacing). It's a lot
nicer if you can just use "integer" and let the compiler worry about
sizes, but that's harder to implement efficiently.

* Default 64-bit 'int', 'word' types, and 64-bit integer constants

Subjective - it makes little difference in practice if default integer
sizes are 32-bit or 64-bit, assuming big target systems. Each is big
enough for "almost everything", and neither is big enough for
"absolutely everything".

* Built-in print and read statements WITH NO STUPID FORMAT CODES

Subjective - though the modern trend is strongly towards avoiding
special statement types and preferring standard library functions for
this sort of thing. What constitutes "stupid" format codes is obviously
highly subjective - but it is objectively better if they can be deduced automatically by the compiler.

* Keyword and default function parameters

Subjective. I like keyword or named parameters. Others feel they
encourage having too complicated interfaces with too many parameters.

* Fewer, more intuitive operator precedences

Subjective. Some people think it is best to give a total order for
operator precedence, which means a lot of levels. And "intuitive" is completely subjective after basic mathematical arithmetic operators. I
am inclined to agree with you in general, however.

* Does not conflate arrays and pointers

Objective, and good.

* 'Proper' for loops; for-in loops

"Proper" is subjective. Good loop constructs are vital for an
imperative language, however.

* Separate 'switch' and 'case' selection; the latter has no restrictions
(and no stupid fallthrough on switch)

Subjective, I think. There are many ways to handle multiple choices or
pattern matching, and I don't think there is any justification for
claiming one particular way is "right" or "the best". One can do a lot
better than C's "switch", and I agree about "fallthrough".

* Proper named constants

Subjective - again, "proper" is totally your own opinion. Good support
for read-only (but set at run-time) and compile-time constant objects of
all types is objectively good.

* Break out of nested loops

Objective - you can always do that in some way. /How/ you should be
able to do it, is very subjective.

* Embed strings and binary files

Objective and subjective. Strings are too useful to require them to be
in separate files (though the possibility of doing so is useful for international translations). Embedding binary files as part of the
language is more subjective - some people think it is a good idea, some
people do not. (I think it is nice.) Being able to include them via
linking is definitely a useful feature.

* 'Tabledata' and 'enumdata' features (compare with X-macros)

I don't really know what you mean here. Most languages support filling
a table or array with constant data. Some let you do so using
compile-time functions, which is /very/ nice IMHO (though it can be
costly in compile time). "X-macros" is a general technique for textual substitution macros, which can be used for a huge variety of things.
Like many powerful techniques, it can be used to make code simpler,
clearer and more maintainable - or abused to make it messier.

* Function reflection

Objective - reflection, at least of things known constant at compile
time (for compiled languages), is a useful feature.

* Built-in, overloaded ops like abs, min, max

Subjective. There are no real advantages in being "built in" compared
to library functions. The trend is that functions (and features) that
can be in a library, /are/ in a library. But languages with simpler and
more limited tools, you can probably get more efficient results from
built-in functions rather than library functions.

* 'Properties' such as .len and .lwb

Subjective. Some like properties, some like functions. I personally
think it's a good kind of syntax for type-related information (such as
the size of a type), rather than for run-time information.

* Built-in 'swap'

It's difficult to call that one - I think it's hard to object to having
such a feature, but it might also be important to be able to override it
for your own types.

* Bit/field extracion/insertion syntax

Again, difficult to call. It's useful to be able to do bitfield
manipulation, but it can be done by struct definition or by an operator
or function taking the start and length as parameters. I'd say defined
and named bitfields are most important, and ad-hoc accesses can always
be done by masking and shifting when needed.

* Multiple function return values

Objective - that's a useful feature.

* Multiple assignment

Objective - also good, IMHO.

* Slices (including slices of char arrays to give counted strings)

Objective - some kind of slice syntax is nice on arrays.

* Doc strings

Objective - documentation is vital!

* Whole-program compiler that does not need a separate build system

Subjective, and I disagree. It does not matter how many parts tools are
in. Single "do everything" tools tend to be convenient but inflexible.
All other things being equal, small and simpler tools are better - but
all other things are seldom equal, and it's a low priority for me.

* Pass-by-reference

Subjective. Some people like it, others think it makes it harder to see
the effects calls have or what they might change. Pass by const
reference is less controversial.

* Value arrays

You mean arrays as value types, that can be assigned, passed as
parameters or returned from functions? Objective - it's good.

Yeah, just like C! If you think this lot is just C with a paint-job,
then you're in denial.

Yes, it is a lot like C. It has a number of changes, some that I think
are good, some that I think are bad, but basically it is mostly like C.
You can take a old-style C program, with some restrictions, and
translate it fairly directly into your language. You can take a program
in your language and translate it fairly directly into C, albeit some
parts will be ugly or non-idiomatic. I suspect that it would be more challenging to translate modern C cleanly into your language, than
vice-versa.

I am not saying they are the same, any more than C and Pascal are the
same - but they are very similar.

In particular, it's quite clear to me that when you developed your
language, you had the assembly level implementation heavily in mind when
doing so. Why are your numbers 64-bit integers by default? It is not
because it is a particularly useful size for integers, but because it
fits the cpu's you are targeting. Why do you want integer types that
are powers of 8? They fit the cpu. Why don't you allow user-defined
integer types that fit the sizes relevant to the application, such as a
type for numbers 1 to 100? Because you think in terms of the cpu and implementation, not in terms of the application space and the
programmer's task. Why do you not have inline functions, overloaded
functions, etc.? Because you think a function name should correspond to
a label of the same name in the assembly code. Why do you have two's complement overflow for integers? Because that's what the cpu does, not because it is useful to programmers.

It's a low-level language. Even if it is not explicitly defined in
terms of a particular cpu, that's your philosophy all the way. (And let
me stress - there is nothing wrong with that.) The result is almost
inevitably similar to C, which has a similar background philosophy
(albeit with a wider range of cpus in mind).

Of course, I fully expect you to be completely dismissive of all of
this. I wouldn't swap any of these for higher-order functions.

I can't imagine why you would think adding higher-order functions would
mean dropping any of it.

And there's no doubt that a large proportion of programmers go through
their career without ever considering higher order functions
(functions that operate on or return functions).

Too right. To be able to use such things, they MUST be 100% intuitive
and be usable with 100% confidence. But that's just the author; you need
to consider other readers of your code too, and those who have to
maintain it.

"Intuitive" means you've used it often enough to use the feature without thinking about it. Nothing more. Stop imagining that everything you
learned along your programming career is somehow easier than other
methods seen by other people. It's so long since you learned to program
that you've forgotten how it goes. When you have a long history of
programming in ALGOL, assembly, and perhaps a spot of FORTRAN or BASIC,
the step to C or your own language is minor. That makes it /seem/
intuitive, but it is not - it's just what you are used to. For people
with different backgrounds, there's no reason to suppose that other
types of language are not easier to learn (more "intuitive" for them).

COBOL was made to be easy for people with a background in business
logic. dBase is for database developers who want more than they can get
from SQL. A Forth programmer would think your language was gibberish,
as would someone who thinks about relations between data and likes
Prolog. Someone who has been good at mathematics at school will like
Haskell's way of defining things, but think C is insane (how can "x" be
equal to "x + 1" ?).

I learned functional programming at university. When I started, I had programmed mainly in a number of types of BASIC (that's what home
computers had), plus assembly and machine code on four different
processors, and played a little with Forth, Pascal, C, Logo, and even a
touch of Prolog and APL. During the first 8 week term, functional
programming was one of the six (IIRC) courses we had. Intuition is
quickly learned. (To be fair, we had a teacher and a planned path to
learning - we did not just leap into random high-order functions.)

To me they are a very long way from being 100% intuitive. So what do you think I should do: strive to be a 10th-rate programmer in a functional language I've no clue about; give up programming and tend to my garden;
or carry on coding in a style that *I* understand 100% (and most others
will too)?

No. I think you should be happy to accept that you don't know anything
about functional programming, and haven't the inclination or motivation
to learn, and leave it at that. What you should /stop/ doing is
claiming that it is not "intuitive", not useful, clunky, impractical,
hard to understand, or any of the other unjustified and unjustifiable complaints you have about it just because /you/ didn't grok it at a
quick glance.

I think anyone trying to make an interesting and useful new programming language should learn some functional programming (as I think they
should learn many other kinds of languages) to get a broader view of programming. You might not need that if you just want to make a variant
of the languages you already know, just with syntax that suits your own preferences better.

I also think that anyone interested in becoming a better /programmer/ or software developer, rather than just a better /coder/, should learn some function programming. You'll be a better imperative programmer for it.

But for you, personally, I think your prejudice and biases (or
"intuition") are too fixed. You'll never look at something new with an
open mind, so there is little point.

The stuff I do simply doesn't require a sophisticated language with
advanced types and curried functions invented on-the-fly. Here is an
actual example from an old app, a small function to keep it short:

    proc displaypoletotal =
        if not poleenabled then return fi
        print @poledev, chr(31), chr(14)    ! clear display
        print @poledev, "Total:", rightstr(strcash(total, paymentunit),
14)
    end

(This is part of a POS and displays running totals, on an LED display
mounted on a pole, driven from a serial port. It ran in a duty-free area
and worked with multiple currencies.)

What can higher-order-functions do for me here? Absolutely sod-all.

So what? What do bitfield extraction operators give you here? Or
multiple return values? Sod-all.

Higher-order functions are a feature of functional programming
techniques. No one expects you to write code in lambda calculus, any
more than they expect imperative programmers to write Turing machines.

But equally there's no doubt that they /are/ useful for many people in
many types of coding. Sometimes higher order functions are used
without people knowing about them - Python decorators are a fine example.

Actually, Python declarators are such a good example that I recommend

Decorators?

Yes. (I don't really know how much Python you have done.)

this link <https://realpython.com/primer-on-python-decorators/> that
gives a number of useful examples.

Decorators are a /very/ good example of a Python feature that I could
never get my head around. 5 minutes later, I'd have to look them up again.

You have to try /using/ them to have a hope of learning about them.

Think of this example. You have some code with functions "foo", "bar"
and "foobar". Mostly you call them as they are in your code.

auto debug(auto const& f) {
     return [&f](auto... args) {
         std::cout << "Calling ";
         ((std::cout << " " << args), ...);
         std::cout << "\n";
         auto r = f(args...);
         std::cout << "Returning " << r << "\n";
         return r;
     };
}

Suppose your real functions are :

     int foo(int x);
     int bar(int x, double y);
     double foobar(int x, double y, const char * p);

Your original code was:

     int a = foo(10);
     int b = bar(20, 3.14);
     double c = foobar(30, 2.71828, "Hello");

None of this gives you things you could not do by hand. But if you
find yourself doing the same thing by hand many times, then it is
natural to ask if it can be automated - if you can write a function to
do that. You can, if you have higher order functions.

I can't follow the C++ debug function at all.

"auto" just means "infer the type automatically". Since the compiler
knows the type of everything here, there is no need to specify it
explicitly - and indeed the type of lambdas can't be expressed
explicitly (each anonymous function is its own type). And the "const&"
bit just means it takes its parameter by constant reference - it's not
actually needed here, just habit.

The syntax "[&f](auto... args) { } " is declaring an anonymous function.
The local variable "f" (from the parameters) can be seen inside the
function, and the function will take a variable number of arguments
whose number and types are determined automatically when called. The
"return" means that this anonymous function is the return value of the
"debug" function.

The line "((std::cout << " " << args), ...);" basically means "do
(std::cout << " " << arg)" for each "arg" in the list of arguments. The
syntax can be fiddly until you are used to it, but it is handy for
making functions that can work with many arguments (such as a "sum"
function that can be used as "sum(1, 2, 3, 4,)" with as many arguments
as you want).

"auto r = f(args...);" should be clear now :-) It declares a local
variable whose type is determined automatically, initialised by calling
the function "f" with all the arguments. This value "r" is returned
after the debug printout.

But I notice the user code
changes from 'foo()' to 'debug()()'; I thought this could be done while leaving the foo() call unchanged.

The call to "debug" with argument "foo" returns a new function, which is
then callable.

You could easily add local variables :

auto debug_foo = debug(foo);
auto foo = debug_foo;

(It would have been nicer if "auto foo = debug(foo);" were allowed, but
the grammar of C++ does not allow that. I have never claimed it was a
perfect language!)

In effect, the identifier "foo" is already a "const" object from before,
and thus cannot be changed inside the same scope - it can only be
overridden in a narrower scope.

Or you could write "#define foo debug(foo)", but I suppose you'd call
macros cheating!

But no, my language doesn't deal with parameter lists as a first class
entity at all. (At best it can access them as a list object, but it
doesn't help here.)

The best I can do here is to have a dedicated function for each number
of arguments, and to use dynamic code to allow the same function for any types:

    func debug3(f, a,b,c)=
        println "Calling",f,"with",a,b,c
        f(a,b,c)
    end

    func foobar(a,b,c)=
        println "FooBar",a,b,c
        return a+b+c
    end

    x:=debug3(foobar, 5,6,7)     # in place of foobar(5, 6, 7)

    println x

This displays:

Calling <procid:"foobar"> with 5 6 7
FooBar 5 6 7
18

However this loses to ability to use any keyword or default arguments
for FooBar, since they are only available for direct calls (it's done at compile-time).

(You are also missing the return types. I don't know if that was an
oversight, or a complication you can't easily handle.)

Dynamic typing reduces the amount of manual work, but it is all run-time effort. The C++ method is all compile-time work, and gives optimal
code. (Well, the implementation of std::cout is always an inefficient
mess...) You think of functions as single pieces of executable code, corresponding (in a compiled language) to an assembly label and a set of assembly instructions. More advanced languages think of them as a
description of actions with a far loser connection between the source
code and the generated assembly code. (That does make it much harder to understand the generated assembly, or for assembly-level debugging.) So
the calls to "debug" do not result in assembly CALL instructions - they
result in the std::cout statements being included alongside the calls to
the original "foo" function.

So I can see that that C++ debug does some very hairy stuff, to make it
work with static types and for any function, but I just can't understand
it.

It would be unreasonable to expect anyone to understand this stuff from
a simple Usenet post!

However, given the requirement you outlined, I could probably come up
with a custom feature to do just that. Although it might be in the form
of a compiler option which injects the debug code at the start of the relevant functions. Then the user code does not need updating.

See, when you have control of the language and implementation, there are
more and better possibilities.

When you have a flexible enough language, you don't /need/ to mess with
the implementation. You can do what you want anyway. And note that
almost nobody has their own language and implementation to change at
will - it is not a scalable solution.

There is a proposal to add metaclasses to C++. (If you google for this,
be aware that the syntax is quite hairy, and likely to be used only by
library programmers.) One of the things this gives you is enough power
to define the concept of "struct", "class", "union", "enum", bitfields,
etc., as metaclasses. That means you could take C++ with metaclasses,
remove the keywords "struct", "class", etc., and then define them again
within the language itself.

That's like saying you have 20 years of experience as a taxi driver,
and never once had to use "flaps" or "ailerons", or even think about
the concept. You therefore can't understand why pilots want to use
them all the time. You can give a qualified opinion on driving round
roundabouts and may be an expert on gearing, but you have no basis for
a qualified opinion on flying.

I don't want to fly. (I was once in a small aircraft flying at 7000 ft.
But I've also ridden a bike at 8000 ft, although over a mountain in that case. So who needs to fly?!)

That's fine - but some other people (not me either) /do/ want to fly.

So again - mocking and dismissing concepts that you know nothing
about, makes you look foolish. (Your ignorance of the topic is not
the issue - we are all ignorant of almost everything.)

Have I ever called you ignorant? I don't care about these concepts; they
are not for me. But I appreciate lots of things you don't care for.

Look at this code; it is a silly task, but concentrate on the bit that
does the input:

    real a,b,c

    print "Three numbers: "
    readln a, b, c

    println "Their sum is:", a+b+c

The spec is that the three numbers are read /from the same line/, and
can be separated with commas or spaces.

Try to do that `readln` part in Python, and just as simply. Even in C
it's an ordeal.

a, b, c = eval(input("Three numbers: "))
print(a + b + c)

Or just :

print(sum(eval(input("Input some numbers: "))))

(I will not try it in C - I fully agree it's a poor choice of language
for that kind of thing.)

(My code actually works on either of my languages, static or dynamic.
That's a bonus feature. Imagine a solution in Python or C that works
with both languages.)

No, it is hard, obscure, subtle. Take my word for it.

No, I will not take your word for it. You know nothing about it.

I implemented it, remember? Even if it was a mock-up to see if a
proposed built-in approach would work.

Yes, that was my deferred code feature, itself deferred. (It means I
have to instead define an explicit, named function.)

That's the impression I got. I don't know how you handle captures of
local variables (if you do so at all).

When I had local functions for a while, they could access static
variables, user types, named constants, macros, enums and other local functions within a containing function. Plus of course anything defined globally. But not parameters and stack-frame variables of the enclosing functions.

Quite a lot could actually be done that way. So it could with my
deferred code objects.

Effectively any function is just a variable to which has been
assigned some anonymous function (although in practice, the function
retains its 'F' identify even if the user's 'F' variable has been
assigned a different value).

Python does not have variables. It has /identifiers/. Change
"variable" for "identifier" in your description, and "assigned" to
"bound", and you've got it right.

Just call them variables that work in a particular way: they are
references to objects, but can never be references to other variables.

I was trying to be precise, so that you get the right idea.

When you assign a value, you are copying a reference.

When you bind an object to an identifier or other object, you take a new reference (increasing its reference counter), not just a copy.

And you seem convinced that the Python code I showed is "hackish" and
"unprofessional".

Defining a struct's layout as "IIHHIII" or whatever? Yeah, that's really professional!

It's a nice, simple format string - "I" is a 32-bit unsigned integer,
"H" is a 16-bit unsigned "half integer", lower case versions are signed.
There's a list. It is /really/ simple and efficient, and very
flexible. You define the layout in a single string.

The code works fine - it is clear and simple, shorter than in your
language, and easy to modify and maintain.

Really? The struct changes: two fields are swapped. You have to count
along counting which one those characters needed to be exchanged. And
that multiple assignment needs to be revised too. It's a bit hit and miss.

In different layouts, some things will always be a little easier, other
things a little harder. Yes, it's clear and simple and easily used.

If you prefer to think of structures matching C struct definitions
(which are /one/ way to describe a file format, but certainly not the
only way), you can use the "ctypes" Python module and define a structure.

So why didn't you do that in the first place? I assume that can define pointers too? (Since structs can contain pointers and you might need to access what they point to.)

No, you don't have pointers like in C - they have to be related to
Python types since you are interfacing between C types and Python types.
But the ctypes module handles the details.

I wrote it with "struct" because for a simple case like this, it is
shorter and clearer. But if you prefer, use "ctypes" - it's nicer for
some things. There is no need to limit yourself to one single solution.

But I guess that this was about you proving that pointers were
unnecessary...

It was about you thinking that your own code using pointers could not be implemented neatly in languages without pointers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Fri Nov 18 11:03:04 2022

On 17/11/2022 12:24, Bart wrote:

On 17/11/2022 10:34, David Brown wrote:

On 16/11/2022 23:01, Bart wrote:

But Unicode makes everything harder, with characters taking up
multiple bytes, and a lot of the time it just doesn't work. (I've
seen Unicode errors on everything from TV subtitles to supermarket
receipts, and that was a few weeks ago.)

That's not a Unicode problem - that's a software bug.

It means even the big boys have issues with it.

What makes you think software for that kind of thing is made by "big
boys"? It's as likely to be small companies or single developers, as
anything else. /I/ have made software that printed out receipts in
machines in supermarkets, and that was in assembly with no Unicode in
sight. (It was a long time ago.)

And the "big boys" make mistakes as often as the small folks, both in
terms of understanding the problem, knowing about the solutions, and implementing the code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Fri Nov 18 12:00:56 2022

On 15/11/2022 18:32, James Harris wrote:

On 14/11/2022 15:23, David Brown wrote:

On 14/11/2022 11:47, Bart wrote:

...

In-place, value-returning increment ops written as ++ and -- are
common in languages.

Yes. And bugs are common in programs. Being common does not
necessarily mean it's a good idea.

(It doesn't necessarily mean it's a bad idea either - I am not
implying that increment and decrement are themselves a major cause of
bugs! But mixing side-effects inside expressions /is/ a cause of bugs.)

The side effects of even something awkward such as

*(++p) = *(q++);

are little different from those of the longer version

p = p + 1;
*p = *q;
q = q + 1;

The former is clearer, however. That makes it easier to see the intent..

Really? I have no idea what the programmer's intent was. "*p++ =
*q++;" is common enough that the intent is clear there, but from your
first code I can't see /why/ the programmer wanted to /pre/increment
"p". Maybe he/she made a mistake? Maybe he/she doesn't really
understand the difference between pre-increment and post-increment?
It's a common beginners misunderstanding.

On the other hand, it is quite clear from the separate lines exactly
what order the programmer intended.

What would you say are the differences in side-effects of these two code snippets? (I'm assuming we are talking about C here.)

Just blaming operators you don't like is unsound - especially since, as
you seem to suggest below, you use them in your own code!!!

All I am saying is that it's worth considering the advantages and
disadvantages of making a decision about such operators. I'm not
denying that the operators can be useful - I am questioning whether
those uses are enough in comparison to the advantages of /not/ having them.

When I program in C, I use the features of C as best I can in the
clearest way. I don't change the language (though I limit myself to a
subset of the possibilities, of course, as everyone does in every language).

...

[discussion of ++ and -- operators]

Is your point that you shouldn't have either of those operators?

Yes! What gave it away - the first three or four times I said as much?

...

... (Of course I use increment operator, especially in loops, because
that's how C is written. But a new language can do better than that.)

If you think ++ and -- shouldn't exist then why not ban them from your
own programming for a while before you try to get them banned from a new language?

Banning them from C would have no benefits because C has side-effects in expressions, and avoiding the operators won't change that. Besides, the
real problem is not careful and considered use, it is the /abuse/ that
some people call "clever coding". It is not the "char c = *p++;" that
is the problem, it is the "++E*++".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to David Brown on Fri Nov 18 11:47:45 2022

On 2022-11-18 11:03, David Brown wrote:

And the "big boys" make mistakes as often as the small folks, both in
terms of understanding the problem, knowing about the solutions, and implementing the code.

Big boys make more mistakes due to mismanagement typical for big
organizations.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Fri Nov 18 11:48:43 2022

On 16/11/2022 00:11, Bart wrote:

On 15/11/2022 18:05, David Brown wrote:

On 15/11/2022 16:26, Bart wrote:

import struct        # Standard module
bs = open("potato_c.cof").read()

machine, nsections, timestamp, symtaboffset, nsymbols,

optheadersize, characteristics = struct.unpack_from("<HHIIIHH")

That's it. Three lines. I would not think of C for this kind of
thing - Python is /much/ better suited.

I don't believe you.

OK, I suppose. But I've done this in Python, I've done it in C, and I
would never choose C for any kind of serious file handling. Reading a
simple structure in C is okay, and of course if you already have the
struct defined in a C header, using it will save effort.

(BTW you might be missing an argument in that struct.unpack_from call.)

No, I am not. There is an optional third argument, but it is optional.

What about the second argument? I don't understand how the function call knows to get the data from 'bs'.

My apologies - I must have had a copy-and-paste screwup. I didn't see
it, and assumed you were talking about the optional "offset" parameter,
or perhaps the odd "/" character that appears in the documentation.
Yes, I was missing "bs" as the second parameter.

<https://docs.python.org/3/library/struct.html>

Using that approach for the nested structs and unions of my other
example is not so straightforward. You basically have to fight for
every field.

You have to define every field in every language, or define the ones
you want along with offsets to skip uninteresting data.

When properly supported, you can define the fields of a struct just as
you would in any static language (see above example), and you can write handling code just as conveniently.

You don't have to manually write strings of anonymous letter codes and
have to remember their ordering everywhere they are used. That is just
crass.

You don't have to do that in Python either. But it is really convenient
for small and simple cases.

If you have something bigger or more complicated, you can do what you
can always do in programming - divide and conquer. When I wrote network
code for Modbus, I used one pack/unpack set for the common packet
header, another for the Modbus function specific fields, and another for
the data (using "H" * n for the format string of multiple 16-bit
unsigned types, since that's what Modbus likes).

Or you can use ctypes: <https://docs.python.org/3/library/ctypes.html#structures-and-unions>

I've found struct pack/unpack to be very convenient and simple. It is particularly handy in combination with Python's array slicing, and easy combination of arrays or strings by just "adding" them.

I went out of my way to add such facilities in my scripting language,
because I felt it was important. So you can code just as you would in a static language but with the convenience of informal scripting.

Clearly you don't care for such things and prefer a hack.

You say "hack" as though there exists "right" and "wrong" ways to handle particular tasks. I use a language and techniques that are quick,
simple, and do the job. Other people can understand, modify and use the
code as they need. (In that respect, pretty much everything beats your solution - except maybe Perl :-) ) It's a solution, not a "hack".

The result is a tuple of unnamed fields. You really want a proper
record, which is yet another add-on, with a choice of several modules
depending on which set of characterics you need.

You can do that in Python.

Yeah, I know, you can do anything in Python, since there is an army of
people who will create the necessary add-on modules to create ugly and cumbersome bolted-on solutions.

Do you actually know much about Python? Maybe I've been assuming too much.

I can list dozens of things that my scripting language does better than Python. (Here, such a list exists: https://github.com/sal55/langs/blob/master/QLang/QBasics.md.)

I'm not falling for that again. Suffice to say that you can assume any
other programmer (not just me) will throw out at least half the list
because they disagree with your opinions. (Which half will, of course,
vary enormously.) And anyone with experience with Python and who is
insanely bored could list hundreds or thousands of things that are
better in Python - assuming they could find solid documentation for your language.

In short, you are making up shit in an attempt to make your own
language look better than other languages, because you'd rather say
something silly than admit that any other language could be better in
any way for any task.

Not at all. Python is better for lots of things, mainly because there
are a million libraries that people have written for it, armies of
volunteers who have written suitable, bindings or written all sorts of
shit. And there is huge community and lots of resources to help out.

It is also full of as many advanced, esoteric features that you could
wish for.

But it is short of the more basic and primitive features of the kind I
use and find invaluable.

'struct' is also not a true Python module; it's a front end for an
internal one called `_struct`, likely implemented in C, and almost
certainly using pointers.

Please re-think what you wrote there. I hope you can realise how
ridiculous you are being.

Tell me. Maybe struct.py could be written in pure Python; I don't know.

Your last point is quite telling - you /don't know/, yet you feel
qualified to make claims about it. But you are missing the /real/ point
- the implementation is /irrelevant/. It doesn't matter if "struct" is
written in pure Python, or as a module written in C, or anything else.
It is a standard Python library module (thus not an "add-on" or "some
weird extra module"). When people write code in the Python language
using the standard Python library to do tasks like the one you gave
above, they do not use pointers.

(In case you are curious, some implementations of Python - such as PyPy
- have pure Python implementations of modules like "struct". Others,
such as CPython, have C modules for that kind of thing.)

I'm saying I guarantee mine would have the necessary features to do so.

But this started off being about pointers. Here's another challenge:
this program is for Windows, and displays the first 2 bytes of the
executable image of the interpreter, as loaded in memory:

    println peek(0x400000, u16):"m"

    fun peek(addr, t=byte) = makeref(addr, t)^

This displays 'MZ' (the signature of PE files on Windows). But of
interest is how Python would implement that peek() function.

You think it is a good think that programs have direct access to the
memory their interpreter's executable? Really?

import struct
bs = open("/usr/bin/python").read()
print(struct.unpack_from("H", bs, 0x40000)[0])

That prints the 16-bit value at offset 0x40000 from the start of the
"python" file. Was that what you wanted?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Fri Nov 18 12:22:32 2022

On 18/11/2022 10:48, David Brown wrote:

On 16/11/2022 00:11, Bart wrote:

I can list dozens of things that my scripting language does better
than Python. (Here, such a list exists:
https://github.com/sal55/langs/blob/master/QLang/QBasics.md.)

I'm not falling for that again. Suffice to say that you can assume any other programmer (not just me) will throw out at least half the list
because they disagree with your opinions. (Which half will, of course,
vary enormously.) And anyone with experience with Python and who is insanely bored could list hundreds or thousands of things that are
better in Python - assuming they could find solid documentation for your language.

The point of that list was to show the basics that are either missing or
that have clunky bolted-on implementations, or that have to be achieved
by abusing advanced features (I'm thinking of people who have added even
'goto' to Python).

As one example, if I want the ASCII code for "A", I write 'A' as one
might do in C.

In Python it's ord("A") (which used to involve a global lookup then a
function call; perhaps it still does, since 'ord' might have been
reassigned).

In Lua it's string.byte("A"). Just what you need in interpreted code:
extra overheads!

What I'm saying is that there are too many advanced features with less
support for fundamental ones.

This displays 'MZ' (the signature of PE files on Windows). But of
interest is how Python would implement that peek() function.

You think it is a good think that programs have direct access to the
memory their interpreter's executable? Really?

I don't claim my language is safe. I made a decision that still allowing
raw pointers in scripting languages can be useful in the right hands.
(However these have some extra protections compared with their C
equivalents.)

But Python has so many possibilities with Cython or Ctypes or C
extension modules, that I'm sure just as much mischief can be done if
somebody wants, but the whole thing is so complicated that you can't
audit that easily.

    import struct
    bs = open("/usr/bin/python").read()
    print(struct.unpack_from("H", bs, 0x40000)[0])

That prints the 16-bit value at offset 0x40000 from the start of the
"python" file. Was that what you wanted?

Not quite. Your code just reads 2 bytes from a file (at a 256K offset
from the start of the file; the MZ signature is at offset 0).

My code accesses my task's virtual memory space, where the executable
image is placed at offset 0x400000 (4MB) from address 0x000000. Think of
it as a form of reflection.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andy Walker@21:1/5 to David Brown on Fri Nov 18 14:09:03 2022

On 18/11/2022 07:12, David Brown wrote:

If I want two things to change, why try to squeeze it into /one/
expression or statement? Why not write two statements, each one
doing a single clear and simple task?

I think there is a danger of over-egging this sort of case, partly caused by writing simple examples. It depends on context, but one reason
to write one expression rather than two statements is to keep code short
and structured:

a[ i +:= 1 ] := x

vs

begin
i +:= 1;
a[i] := x
end

[on the assumption that this is the controlled clause of a conditional or
loop, so needs to be turned into one statement]. Repeated several times
in a few lines of code, it turns a half-page procedure you can comprehend
"at a glance" into a page-and-a-half for which you have to keep scrolling
up and down, the "begins" and "ends" [or moral equivalents] start to
obtrude, and the whole structure of the code is less clear. You've
replaced one task ["assign to the next element of the array"] by two,
that may or may not be linked.

Clearly this can be overdone, but there is a balance to be struck,
and I would argue that Pascal and, judging by Dmitry's tokeniser example,
Ada are too far one way and C[++] too far the other.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Couperin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Fri Nov 18 16:52:33 2022

On 18/11/2022 07:12, David Brown wrote:

On 16/11/2022 21:04, Bart wrote:

That's fine. But you wouldn't enter a discussion with a linguist who
has some experience with Chinese, and try to tell them that Chinese
grammar is beyond human comprehension. You could say that /you/ think
it looks like Chinese writing would be hard to learn - but you could
/not/ say anything about how hard it is for Chinese speakers to learn.
You could not even say that it really would be difficult for you to
learn, because you haven't tried or investigated enough.

I mentioned Chinese because I and a friend did attend a short (and free)
taster course.

But I also mentioned it because it's not that useful in the UK, even if
I'd mastered some it.

It's also interesting because if you look at the English Wikipedia entry
on China, the web page source is something like 99% ASCII, with the rest Unicode. But even if you look at the /Chinese/ page, the page source is
still 80% ASCII (mainly due to tags etc)

This ties in with the discussion on Unicode, and also my point about
giving weight to more basic language features.

Unicode understands the importance of ASCII, since ASCII occupies the
first 128 Unicode codepoints.

(Imagine if ASCII had been relegated to some obscure alphabet.)

I can fully respect your personal preferences - that's not the issue for me. I find it sad and disappointing that someone can have such strong opinions about something they have never really considered or tried to
learn about,

I've tried lots of times. I've even tried implementing some of it.

and consequently don't understand, but I guess that is
human nature. It's ancient survival tactics, honed by evolution - when something is new, different and unknown, human instinct is to fear it
and run away, rather than investigate and study it.

There is a VAST amount of complexity about now, most of which is not
justified. And that applies across everything - processors, computers, operating systems, languages, tools and applications.

It all barely manages to work, mainly thanks to just throwing more
resources at any problems instead of tackling it at root, by simplifying.

The big thing in new languges now is not really advanced kinds of
functions, it is even more advanced and esoteric type systems.

I'm not interested in advanced functions and even less in types. You can
do a huge amount with the basics; what you do can be understood by more
people; and it is accessible to more people. It's probably also easier
to make things efficient.

Tell me what the latest thing is in English now; what's new? There are
lots of new words about, but most people communicate using a much more
basic vocabulary.

The bit that really bugs me is how you (and James) can hold such strong opinions about how /other people/ might like and use these features and languages that support them. Is it so hard to accept that some people
like using higher order functions?

Is it so hard to accept that the vast majority of functions in any
application are VERY ordinary? So why make so much of the tiny minority
that need those special features?

Look at any library: OpenGL exports 500 ordinary functions; GTK and
Windows perhaps the best part of 10,000 functions each.

Or that some people write code in
functional programming languages, because they find it a better choice
for their needs?

I said I don't care what other people use.

I care when functional style is inflicted on me, but if I reject it,
then I'm just being deliberately ignorant; a diehard; a Luddite; a stick-in-the-mud.

All I see is something that makes ordinary coding much harder to do and
harder to understand.

Is it so hard to accept that other people can write
code for the same task in widely different languages, and /your/ code in /your/ language is not the "perfect" solution or the only "non-clunky"
code?

I accept my 1980s-style code is clunky. But it still works extraordinary
well. Code in my scripting language is clear enough and conservative
enough to act as pseudo-code.

(There was a period in clc when Ben Bacarisse used to post short 10-line solitions to some task in Haskell. Except that they were rather
difficult to understand.

I had a habit then where I posted solutions in my scripting language
which were far simpler for /anyone/ to understand. But mine might have
been 15 lines instead of 10.)

If I want two things to change, why try to squeeze it into /one/
expression or statement? Why not write two statements, each one doing a single clear and simple task?

What about:

swap(a, b) # from my stuff
(a, b) = (b, a) # from Python
(a, b) = f() # multiple function return values
x := (a, b, c, d) # record/struct constructor assigns 4 fields
A[3..6] := (1,2,3,4) # change 4 elements of A

This stuff happens everywhere.

(As to writing "i" three times - again, these things are often found in loops, where a good syntax can mean "i" is never written at all.)

If you're talking about C-style for-loops, that remains one of the great mysteries of the world: why so many languages would copy such a crappy
feature:

for (i=a; i<=b; ++i)

instead of:

for i in a..b # or countless variations

But it's amusing that people who defend that C-style loop dismiss that
superior version and say, Ah, but you shouldn't be using such iteration
at all! Let's jump a language level or two and go straight into advanced
array manipulation features, or straight to functional code.

Here, show me this in any language:

to n do
println random()
od

Just repeat-N-times. It terms of low-hanging fruit of simple-to-add and simple-to-understand and convenient features, it is one of the lowest.
But as a feature it is rare.

Yeah, just like C! If you think this lot is just C with a paint-job,
then you're in denial.

Yes, it is a lot like C.

Yes, it is that /level/ of language. But done properly (IMO).

It has a number of changes, some that I think
are good, some that I think are bad, but basically it is mostly like C.

There are some massive changes. Even C++ has only just acquired modules.

In particular, it's quite clear to me that when you developed your
language, you had the assembly level implementation heavily in mind when doing so. Why are your numbers 64-bit integers by default?

What other size makes sense these days? C has to use i32 for
compatibility reasons, although the language itself would allow i64.

It is not
because it is a particularly useful size for integers, but because it
fits the cpu's you are targeting. Why do you want integer types that
are powers of [2]?

For the same reason that 10 other languages I can't be bothered to list,
that have fixed-width integer types, do so too.

Yes you can have the luxury of ignoring the CPU, but you're going to
have a slower and/or more complex language and/or a more massive and
slower compiler or JIT to get it up to speed.

It's a low-level language.

I've never said anything else. My two languages are marked as M and Q
here, on a scale of language level:

C--M-----Q---------------Python

Plus other kinds of languages exist outside of that line. It's a
particular niche I find useful for the things I want to do.

Of course, I fully expect you to be completely dismissive of all of
this. I wouldn't swap any of these for higher-order functions.

I can't imagine why you would think adding higher-order functions would
mean dropping any of it.

Languages having such functions ahead of nearly everything else annoy
me. One new language proposal over on Reddit described 3 kinds of functions.

The first two were advanced ones; the last were ordinary 'named'
functions, as are used for 99% of the functions in real applications.

They were almost an afterthought!

"Intuitive" means you've used it often enough to use the feature without thinking about it. Nothing more. Stop imagining that everything you learned along your programming career is somehow easier than other
methods seen by other people. It's so long since you learned to program that you've forgotten how it goes. When you have a long history of programming in ALGOL, assembly, and perhaps a spot of FORTRAN or BASIC,
the step to C or your own language is minor. That makes it /seem/ intuitive, but it is not - it's just what you are used to.

It was also what I could do within 16KB of memory on an 8-bit processor, written in an assembler written in hex machine code.

Then I realised how much potential even that simple language had. With
that language (and with an extra 16KB data memory and 8KB video memory),
I worked on two kinds of programs I had an interest in, apart from the
language stuff:

* 3D vector graphics (which involved emulating floating point
arithmetic, as well as line-drawing algorithms etc)

* Video processing based on capturing image data from a frame grabber.

The language really isn't that important. You don't need anything fancy
or sophisticated, the basics will do. Plus some conveniences I've added.
Yes, even in 2022.

No. I think you should be happy to accept that you don't know anything about functional programming, and haven't the inclination or motivation
to learn, and leave it at that.

Let's leave it then.

I also think that anyone interested in becoming a better /programmer/ or software developer, rather than just a better /coder/, should learn some function programming. You'll be a better imperative programmer for it.

It would have to be /my/ functional language, that's not going to happen.

But for you, personally, I think your prejudice and biases (or
"intuition") are too fixed. You'll never look at something new with an
open mind, so there is little point.

You mean, like you are so sceptical of my bit operations; or
'tabledata'; or whole-program compilation (which simply changes the
granularity at which a compiler works).

I've also long abolished 'linkers'; I've introduced run-from-source,
even for compiled code; I eliminated the distinction between expressions
and statements;...

Quite a lot of innovative stuff. Just not the sort of thing that you
think adds value.

And then, you dismiss my whole-program compiler, but extol link-time-optimisation.

(Examples of 'tabledata', now called 'enumdata' when enums are involved:

https://github.com/sal55/langs/blob/master/MLang/Examples/aa_tables.m)

So what? What do bitfield extraction operators give you here? Or
multiple return values? Sod-all.

People can see the utility of bitfield extraction and multiple return
values, even when writing reams of very dull code. You must have seen
GETBIT and SETBIT macros in C.

The benefits of currying are far more elusive.

There is a proposal to add metaclasses to C++.

Why not? It has everything else!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to James Harris on Fri Nov 18 18:24:20 2022

On 16/11/2022 23:02, James Harris wrote:

On 14/11/2022 18:41, Dmitry A. Kazakov wrote:

On 2022-11-14 19:26, James Harris wrote:

On 14/11/2022 11:29, Dmitry A. Kazakov wrote:

...

    Index     : Integer := Pointer + 1;
    Malformed : Boolean := False;
    Underline : Boolean := False;
    Symbol    : Character;
begin
    while Index <= Line'Last loop
       Symbol := Line (Index);
       if Is_Alphanumeric (Symbol) then
          Underline := False;
       elsif '_' = Symbol then
          Malformed := Malformed or Underline;
          Underline := True;
       else
          exit;
       end if;
       Index := Index + 1;
    end loop;
    Malformed := Malformed or Underline;

...

    errors = 0
    last_char = line(pointer)
    rep for i = pointer + 1, while i le line_last, ++i
      ch = line(i)
      if ch eq '_'
        if last_char eq '_' so ++errors ;Consecutive underscores
      on not is_alphanum(ch)
        break rep ;If neither underscore nor alphanum we are done
      end if
      last_char = ch
    end rep
    if last_char eq '_' so ++errors ;Trailing underscore

...

It occurred to me that the code could be made yet shorter, leading to

errors = 0
rep for i = pointer, while ++i le line_last
if line(i) eq '_'
if line(i - 1) eq '_' so ++errors
on not alphanum(line(i))
exit rep ;Neither underscore nor alphanum so we're done
end if
end rep
if line(i - 1) eq '_' so ++errors ;Trailing underscore

Overall, this bit of the code has gone from 17 lines to 12 to 9. That's
an appreciable reduction. And no tricks were involved. This is basic
refactored code.

Don't get me wrong. I'm not saying that short code is in and of itself a benefit. But if the 'complexity density' remains the same then less code
does help: it gives a programmer less to read, less to understand, less
to keep in his head, and less to debug.

As well as having fewer lines, some of the lines in the latter version
are also shorter and simpler than those in the version which preceded it.

The most recent changes:

I stripped out the variables ch and last_char. Referring, instead, to
line(i) and line(i - 1). IMO that makes clearer what's being referred
to: the char at the current index and the one before.

Another change is relevant to this thread: I made better use of one of
the "++" operators. The prior code began its loop with

i = pointer + 1, while i ...., ++i

That code was an obvious candidate for changing to the simpler

i = pointer, while ++i ....

I return to my earlier assertion: allowing programmers to use the nudge operators ("++" etc) allows certain algorithms to be expressed more
elegantly, more clearly, and more simply.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Fri Nov 18 19:01:20 2022

On 15/11/2022 21:40, David Brown wrote:

On 15/11/2022 20:09, James Harris wrote:

On 15/11/2022 17:31, David Brown wrote:

On 15/11/2022 17:58, James Harris wrote:

...

The question is not whether prevention would be possible but whether
you (i.e. DB) would consider it /advisable/. If you prevented it then
a lot of familiar programming patterns and a number of existing APIs
would become unavailable to you so choose wisely...! :-)

I am not the language designer here

Uh, huh.

- and I still don't really grok what
kind of language /you/ want, what you understand from before, what uses
it should have, or what you think is wrong with existing languages. (Or maybe this is all for fun and interest, which is always the best reason
for doing anything.) That makes it hard to give recommendations.

...

You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something,
allow it. If you want to prohibit it, prohibit it.

Sorry, but it doesn't work like that.

Yes, it does.

No, it does not. Your view of language design is far too simplistic.
Note, also, that in a few paragraphs you say that you are not the
language designer whereas I am, but then you go on to try to tell me how
it works and how it doesn't and, previously, that anything can be done.
You'd gain by /trying/ it yourself. They you might see that it's not as straightforward as you suggest.

A language cannot be built on ad-hoc choices such as you have suggested.

I haven't suggested ad-hoc choices. I have tried to make reasoned suggestions. Being different from languages you have used before, or
how you envision your new language, does not make them ad-hoc.

Saying you'd like selected combinations of operators to be banned looks
like an ad-hoc approach to me.

...

BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward to
use.

So do over-generalisations.

Not really.

Yes, really.

You simply repeating phrases back to me but in the negative does not
make your assertions correct or mine wrong.

Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters. Let's treat them the same". Now people can name a function "123", or "2+2".
It's conceivable that you'd work out a grammar and parsing rules that
allow that (Forth, for example, has no problem with functions that are
named by digits. You can redefine "2" to mean "1" if you like). Do you think that would make the language easier to learn and less awkward to use?

Certainly not. Why do you ask?

It's ad-hoc rules which become burdensome.

Agreed.

Phew!

...

Seriously, try designing a language, yourself. You don't have to
implement it. Just try coming up with a cohesive design of something
you would like to program in.

If I had the time... :-)

I fully appreciate that this is not an easy task.

I'm sure you do but your view of the details is superficial. You have
ideas which are interesting in themselves but you don't appear to
appreciate how decisions bear on each other when you have to bring
hundreds of them together.

...

Bart came up with an example something like

+(+(+(+ x)))

That's not at all sensible. You want that banned, too?

Yes :-) Seriously, I appreciate that there will always be compromises - trying to ban everything silly while allowing everything sensible would
mean countless ad-hoc rules, and you are right to reject that. I am advocating drawing a line, just like you - the difference is merely a
matter of where to draw that line. I'd draw the line so that it throws
out the increment and decrement operators entirely. But if you really wanted to keep them, I'd make them postfix only and as statements, not
in expressions - let "x++" mean "x += 1" which means "x = 1" which
should, IMHO, be a statement and not allowed inside an expression.

This does, indeed, in a sense, come down to where the designer decides
to draw the line. Unfortunately there is no simple line.

For example, you spoke about banning side effects in expressions. For
sure, you could do that. But then you thought side effects in function
calls in expressions should possibly be treated differently and be left
in! Making such rules is not as simple as it may appear. All the
decisions a language designer makes have a tendency to bear on each
other, even if only in the ethos of the final language: whether it's
simple and cohesive or ad hoc and confusing.

Further, remember that the decisions the language designer makes have to
be communicated to the programmer. If a designer says "these side
effects are allowed but these other ones are not" then that just gives
the programmer more to learn and remember.

As I say, you could try designing a language. You are a smart guy. You
could work on a design in your head while walking to the shops, while
waiting for a train, etc. As one of my books on language design says,
"design repeatedly: it will make you a better designer".

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Fri Nov 18 20:14:52 2022

On 18/11/2022 11:00, David Brown wrote:

On 15/11/2022 18:32, James Harris wrote:

...

The side effects of even something awkward such as

   *(++p) = *(q++);

are little different from those of the longer version

   p = p + 1;
   *p = *q;
   q = q + 1;

The former is clearer, however. That makes it easier to see the intent..

Really? I have no idea what the programmer's intent was. "*p++ =
*q++;" is common enough that the intent is clear there, but from your
first code I can't see /why/ the programmer wanted to /pre/increment
"p". Maybe he/she made a mistake? Maybe he/she doesn't really
understand the difference between pre-increment and post-increment? It's
a common beginners misunderstanding.

I don't think I know of any language which allows a programmer to say
/why/ something is the case; that's what comments are for. Programs
normally talk about /what/ to do, not why. The very fact that the
assignment does something non-idiomatic is a sign that a comment could
be useful. It's akin to

for (i = 0; i <= n ....

If the test really should be <= then a comment may be useful to explain why.

On the other hand, it is quite clear from the separate lines exactly
what order the programmer intended.

What would you say are the differences in side-effects of these two code snippets? (I'm assuming we are talking about C here.)

That depends on whether the operations are ordered or not. In C they'd
be different, potentially, from what they would be in my language. What
would you say they are?

Just blaming operators you don't like is unsound - especially since,
as you seem to suggest below, you use them in your own code!!!

All I am saying is that it's worth considering the advantages and disadvantages of making a decision about such operators. I'm not
denying that the operators can be useful - I am questioning whether
those uses are enough in comparison to the advantages of /not/ having them.

It's always useful to have one's preferences challenged. :-)

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Fri Nov 18 19:31:55 2022

On 18/11/2022 07:12, David Brown wrote:

...

... I find it sad and disappointing that someone can have such strong opinions about something they have never really considered or tried to
learn about, ...

A bit like those who have strong opinions on how a programming language
should be designed but have never designed one themselves, you mean?!

and consequently don't understand, but I guess that is
human nature. It's ancient survival tactics, honed by evolution - when something is new, different and unknown, human instinct is to fear it
and run away, rather than investigate and study it.

The bit that really bugs me is how you (and James) can hold such strong opinions about how /other people/ might like and use these features and languages that support them.

There's an old maxim: Try and design a language for other people to use
and you'll end up with PL/I or Cobol. Try and design a language for
yourself and you might find that others of like mind also appreciate it.

That's a reasonable maxim although I'd add to keep it simple: cut down
on those ad-hoc rules or others will find it too hard to remember, even
if you as the language designer can remember them all.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Fri Nov 18 19:19:05 2022

On 15/11/2022 12:14, Dmitry A. Kazakov wrote:

On 2022-11-15 12:44, James Harris wrote:

Do you also believe that the Unix

   bytes = read(fd, &buf[1], reqd);

should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??

That is simple. Ada's standard library has it:

   procedure Read
             ( Stream : in out Root_Stream_Type;
                Item   : out Stream_Element_Array;
                Last   : out Stream_Element_Offset
             ) is abstract;

Item is an array:

type Stream_Element_Array is
   array (Stream_Element_Offset range <>) of aliased Stream_Element;

It is also a "virtual" operation in C++ terms to be overridden by new implementation of stream. Last is the index of the last element read.
Notice non-sliding bounds, as you can do this:

   Last := Buff'First - 1;
   loop
      Read (S, Buff (Last + 1..Buff'Last), Last); -- Non-blocking chunk
      exit when Last = Buff'Last;                 -- Done
   end loop;

Since bounds do not slide Last stays valid for all array slices.

That's cool. So the call passes to Read a 'virtual' array, aka a view of
part of an array, and Last is /output/ from the call? Presumably the
array is made into a view (rather than an actual array) by means of the "aliased" keyword. Is that correct?

Since Last is output from Read why do you set it before the loop starts?

If there's no more data does Read throw an exception?

It's interesting that the array is termed an /out/ parameter even though
only part of it might be overwritten!

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Fri Nov 18 21:46:44 2022

On 2022-11-18 20:19, James Harris wrote:

On 15/11/2022 12:14, Dmitry A. Kazakov wrote:

On 2022-11-15 12:44, James Harris wrote:

Do you also believe that the Unix

   bytes = read(fd, &buf[1], reqd);

should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??

That is simple. Ada's standard library has it:

    procedure Read
              ( Stream : in out Root_Stream_Type;
                 Item   : out Stream_Element_Array;
                 Last   : out Stream_Element_Offset
              ) is abstract;

Item is an array:

type Stream_Element_Array is
    array (Stream_Element_Offset range <>) of aliased Stream_Element;

It is also a "virtual" operation in C++ terms to be overridden by new
implementation of stream. Last is the index of the last element read.
Notice non-sliding bounds, as you can do this:

    Last := Buff'First - 1;
    loop
       Read (S, Buff (Last + 1..Buff'Last), Last); -- Non-blocking chunk
       exit when Last = Buff'Last;                 -- Done
    end loop;

Since bounds do not slide Last stays valid for all array slices.

That's cool. So the call passes to Read a 'virtual' array, aka a view of
part of an array, and Last is /output/ from the call? Presumably the
array is made into a view (rather than an actual array) by means of the "aliased" keyword. Is that correct?

Aliased is only needed for pointers. Stream_Element_Array has aliased
elements for interoperability with the OS. It means that you can take a
pointer to any array element and pass it down to our beloved C's fread (:-))

Since Last is output from Read why do you set it before the loop starts?

Because it moves from Buff'First - 1 to Buff'Last as we read the stream
into Buff.

If there's no more data does Read throw an exception?

No, it returns available data. So the above is busy polling when S is non-blocking. If you need non-busy interface you must have an event you
could wait for and reset (without falling into a race condition).
Usually, this stuff is kept inside the implementation of the stream.
E.g. S keeps an event, or creates one per each task/thread in the
extreme case of shared I/O. Read returns available data or waits for the
event which is reset atomically with getting buffered data.

It's interesting that the array is termed an /out/ parameter even though
only part of it might be overwritten!

There is no much difference between out and in out arrays. Basically out
means that the callee won't expect anything in the array. Note that the
bounds (and other constraints) cannot be changed. That is the key
difference between out-argument and a result. E.g. you cannot mutate an argument into something else.

You could try to refine I/O modes (views) like in, out, in out. E.g. the
file system and databases support blocking of portions of data. The corresponding I/O mode would be when some parts of the array are in the
in mode other parts are in the out or in out mode. You could consider
extending that on structures and other containers, of course.

I have no idea how to express that on the language level, but you get
the idea. Type algebra is an exciting subject. Alas, nobody pays any
attention these days.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Fri Nov 18 21:57:31 2022

On 2022-11-18 20:01, James Harris wrote:

On 15/11/2022 21:40, David Brown wrote:

Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters.
Let's treat them the same". Now people can name a function "123", or
"2+2". It's conceivable that you'd work out a grammar and parsing
rules that allow that (Forth, for example, has no problem with
functions that are named by digits. You can redefine "2" to mean "1"
if you like). Do you think that would make the language easier to
learn and less awkward to use?

Certainly not. Why do you ask?

Well, let me intervene. Actually early languages played with such ideas.
It is worth to mention Forth, Lisp, TeX, all sorts of preprocessors etc.
That time the idea that you could "program" the language syntax was
very popular.

Then evolution of languages took a different turn. Polymorphism was
achieved through decomposition (first procedural, then OO, relational, functional) keeping the language syntax stable.

(Software reuse choked the ugly spawn. If each programmer created his
own new language reuse would not be possible)

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Sat Nov 19 16:05:15 2022

On 17/11/2022 11:24, Bart wrote:

...

If I wanted to display UTF8 right now on Windows, say from a C program
even, I would have to fight it. If I write this (created with Notepad):

#include <stdio.h>
int main(void) {
printf("€°£");
}

and compile with gcc, it shows:

Γé¼┬░┬ú

I'm not sure what code page it's on, but if I switch to 65001 which is supposed to be UTF8, then it shows:

��

(or equivalent in the terminal font). If I dump the C source, it does
indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC code
for the Euro sign.

I'm sure that on Linux it works perfectly within a terminal window. But
I'm on Windows and I can't be bothered to do battle. Even if /I/ get it
to work, I can't guarantee it for anyone else.

I presume you piped the output into hd or xxd to see exactly what was
being sent to the terminal - and hopefully prove it was the correct UTF-8.

I can't comment on how to get Windows to display Unicode chars from
UTF-8. I found Windows terminal drivers (VT-102 etc) often failed to
produce the correct output whereas Unix terminal emulation for the same terminals looked perfect.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Sat Nov 19 16:20:47 2022

On 19/11/2022 16:05, James Harris wrote:

On 17/11/2022 11:24, Bart wrote:

...

If I wanted to display UTF8 right now on Windows, say from a C program
even, I would have to fight it. If I write this (created with Notepad):

   #include <stdio.h>
   int main(void) {
       printf("€°£");
   }

and compile with gcc, it shows:

Γé¼┬░┬ú

I'm not sure what code page it's on, but if I switch to 65001 which is
supposed to be UTF8, then it shows:

��

(or equivalent in the terminal font). If I dump the C source, it does
indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC
code for the Euro sign.

I'm sure that on Linux it works perfectly within a terminal window.
But I'm on Windows and I can't be bothered to do battle. Even if /I/
get it to work, I can't guarantee it for anyone else.

I presume you piped the output into hd or xxd to see exactly what was
being sent to the terminal - and hopefully prove it was the correct UTF-8.

Well, gcc using puts, or bcc/tcc using puts or prints, work correcly.

For some reason gcc+printf doesn't deal with it properly. I assumed
those characters were raw UTF8 bytes, and redirecting it now to a file,
that is exactly what I get.

gcc+printf is bypassing something it shouldn't.

My language supports it too (but needs external tools to get that
Unicode text into my source files as UTF8), even when translated to C
and passed through gcc. Here however, the C uses its own printf declaration.

Going back to the C, replacing #include <stdio.h> with:

extern int printf(const char*, ...);

makes it work with gcc too.

So the main failure is associated with gcc. But there are other issues,
such as ensuring the correct code pages, making it work with graphical
output, and other failures which I'm sure will come up. I don't want the headache.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Sat Nov 19 19:47:05 2022

On 19/11/2022 16:20, Bart wrote:

On 19/11/2022 16:05, James Harris wrote:

On 17/11/2022 11:24, Bart wrote:

...

If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created with
Notepad):

   #include <stdio.h>
   int main(void) {
       printf("€°£");
   }

and compile with gcc, it shows:

Γé¼┬░┬ú

I'm not sure what code page it's on, but if I switch to 65001 which
is supposed to be UTF8, then it shows:

��

(or equivalent in the terminal font). If I dump the C source, it does
indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC
code for the Euro sign.

I'm sure that on Linux it works perfectly within a terminal window.
But I'm on Windows and I can't be bothered to do battle. Even if /I/
get it to work, I can't guarantee it for anyone else.

I presume you piped the output into hd or xxd to see exactly what was
being sent to the terminal - and hopefully prove it was the correct
UTF-8.

Well, gcc using puts, or bcc/tcc using puts or prints, work correcly.

For some reason gcc+printf doesn't deal with it properly. I assumed
those characters were raw UTF8 bytes, and redirecting it now to a file,
that is exactly what I get.

gcc+printf is bypassing something it shouldn't.

Is printf supposed to handle non-ASCII strings? Remember that it is
expected to interpret the first string whereas puts isn't.

Actually, including UTF8 in any simple string sounds dodgy. As an
example, imagine an embedded byte value of 0x80 on a 1s complement
machine. It would likely terminate the string.

IOW I wouldn't expect any of this stuff to work portably.

And as I said before, source should be pure ASCII....!

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Sat Nov 19 20:23:04 2022

On 17/11/2022 13:20, Dmitry A. Kazakov wrote:

On 2022-11-17 13:35, Bart wrote:

On 17/11/2022 12:12, Dmitry A. Kazakov wrote:

On 2022-11-17 12:24, Bart wrote:

If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):

   #include <stdio.h>
   int main(void) {
       printf("€°£");
   }

...

In CMD:

;CHCP 65001

Active code page: 65001

;main.exe

€°£

Of course, you could use the code you wrote under the condition that
both the editor and the compiler use UTF-8.

The point about UTF8 is that it doesn't matter. So the string contains
'character' E2; in C, this is just a byte array, it should just pass
it as it is to the printf function.

...

That would work, but is also completely impractical for large amounts
of non-ASCII content. Or even small amounts. You /need/ editor
support. I don't have it and don't do enough with Unicode to make it
worth the trouble.

That's is another guideline topic: you never ever place localization
stuff in the source code.

I was going to say the same. Where Bart says that typing "large amounts
of non-ASCII content" needs editor support I would go the other way. In
source code use pure ASCII, and ship with files which map the source to different locales.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Sat Nov 19 20:20:33 2022

On 19/11/2022 19:47, James Harris wrote:

On 19/11/2022 16:20, Bart wrote:

On 19/11/2022 16:05, James Harris wrote:

On 17/11/2022 11:24, Bart wrote:

...

If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):

   #include <stdio.h>
   int main(void) {
       printf("€°£");
   }

and compile with gcc, it shows:

Γé¼┬░┬ú

I'm not sure what code page it's on, but if I switch to 65001 which
is supposed to be UTF8, then it shows:

��

(or equivalent in the terminal font). If I dump the C source, it
does indeed contain the E2 82 AC sequence which is the UTF8 for the
20AC code for the Euro sign.

I'm sure that on Linux it works perfectly within a terminal window.
But I'm on Windows and I can't be bothered to do battle. Even if /I/
get it to work, I can't guarantee it for anyone else.

I presume you piped the output into hd or xxd to see exactly what was
being sent to the terminal - and hopefully prove it was the correct
UTF-8.

Well, gcc using puts, or bcc/tcc using puts or prints, work correcly.

For some reason gcc+printf doesn't deal with it properly. I assumed
those characters were raw UTF8 bytes, and redirecting it now to a
file, that is exactly what I get.

gcc+printf is bypassing something it shouldn't.

Is printf supposed to handle non-ASCII strings? Remember that it is
expected to interpret the first string whereas puts isn't.

The only characters printf formats care about are '%', which indicates
the start of a format sequence, and 0, which indicates the end of the
string.

Besides, all the other printf versions I tried worked (and it works on
Linux).

Window + gcc + printf didn't, but I don't think it got as far as calling
a normal printf; doubtless it's doing something too clever. The relevant characters get to the output, but somehow bypass the part where the OS
console driver converts the character stream to Unicode.

Actually, including UTF8 in any simple string sounds dodgy. As an
example, imagine an embedded byte value of 0x80 on a 1s complement
machine. It would likely terminate the string.

What machines use 1s complement these days? You might as well worry
about those using 7-bit characters! Or EBCDIC.

UTF8 was designed to be transparent to anything processing 8-bit
strings. On ones completed it presumably wouldn't work, unless
characters were wider than 8 bits.

(Don't tell me you're avoiding the use of UTF8 for that reason. For
anyone still using ones complement, probably ASCII would be too advanced
as they're still using 5-bit telegraph codes!)

IOW I wouldn't expect any of this stuff to work portably.

And as I said before, source should be pure ASCII....!

Sure. But this is mostly about data within programs which can be anything.

Restricting source code means you can't have Unicode content in
comments, or inside string constants.

The means not being able to have "°", you'd need an escape sequence. Or convert any pasted Unicode string into such a string.

This is unreasonable considering that providing such support within a
compiler requires pretty much zero effort.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Sat Nov 19 20:51:39 2022

On 19/11/2022 20:20, Bart wrote:

On 19/11/2022 19:47, James Harris wrote:

On 19/11/2022 16:20, Bart wrote:

...

For some reason gcc+printf doesn't deal with it properly. I assumed
those characters were raw UTF8 bytes, and redirecting it now to a
file, that is exactly what I get.

gcc+printf is bypassing something it shouldn't.

Is printf supposed to handle non-ASCII strings? Remember that it is
expected to interpret the first string whereas puts isn't.

The only characters printf formats care about are '%', which indicates
the start of a format sequence, and 0, which indicates the end of the
string.

Well, is printf /specified/ to accept all other 8-bit codes? If not, you
may find printf working in some cases but not in others.

Besides, all the other printf versions I tried worked (and it works on Linux).

I'm sure you know that "it works here" is irrelevant because C's
specification includes much IB and UB. One could /very/ easily find some
code doing what one wants in one environment and not doing what one
wants in another. The effect of UB could even vary depending on whether
it was Tuesday or not. Seriously, we need to keep well away from
anything in C which is not defined.

IOW the fact that something works when tried is *meaningless* if it's
contrary to the specs.

...

Actually, including UTF8 in any simple string sounds dodgy. As an
example, imagine an embedded byte value of 0x80 on a 1s complement
machine. It would likely terminate the string.

What machines use 1s complement these days? You might as well worry
about those using 7-bit characters! Or EBCDIC.

As I say, if you write non-specified C code it may happen to work on one machine but that's no guarantee it will work on another.

UTF8 was designed to be transparent to anything processing 8-bit
strings. On ones completed it presumably wouldn't work, unless
characters were wider than 8 bits.

Well, UTF8 on a 1's complement machine embedded in a string which the
runtime expects to be ASCII sounds like a recipe for trouble. Even
characters having a set the top bit may lead to problems with char
signedness on a 2s complement machine.

(Don't tell me you're avoiding the use of UTF8 for that reason. For
anyone still using ones complement, probably ASCII would be too advanced
as they're still using 5-bit telegraph codes!)

Baudot rules :-)

In fact, I'd use UTF8 in auxiliary files but not in the source.

IOW I wouldn't expect any of this stuff to work portably.

And as I said before, source should be pure ASCII....!

Sure. But this is mostly about data within programs which can be anything.

That's OK. Data read would be pure bits and octets. It's when something
tries to interpret those bits that problems can arise - e.g. to_upper
... and the printf control string.

Restricting source code means you can't have Unicode content in
comments, or inside string constants.

Exactly.

The means not being able to have "°", you'd need an escape sequence. Or convert any pasted Unicode string into such a string.

Not quite. If that char (which appears to be a degree character) varied
between locales it could go in an aux file. If it was universal,
however, the source could include its name with such as

"\degree/"

This is unreasonable considering that providing such support within a compiler requires pretty much zero effort.

This is not to help the compiler or the compiler writer. The idea of restricting source to a lingua franca is to help teams of humans
collaborate on the source even if they are from different locales.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Sat Nov 19 21:13:50 2022

On 19/11/2022 20:20, Bart wrote:

On 19/11/2022 19:47, James Harris wrote:

On 19/11/2022 16:20, Bart wrote:

On 19/11/2022 16:05, James Harris wrote:

On 17/11/2022 11:24, Bart wrote:

...

If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):

   #include <stdio.h>
   int main(void) {
       printf("€°£");
   }

and compile with gcc, it shows:

Γé¼┬░┬ú

...

gcc+printf is bypassing something it shouldn't.

Is printf supposed to handle non-ASCII strings? Remember that it is
expected to interpret the first string whereas puts isn't.

The only characters printf formats care about are '%', which indicates
the start of a format sequence, and 0, which indicates the end of the
string.

Besides, all the other printf versions I tried worked (and it works on Linux).

As I said in the other reply, if this is contrary to the specification
then this stuff is poisonous. But curious to see what would happen in my particular environment I tried your code. The source file had your
string between the quotes as

e2 82 ac c2 b0 c2 a3

What is that? UTF8?

When run, also on Unix, it outputs the same bytes.

$ ./a.out | hd
00000000 e2 82 ac c2 b0 c2 a3

Hence no change for this specific test.

It's 7 bytes, though, and when you ran it on Windows you said you got

Γé¼┬░┬ú

which is also 7 characters. At a guess, were those chars from the
codepage your Windows terminal was using (rather than it interpreting UTF8)?

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From luserdroog@21:1/5 to James Harris on Sat Nov 19 19:46:19 2022

On Saturday, November 19, 2022 at 3:13:52 PM UTC-6, James Harris wrote:

On 19/11/2022 20:20, Bart wrote:

Besides, all the other printf versions I tried worked (and it works on Linux).

As I said in the other reply, if this is contrary to the specification
then this stuff is poisonous. But curious to see what would happen in my particular environment I tried your code. The source file had your
string between the quotes as

e2 82 ac c2 b0 c2 a3

What is that? UTF8?

Yep. That's UTF-8. With a little practice it's pretty easy to parse, or at least to find
the character code boundaries. The first byte starts with some number of ones in the most significant position, from 0 to 4 (or maybe 6 in rare applications).
That number tells you the length of the encoding for that character.

So e2 is the start of a 3 byte code. And c2 is the start of a 2 byte code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Sun Nov 20 13:28:56 2022

On 18/11/2022 20:01, James Harris wrote:

On 15/11/2022 21:40, David Brown wrote:

On 15/11/2022 20:09, James Harris wrote:

On 15/11/2022 17:31, David Brown wrote:

On 15/11/2022 17:58, James Harris wrote:

...

The question is not whether prevention would be possible but whether
you (i.e. DB) would consider it /advisable/. If you prevented it then
a lot of familiar programming patterns and a number of existing APIs
would become unavailable to you so choose wisely...! :-)

I am not the language designer here

Uh, huh.

- and I still don't really grok what kind of language /you/ want, what
you understand from before, what uses it should have, or what you
think is wrong with existing languages. (Or maybe this is all for fun
and interest, which is always the best reason for doing anything.)
That makes it hard to give recommendations.

...

You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something,
allow it. If you want to prohibit it, prohibit it.

Sorry, but it doesn't work like that.

Yes, it does.

No, it does not. Your view of language design is far too simplistic.
Note, also, that in a few paragraphs you say that you are not the
language designer whereas I am, but then you go on to try to tell me how
it works and how it doesn't and, previously, that anything can be done.
You'd gain by /trying/ it yourself. They you might see that it's not as straightforward as you suggest.

That is a fair point. But I challenge you to show me where there are
rules written for language designs. Explain to me exactly why you are
not allowed to, say, provide an operator "-" without a corresponding
operator "+". Tell me who is banning you from deciding that source code
lines must be limited to 40 characters, or that every assignment
statement shall be preceded by the keyword "please". I'm not saying any
of these things are a good idea (though something similar has been done
in other cases), I am saying it is /your/ choice to do that or not.

You can say "I can't have feature A and feature B and maintain the
consistency I want." You /cannot/ say "I can't have feature A". It is
/your/ decision not have feature A. Choosing to have it may mean
changing or removing feature B, or losing some consistency that you had
hoped to maintain. But it is your language, your choices, your
responsibility - saying "I can't do that" is abdicating that responsibility.

A language cannot be built on ad-hoc choices such as you have suggested.

It most certainly can. Every language is a collection of design
decisions, and most of them are at least somewhat ad-hoc.

However, my suggestions where certainly /not/ ad-hoc - it was for a
particular way of thinking about operators and expressions, with
justification and an explanation of the benefits. Whether you choose to
follow those suggestions or not, is a matter of your personal choices
for how you want your language to work - and /that/ choice is therefore somewhat ad-hoc. They only appear ad-hoc if you don't understand what I
wrote justifying them or giving their advantages.

Of course you want a language to follow a certain theme or style (or
"ethos", as you called it). But that does not mean you can't make
ad-hoc decisions if you want - it is inevitable that you will do so.
And it certainly does not mean you can't make the choices you want for
your language.

Too many ad-hoc choices mean you loose the logic and consistency in the language. Too few, and your language has nothing to it. Excessive
consistency is great for some theoretical work - Turing machines,
lambda calculus, infinite register machines, and the like. It is
useless in a real language.

Look at C as an example. Not everyone likes the language, and the only
people who find nothing to dislike in it are people to haven't used it
enough. But it is undoubtedly a highly successful language. All binary operators require the evaluation of both operands before evaluating the operator. (And before you start thinking that is unavoidable, it is
not, and does not apply to all languages.) Except && and ||, where the
second operand is not evaluated if it is not needed - that's an ad-hoc decision, different from the general rule. All access to objects must
be through lvalues of compatible types - except for the ad-hoc rule that character type pointers can also be used.

To be successful at anything - program language design or anything else
- you always need to aim for a balance. Consistency is vital - too much consistency is bad. Generalisation is good - over-generalisation is
bad. Too much ad-hoc is bad, so is too little.

I haven't suggested ad-hoc choices. I have tried to make reasoned
suggestions. Being different from languages you have used before, or
how you envision your new language, does not make them ad-hoc.

Saying you'd like selected combinations of operators to be banned looks
like an ad-hoc approach to me.

Then you misunderstand what I wrote. I don't know if that was my fault
in poor explanations, or your fault in misreading or misunderstanding -
no doubt, it was a combination.

...

BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward
to use.

So do over-generalisations.

Not really.

Yes, really.

You simply repeating phrases back to me but in the negative does not
make your assertions correct or mine wrong.

That's why I gave the example below...

Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters.
Let's treat them the same". Now people can name a function "123", or
"2+2". It's conceivable that you'd work out a grammar and parsing
rules that allow that (Forth, for example, has no problem with
functions that are named by digits. You can redefine "2" to mean "1"
if you like). Do you think that would make the language easier to
learn and less awkward to use?

Certainly not. Why do you ask?

I ask, because it is an example of over-generalisation that makes a
language harder to learn and potentially a lot more confusing to understand.

It's ad-hoc rules which become burdensome.

Agreed.

Phew!

...

Seriously, try designing a language, yourself. You don't have to
implement it. Just try coming up with a cohesive design of something
you would like to program in.

If I had the time... :-)

I fully appreciate that this is not an easy task.

I'm sure you do but your view of the details is superficial. You have
ideas which are interesting in themselves but you don't appear to
appreciate how decisions bear on each other when you have to bring
hundreds of them together.

Oh, I do appreciate that. As I have said all along, I am not giving recommendations or claiming that my suggestions are the only way to do
things. It is all up to /you/.

...

Bart came up with an example something like

+(+(+(+ x)))

That's not at all sensible. You want that banned, too?

Yes :-) Seriously, I appreciate that there will always be compromises
- trying to ban everything silly while allowing everything sensible
would mean countless ad-hoc rules, and you are right to reject that.
I am advocating drawing a line, just like you - the difference is
merely a matter of where to draw that line. I'd draw the line so that
it throws out the increment and decrement operators entirely. But if
you really wanted to keep them, I'd make them postfix only and as
statements, not in expressions - let "x++" mean "x += 1" which means
"x = 1" which should, IMHO, be a statement and not allowed inside an
expression.

This does, indeed, in a sense, come down to where the designer decides
to draw the line. Unfortunately there is no simple line.

Agreed.

For example, you spoke about banning side effects in expressions. For
sure, you could do that. But then you thought side effects in function
calls in expressions should possibly be treated differently and be left
in! Making such rules is not as simple as it may appear.

Agreed. That does not mean rules cannot be made.

I am making suggestions and throwing up ideas - I am not designing the language. There are many ways you can make this all work. Banning side-effects from expressions has many positive benefits in a language -
I've discussed several. (I haven't even mentioned one of the real big
ones, which is how much easier and safer it makes parallel computation
and multi-threading.) It also has complications, and it also limits
what programmers can do. It's a trade-off, like most choices, and you
have to decide if the benefits are worth the cost.

(It's worth noting again that the real power of a programming language
comes from what you cannot do, not from what you /can/ do.)

All the
decisions a language designer makes have a tendency to bear on each
other, even if only in the ethos of the final language: whether it's
simple and cohesive or ad hoc and confusing.

They all have a bearing on each other, yes. And significant changes in
one place may encourage changes in other places. But you are setting up
a false dichotomy here - perhaps because you are not willing to consider
making these other changes. (And if you don't want to, that's fine - I
don't have any stake in how your language turns out. I just don't want
you to miss out on possibilities because you think you /can't/ have
them, rather than because you don't /want/ them.)

Further, remember that the decisions the language designer makes have to
be communicated to the programmer. If a designer says "these side
effects are allowed but these other ones are not" then that just gives
the programmer more to learn and remember.

Sure. But programmers are not stupid (or at least, you are not catering
for stupid programmers). They can learn more than one rule.

As I say, you could try designing a language. You are a smart guy. You
could work on a design in your head while walking to the shops, while
waiting for a train, etc. As one of my books on language design says,
"design repeatedly: it will make you a better designer".

Oh, I have plenty of ideas for a language - I have no end to the number
of languages, OS's, processors, and whatever that I have "designed" in
my head :-) The devil's in the details, however, and I haven't taken
the time for that!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to David Brown on Sun Nov 20 16:08:27 2022

On 20/11/2022 12:28, David Brown wrote:

Look at C as an example. Not everyone likes the language, and the only people who find nothing to dislike in it are people to haven't used it enough.

I disliked C at first glance having never used it. But I loved Algol68,
having never used that either.

But given a task now and the choice was between those two languages, I
would choose C, mainly because some design choices of Algol68 syntax
make writing code more painful than in C. (Ahead of both would be one of
my two, by a mile.)

However I can admire Algol68 for its design, even if it needed tweaking
IMO, but I would never be able to do that for C, since a lot of it looks
like it was thrown together with no thought at all, or under the
influence of some substance.

But it is undoubtedly a highly successful language.

On the back of Unix inflicting it on everybody (can anyone prise Unix
and C apart?), and the lack of viable alternatives.

Successful languages, then, needed to be able to bend the rules a
little, do underhand stuff, which C could do in spades (so could mine!).
You can't really do that with a Wirth language or ones like Algol68.
Ones like PL/M disappeared.

Now people look askance at such practices, but C already had its foot in
the door.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Mon Nov 21 16:01:40 2022

On 18/11/2022 21:14, James Harris wrote:

On 18/11/2022 11:00, David Brown wrote:

On 15/11/2022 18:32, James Harris wrote:

...

The side effects of even something awkward such as

   *(++p) = *(q++);

are little different from those of the longer version

   p = p + 1;
   *p = *q;
   q = q + 1;

The former is clearer, however. That makes it easier to see the intent..

Really? I have no idea what the programmer's intent was. "*p++ =
*q++;" is common enough that the intent is clear there, but from your
first code I can't see /why/ the programmer wanted to /pre/increment
"p". Maybe he/she made a mistake? Maybe he/she doesn't really
understand the difference between pre-increment and post-increment?
It's a common beginners misunderstanding.

I don't think I know of any language which allows a programmer to say
/why/ something is the case; that's what comments are for. Programs
normally talk about /what/ to do, not why. The very fact that the
assignment does something non-idiomatic is a sign that a comment could
be useful. It's akin to

for (i = 0; i <= n ....

If the test really should be <= then a comment may be useful to explain
why.

Ideally there should be no need for a comment, because the code makes it
clear - for example via the names of the identifiers, or from the rest
of the context. That rarely happens in out-of-context snippets.

On the other hand, it is quite clear from the separate lines exactly
what order the programmer intended.

What would you say are the differences in side-effects of these two
code snippets? (I'm assuming we are talking about C here.)

That depends on whether the operations are ordered or not. In C they'd
be different, potentially, from what they would be in my language. What
would you say they are?

You said the side-effects are "a little different", so I wanted to hear
what you meant.

In C, there is no pre-determined sequencing between the two increments -
they can occur in any order, or can be interleaved. As far as the C
abstract machine is concerned (and that's what determines what
side-effects mean), unsequenced events are not ordered and it doesn't
make sense to say which happened first. You can consider them as
happening at the same time - and if that affects the outcome of the
program, then it is at least unspecified behaviour if not undefined
behaviour. (It would be undefined behaviour if "p" and "q" referred to
the same object, for example.)

So I don't think it really makes sense to say that the order is
different. If the original "*(++p) = *(q++);" makes sense at all, and
is defined behaviour, then it's behaviour is not distinguishable from
within the C language from the expanded version.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Wed Nov 23 16:59:53 2022

On 21/11/2022 15:01, David Brown wrote:

On 18/11/2022 21:14, James Harris wrote:

On 18/11/2022 11:00, David Brown wrote:

On 15/11/2022 18:32, James Harris wrote:

...

The side effects of even something awkward such as

   *(++p) = *(q++);

are little different from those of the longer version

   p = p + 1;
   *p = *q;
   q = q + 1;

The former is clearer, however. That makes it easier to see the
intent..

Really? I have no idea what the programmer's intent was. "*p++ =
*q++;" is common enough that the intent is clear there, but from your
first code I can't see /why/ the programmer wanted to /pre/increment
"p". Maybe he/she made a mistake? Maybe he/she doesn't really
understand the difference between pre-increment and post-increment?
It's a common beginners misunderstanding.

I don't think I know of any language which allows a programmer to say
/why/ something is the case; that's what comments are for. Programs
normally talk about /what/ to do, not why. The very fact that the
assignment does something non-idiomatic is a sign that a comment could
be useful. It's akin to

   for (i = 0; i <= n ....

If the test really should be <= then a comment may be useful to
explain why.

Ideally there should be no need for a comment, because the code makes it clear - for example via the names of the identifiers, or from the rest
of the context. That rarely happens in out-of-context snippets.

Either way, non-idiomatic code is a flag. And in that it's useful -
especially if its easy to read.

On the other hand, it is quite clear from the separate lines exactly
what order the programmer intended.

What would you say are the differences in side-effects of these two
code snippets? (I'm assuming we are talking about C here.)

That depends on whether the operations are ordered or not. In C they'd
be different, potentially, from what they would be in my language.
What would you say they are?

You said the side-effects are "a little different", so I wanted to hear
what you meant.

I said they were "little different", not "a little different". In other
words, focus on the main point rather than minutiae such as what could
happen if the pointers were identical or overlapped, much as you go on
to mention:

In C, there is no pre-determined sequencing between the two increments -
they can occur in any order, or can be interleaved. As far as the C abstract machine is concerned (and that's what determines what
side-effects mean), unsequenced events are not ordered and it doesn't
make sense to say which happened first. You can consider them as
happening at the same time - and if that affects the outcome of the
program, then it is at least unspecified behaviour if not undefined behaviour. (It would be undefined behaviour if "p" and "q" referred to
the same object, for example.)

So I don't think it really makes sense to say that the order is
different. If the original "*(++p) = *(q++);" makes sense at all, and
is defined behaviour, then it's behaviour is not distinguishable from
within the C language from the expanded version.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Wed Nov 23 19:06:06 2022

On 23/11/2022 17:59, James Harris wrote:

On 21/11/2022 15:01, David Brown wrote:

On 18/11/2022 21:14, James Harris wrote:

Either way, non-idiomatic code is a flag. And in that it's useful - especially if its easy to read.

Yes.

On the other hand, it is quite clear from the separate lines exactly
what order the programmer intended.

What would you say are the differences in side-effects of these two
code snippets? (I'm assuming we are talking about C here.)

That depends on whether the operations are ordered or not. In C
they'd be different, potentially, from what they would be in my
language. What would you say they are?

You said the side-effects are "a little different", so I wanted to
hear what you meant.

I said they were "little different", not "a little different".

Ah, my mistake. Still, it implies you think there is /some/ difference.

In other
words, focus on the main point rather than minutiae such as what could
happen if the pointers were identical or overlapped, much as you go on
to mention:

OK, so you don't think there is any differences in side-effect other
than the possible issue I mentioned of undefined behaviour in very
particular circumstances. That's fine - I just wanted to know if you
were thinking of something else.

(Note that the freedom for compilers to re-arrange code from the
"compact" form to the "expanded" form is one of the reasons why such unsequenced accesses to the same object are undefined behaviour in C.)

In C, there is no pre-determined sequencing between the two increments
- they can occur in any order, or can be interleaved. As far as the C
abstract machine is concerned (and that's what determines what
side-effects mean), unsequenced events are not ordered and it doesn't
make sense to say which happened first. You can consider them as
happening at the same time - and if that affects the outcome of the
program, then it is at least unspecified behaviour if not undefined
behaviour. (It would be undefined behaviour if "p" and "q" referred
to the same object, for example.)

So I don't think it really makes sense to say that the order is
different. If the original "*(++p) = *(q++);" makes sense at all, and
is defined behaviour, then it's behaviour is not distinguishable from
within the C language from the expanded version.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Wed Nov 23 18:31:27 2022

On 20/11/2022 12:28, David Brown wrote:

On 18/11/2022 20:01, James Harris wrote:

On 15/11/2022 21:40, David Brown wrote:

On 15/11/2022 20:09, James Harris wrote:

On 15/11/2022 17:31, David Brown wrote:

...

You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something, >>>>> allow it. If you want to prohibit it, prohibit it.

Sorry, but it doesn't work like that.

Yes, it does.

No, it does not. Your view of language design is far too simplistic.
Note, also, that in a few paragraphs you say that you are not the
language designer whereas I am, but then you go on to try to tell me
how it works and how it doesn't and, previously, that anything can be
done. You'd gain by /trying/ it yourself. They you might see that it's
not as straightforward as you suggest.

That is a fair point. But I challenge you to show me where there are
rules written for language designs. Explain to me exactly why you are
not allowed to, say, provide an operator "-" without a corresponding
operator "+". Tell me who is banning you from deciding that source code lines must be limited to 40 characters, or that every assignment
statement shall be preceded by the keyword "please". I'm not saying any
of these things are a good idea (though something similar has been done
in other cases), I am saying it is /your/ choice to do that or not.

You can say "I can't have feature A and feature B and maintain the consistency I want." You /cannot/ say "I can't have feature A". It is /your/ decision not have feature A. Choosing to have it may mean
changing or removing feature B, or losing some consistency that you had
hoped to maintain. But it is your language, your choices, your responsibility - saying "I can't do that" is abdicating that
responsibility.

Well, your comments have let me know what you mean, at least, but when I
say "it doesn't work like that" I mean that language design is not as
simple as you suggest. In absolute terms I agree with you: you are right
that a designer can make any decisions he wants. But in reality certain
things are /infeasible/. You might as well say you could get from your
house to the nearest supermarket by flying to another country first. In absolute terms you probably could do that and eventually get where you
want to go but in reality it's so absurd a suggestion that it's infeasible.

A language cannot be built on ad-hoc choices such as you have
suggested.

It most certainly can. Every language is a collection of design
decisions, and most of them are at least somewhat ad-hoc.

However, my suggestions where certainly /not/ ad-hoc

Hmm, you suggested banning side effects, except in function calls, and
banning successive prefix "+" operators. Those suggestions seem rather
ad hoc to me.

- it was for a
particular way of thinking about operators and expressions, with justification and an explanation of the benefits. Whether you choose to follow those suggestions or not, is a matter of your personal choices
for how you want your language to work - and /that/ choice is therefore somewhat ad-hoc. They only appear ad-hoc if you don't understand what I wrote justifying them or giving their advantages.

True, if there is a legitimate and useful reason for a rule then that
rule will seem less ad hoc than if the reasons for it are unknown.

Of course you want a language to follow a certain theme or style (or
"ethos", as you called it). But that does not mean you can't make
ad-hoc decisions if you want - it is inevitable that you will do so. And
it certainly does not mean you can't make the choices you want for your language.

Too many ad-hoc choices mean you loose the logic and consistency in the language. Too few, and your language has nothing to it. Excessive consistency is great for some theoretical work - Turing machines,
lambda calculus, infinite register machines, and the like. It is
useless in a real language.

Look at C as an example. Not everyone likes the language, and the only people who find nothing to dislike in it are people to haven't used it enough. But it is undoubtedly a highly successful language. All binary operators require the evaluation of both operands before evaluating the operator. (And before you start thinking that is unavoidable, it is
not, and does not apply to all languages.) Except && and ||, where the second operand is not evaluated if it is not needed - that's an ad-hoc decision, different from the general rule. All access to objects must
be through lvalues of compatible types - except for the ad-hoc rule that character type pointers can also be used.

To be successful at anything - program language design or anything else
- you always need to aim for a balance. Consistency is vital - too much consistency is bad. Generalisation is good - over-generalisation is
bad. Too much ad-hoc is bad, so is too little.

Fair enough. Short-circuit evaluation is a good example of what you have
been saying, although it effects a semantic change. By contrast, banning
prefix "+" operators because you don't like them does not effect any
useful change in the semantics of a program.

I haven't suggested ad-hoc choices. I have tried to make reasoned
suggestions. Being different from languages you have used before, or
how you envision your new language, does not make them ad-hoc.

Saying you'd like selected combinations of operators to be banned
looks like an ad-hoc approach to me.

Then you misunderstand what I wrote. I don't know if that was my fault
in poor explanations, or your fault in misreading or misunderstanding -
no doubt, it was a combination.

Maybe. I thought you wanted ++E++ banned because it had successive ++
operators but perhaps I misunderstood. Was what you actually wanted
banned /any/ use of ++ operators? If the language /is/ to have ++
operators after all, though, would you still want ++E++ banned?

...

Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters.
Let's treat them the same". Now people can name a function "123", or
"2+2". It's conceivable that you'd work out a grammar and parsing
rules that allow that (Forth, for example, has no problem with
functions that are named by digits. You can redefine "2" to mean "1"
if you like). Do you think that would make the language easier to
learn and less awkward to use?

Certainly not. Why do you ask?

I ask, because it is an example of over-generalisation that makes a
language harder to learn and potentially a lot more confusing to
understand.

I don't see any lack of generalisation in setting out rules for
identifier names.

...

[Snipped a bunch of points on which we agree.]

Further, remember that the decisions the language designer makes have
to be communicated to the programmer. If a designer says "these side
effects are allowed but these other ones are not" then that just gives
the programmer more to learn and remember.

Sure. But programmers are not stupid (or at least, you are not catering
for stupid programmers). They can learn more than one rule.

You are rather changing your tune, there. Earlier you were concerned
about programmers failing to understand the difference between
pre-increment and post-increment!

As I say, you could try designing a language. You are a smart guy. You
could work on a design in your head while walking to the shops, while
waiting for a train, etc. As one of my books on language design says,
"design repeatedly: it will make you a better designer".

Oh, I have plenty of ideas for a language - I have no end to the number
of languages, OS's, processors, and whatever that I have "designed" in
my head :-) The devil's in the details, however, and I haven't taken
the time for that!

Yes, the devil is indeed in the details. It's one thing to have some
good ideas. It's quite another to bring them together into a single product.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to David Brown on Wed Nov 23 18:50:50 2022

On 23/11/2022 18:06, David Brown wrote:

On 23/11/2022 17:59, James Harris wrote:

On 21/11/2022 15:01, David Brown wrote:

On 18/11/2022 21:14, James Harris wrote:

Previously ===>

The side effects of even something awkward such as

*(++p) = *(q++);

are little different from those of the longer version

p = p + 1;
*p = *q;
q = q + 1;

The former is clearer, however. That makes it easier to see the intent..

What would you say are the differences in side-effects of these two
code snippets? (I'm assuming we are talking about C here.)

That depends on whether the operations are ordered or not. In C
they'd be different, potentially, from what they would be in my
language. What would you say they are?

You said the side-effects are "a little different", so I wanted to
hear what you meant.

I said they were "little different", not "a little different".

Ah, my mistake. Still, it implies you think there is /some/ difference.

I thought there was the /potential/ for a difference (and I suspect that
there is in C) but that that would distract from the point being made.

The point remains: I was saying that the former is significantly clearer
(as long as its effects are defined).

In other words, focus on the main point rather than minutiae such as
what could happen if the pointers were identical or overlapped, much
as you go on to mention:

OK, so you don't think there is any differences in side-effect other
than the possible issue I mentioned of undefined behaviour in very
particular circumstances. That's fine - I just wanted to know if you
were thinking of something else.

(Note that the freedom for compilers to re-arrange code from the
"compact" form to the "expanded" form is one of the reasons why such unsequenced accesses to the same object are undefined behaviour in C.)

Understood.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Harris on Wed Nov 23 21:33:53 2022

On 23/11/2022 19:31, James Harris wrote:

On 20/11/2022 12:28, David Brown wrote:

On 18/11/2022 20:01, James Harris wrote:

On 15/11/2022 21:40, David Brown wrote:

Well, your comments have let me know what you mean, at least, but when I
say "it doesn't work like that" I mean that language design is not as
simple as you suggest. In absolute terms I agree with you: you are right
that a designer can make any decisions he wants. But in reality certain things are /infeasible/. You might as well say you could get from your
house to the nearest supermarket by flying to another country first. In absolute terms you probably could do that and eventually get where you
want to go but in reality it's so absurd a suggestion that it's infeasible.

Sure. Not all suggestions are good in practice, and not all things that
are possible are easy or a good trade-off. I am merely saying that the decisions are yours to make, even if you feel there is only one sane way
to pick. You are still free to make the hard choice even if that means
big knock-on effects.

A language cannot be built on ad-hoc choices such as you have
suggested.

It most certainly can. Every language is a collection of design
decisions, and most of them are at least somewhat ad-hoc.

However, my suggestions where certainly /not/ ad-hoc

Hmm, you suggested banning side effects, except in function calls, and banning successive prefix "+" operators. Those suggestions seem rather
ad hoc to me.

They are not necessarily all /good/ suggestions! My point was merely
that if you don't want people to be able to write +(+(+(+x))) in your
language, you have the power to ban them if you want.

- it was for a particular way of thinking about operators and
expressions, with justification and an explanation of the benefits.
Whether you choose to follow those suggestions or not, is a matter of
your personal choices for how you want your language to work - and
/that/ choice is therefore somewhat ad-hoc. They only appear ad-hoc
if you don't understand what I wrote justifying them or giving their
advantages.

True, if there is a legitimate and useful reason for a rule then that
rule will seem less ad hoc than if the reasons for it are unknown.

Indeed. I am not recommending a chaotic language!

Of course you want a language to follow a certain theme or style (or
"ethos", as you called it). But that does not mean you can't make
ad-hoc decisions if you want - it is inevitable that you will do so.
And it certainly does not mean you can't make the choices you want for
your language.

Too many ad-hoc choices mean you loose the logic and consistency in
the language. Too few, and your language has nothing to it.
Excessive consistency is great for some theoretical work - Turing
machines, lambda calculus, infinite register machines, and the like.
It is useless in a real language.

Look at C as an example. Not everyone likes the language, and the
only people who find nothing to dislike in it are people to haven't
used it enough. But it is undoubtedly a highly successful language.
All binary operators require the evaluation of both operands before
evaluating the operator. (And before you start thinking that is
unavoidable, it is not, and does not apply to all languages.) Except
&& and ||, where the second operand is not evaluated if it is not
needed - that's an ad-hoc decision, different from the general rule.
All access to objects must be through lvalues of compatible types -
except for the ad-hoc rule that character type pointers can also be used.

To be successful at anything - program language design or anything
else - you always need to aim for a balance. Consistency is vital -
too much consistency is bad. Generalisation is good -
over-generalisation is bad. Too much ad-hoc is bad, so is too little.

Fair enough. Short-circuit evaluation is a good example of what you have
been saying, although it effects a semantic change. By contrast, banning prefix "+" operators because you don't like them does not effect any
useful change in the semantics of a program.

I haven't suggested ad-hoc choices. I have tried to make reasoned
suggestions. Being different from languages you have used before,
or how you envision your new language, does not make them ad-hoc.

Saying you'd like selected combinations of operators to be banned
looks like an ad-hoc approach to me.

Then you misunderstand what I wrote. I don't know if that was my
fault in poor explanations, or your fault in misreading or
misunderstanding - no doubt, it was a combination.

Maybe. I thought you wanted ++E++ banned because it had successive ++ operators but perhaps I misunderstood. Was what you actually wanted
banned /any/ use of ++ operators? If the language /is/ to have ++
operators after all, though, would you still want ++E++ banned?

I was suggesting banning any use of pre- and post- increment and
decrement operators. They are unnecessary in a language, and (along
with assignment operators that return values, rather than being strictly statements) they are a way of having side-effects in the middle of
expressions that otherwise look like calculations or reading data.

Unless you are aiming for a pure functional language, "side-effects" are necessary - it's how you get things done in the code. But IMHO they
should be as clear as possible, not hidden away as extras. Changes to
any object should be the main purpose of a statement or function call,
rather than a little extra feature.

Remember, the fewer places you can have side-effects - changes to an
object, or IO functionality - the more freedom the compiler has to
manipulate and optimise the code, the clearer the code is to the reader,
the safer it is from accidentally changing things, the easier it is to
be sure the code is correct, and the more the code can be thread-safe, re-entrant or run in parallel. Make every object immutable unless the programmer goes out of their way to insist that it is mutable - you can
do so much more with it if its value cannot change!

Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters.
Let's treat them the same". Now people can name a function "123",
or "2+2". It's conceivable that you'd work out a grammar and parsing
rules that allow that (Forth, for example, has no problem with
functions that are named by digits. You can redefine "2" to mean
"1" if you like). Do you think that would make the language easier
to learn and less awkward to use?

Certainly not. Why do you ask?

I ask, because it is an example of over-generalisation that makes a
language harder to learn and potentially a lot more confusing to
understand.

I don't see any lack of generalisation in setting out rules for
identifier names.

You can give nice general rules for an identifier - you can say
identifiers must start with a letter, and consist of letters, digits and underscore characters. (That's a common choice for many languages, but
not the only choice.) If you /over-generalise/, you might allow
identifiers consisting solely of digits. And that can lead to allowing confusing code - such as this example from Forth :

$ gforth
Gforth 0.7.3, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
ok
2 2 + . 4 ok
: 2 3 ; ok
2 2 + . 6 ok

Forth is so general and free that you can redefine the meaning of "2"
(or pretty much anything else). This is not a good idea.

[Snipped a bunch of points on which we agree.]

Further, remember that the decisions the language designer makes have
to be communicated to the programmer. If a designer says "these side
effects are allowed but these other ones are not" then that just
gives the programmer more to learn and remember.

Sure. But programmers are not stupid (or at least, you are not
catering for stupid programmers). They can learn more than one rule.

You are rather changing your tune, there. Earlier you were concerned
about programmers failing to understand the difference between
pre-increment and post-increment!

Sometimes smart programmers get mixed up too - especially when trying to
read code that is symbol-heavy and uses code that appears to be a common
idiom, but is subtly different.

As I say, you could try designing a language. You are a smart guy.
You could work on a design in your head while walking to the shops,
while waiting for a train, etc. As one of my books on language design
says, "design repeatedly: it will make you a better designer".

Oh, I have plenty of ideas for a language - I have no end to the
number of languages, OS's, processors, and whatever that I have
"designed" in my head :-) The devil's in the details, however, and I
haven't taken the time for that!

Yes, the devil is indeed in the details. It's one thing to have some
good ideas. It's quite another to bring them together into a single
product.

And that's assuming you can figure out which ideas are good!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Sep 16 21:01:27 2025
  from Wales, Uk via Telnet
- Bob Worm
  Tue Sep 16 15:15:42 2025
  from Wales, Uk via Telnet
- Gretchiie
  Tue Sep 16 05:20:21 2025
  from Derry, Nh via Telnet
- Ginger1
  Mon Sep 15 19:33:54 2025
  from London via SSH
- Bob Worm
  Mon Sep 15 15:42:34 2025
  from Wales, Uk via Telnet
- Gretchiie
  Mon Sep 15 05:16:29 2025
  from Derry, Nh via Telnet
- Fred Blogs
  Mon Sep 15 00:03:12 2025
  from Uk via SSH
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (3 / 13)
Uptime:	36:00:42
Calls:	10,392
Calls today:	3
Files:	14,064
Messages:	6,417,152

Dereference relative to increment and decrement operators ++ --

Who's Online

Recent Visitors

System Info