On 2022-11-07 12:55, James Harris wrote:
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.
One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something a programmer can rely on.
On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
On 2022-11-07 12:55, James Harris wrote:
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.
One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.
That's all very well but what specifically would you prohibit?
On 2022-11-07 13:52, James Harris wrote:
On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
On 2022-11-07 12:55, James Harris wrote:
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.
One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.
That's all very well but what specifically would you prohibit?
Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))
A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.
The basic question is: If ^ is a postfix dereference operator then what should be the relative precedences of the following (where E is any subexpression)?
++E
E++
E^
(The same goes for -- but to make description easier I'll mention only ++.)
Taking a step back and considering general expression evaluation I have,
so far, been defining the apparent order. And I'd like to continue with
that. So it should be possible to combine multiple ++ operators
arbitrarily. For example,
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something a programmer can rely on.
Setting that aside aside ... and going back to the query, what should be
the relative precedences of the three operators? For example, how should
the following be evaluated?
++E++^
++E^++
Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!
I guess it boils down to what's most convenient and comprehensible for a programmer but I don't know if there is a clear answer. What do you guys think?
I've been scratching my head over this for a while so other opinions
would be most welcome!
A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.
The basic question is: If ^ is a postfix dereference operator then what should be the relative precedences of the following (where E is any subexpression)?
++E
E++
E^
(The same goes for -- but to make description easier I'll mention only ++.)
Taking a step back and considering general expression evaluation I have,
so far, been defining the apparent order. And I'd like to continue with
that. So it should be possible to combine multiple ++ operators
arbitrarily. For example,
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something a programmer can rely on.
[[
As an aside, one thing I should point out is that while both pre-and post-increment require an lvalue it is easy for prefix ++ to also result
in an lvalue whereas postfix ++ more naturally produces an rvalue.
Prefix ++ can be translated to
increment the value at a certain address
use that /address/
By contrast, postfix ++ more naturally translates to
load into a register the /value/ at a certain address
increment the value left at that address
After postfix ++ the address may not be so usable because its value has already been changed and yet the code said to increment it /after/ the operation (for some definition of 'after').
At any rate, that distinction between prefix and postfix ++ seems to be recognised at the following link where it says "Prefix versions of the built-in operators return references and postfix versions return values."
https://en.cppreference.com/w/cpp/language/operator_incdec
Setting that aside aside ... and going back to the query, what should be
the relative precedences of the three operators? For example, how should
the following be evaluated?
++E++^
++E^++
Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!
On 07/11/2022 12:55, James Harris wrote:
A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.
The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?
++E
E++
E^
(The same goes for -- but to make description easier I'll mention only
++.)
Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.
I disagree entirely - unless you include giving an error message saying
the programmer should be fired for writing gibberish as "well defined
and something you can rely on". I can appreciate not wanting such
things to be run-time undefined behaviour, but there is no reason at all
to insist that it is acceptable by the compiler.
Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?
++E++^
++E^++
Make it a syntax error.
On 07/11/2022 15:23, Bart wrote:
V = V++
This one doesn't have any problems, but is probably not useful: you're
modifying V then replacing its value anyway, and with its original
value. That new V+1 value is discarded.
In C, it has /big/ problems - the side-effects on V are not sequenced,
so the expression is undefined behaviour. Other languages may differ - you'd have to read the specifications or standards for those languages.
I think this is because in my language, for something to be a valid
lvalue, you need to be able to apply & address-of to it. The result of
E++ doesn't have an address. But (E++)^ works because & and ^ cancel
out. Or something...
It is a bad sign for a language when even the language author,
implementer, and experienced user is not sure how it works. As long as
the language is only ever meant to be for a single person, you can get
away with saying "I wouldn't write that, so it doesn't matter what it means". But if the OP has hopes that more than one person will ever see
his language, it should be specified well enough that these things are written down.
On 07/11/2022 11:55, James Harris wrote:
A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.
The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?
++E
E++
E^
(The same goes for -- but to make description easier I'll mention only
++.)
For unary operators, the evaluation order is rather peculiar yet seems
to be used in quite a few languages without anyone questioning it.
So if
`a b c d` are unary operators, then the following:
a b E c d
is evaluated like this:
a (b ((E c) d))
That is, first all the post-fix operators in left-to-right order, then
all the prefix ones in right-left order. It sounds bizarre when put like that!
Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,
++E + E++
This is well defined, as unary operators bind more tightly than binary
ones. This is just (++E) + (++E).
However the evaluation order for '+' is not usually well-defined, so you don't know which operand will be done first.
++E++
This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++). But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.
V = V++
This one doesn't have any problems, but is probably not useful: you're modifying V then replacing its value anyway, and with its original
value. That new V+1 value is discarded.
https://en.cppreference.com/w/cpp/language/operator_incdec
I tried to get ++E++ to work using a suitable type for E, but in my
language it cannot work, as the first ++ still needs an lvalue; just an rvalue which has a pointer type won't cut it.
However ++E++^ can work, where ^ is deref, and E is a pointer.
I think this is because in my language, for something to be a valid
lvalue, you need to be able to apply & address-of to it. The result of
E++ doesn't have an address. But (E++)^ works because & and ^ cancel
out. Or something...
Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?
++E++^
++E^++
Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!
You have the same issues in C, but that's OK because people are so
familiar with it. Also * deref is a prefix operator so you never have
two distinct postfix operators, unless you write E++ --.
But yes, parentheses are recommended when mixing certain prefix/postfix
ops. I think this one is clear enough however:
-E^
Deference E then negate the result. As is this: -E[i]; you wouldn't
assume that meant (-E)[i].
On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
On 2022-11-07 13:52, James Harris wrote:
On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
On 2022-11-07 12:55, James Harris wrote:
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.
One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.
That's all very well but what specifically would you prohibit?
Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))
+ usually means numeric addition
= here presumably means an assignment (and from right to left)
++ can also be assumed to mean in-place increment. Specifically:
++E is equivalent to: (E := E + 1; E)
E++ is equivalent to: (T := E; E := E + 1; T)
(When E can be harmlessly evaluated more than once; otherwise an extra temporary reference would need to be used.)
But I'm sure you know this already.
On 07/11/2022 14:58, David Brown wrote:
On 07/11/2022 12:55, James Harris wrote:
A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.
The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?
++E
E++
E^
(The same goes for -- but to make description easier I'll mention
only ++.)
Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.
I disagree entirely - unless you include giving an error message
saying the programmer should be fired for writing gibberish as "well
defined and something you can rely on". I can appreciate not wanting
such things to be run-time undefined behaviour, but there is no reason
at all to insist that it is acceptable by the compiler.
gcc accepts this C code (when E, V are both ints):
++E + E++;
V = V++;
It won't accept ++E++ because the first ++ expects an lvalue. Probably
the same will happen when you try and implement it elsewhere. So no
actual need to prohibit in the language - it just won't work.
Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?
++E++^
++E^++
Make it a syntax error.
The equivalent in C syntax for the first is:
++*(P++);
This compiles fine when P has type int* for example. It means this:
- Increment the pointe P
- Increment the location that P now points to (using the * deref op)
So no reason to prohibit anything; it is perfectly well-defined.
The
first example is equivalent to:
++((*P)++);
This won't work for the same reason as above. This is hard to prohibit
via grammar rules, but it it not necessary as it fails on type-checking.
On 07/11/2022 15:18, David Brown wrote:
On 07/11/2022 15:23, Bart wrote:
V = V++
This one doesn't have any problems, but is probably not useful:
you're modifying V then replacing its value anyway, and with its
original value. That new V+1 value is discarded.
In C, it has /big/ problems - the side-effects on V are not sequenced,
so the expression is undefined behaviour. Other languages may differ
- you'd have to read the specifications or standards for those languages.
I'd suggest that in C it would be a compiler problem. For example if it
did the assignment, and then decided to increment V.
To me that would be bizarre: I'd expect to evaluate the RHS as a single
term (V++), including any side-effects entailed, before writing the
resulting value (the old value of V) into V.
But in general you're right: I'm not keen on multiple things being
changed inside one expression. I tolerate ++ and -- (and chained
assignment) because they are so handy. But I don't allow augmented assignments inside an expression as C does.
I think this is because in my language, for something to be a valid
lvalue, you need to be able to apply & address-of to it. The result
of E++ doesn't have an address. But (E++)^ works because & and ^
cancel out. Or something...
It is a bad sign for a language when even the language author,
implementer, and experienced user is not sure how it works. As long
as the language is only ever meant to be for a single person, you can
get away with saying "I wouldn't write that, so it doesn't matter what
it means". But if the OP has hopes that more than one person will
ever see his language, it should be specified well enough that these
things are written down.
The 'or something' refers to the mechanism within my compiler which determines what is a legal lvalue. I'd have to study 3700 lines of code
to discover exactly how it worked.
But it should be obvious (now that I've thought about it!) that a term
of the form X^, which is all that `E++^` is, should be a legal lvalue as
it can be used on either side of an assignment:
X^ := X^
(Although no doubt C will make that UB because that's what it likes to do.)
On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
On 2022-11-07 13:52, James Harris wrote:
On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
On 2022-11-07 12:55, James Harris wrote:
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.
One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.
That's all very well but what specifically would you prohibit?
Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))
+ usually means numeric addition
= here presumably means an assignment (and from right to left)
++ can also be assumed to mean in-place increment. Specifically:
On 2022-11-07 13:52, James Harris wrote:
On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
On 2022-11-07 12:55, James Harris wrote:
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.
One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.
That's all very well but what specifically would you prohibit?
Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))
On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
On 2022-11-07 13:52, James Harris wrote:
On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
On 2022-11-07 12:55, James Harris wrote:
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.
One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.
That's all very well but what specifically would you prohibit?
Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))
It's easy to dislike a certain sequence of operators. It's harder to
define rules for their prohibition.
On 07/11/2022 16:06, Bart wrote:
But I'm sure you know this already.
What you have given are the interpretations for C and similar languages, operating on arithmetic operands. Other languages may have different meanings for the symbols. Even if the OP's language gives the same
meaning to the operators for integers, it might mean something different
for other types - including the possibility of operator overloads for
user types.
On 07/11/2022 11:55, James Harris wrote:
++E++
This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++). But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.
On 07/11/2022 12:55, James Harris wrote:
A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.
The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?
++E
E++
E^
(The same goes for -- but to make description easier I'll mention only
++.)
Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.
I disagree entirely
- unless you include giving an error message saying
the programmer should be fired for writing gibberish as "well defined
and something you can rely on". I can appreciate not wanting such
things to be run-time undefined behaviour, but there is no reason at all
to insist that it is acceptable by the compiler.
Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?
++E++^
++E^++
Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!
Make it a syntax error.
On the other
hand, you /do/ have an obligation to try to catch mistakes, typos, and accidental errors in code.
On 07/11/2022 16:43, David Brown wrote:
On 07/11/2022 16:06, Bart wrote:
..
But I'm sure you know this already.
What you have given are the interpretations for C and similar
languages, operating on arithmetic operands. Other languages may have
different meanings for the symbols. Even if the OP's language gives
the same meaning to the operators for integers, it might mean
something different for other types - including the possibility of
operator overloads for user types.
If precedences were to vary with operand types then expressions would be
vary hard for programmers to read so IMO it's important for program readability that precedences go with the operators and that they are independent of the types of the operands. If a programmer didn't know
what order
a + b * c
would be evaluated in until he looked up the types then even simple
programs would be very confusing.
On 07/11/2022 14:58, David Brown wrote:
On 07/11/2022 12:55, James Harris wrote:
A piece of code Bart wrote in another thread happened to relate to
something I've been working on but without me coming up with a clear
answer so I thought I'd ask you guys what you think.
The basic question is: If ^ is a postfix dereference operator then
what should be the relative precedences of the following (where E is
any subexpression)?
++E
E++
E^
(The same goes for -- but to make description easier I'll mention
only ++.)
Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so something
a programmer can rely on.
I disagree entirely
Good. :)
- unless you include giving an error message saying the programmer
should be fired for writing gibberish as "well defined and something
you can rely on". I can appreciate not wanting such things to be
run-time undefined behaviour, but there is no reason at all to insist
that it is acceptable by the compiler.
As I said to Dmitry, if one wants to prohibit the above then one has to define what exactly is being prohibited and to be careful not thereby to prohibit something else that may be more legitimate. Further, such a prohibition is an additional rule the programmer has to learn.
All in all, ISTM better to define such expressions. The programmer is
not forced to use them but at least if they are present in code and well defined then their meaning will be plain.
Take the first one,
++E + E++
It could be defined fairly easily. If operands to + are defined to
appear as though they were evaluated left then right and the ++
operators are set to be of higher precedence and defined to take effect
as soon as they are evaluated than
++E + E++
would evaluate as though the operations were
++E; E++; +
If E were a variable of value 5 then the result would be
6; 6++; + ===> 12 with E ending as 7
E&OE the expression is not actually all that hard to parse if the rules
are simple.
Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?
++E++^
++E^++
Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!
Make it a syntax error.
Why? What's so wrong with it? AISI if all three operators have the
requisite number of operands then how can it be an error in syntax?
On the other hand, you /do/ have an obligation to try to catch
mistakes, typos, and accidental errors in code.
Is it at least partially true that C defines a bunch of expressions as
UB because the rules were not clearly specified initially and different compilers chose different interpretations?
With a new language I cannot see why you might be against clear
definition.
I am aware that it might make optimisation harder to achieve
but that would only apply in some cases and is still, IMO, better than
simply saying "that's not defined".
IOW I welcome your disagreement but don't understand it!
On 07/11/2022 16:23, Bart wrote:
gcc accepts this C code (when E, V are both ints):
++E + E++;
V = V++;
That's like saying that you can hit a screw with a hammer. Use the tool properly, and you will see the complaints. gcc is a C compiler, not
some kind of "official" guide to the language, and everyone knows that without flags it is far too accepting of code that has undefined
behaviour or is otherwise clearly wrong even in cases that can be
spotted easily. With even basic warning flags enabled, these are marked.
(You've had this explained to you a few hundred times over the last
decade or so. I know you get some kind of perverse pleasure out of find
any way of making C and/or gcc look bad in your own eyes,
It won't accept ++E++ because the first ++ expects an lvalue. Probably
the same will happen when you try and implement it elsewhere. So no
actual need to prohibit in the language - it just won't work.
That makes /no/ sense. If by "it just won't work"
So no reason to prohibit anything; it is perfectly well-defined.
There is good reason to prohibit it - you got it wrong, so despite being well-defined by the language, it is not clear code.
The actual meaning of "++*(P++);" is :
1. Remember the original value of P - call it P_orig
2. Increment P (that is, add sizeof(*P) to it).
3. Increment the int at the location pointed to by P_orig.
4. The value of the expression is the new updated value pointed to by P_orig.
No specific ordering of the two increments is implied here - they can be
done in either order.
In C, prohibitions against such code come from "constraints", which are
not part of the BNF grammar rules, but come before any kind of type checking. Whether an expression is an "rvalue", a "modifible lvalue",
"a non-modifiable lvalue", or other classification, is not part of the
type system.
On 2022-11-07 18:26, James Harris wrote:
On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
On 2022-11-07 13:52, James Harris wrote:
On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
On 2022-11-07 12:55, James Harris wrote:
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual >>>>>> meaning is less important than it being well defined and so
something a programmer can rely on.
One major contribution of PL/1 was clear understanding that "every
garbage must mean something" was a bad language design principle.
That's all very well but what specifically would you prohibit?
Your very question was about some arbitrary sequence of operators you
fail to give a meaning! Stop right here. (:-))
It's easy to dislike a certain sequence of operators. It's harder to
define rules for their prohibition.
1. Reduce number of precedence level to logical, additive,
multiplicative, highest order.
2. Require parenthesis for mixed operations at the same level (except
for * and /)
3. No side effects of operators.
On 07/11/2022 16:34, David Brown wrote:
On 07/11/2022 16:23, Bart wrote:
gcc accepts this C code (when E, V are both ints):
++E + E++;
V = V++;
That's like saying that you can hit a screw with a hammer. Use the
tool properly, and you will see the complaints. gcc is a C compiler,
not some kind of "official" guide to the language, and everyone knows
that without flags it is far too accepting of code that has undefined
behaviour or is otherwise clearly wrong even in cases that can be
spotted easily. With even basic warning flags enabled, these are marked. >>
(You've had this explained to you a few hundred times over the last
decade or so. I know you get some kind of perverse pleasure out of
find any way of making C and/or gcc look bad in your own eyes,
Well, isn't it? You recommended that a new language doesn't allow it,
but C does anyway, or at least its implementations do so.
(Unless you go out of /your/ way to ensure it doesn't pass. But you'd be better off avoiding such code. There are a million ways of writing
nonsense code that cannot be prohibited by a compiler.)
It won't accept ++E++ because the first ++ expects an lvalue.
Probably the same will happen when you try and implement it
elsewhere. So no actual need to prohibit in the language - it just
won't work.
That makes /no/ sense. If by "it just won't work"
I mean that you will not get any C compilers to get it to work: all
report hard errors, and will not generate any code.
All the errors mention that some operand is not an lvalue. You don't
really need a special rule in grammar to prohibit certain combinations
of expressions.
For the same reasons, it won't work in other languages unless they have
very different intepretations of what ++ means.
Now compare this kind of unequivocal error report with the wishy-washing handling of C compilers of those other two lines:
So no reason to prohibit anything; it is perfectly well-defined.
There is good reason to prohibit it - you got it wrong, so despite
being well-defined by the language, it is not clear code.
The actual meaning of "++*(P++);" is :
1. Remember the original value of P - call it P_orig
2. Increment P (that is, add sizeof(*P) to it).
3. Increment the int at the location pointed to by P_orig.
4. The value of the expression is the new updated value pointed
to by P_orig.
So, the meaning is that. The point is, it's well-defined and makes
sense.
It may be confusing to look at, but look at ANY C source and you
will see complex expressions that are much harder to grok, like:
OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
}
(Is it even an expression? I /think/ it's function definition.)
So why single out increment operators? Because I got ++P confused with
P++ for a second? Then let's ban those two varieties of increment op too!
Note that ++*(P++) is equivalent to:
*(P += 1) += 1;
Do we ban this or not? (My language doesn't allow this, but again it's a
type issue because `+:=` doesn't return a value.
No specific ordering of the two increments is implied here - they can
be done in either order.
As I said in another post, that would be perverse.
In C, prohibitions against such code come from "constraints", which
are not part of the BNF grammar rules, but come before any kind of
type checking. Whether an expression is an "rvalue", a "modifible
lvalue", "a non-modifiable lvalue", or other classification, is not
part of the type system.
That's up to the implementation. In my compilers including for C,
validating lvalues is part of the type-checking.
I like the simplicity of the language which would result from your suggestions but can't help but think they would make programming in it
less comfortable, like the simplicity of a hair shirt. ;)
But now what about dereference? Should it also take precedence over the
++ operators or should it come after one or both? For instance, what
should the following mean?
++p^
Should it be
(++p)^
or
++(p^)
?
On 08/11/2022 01:15, Bart wrote:
(Unless you go out of /your/ way to ensure it doesn't pass. But you'd
be better off avoiding such code. There are a million ways of writing
nonsense code that cannot be prohibited by a compiler.)
Yes, because "gcc -Wall" is /so/ hard to write.
I mean, it takes hours
extra work, far out of your way. Write yourself a batch file with gcc
flags - you could have done it 20 years ago and saved yourself and
everyone else enormous effort.
A major point of a good programming language - aided by good tools - is
to reduce the amount of bad code that is accepted.
I mean that you will not get any C compilers to get it to work: all
report hard errors, and will not generate any code.
So it is prohibited by the language.
It may be confusing to look at, but look at ANY C source and you will
see complex expressions that are much harder to grok, like:
OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
}
(Is it even an expression? I /think/ it's function definition.)
Are you arguing that because some people write C code that is even
harder to understand, the OP should allow these nonsense expressions in
his language? That "logic" is like saying that because there are bank robbers, people should be allowed to drunk-drive.
On 07/11/2022 16:23, Bart wrote:
On 07/11/2022 14:58, David Brown wrote:
On 07/11/2022 12:55, James Harris wrote:
gcc accepts this C code (when E, V are both ints):
++E + E++;
V = V++;
(You've had this explained to you a few hundred times over the last
decade or so. I know you get some kind of perverse pleasure out of find
any way of making C and/or gcc look bad in your own eyes, but would you /please/ stop being such a petty child and stop writing things
deliberately intended to confuse, mislead or annoy others?)
++E++^
++E^++
Make it a syntax error.
The equivalent in C syntax for the first is:
++*(P++);
This compiles fine when P has type int* for example. It means this:
- Increment the pointe P
- Increment the location that P now points to (using the * deref op)
So no reason to prohibit anything; it is perfectly well-defined.
There is good reason to prohibit it - you got it wrong, so despite being well-defined by the language, it is not clear code.
The actual meaning of "++*(P++);" is :
1. Remember the original value of P - call it P_orig
2. Increment P (that is, add sizeof(*P) to it).
3. Increment the int at the location pointed to by P_orig.
4. The value of the expression is the new updated value pointed to by P_orig.
On 07/11/2022 17:53, Dmitry A. Kazakov wrote:
1. Reduce number of precedence level to logical, additive,
multiplicative, highest order.
2. Require parenthesis for mixed operations at the same level
(except for * and /)
3. No side effects of operators.
Good suggestions, especially ruling out operators with side effects. You wouldn't believe how much trouble they've been giving me.
I like the simplicity of the language which would result from your suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)
On 08/11/2022 08:04, James Harris wrote:
On 07/11/2022 17:53, Dmitry A. Kazakov wrote:
1. Reduce number of precedence level to logical, additive,
multiplicative, highest order.
Many people expect "and" to bind more tightly than "or", so you perhaps need [at least] two levels of logical. Somewhere between C and
hair shirts, there is perhaps some more sensible number?
2. Require parenthesis for mixed operations at the same level
(except for * and /)
Don't know why the exception?
3. No side effects of operators.
How is "side effect" defined for this purpose?
You can get too paranoid about side-effects. They're like many aspects of programming; they can be used for good or evil, and on the
whole you should let programmers use them that way. Good programmers
will use them wisely, bad programmers will write bad programs no matter
how hard you try to make them write good ones.
On 07/11/2022 14:23, Bart wrote:
On 07/11/2022 11:55, James Harris wrote:
..
++E++
This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++). But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.
Yes. If
++E++
is going to be permitted
to say that both ++ operators need to refer to the same lvalue? If so then
++p
should probably have higher precedence than
p++
or perhaps their precedences could be the same but they be applied in left-to-right order.
It may be worth looking at other operators which take in AND produce
lvalues, most familiarly array indexing and field referencing, and hence
they can be incremented. Isn't it true that for both ++ operators of
++points.x[1]
points.x[1]++
that a programmer would normally want points.x[1] incremented, i.e.
field referencing and array indexing would take precedence over either
++ operator?
But now what about dereference? Should it also take precedence over the
++ operators or should it come after one or both? For instance, what
should the following mean?
++p^
Should it be
(++p)^
or
++(p^)
On 07/11/2022 16:34, David Brown wrote:
On 07/11/2022 16:23, Bart wrote:
The equivalent in C syntax for the first is:
++*(P++);
This compiles fine when P has type int* for example. It means this:
- Increment the pointe P
- Increment the location that P now points to (using the * deref op) >>>
So no reason to prohibit anything; it is perfectly well-defined.
There is good reason to prohibit it - you got it wrong, so despite
being well-defined by the language, it is not clear code.
The actual meaning of "++*(P++);" is :
1. Remember the original value of P - call it P_orig
2. Increment P (that is, add sizeof(*P) to it).
3. Increment the int at the location pointed to by P_orig.
4. The value of the expression is the new updated value pointed
to by P_orig.
That's a good example of how legitimate code can me made to look like gibberish by the evil programmer (tm). As I've mentioned elsewhere it's
hard to invent rules to prohibit particular constructs simply because
'we don't like the look of them' and it would make the language harder
to implement and understand if the language design included rules on 'aesthetics'.
It seems to have parallels with the free-speech debate. Free speech is
easy when we get to choose what should be free and what should be banned
- but then that's not free speech. In reality, free speech is hard
because others may be free to say things we don't like (although IMO
those who don't want to hear them should not have to listen to them ...
but that's another topic and getting off the point of the simile).
In a
similar way, programming can be hard when other programmers write
constructs we don't like. I agree that it's best for a language to help programmers write readable and comprehensible programs - and even to
make them the easiest to write, if possible - but the very flexibility
which may allow them to do so may also give then the freedom to write
code we don't care for. I don't think one can legislate against that.
On 08/11/2022 08:33, David Brown wrote:
On 08/11/2022 01:15, Bart wrote:
(Unless you go out of /your/ way to ensure it doesn't pass. But you'd
be better off avoiding such code. There are a million ways of writing
nonsense code that cannot be prohibited by a compiler.)
Yes, because "gcc -Wall" is /so/ hard to write.
And it's SO hard for a compiler to just use that as a default!
So it
stays safe for EVERYONE no matter how they invoke the compiler.
Take a function like this which I consider much more dangerous than
anything we've been discussing:
void fred() {}
My bcc compiler gives a hard error: "() params are not allowed". But
this works:
gcc c.c -c
OK, I'll have to write -Wall as you say:
gcc -Wall c.c -c
But, it still passes!
(So much existing code wrongly uses () to mean no parameters - thanks no doubt to gcc's lax approach over decades - that I have to give bcc a
special option to enable it when it comes up.)
I mean, it takes hours extra work, far out of your way. Write
yourself a batch file with gcc flags - you could have done it 20 years
ago and saved yourself and everyone else enormous effort.
Why do you expect people to have to themselves implement a chunk of the compiler they're using?
And have to do so for every compiler - at one time I was using 7 or 8.
ALL of them should be doing their jobs properly without being told.
A major point of a good programming language - aided by good tools -
is to reduce the amount of bad code that is accepted.
Yeah. In my language, A[i] only works when A is an array; P^ (pointer
deref) only works when P is a pointer.
Sounds obvious when put like that, but in C anything goes; Allowing A^
and P[i] enables a huge amout of dangerous nonsense.
I mean that you will not get any C compilers to get it to work: all
report hard errors, and will not generate any code.
So it is prohibited by the language.
So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
sounds like, it depends!
It may be confusing to look at, but look at ANY C source and you will
see complex expressions that are much harder to grok, like:
OP(op,3f) { F = ((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF;
}
(Is it even an expression? I /think/ it's function definition.)
Are you arguing that because some people write C code that is even
harder to understand, the OP should allow these nonsense expressions
in his language? That "logic" is like saying that because there are
bank robbers, people should be allowed to drunk-drive.
YOU are arguing that you shouldn't be allowed to compose certain
operators because the result might be confusing. But why single out
these particular ones?
On 08/11/2022 13:43, Bart wrote:
On 08/11/2022 08:33, David Brown wrote:
On 08/11/2022 01:15, Bart wrote:
(Unless you go out of /your/ way to ensure it doesn't pass. But
you'd be better off avoiding such code. There are a million ways of
writing nonsense code that cannot be prohibited by a compiler.)
Yes, because "gcc -Wall" is /so/ hard to write.
And it's SO hard for a compiler to just use that as a default!
Yes, it /is/ hard to have it as the default.
But it's not feasible - it will break endless
amount of code and build scripts,
You are used to your own little world of your own tools, your own
language, your own code. Perhaps you don't realise what it means to
work with other people - certainly you don't understand what it means
for a language and a toolchain to be vital for /millions/ of programs.
gcc c.c -c
OK, I'll have to write -Wall as you say:
gcc -Wall c.c -c
But, it still passes!
It is valid code - why would it not pass?
(So much existing code wrongly uses () to mean no parameters - thanks
no doubt to gcc's lax approach over decades - that I have to give bcc
a special option to enable it when it comes up.)
No, existing C code uses () to mean unspecified number of parameters - anything from zero upwards.
You claim to have made a C compiler - did you never actually look at the language standards or learn the language?
(In the next C standard, C23, "void foo()" will mean "foo" takes no parameters, just like in C++.)
I mean, it takes hours extra work, far out of your way. Write
yourself a batch file with gcc flags - you could have done it 20
years ago and saved yourself and everyone else enormous effort.
Why do you expect people to have to themselves implement a chunk of
the compiler they're using?
What? You complain that gcc's source is millions of lines long. How
does a few command-line options count as "a chunk of the compiler" ?
No, despite your continued exaggerations, C is not "anything goes". But
it /does/ allow some constructs that other languages don't (and vice
versa).
You may not have noticed, in your eagerness to condemn everything C
related, including anyone who actually understands and uses the
language, that I have repeatedly recommend that the OP /not/ copy C in
his new language. The design decision for C's subscript to be syntactic sugar for pointer dereferencing (you can't apply it to an array in C,
despite appearances that confuse you)
So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
sounds like, it depends!
Yes, it is prohibited by the language
- as I said, and you bizarrely
claimed otherwise (saying "no actual need to prohibit in the language -
it just won't work").
It has nothing to do with types, it is in
language constraint clauses.
It may be confusing to look at, but look at ANY C source and you
will see complex expressions that are much harder to grok, like:
OP(op,3f) { F =
((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF; } >>>>
(Is it even an expression? I /think/ it's function definition.)
Are you arguing that because some people write C code that is even
harder to understand, the OP should allow these nonsense expressions
in his language? That "logic" is like saying that because there are
bank robbers, people should be allowed to drunk-drive.
YOU are arguing that you shouldn't be allowed to compose certain
operators because the result might be confusing. But why single out
these particular ones?
I didn't. You are making things up.
Make it a syntax error.
I recommend you take a step outside to your garden. Jump up and down
and scream "I hate C" at the top of your voice, until you are hoarse and
red in the face. Get it out your system.
Then come back here, stop
posting ludicrous anti-C drivel, and maybe you can go back to
contributing usefully to the discussion. You have more experience in home-made languages than most people - try to give useful advice and
leave anything about C to people who can talk about it rationally.
No it isn't. And the consequences of allowing terrible, error prone
legacy code are considerable.
On 2022-11-08 18:46, Bart wrote:
No it isn't. And the consequences of allowing terrible, error prone
legacy code are considerable.
Wrong. Backward compatibility trumps everything, absolutely everything. Legacy C, FORTRAN, COBOL code is far more stable than whatever new garbage.
Unless your new language supports strong typing, contracts and formal verification, I'd better take old C code, than newly introduced fancy bugs.
From a new language I expect new technological level. So long you guys
are keeping on reinventing C, I'd better stay with C.
On 08/11/2022 18:18, Dmitry A. Kazakov wrote:
On 2022-11-08 18:46, Bart wrote:They are plenty of newer, more ground-breaking languages around that
No it isn't. And the consequences of allowing terrible, error prone
legacy code are considerable.
Wrong. Backward compatibility trumps everything, absolutely
everything. Legacy C, FORTRAN, COBOL code is far more stable than
whatever new garbage.
Unless your new language supports strong typing, contracts and formal
verification, I'd better take old C code, than newly introduced fancy
bugs.
From a new language I expect new technological level. So long you
guys are keeping on reinventing C, I'd better stay with C.
might suit you: Rust, Zig, Odin, Dart, Julia... Or functional ones like Haskell, OCaml, F#.
On 08/11/2022 16:20, David Brown wrote:
On 08/11/2022 13:43, Bart wrote:
On 08/11/2022 08:33, David Brown wrote:
On 08/11/2022 01:15, Bart wrote:
(Unless you go out of /your/ way to ensure it doesn't pass. But
you'd be better off avoiding such code. There are a million ways of
writing nonsense code that cannot be prohibited by a compiler.)
Yes, because "gcc -Wall" is /so/ hard to write.
And it's SO hard for a compiler to just use that as a default!
Yes, it /is/ hard to have it as the default.
No it isn't. And the consequences of allowing terrible, error prone
legacy code are considerable.
You need ONE new option in a compiler, example:
gcc --classic
(Or, more apt, --unsafe.)
gcc c.c -c
OK, I'll have to write -Wall as you say:
gcc -Wall c.c -c
But, it still passes!
It is valid code - why would it not pass?
Because it's fucking stupid code:
#include <stdio.h>
int fred() {return 0;}
int main(void) {
fred(1,2,3,4,5,6,7,8,9,10);
fred("Hello, World!");
fred(fred,fred,fred(fred(fred)));
}
On what planet could all those calls to fred() be correct? All of them, except at most one, will be wrong. And dangerous.
Yet 'gcc -Wall -Wextra -Wpedantic etc etc` passes it quite happily.
That is a fucking stupid compiler.
(So much existing code wrongly uses () to mean no parameters - thanks
no doubt to gcc's lax approach over decades - that I have to give bcc
a special option to enable it when it comes up.)
No, existing C code uses () to mean unspecified number of parameters -
anything from zero upwards.
No, all the C code I've seen routinely uses () to mean zero parameters
only.
The problem with that is that any number of parameters of any
types can be passed, clearly incorrectly, and it cannot be detected.
Code that uses () correctly (normally associated with function pointers) needs to ensure that the call and the callee match in argument counts
and types. That's why it is dangerous. But this use is unusual.
(In my language, that is achieved with explicit function pointer casts.)
You claim to have made a C compiler - did you never actually look at
the language standards or learn the language?
I made a compiler for a subset of C - minus some features. () parameters
need to be enabled by a legacy switch like the one I mentioned, in my
case called '-old'.
(In the next C standard, C23, "void foo()" will mean "foo" takes no
parameters, just like in C++.)
Will my nonsense program still pass?
I mean, it takes hours extra work, far out of your way. Write
yourself a batch file with gcc flags - you could have done it 20
years ago and saved yourself and everyone else enormous effort.
Why do you expect people to have to themselves implement a chunk of
the compiler they're using?
What? You complain that gcc's source is millions of lines long. How
does a few command-line options count as "a chunk of the compiler" ?
By using 1000s of options to control every aspect of the process. The
options form a mini-DSL to build a custom dialect of a language.
No, despite your continued exaggerations, C is not "anything goes".
But it /does/ allow some constructs that other languages don't (and
vice versa).
You may not have noticed, in your eagerness to condemn everything C
related, including anyone who actually understands and uses the
language, that I have repeatedly recommend that the OP /not/ copy C in
his new language. The design decision for C's subscript to be
syntactic sugar for pointer dereferencing (you can't apply it to an
array in C, despite appearances that confuse you)
You mean appearances like this:
int A[10];
int x;
x = *A;
gcc is happy with this.
So is ++E + E++ prohibited by C or not? I'm none the wiser, but it
sounds like, it depends!
Yes, it is prohibited by the language
Yet no compiler stopped me creating an executable. So why is ++E++ a
hard error and not ++E + E++?
- as I said, and you bizarrely claimed otherwise (saying "no actual
need to prohibit in the language - it just won't work").
Yes, about ++E++. Not ++E + E++; I merely observed that gcc didn't take
the latter seriously.
It has nothing to do with types, it is in language constraint clauses.
It may be confusing to look at, but look at ANY C source and you
will see complex expressions that are much harder to grok, like:
OP(op,3f) { F =
((F&(SF|ZF|YF|XF|PF|CF))|((F&CF)<<4)|(A&(YF|XF)))^CF; }
(Is it even an expression? I /think/ it's function definition.)
Are you arguing that because some people write C code that is even
harder to understand, the OP should allow these nonsense expressions
in his language? That "logic" is like saying that because there are
bank robbers, people should be allowed to drunk-drive.
YOU are arguing that you shouldn't be allowed to compose certain
operators because the result might be confusing. But why single out
these particular ones?
I didn't. You are making things up.
You said:
Make it a syntax error.
about ++E++^ and ++E^++. Before going on to compare that syntax with Brainfuck.
I recommend you take a step outside to your garden. Jump up and down
and scream "I hate C" at the top of your voice, until you are hoarse
and red in the face. Get it out your system.
I suggest you do the same with "I hate Bart".
I already know that the stuff I do is miles better than C, while still
being simple, low-level, small footprint and easy to build fast. Thanks
for reminding me what a quagmire it is.
Then come back here, stop posting ludicrous anti-C drivel, and maybe
you can go back to contributing usefully to the discussion. You have
more experience in home-made languages than most people - try to give
useful advice and leave anything about C to people who can talk about
it rationally.
This is not the C group. C comes up tangentially from time to time. But
I believe it was mostly you who wholesale dragged C and its compilers
into the discussion.
Please do not reply to this. I'm not interested in taking it any further.
On 08/11/2022 18:46, Bart wrote:
I made a compiler for a subset of C - minus some features. ()
parameters need to be enabled by a legacy switch like the one I
mentioned, in my case called '-old'.
If it is just a subset of C, with no attempt at conformity, then it is misleading to refer to it as a C compiler. (It can still be a useful
tool for your own use.)
(In the next C standard, C23, "void foo()" will mean "foo" takes no
parameters, just like in C++.)
Will my nonsense program still pass?
No. It would be as though you had written "int fred(void) {return 0;}"
as the first line.
On 08/11/2022 20:28, David Brown wrote:
On 08/11/2022 18:46, Bart wrote:
I made a compiler for a subset of C - minus some features. ()
parameters need to be enabled by a legacy switch like the one I
mentioned, in my case called '-old'.
If it is just a subset of C, with no attempt at conformity, then it is
misleading to refer to it as a C compiler. (It can still be a useful
tool for your own use.)
There must be 1000s of amateur C compiler projects, probably more than
for any other language.
Mine was able to build programs like Lua, Tiny C, Seed7 and SQLite, and
run those programs to varying degrees. So more capable than most, but I usually refer to it in docs as a C-subset compiler.
Whatever C coding I do these days will be in that subset, and mine will
be my first choice of C compiler. Any machine-generated C will be in
that same subset.
(In the next C standard, C23, "void foo()" will mean "foo" takes no
parameters, just like in C++.)
Will my nonsense program still pass?
No. It would be as though you had written "int fred(void) {return
0;}" as the first line.
So what happened to backwards compatibility? All those programs which
call () functions with more arguments will no longer work.
On 09/11/2022 00:06, Bart wrote:
So what happened to backwards compatibility? All those programs which
call () functions with more arguments will no longer work.
I still don't understand what you mean.
int (*f)(void);
int (*g)(int, int);
void foo(void) {
f(); // Good
g(1, 2); // Good
f(1, 2); // Error
g(); // Error
}
Function pointers of different types - that is, different return types, different numbers or types of parameters - are incompatible in C. There
are no implicit convertions between them (unlike objects pointers and
void*), there is no common base type, and any use of explicit casts will
lead to undefined behaviour if you don't cast back to the right type
before calling the function. Even the cast back and forth is
implementation dependent - an implementation could use different sizes
for different function pointer types.
In practice, of course, most implementations have the same size of
pointer for all function types. On some real-world targets it will be different from the size of object pointers (imagine a 16-bit system with "large" code model and "small" data model, or vice versa). But C still
does not allow you to mess with the function pointer types without
explicit "I know what I am doing" casts.
On 07/11/2022 18:16, James Harris wrote:
On 07/11/2022 14:23, Bart wrote:
On 07/11/2022 11:55, James Harris wrote:
..
++E++
This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++). But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.
Yes. If
++E++
is going to be permittedThen you need to define what it means.
Here, suppose that in each case E
starts off as 100:
E++ # What value does E have afterwards?
++E # What value does E have afterwards?
X := E++ # What is the value of X?
++E++ # What is the value of E after?
X := ++E++ # What is the value of X? What is the type and value
# of the E++ portion?
I can't make ++E++ work in any of my languages because of type/lvalue discrepancies.
then for programmer sanity wouldn't it be true
to say that both ++ operators need to refer to the same lvalue? If so then
++p
should probably have higher precedence than
p++
or perhaps their precedences could be the same but they be applied in left-to-right order.It would already be a big deal, and a vast improvement over C, that "^"
is a postfix op; don't push it!
It may be worth looking at other operators which take in AND produce lvalues, most familiarly array indexing and field referencing, and hence they can be incremented. Isn't it true that for both ++ operators of
++points.x[1]
points.x[1]++
that a programmer would normally want points.x[1] incremented, i.e.I'm not sure what you're asking here or where producing lvalues comes in
field referencing and array indexing would take precedence over either
++ operator?
to it.
Those examples work as expected in my language:
record R =
var x
end
points:=R((10,20,30))
println points # (10, 20, 30)
++points.x[1]
println points # (11, 20, 30)
points.x[1]++
println points # (12, 20, 30)
However, my syntax works a specific way:
* "." is not considered a normal binary operator (because it isn't).
* "[]" is not considered that either (this is more typical)
So `points.x[1]` forms a single expression term. Unary ops like `++`
work on a term. If "." was a normal binary op, then your example would
be parsed as:
(++points).x[1]
unless you make special rules just for ++.
Note, usually ++A and A++ are interchangeable. There is only different behaviour if you try to use the resulting value (the first then returns
new A, the second returns old A).
But now what about dereference? Should it also take precedence over the
++ operators or should it come after one or both? For instance, what should the following mean?
++p^
Should it be
(++p)^
or
++(p^)Isn't it just up to unary op evaluation? I already said how it's
typically done, so that ++p^ means ++(p^). If it's unclear, then just
use parentheses.
On 09/11/2022 08:12, David Brown wrote:
On 09/11/2022 00:06, Bart wrote:
So what happened to backwards compatibility? All those programs which
call () functions with more arguments will no longer work.
I still don't understand what you mean.
int (*f)(void);
int (*g)(int, int);
void foo(void) {
f(); // Good
g(1, 2); // Good
f(1, 2); // Error
g(); // Error
}
Function pointers of different types - that is, different return
types, different numbers or types of parameters - are incompatible in
C. There are no implicit convertions between them (unlike objects
pointers and void*), there is no common base type, and any use of
explicit casts will lead to undefined behaviour if you don't cast back
to the right type before calling the function. Even the cast back and
forth is implementation dependent - an implementation could use
different sizes for different function pointer types.
In practice, of course, most implementations have the same size of
pointer for all function types. On some real-world targets it will be
different from the size of object pointers (imagine a 16-bit system
with "large" code model and "small" data model, or vice versa). But C
still does not allow you to mess with the function pointer types
without explicit "I know what I am doing" casts.
The following code is a legitimate use of () parameter lists:
#include <stdio.h>
int f1(int a) {return a;}
int f2(int a, int b) {return a+b;}
int f3(int a, int b, int c) {return a+b+c;}
int (*fntable[])() = {NULL, f3, f1, f2};
int args[] = {0, 3, 1, 2};
int main(void) {
int n =3, x;
switch (args[n]) {
case 1: x=fntable[n](10); break;
case 2: x=fntable[n](20,30); break;
case 3: x=fntable[n](40,50,60); break;
}
printf("x=%d\n",x);
}
'fntable' is populated with functions of mixed signatures. When calling
one of those functions, the user-code must ensure the function pointer
is called with the right arguments for that specific function.
That is done here with the switch statement. If the () in this line:
int (*fntable[])() = {NULL, f3, f1, f2};
is assumed to be (void) in C23, then initialising with f1, f2, f3 will
be illegal, and all those calls will be too. This is why I said, what happened to backwards compatibility.
In my language there is no equivalent to C's unchecked () parameter
list. There I would likely use the equivalent of void* pointers and
apply a cast at the point of call. The same could be done in C.
On Tuesday, 8 November 2022 at 16:42:36 UTC, Bart wrote:
++E++ # What is the value of E after?
102
(++E)++
X := ++E++ # What is the value of X? What is the type and value
# of the E++ portion?
X: 101
(E would still end up holding 102)
As a subexpression it would have the type and the intermediate value of E (101), not of E++.
On 2022-11-07 16:06, Bart wrote:
On 07/11/2022 13:43, Dmitry A. Kazakov wrote:
On 2022-11-07 13:52, James Harris wrote:
On 07/11/2022 12:22, Dmitry A. Kazakov wrote:
On 2022-11-07 12:55, James Harris wrote:
++E + E++
++E++
V = V++
++ means cheap keyboard with broken keys or coffee spilled over it... (:-))
On 07/11/2022 16:16, Dmitry A. Kazakov wrote:
++ means cheap keyboard with broken keys or coffee spilled over it...
(:-))
A bit like Ada's --, then. ;-)
On 08/11/2022 14:24, James Harris wrote:
You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can try.
Aim to make it harder to write convoluted code, and easier to write clearer code. And try to make the clearer code more efficient, to
reduce the temptation to write evil code.
In a similar way, programming can be hard when other programmers write
constructs we don't like. I agree that it's best for a language to
help programmers write readable and comprehensible programs - and even
to make them the easiest to write, if possible - but the very
flexibility which may allow them to do so may also give then the
freedom to write code we don't care for. I don't think one can
legislate against that.
I'm not sure it is the same - after all, if some one exercises their
rights to speak gibberish, or to give long, convoluted and
incomprehensible speaches, the listener has the right to go away, ignore them, or fall asleep. It's harder for a compiler to do that!
On 09/11/2022 13:53, James Harris wrote:
On Tuesday, 8 November 2022 at 16:42:36 UTC, Bart wrote:
++E++ # What is the value of E after?
102
(++E)++
X := ++E++ # What is the value of X? What is the type and value
# of the E++ portion?
X: 101
(E would still end up holding 102)
As a subexpression it would have the type and the intermediate value
of E (101), not of E++.
This is where your approach differs from mine. It sounds like you would
also allow this:
++ ++ ++ E
so that E becomes 103?
It doesn't then matter whether ++ is prefix or
postfix.
If that isn't the case (yours only works with mixed
prefix/postfix), then I can only explain it like this:
++E is the same as: (E:=E+1; E) # that final E is an lvalue
E++ is the same as: (T:=E; E:=E+1; T)
the final T /is/ an lvalue, but not the right one! You can't use it to
modify E.
That wouldn't work for me anyway because T is a transient value
(typically stored on the stack, register or unaccessible temporary -
it's got to exist somewhere!) with no lvalue. It's similar to this:
A+B is the same as: (T:=A+B; T)
It's clear that ++(A+B) can't work unless you change what ++ means (eg.
++A now means (A+1) because whatever ++ modifies is not accessible).
If that's how your approach works, then it would be unorthogonal:
++(E++) works
(++E)++ doesn't
even though you'd expect E to be 102 in both cases (and to deliver 101
in both cases too).
And:
++ ++ E works
E ++ ++ doesn't
On 2022-11-13 16:54, James Harris wrote:
On 07/11/2022 16:16, Dmitry A. Kazakov wrote:
++ means cheap keyboard with broken keys or coffee spilled over it...
(:-))
A bit like Ada's --, then. ;-)
In Ada -- is a comment, not operator.
An interesting question regarding operator's symbol is: sticking to
ASCII or going Unicode. Let's say we wanted an increment operator (I do
not). Why ++ from 60's? Take the increment (∆) or the upwards arrow ↑ etc.
Note that all arguments against Unicode apply to operators. If the thing
is difficult to type then it is difficult to remember: precedence level, associativity, semantics. If you can hold these in your head, you could remember the key combination as well. If you do not, then, maybe, having
a subprogram Increment() would be better choice?
On 07/11/2022 20:24, James Harris wrote:
On 07/11/2022 14:58, David Brown wrote:
On 07/11/2022 12:55, James Harris wrote:
So it should be possible to combine multiple ++
operators arbitrarily. For example,
++E + E++
++E++
V = V++
Expressions such as those would have a defined meaning. The actual
meaning is less important than it being well defined and so
something a programmer can rely on.
I disagree entirely
Good. :)
- unless you include giving an error message saying the programmer
should be fired for writing gibberish as "well defined and something
you can rely on". I can appreciate not wanting such things to be
run-time undefined behaviour, but there is no reason at all to insist
that it is acceptable by the compiler.
As I said to Dmitry, if one wants to prohibit the above then one has
to define what exactly is being prohibited and to be careful not
thereby to prohibit something else that may be more legitimate.
Further, such a prohibition is an additional rule the programmer has
to learn.
No one said this was easy! Though Dmitry had some suggestions of rules
to try.
These prohibitions aren't really additional rules for the programmer to
learn - it is primarily about disallowing things that a good programmer
is not going to write in the first place. No one should actually care
if "++E++" is allowed or not, because they should never write it.
Prohibiting it means you don't have to specify the order these operators
are applied, or whether the expression must be evaluated for
side-effects twice, or any of the rest of it. The only people that will have to learn something extra are the sort of programmers who think it
is smart to write line noise.
All in all, ISTM better to define such expressions. The programmer is
not forced to use them but at least if they are present in code and
well defined then their meaning will be plain.
No, the meaning will /not/ be plain. That's the point. Ideally you
should only allow constructs that do exactly what they appear to do,
without the reader having to study the manuals to understand some indecipherable gibberish that is technically legal code but completely
alien to them because no sane programmer would write it.
Take the first one,
++E + E++
It could be defined fairly easily. If operands to + are defined to
appear as though they were evaluated left then right and the ++
operators are set to be of higher precedence and defined to take
effect as soon as they are evaluated than
++E + E++
would evaluate as though the operations were
++E; E++; +
Then define it as "syntax error" and insist the programmer writes it sensibly.
I cannot conceive of a reason to have a pre-increment operator in a
modern language, nor would I want post-increment to return a value (nor
any other kind of assignment). Ban side-effects in expressions -
require a statement. "x = y + 1;" is a statement, so it can affect "x".
"y++;" is a statement - a convenient abbreviation for "y = y + 1;".
"++x" no longer exists, and "x + x++;" makes no sense because it mixes
an expression and a statement.
What is the cost? The programmer might have to split things into a few lines - but we have much bigger screens and vastly bigger disks than the
days when C was born. The programmer might need a few extra temporary variables - these are free with modern compiler techniques.
Ask yourself why "++x;" and the like exist in languages like C. The
reason is that early compilers were weak - they were close to dumb translators into assembly, and if you wanted efficient results using the features of the target processor, you needed to write your code in a way
that mimicked the actual processor instructions. "INC A" was faster
than "ADD A, 1", so you write "x++" rather than "x = x + 1". This is no longer the case in the modern world.
I am aware that it might make optimisation harder to achieve but that
would only apply in some cases and is still, IMO, better than simply
saying "that's not defined".
IOW I welcome your disagreement but don't understand it!
I think it is great that you are happy to discuss this and I try my bes
to explain it.
As you know, I don't like Unicode for program source. It can be hard to
type, hard to read aloud, and hard to compare when two glyphs look the
same but have different encodings. A small set of characters such as
ASCII has none of those problems.
James Harris <james.harris.1@gmail.com> writes:
As you know, I don't like Unicode for program source. It can be hard to
type, hard to read aloud, and hard to compare when two glyphs look the
same but have different encodings. A small set of characters such as
ASCII has none of those problems.
But it can be cumbersome to escape all quotation characters
in a string, as in "\"\\" in C.
Imagine one would use some obscure Unicode characters as
string delimiters. For example,
ᒪ CANADIAN SYLLABICS MA
ᒧ CANADIAN SYLLABICS MO
. Programmers will surely find a way to map them to their
keyboards somehow. Then that string literal would be just
ᒧ"\ᒪ!
(One would just have to use escapes in the rare case that
one really needs to have those Canadian syllabics ma or mo
within a string literal.)
On 2022-11-08 09:04, James Harris wrote:
I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in it
less comfortable, like the simplicity of a hair shirt. ;)
If you think programmers are dying to write stuff like *p++=*q++; you
are wrong. Actually that train left the station. Today C++ fun is
templates. It is monstrous instantiations over instantiations barely resembling program code. Modern times is a glorious combination of
Python performance with K&R C readability! (:-))
On 07/11/2022 11:55, James Harris wrote:
For unary operators, the evaluation order is rather peculiar yet seems
to be used in quite a few languages without anyone questioning it. So if
`a b c d` are unary operators, then the following:
a b E c d
is evaluated like this:
a (b ((E c) d))
That is, first all the post-fix operators in left-to-right order, then
all the prefix ones in right-left order. It sounds bizarre when put like that!
Taking a step back and considering general expression evaluation I
have, so far, been defining the apparent order. And I'd like to
continue with that. So it should be possible to combine multiple ++
operators arbitrarily. For example,
++E + E++
This is well defined, as unary operators bind more tightly than binary
ones. This is just (++E) + (++E).
However the evaluation order for '+' is not usually well-defined, so you don't know which operand will be done first.
++E++
This may not work, or not work as espected. The binding using my above
scheme means this is equivalent to ++(E++).
But the second ++ requires
an lvalue as input, and yields an rvalue, which would be an invalid
input to the first ++.
At any rate, that distinction between prefix and postfix ++ seems to
be recognised at the following link where it says "Prefix versions of
the built-in operators return references and postfix versions return
values."
https://en.cppreference.com/w/cpp/language/operator_incdec
I tried to get ++E++ to work using a suitable type for E, but in my
language it cannot work, as the first ++ still needs an lvalue; just an rvalue which has a pointer type won't cut it.
However ++E++^ can work, where ^ is deref, and E is a pointer.
I think this is because in my language, for something to be a valid
lvalue, you need to be able to apply & address-of to it. The result of
E++ doesn't have an address. But (E++)^ works because & and ^ cancel
out. Or something...
Setting that aside aside ... and going back to the query, what should
be the relative precedences of the three operators? For example, how
should the following be evaluated?
++E++^
++E^++
Or should some ways of combining ^ with either or both of the ++
operators be prohibited because they make code too difficult to
understand?!!
You have the same issues in C, but that's OK because people are so
familiar with it. Also * deref is a prefix operator so you never have
two distinct postfix operators, unless you write E++ --.
But yes, parentheses are recommended when mixing certain prefix/postfix
ops. I think this one is clear enough however:
-E^
Deference E then negate the result. As is this: -E[i]; you wouldn't
assume that meant (-E)[i].
On 2022-11-13 20:18, James Harris wrote:
On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
On 2022-11-08 09:04, James Harris wrote:
I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)
If you think programmers are dying to write stuff like *p++=*q++; you
are wrong. Actually that train left the station. Today C++ fun is
templates. It is monstrous instantiations over instantiations barely
resembling program code. Modern times is a glorious combination of
Python performance with K&R C readability! (:-))
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more natural
and easier to read...?
No algorithm requires you resort to pointers. With arrays it is plain assignment:
p := q;
which BTW could be performed in parallel or by a single instruction on a
CISC machine or with a bunch optimizations.
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
On 2022-11-08 09:04, James Harris wrote:
I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)
If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun
is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more natural
and easier to read...?
No algorithm requires you resort to pointers. With arrays it is plain
assignment:
p := q;
You're assuming this is part of a loop. But perhaps other things are happening after each *p++ = *q++: the next source or dest might be
different, or the next transfer might be of a different type and/or size.
This is what makes such lower level operations so useful. A solution in
a higher level but more limiting language might require ingenuity to get around the strictness.
which BTW could be performed in parallel or by a single instruction on
a CISC machine or with a bunch optimizations.
Well, perhaps this *p++=*q++ is part of the result of such a process,
where the target happens to be C source code. Few languages higher level
than ASM are suited for that job.
But by all means continue to pour scorn on it.
On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
On 2022-11-08 09:04, James Harris wrote:
I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)
If you think programmers are dying to write stuff like *p++=*q++; you
are wrong. Actually that train left the station. Today C++ fun is
templates. It is monstrous instantiations over instantiations barely
resembling program code. Modern times is a glorious combination of
Python performance with K&R C readability! (:-))
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more natural and easier to read...?
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
On 2022-11-08 09:04, James Harris wrote:
I like the simplicity of the language which would result from your
suggestions but can't help but think they would make programming in
it less comfortable, like the simplicity of a hair shirt. ;)
If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun
is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more natural
and easier to read...?
No algorithm requires you resort to pointers.
With arrays it is plain
assignment:
p := q;
You're assuming this is part of a loop. But perhaps other things are happening after each *p++ = *q++: the next source or dest might be
different, or the next transfer might be of a different type and/or size.
This is what makes such lower level operations so useful. A solution in
a higher level but more limiting language might require ingenuity to get around the strictness.
which BTW could be performed in parallel or by a single instruction on
a CISC machine or with a bunch optimizations.
Well, perhaps this *p++=*q++ is part of the result of such a process,
where the target happens to be C source code. Few languages higher level
than ASM are suited for that job.
On 2022-11-13 22:06, James Harris wrote:
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
On 2022-11-08 09:04, James Harris wrote:
I like the simplicity of the language which would result from
your suggestions but can't help but think they would make
programming in it less comfortable, like the simplicity of a hair >>>>>>> shirt. ;)
If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun >>>>>> is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more natural
and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just as
easily have
a[i++] := b[j++]
Same question. What for?
Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square root,
post exponentiation, post factorial when the humankind need them so bad?
You remind be a salesman selling the toaster equipped with a toilet
brush. Know what? I do not need this combination...
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
On 2022-11-08 09:04, James Harris wrote:
I like the simplicity of the language which would result from your >>>>>> suggestions but can't help but think they would make programming
in it less comfortable, like the simplicity of a hair shirt. ;)
If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun
is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more natural
and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just as easily
have
a[i++] := b[j++]
On 2022-11-13 22:06, James Harris wrote:
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
On 2022-11-08 09:04, James Harris wrote:
I like the simplicity of the language which would result from
your suggestions but can't help but think they would make
programming in it less comfortable, like the simplicity of a hair >>>>>>> shirt. ;)
If you think programmers are dying to write stuff like *p++=*q++;
you are wrong. Actually that train left the station. Today C++ fun >>>>>> is templates. It is monstrous instantiations over instantiations
barely resembling program code. Modern times is a glorious
combination of Python performance with K&R C readability! (:-))
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more natural
and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just as
easily have
a[i++] := b[j++]
Same question. What for?
Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square root,
post exponentiation, post factorial when the humankind need them so bad?
You remind be a salesman selling the toaster equipped with a toilet
brush. Know what? I do not need this combination...
On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
On 2022-11-13 22:06, James Harris wrote:
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
On 08/11/2022 08:23, Dmitry A. Kazakov wrote:
On 2022-11-08 09:04, James Harris wrote:
I like the simplicity of the language which would result from
your suggestions but can't help but think they would make
programming in it less comfortable, like the simplicity of a
hair shirt. ;)
If you think programmers are dying to write stuff like *p++=*q++; >>>>>>> you are wrong. Actually that train left the station. Today C++
fun is templates. It is monstrous instantiations over
instantiations barely resembling program code. Modern times is a >>>>>>> glorious combination of Python performance with K&R C
readability! (:-))
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more
natural and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just as
easily have
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need them
so bad?
I cannot tell what point you are trying to make.
On 08/11/2022 16:29, David Brown wrote:
On 08/11/2022 14:24, James Harris wrote:
...
You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can
try. Aim to make it harder to write convoluted code, and easier to
write clearer code. And try to make the clearer code more efficient,
to reduce the temptation to write evil code.
I agree
...
In a similar way, programming can be hard when other programmers
write constructs we don't like. I agree that it's best for a language
to help programmers write readable and comprehensible programs - and
even to make them the easiest to write, if possible - but the very
flexibility which may allow them to do so may also give then the
freedom to write code we don't care for. I don't think one can
legislate against that.
I'm not sure it is the same - after all, if some one exercises their
rights to speak gibberish, or to give long, convoluted and
incomprehensible speaches, the listener has the right to go away,
ignore them, or fall asleep. It's harder for a compiler to do that!
I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!
Rather than allowing non-ASCII in source I came up with a scheme of what
you might call 'named characters' extending the backslash idea of C to
allow names instead of single characters after the backslash. It's off
topic for this thread but it allows non-ASCII characters to be named
(such that the names consist of ASCII characters and would thus be
readable and universal).
On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
On 2022-11-13 23:10, James Harris wrote:
On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
On 2022-11-13 22:06, James Harris wrote:
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
...
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more
natural and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just as
easily have
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Show me the algorithm.
There's no particular algorithm; the construct is a potential component
of many algorithms.
It's as though someone points out a brick and
someone else says "what does it mean?"
Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need
them so bad?
I cannot tell what point you are trying to make.
There exist an infinite number of combinations that could be
operators. E.g. divide by two and reboot the computer.
OK. ATM I have a generous **but limited** number of operators.
Again, IMO it's important for the language to provide such
pseudofunctions so that a programmer's code can be made clearer,
simpler, and more readable.
On 13/11/2022 18:11, James Harris wrote:
On 08/11/2022 16:29, David Brown wrote:
I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!
You /say/ that, but you don't appear to believe it or be interested in
making it happen.
On the one side, you claim you want a clear language that is
understandable for programmers and maintenance. On the other side, you
want to decide what "++E++" should mean, with random "^" characters
thrown in for good measure.
On 13/11/2022 18:11, James Harris wrote:
On 08/11/2022 16:29, David Brown wrote:
...
You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can
try. Aim to make it harder to write convoluted code, and easier to
write clearer code. And try to make the clearer code more efficient,
to reduce the temptation to write evil code.
I agree
I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!
You /say/ that, but you don't appear to believe it or be interested in
making it happen.
On the one side, you claim you want a clear language that is
understandable for programmers and maintenance.
On the other side, you
want to decide what "++E++" should mean, with random "^" characters
thrown in for good measure.
These two statements go together as well as Dmitry's toaster and toilet brush. It doesn't matter how precisely you define how the combination
can be used and what it does, it is still not a good or useful thing.
On 2022-11-14 11:16, James Harris wrote:
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Show me the algorithm.
There's no particular algorithm; the construct is a potential
component of many algorithms.
Show me one that is not array assignment.
On 14/11/2022 09:26, David Brown wrote:
On 13/11/2022 18:11, James Harris wrote:
On 08/11/2022 16:29, David Brown wrote:
...
You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can
try. Aim to make it harder to write convoluted code, and easier to >>>> write clearer code. And try to make the clearer code more
efficient, to reduce the temptation to write evil code.
I agree
...
I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!
You /say/ that, but you don't appear to believe it or be interested in
making it happen.
That comment surprises me a little. /The main point/ of this, AISI, is
to make the job of the programmer simpler and to help him write code
which is more readable. You said yourself that (paraphrasing) when
there's a choice between clear and convoluted code it's important for a language to make the clearer code the easier one to write.
On the one side, you claim you want a clear language that is
understandable for programmers and maintenance.
Yes.
On the other side, you want to decide what "++E++" should mean, with
random "^" characters thrown in for good measure.
Not quite. As language designer I have to decide what facilities will be provided but I do not have absolute control over what a programmer may
do with the facilities. Nor would a programmer want to work with a
language which implemented unnecessary rules or rules which he may see
as arbitrary.
The expression you mention is just one of a myriad of what you might
consider to be potential nasties. If I am going to prohibit that one
then what about all the others?
These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.
OK, let's take the combination you mentioned:
++E++
I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in the
order that it says it. Remember that unlike C I define the apparent
order of evaluation so the expression is perfectly well formed.
On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
On 2022-11-14 11:16, James Harris wrote:
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Show me the algorithm.
There's no particular algorithm; the construct is a potential
component of many algorithms.
Show me one that is not array assignment.
What exactly is your objection: that there shouldn't be an increment
operator at all?
Note that any language that has reference parameters would allow this:
a[postincr(i)] := b[postincr(j)]
Or is that something else you're not keen on?
On 2022-11-14 11:16, James Harris wrote:
On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
On 2022-11-13 23:10, James Harris wrote:
On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
On 2022-11-13 22:06, James Harris wrote:
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
...
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more
natural and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just as
easily have
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Show me the algorithm.
There's no particular algorithm; the construct is a potential
component of many algorithms.
Show me one that is not array assignment.
It's as though someone points out a brick and someone else says "what
does it mean?"
No, it is like building up a brick factory in your back garden for the purpose of cracking walnuts...
Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need
them so bad?
I cannot tell what point you are trying to make.
There exist an infinite number of combinations that could be
operators. E.g. divide by two and reboot the computer.
OK. ATM I have a generous **but limited** number of operators.
Limited by which criteria?
Again, IMO it's important for the language to provide such
pseudofunctions so that a programmer's code can be made clearer,
simpler, and more readable.
Like this one?
++p+++
On 2022-11-13 23:10, James Harris wrote:
On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
On 2022-11-13 22:06, James Harris wrote:
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more
natural and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just as
easily have
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Show me the algorithm.
Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need
them so bad?
I cannot tell what point you are trying to make.
There exist an infinite number of combinations that could be operators.
E.g. divide by two and reboot the computer.
On 10/11/2022 10:42, Bart wrote:
It's clear that ++(A+B) can't work unless you change what ++ means
(eg. ++A now means (A+1) because whatever ++ modifies is not accessible).
It's similar if A were a struct. You could have
A.F
but you could not have
(A + 4).F
The LHS of the . operation, in this case, has to be an lvalue.
On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
On 2022-11-14 11:16, James Harris wrote:
On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
On 2022-11-13 23:10, James Harris wrote:
On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
On 2022-11-13 22:06, James Harris wrote:
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
...
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more >>>>>>>>>> natural and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just as >>>>>>> easily have
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Show me the algorithm.
There's no particular algorithm; the construct is a potential
component of many algorithms.
Show me one that is not array assignment.
if is_name_first(b[j])
a[i++] = b[j++]
rep while is_name_follow(b[j])
a[i++] = b[j++]
end rep
a[i] = 0
return TOK_NAME
end if
Now, what don't you like about the ++ operators in that? How would you
prefer to write it?
It's as though someone points out a brick and someone else says "what
does it mean?"
No, it is like building up a brick factory in your back garden for the
purpose of cracking walnuts...
Says the man who likes Ada! ;-)
Why not a[i expexp(Pi) sqrtsqrt] := b[j!!]. Where are post square
root, post exponentiation, post factorial when the humankind need
them so bad?
I cannot tell what point you are trying to make.
There exist an infinite number of combinations that could be
operators. E.g. divide by two and reboot the computer.
OK. ATM I have a generous **but limited** number of operators.
Limited by which criteria?
The set of operators is limited to what's reasonably necessary such as
the usual stuff: function calls, array references, field selection,
bitwise operations, arithmetic, comparison, boolean and assignment. Most
are present in C; only a few are not such as bitwise combinations (e.g.
nand and nor) and these two: concatenate and boolean (aka logical) xor. What's so bad about that?
Again, IMO it's important for the language to provide such
pseudofunctions so that a programmer's code can be made clearer,
simpler, and more readable.
Like this one?
++p+++
I don't have a +++ operator so I am not sure what that is supposed to
mean. It's no valid in my language.
On 2022-11-14 12:03, James Harris wrote:
On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
On 2022-11-14 11:16, James Harris wrote:
On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
On 2022-11-13 23:10, James Harris wrote:
On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
On 2022-11-13 22:06, James Harris wrote:
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
...
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more >>>>>>>>>>> natural and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just >>>>>>>> as easily have
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Show me the algorithm.
There's no particular algorithm; the construct is a potential
component of many algorithms.
Show me one that is not array assignment.
if is_name_first(b[j])
a[i++] = b[j++]
rep while is_name_follow(b[j])
a[i++] = b[j++]
end rep
a[i] = 0
return TOK_NAME
end if
Now, what don't you like about the ++ operators in that? How would you
prefer to write it?
From parser production code:
procedure Get_Identifier
( Code : in out Source'Class;
Line : String;
Pointer : Integer;
Argument : out Tokens.Argument_Token
) is
Index : Integer := Pointer + 1;
Malformed : Boolean := False;
Underline : Boolean := False;
Symbol : Character;
begin
while Index <= Line'Last loop
Symbol := Line (Index);
if Is_Alphanumeric (Symbol) then
Underline := False;
elsif '_' = Symbol then
Malformed := Malformed or Underline;
Underline := True;
else
exit;
end if;
Index := Index + 1;
end loop;
Malformed := Malformed or Underline;
Set_Pointer (Code, Index);
Argument.Location := Link (Code);
Argument.Value := new Identifier (Index - Pointer);
declare
This : Identifier renames Identifier (Argument.Value.all);
begin
This.Location := Argument.Location;
This.Malformed := Malformed;
This.Value := Line (Pointer..Index - 1);
end;
end Get_Identifier;
I don't have a +++ operator so I am not sure what that is supposed to
mean. It's no valid in my language.
It could a part of even simpler and brilliantly readable:
++p+++q+++r
On 14/11/2022 11:29, Dmitry A. Kazakov wrote:
On 2022-11-14 12:03, James Harris wrote:
On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
On 2022-11-14 11:16, James Harris wrote:
On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
On 2022-11-13 23:10, James Harris wrote:
On 13/11/2022 21:29, Dmitry A. Kazakov wrote:
On 2022-11-13 22:06, James Harris wrote:
On 13/11/2022 20:49, Bart wrote:
On 13/11/2022 20:19, Dmitry A. Kazakov wrote:
On 2022-11-13 20:18, James Harris wrote:
...
Contrast
*p := *q
p := p + 1
q := q + 1
Perhaps
+p++ := *q++
expresses that part of the algorithm in a way which is more >>>>>>>>>>>> natural and easier to read...?
No algorithm requires you resort to pointers.
This doesn't have to be about pointers, Dmitry. One could just >>>>>>>>> as easily have
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Show me the algorithm.
There's no particular algorithm; the construct is a potential
component of many algorithms.
Show me one that is not array assignment.
if is_name_first(b[j])
a[i++] = b[j++]
rep while is_name_follow(b[j])
a[i++] = b[j++]
end rep
a[i] = 0
return TOK_NAME
end if
Now, what don't you like about the ++ operators in that? How would
you prefer to write it?
From parser production code:
procedure Get_Identifier
( Code : in out Source'Class;
Line : String;
Pointer : Integer;
Argument : out Tokens.Argument_Token
) is
Index : Integer := Pointer + 1;
Malformed : Boolean := False;
Underline : Boolean := False;
Symbol : Character;
begin
while Index <= Line'Last loop
Symbol := Line (Index);
if Is_Alphanumeric (Symbol) then
Underline := False;
elsif '_' = Symbol then
Malformed := Malformed or Underline;
Underline := True;
else
exit;
end if;
Index := Index + 1;
end loop;
Malformed := Malformed or Underline;
Set_Pointer (Code, Index);
Argument.Location := Link (Code);
Argument.Value := new Identifier (Index - Pointer);
declare
This : Identifier renames Identifier (Argument.Value.all);
begin
This.Location := Argument.Location;
This.Malformed := Malformed;
This.Value := Line (Pointer..Index - 1);
end;
end Get_Identifier;
Clearly you get paid by the line. Even then, the code where a substring
is copied into another location, which would require the double-stepping
of the relevant pointer/indices of the earlier example, is missing here.
I don't have a +++ operator so I am not sure what that is supposed to
mean. It's no valid in my language.
It could a part of even simpler and brilliantly readable:
++p+++q+++r
This is legal in Ada:
a := +b;
But for some reason, you can't write ++b or + +b, it has to be `a := +
(+b)`. So while you can't do +b++++c, you can write:
a:=+b+(+(+(+(c))));
Do the parentheses make this acceptable?
My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little about it.
On 14/11/2022 09:26, David Brown wrote:
On 13/11/2022 18:11, James Harris wrote:
On 08/11/2022 16:29, David Brown wrote:
...
You can't stop everything - those evil programmers have better
imaginations than any well-meaning language designer. But you can
try. Aim to make it harder to write convoluted code, and easier to >>>> write clearer code. And try to make the clearer code more
efficient, to reduce the temptation to write evil code.
I agree
...
I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!
You /say/ that, but you don't appear to believe it or be interested in
making it happen.
That comment surprises me a little.
/The main point/ of this, AISI, is
to make the job of the programmer simpler and to help him write code
which is more readable. You said yourself that (paraphrasing) when
there's a choice between clear and convoluted code it's important for a language to make the clearer code the easier one to write.
On the one side, you claim you want a clear language that is
understandable for programmers and maintenance.
Yes.
On the other side, you want to decide what "++E++" should mean, with
random "^" characters thrown in for good measure.
Not quite. As language designer I have to decide what facilities will be provided but I do not have absolute control over what a programmer may
do with the facilities.
Nor would a programmer want to work with a
language which implemented unnecessary rules or rules which he may see
as arbitrary.
The expression you mention is just one of a myriad of what you might
consider to be potential nasties. If I am going to prohibit that one
then what about all the others?
These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.
OK, let's take the combination you mentioned:
++E++
I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in the
order that it says it. Remember that unlike C I define the apparent
order of evaluation so the expression is perfectly well formed.
James Harris <james.harris.1@gmail.com> writes:
Rather than allowing non-ASCII in source I came up with a scheme of what
you might call 'named characters' extending the backslash idea of C to
allow names instead of single characters after the backslash. It's off
topic for this thread but it allows non-ASCII characters to be named
(such that the names consist of ASCII characters and would thus be
readable and universal).
Here's an example of a Python program.
print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )
It prints:
\"
On 2022-11-14 13:45, Bart wrote:
Clearly you get paid by the line. Even then, the code where a
substring is copied into another location, which would require the
double-stepping of the relevant pointer/indices of the earlier
example, is missing here.
It is there:
This.Value := Line (Pointer..Index - 1);
assigning array as a whole.
I don't have a +++ operator so I am not sure what that is supposed
to mean. It's no valid in my language.
It could a part of even simpler and brilliantly readable:
++p+++q+++r
This is legal in Ada:
a := +b;
But for some reason, you can't write ++b or + +b, it has to be `a := +
(+b)`. So while you can't do +b++++c, you can write:
a:=+b+(+(+(+(c))));
Do the parentheses make this acceptable?
I don't understand the point. Why would you like to have two unary
pluses in a row?
My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little about it.
No, the point is that no reasonable code should be nonsense-looking and conversely.
Increments cross that line.
They were perfectly acceptable in K&R C. C
was a very large and quite complex language then. It was beautiful
comparing to FORTRAN IV, but I could not use it on a 64K machine. A
5-pass C compiler took an eternity.
Machines then were small and simple.
Programs were tiny. *++p was a reasonably complex code.
On 14/11/2022 09:26, David Brown wrote:
On 13/11/2022 18:11, James Harris wrote:
On 08/11/2022 16:29, David Brown wrote:
I'm not worried about the compiler. As long as it can make sense of a
piece of code then it can compile it. It's the human I am concerned
about, especially the poor old maintenance programmer!
You /say/ that, but you don't appear to believe it or be interested in
making it happen.
On the one side, you claim you want a clear language that is
understandable for programmers and maintenance. On the other side,
you want to decide what "++E++" should mean, with random "^"
characters thrown in for good measure.
In-place, value-returning increment ops written as ++ and -- are common
in languages.
So are pointer-dereference operators in lower-level languages, whether written as * or ^.
Once you have those two possibilities in a language, why shouldn't you
define what combinations of those operators might mean?
(I just differ from James in thinking that successive *value-returning**
++ or -- operators, whether prefix or postfix, are not meaningful. I'd
also think it would be bad form to chain them, but it is not practical
to be ban at the syntax level.
However, I have sometimes banned even `a+b` in some contexts, when the resulting value is unused.)
Is your point that you shouldn't have either of those operators?
++ and
-- can be replaced at some inconvenience. But getting rid of dereference
is harder; if P is a pointer:
print P
will this display the value of the pointer, or the value of its target?
If only there was a way to specify that precisely!
Note that when p and q are byte pointers, then *p++ = *q++ (or p++^ :=
q++^) corresponds to the one-byte Z80 LDI instruction.
So it's something so meaningless that that tiny 8-bit processor decided
to give it its own instruction.
On 13/11/2022 16:55, James Harris wrote:
On 10/11/2022 10:42, Bart wrote:
It's clear that ++(A+B) can't work unless you change what ++ means
(eg. ++A now means (A+1) because whatever ++ modifies is not
accessible).
It's similar if A were a struct. You could have
A.F
but you could not have
(A + 4).F
The LHS of the . operation, in this case, has to be an lvalue.
That's not right. C allows functions to return structs; they are not
lvalues, but you can apply ".":
typedef struct{int x,y;}Point;
Point F(void) {
Point p;
return p;
}
int main(void)
{
Point p;
int a;
// F()=p; // Not valid: not an lvalue
a=F().x; // Valid
}
On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
On 2022-11-14 11:16, James Harris wrote:
a[i++] := b[j++]
Same question. What for?
What do you mean, "What for?"?
Show me the algorithm.
There's no particular algorithm; the construct is a potential
component of many algorithms.
Show me one that is not array assignment.
What exactly is your objection: that there shouldn't be an increment
operator at all?
Then it's end of discussion. But if you allow a value-returning
increment operator, then someone could use it in multiple places in the
same expresion, together with other operators, and the language has to
be able to deal with it.
Note that any language that has reference parameters would allow this:
a[postincr(i)] := b[postincr(j)]
On 14/11/2022 13:17, Dmitry A. Kazakov wrote:
On 2022-11-14 13:45, Bart wrote:
Clearly you get paid by the line. Even then, the code where a
substring is copied into another location, which would require the
double-stepping of the relevant pointer/indices of the earlier
example, is missing here.
It is there:
This.Value := Line (Pointer..Index - 1);
assigning array as a whole.
OK, but this is then doing it in two passes. The original used only one
pass.
And the code when doing the transfer, whether as a loop or
utilising a machine's block copy features, is not shown here.
My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little about
it.
No, the point is that no reasonable code should be nonsense-looking
and conversely.
Increments cross that line.
But it performs a task that is needed,
They were perfectly acceptable in K&R C. C was a very large and quite
complex language then. It was beautiful comparing to FORTRAN IV, but I
could not use it on a 64K machine. A 5-pass C compiler took an eternity.
Did it? I never tried one. I only found out recently that they were that
slow from reading reviews of C compilers of that era in Byte magazine.
On 14/11/2022 11:47, Bart wrote:
Once you have those two possibilities in a language, why shouldn't you
define what combinations of those operators might mean?
If you don't have them, you don't have a problem.
Pointer dereferencing like this is not a requirement for a language. If
you have "proper" arrays (I write it like that because the concept of
"array" can be defined in many ways), multiple return values for
functions, and a way to define data structures such as trees and lists,
where else do you actually need pointers?
Pure functional programming languages don't have pointers, or increment operators - they don't even have assignment. Functional programming languages are usually considered quite high level, but some slightly
impure functional programming languages - such as OCaml - are very
efficient compiled languages that rival C, Pascal, Ada, Fortran, etc.,
for speed. OCaml /does/, AFAIUI (I am no expert in that language) have variables and pointers or references, but they are very rarely seen explicitly, and are intentionally cumbersome to use.
Maybe the OP is designing a language in which pointer dereferencing and increment are expected to turn up so often that it is useful to combine them. But I think it is at lot more likely that this is a mistaken assumption based on limited experience with different kinds of
programming languages. The result will be like your own language - a re-implementation of C or Pascal, with some benefits and some new disadvantages, and nothing of real innovation or interest.
I am trying
to make suggestions to break that pattern.
(I just differ from James in thinking that successive
*value-returning** ++ or -- operators, whether prefix or postfix, are
not meaningful. I'd also think it would be bad form to chain them, but
it is not practical to be ban at the syntax level.
If you think it is "bad form", ban it.
For any language that is going
to be successful in a wider field, not just a plaything for one person,
the man-hour effort in /using/ the language will far outweigh the effort /designing/ or /implementing/ it. Thus it does not matter if a good
design choice is difficult to implement, as it will save effort in the
long run.
but I prefer shorter names such as
print "\bksl/\q11/"
|\abcprint( r"\abc" )
| File "<stdin>", line 1print( r"abc\" )
On 14/11/2022 10:44, James Harris wrote:
On 14/11/2022 09:26, David Brown wrote:
OK, let's take the combination you mentioned:
++E++
I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well formed.
But does this have the same priorities as:
op1 E op2
(where op2 is commonly done first) or does it have special rules, so
that in:
--E++
the -- is done first? If it's different, then what is the ordering when
mixed with other unary ops?
You explained somewhere the circumstances where you think this is
meaningful, but I can't remember what the rules are and I can't find the exact post.
This is the problem. You shouldn't need to stop and think. I make the
rules simple by stipulating that value-returning ++ and -- only ever
return rvalues.
Because if they ever start to return lvalues, then this becomes possible:
++E := 0
E++ := 0
(Whichever one is legal in your scheme.) So I think there is little
useful expressivity to be gained.
James Harris <james.harris.1@gmail.com> writes:
but I prefer shorter names such as
print "\bksl/\q11/"
I see. You could use Unicode names as a fallback for those
characters for which you have not defined a name yet.
The raw string literals in Python have a strange irregularity:
One can use a backslash with no escape, except at the very end
of the string, so one can write
r"\abc" to get \abc, but one can not write
r"abc\" to get abc\. Transcript:
|\abcprint( r"\abc" )
| File "<stdin>", line 1print( r"abc\" )
| print( r"abc\" )
| ^
|SyntaxError: EOL while scanning string literal
I tried to design an escape mechanism for string literals in my
language "Unotal" that has no irregularities.
I already described this kind of string literals here on 2021-09-29,
but in the meantime I have actually written an implementation.
I also have written a tiny demo implementation in Python which
can be used to experiment with the notation (see below).
Here's a short summary of my notation:
- a string literal is written using brackets, as in [abc],
which means the string "abc" (3 characters: a, b, and c).
- nested brackets are allowed: [abc[def]ghi] is "abc[def]ghi"
(11 characters).
- a single left bracket is written as "[`]". This is
admittedly ugly, but it is very rare in most kinds of texts,
so that conflicts with texts containing a literal "[`]"
should be very rare.
- a single right bracket is written as "[]`" for similar
reasons.
I tried to make sure that no other rules are needed and that
every text can be encoded this way.
Here's a small Python program with a tiny scanner.
source code
def scan( source ):
return source[ 1: -1 ].replace( '[`]', '[' ).replace( '[]`', ']' )
def demo( source ):
print( f"{source:14}", scan( source ))
print( f"{'literal':14}", 'meaning' )
demo( '[def]' )
demo( '[de[]f]' )
demo( '[de[`]f]' )
demo( '[[de[`]]f]' )
demo( '[de[]`f]' )
demo( '[de[`]`[]`f]' )
demo( '[de[`][]``f]' )
demo( '[`]]' )
demo( '[[]`]' )
output
literal meaning
[def] def
[de[]f] de[]f
[de[`]f] de[f
[[de[`]]f] [de[]f
[de[]`f] de]f
[de[`]`[]`f] de[`]f
[de[`][]``f] de[]`f
[`]] `]
[[]`] ]
On 14/11/2022 11:44, James Harris wrote:
On 14/11/2022 09:26, David Brown wrote:
On 13/11/2022 18:11, James Harris wrote:
On 08/11/2022 16:29, David Brown wrote:
The expression you mention is just one of a myriad of what you might
consider to be potential nasties. If I am going to prohibit that one
then what about all the others?
Prohibit nasty ones.
A big step in that direction is to say that assignment is a statement,
not an expression,
and that variables cannot be changed by side-effects.
(How you relate this to function calls is a related and complex issue
that I have been glossing over here. An idea would be to distinguish between "procedures" that may have side effects, and "functions" that do not.)
That means there is no such thing as an "increment" operator - post or pre.
It also /hugely/ simplifies the language - both for the programmer, and
for the implementer. If expressions have no side-effects, they can be duplicated, split up, re-arranged, moved around in code, all without
affected the behaviour of the program.
These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.
OK, let's take the combination you mentioned:
++E++
I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well formed.
The very fact that you are discussing how to define it means it is not
clear and obvious.
It is not obvious which order the increments happen,
or if the order is defined, or if the order matters. It is not obvious
what the return value should be. It is not obvious where you have
lvalues or rvalues (not that a language should necessarily have such concepts). It is not obvious what happens to E.
There's a very tempting myth in language design that /defining/
behaviour is key - that gibberish and incorrect code can somehow be made "correct" by defining its behaviour. You are not alone in this - lots
of languages try to achieve "no undefined behaviour" by defining the behaviour of everything instead of banning things that have no correct behaviour.
On 14/11/2022 10:24, Stefan Ram wrote:
James Harris <james.harris.1@gmail.com> writes:
Rather than allowing non-ASCII in source I came up with a scheme of what >>> you might call 'named characters' extending the backslash idea of C to
allow names instead of single characters after the backslash. It's off
topic for this thread but it allows non-ASCII characters to be named
(such that the names consist of ASCII characters and would thus be
readable and universal).
Here's an example of a Python program.
print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )
It prints:
\"
I see they are Unicode names. If I were to support such names I would
have your example as something like
print "\U:reverse solidus/\U:quotation mark/"
but I prefer shorter names such as
print "\bksl/\q11/"
At least with Unicode someone has already defined a name for every
character, but Unicode includes a lot of nonsense such as a character
called
On 14/11/2022 16:14, James Harris wrote:
On 14/11/2022 10:24, Stefan Ram wrote:
James Harris <james.harris.1@gmail.com> writes:
Rather than allowing non-ASCII in source I came up with a scheme of
what
you might call 'named characters' extending the backslash idea of C to >>>> allow names instead of single characters after the backslash. It's off >>>> topic for this thread but it allows non-ASCII characters to be named
(such that the names consist of ASCII characters and would thus be
readable and universal).
Here's an example of a Python program.
print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )
It prints:
\"
I see they are Unicode names. If I were to support such names I would
have your example as something like
print "\U:reverse solidus/\U:quotation mark/"
but I prefer shorter names such as
print "\bksl/\q11/"
At least with Unicode someone has already defined a name for every
character, but Unicode includes a lot of nonsense such as a character
called
How about using the HTML character entity names? That would be "\"". These are a good deal shorter than Unicode names but
are vastly better than inventing your own names.
<https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>
On 14/11/2022 15:23, David Brown wrote:
On 14/11/2022 11:47, Bart wrote:
Once you have those two possibilities in a language, why shouldn't
you define what combinations of those operators might mean?
If you don't have them, you don't have a problem.
Pointer dereferencing like this is not a requirement for a language.
If you have "proper" arrays (I write it like that because the concept
of "array" can be defined in many ways), multiple return values for
functions, and a way to define data structures such as trees and
lists, where else do you actually need pointers?
I have a dynamic language with proper, first class lists, trees,
strings, records, which take care of 90% of the pointer uses in a
C-class language. Yet they can still be useful. This is an extract from
a program that dumps the contents of an EXE file:
coffptr:=makeref(pedata+coffoffset,imagefileheader)
genstrln("Coff header: "+tostr(coffptr^))
genstrln("Machine: "+tostr(coffptr^.machine,"h2"))
genstrln("Nsections: "+tostr(coffptr^.nsections,"h2"))
genstrln("Timestamp: "+tostr(coffptr^.timedatestamp,"h4"))
genstrln("Symtab offset: "+tostr(coffptr^.symtaboffset))
genstrln("Nsymbols: "+tostr(coffptr^.nsymbols))
genstrln("Opt Hdr size: "+tostr(coffptr^.optheadersize))
genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b"))
genline()
Pure functional programming languages don't have pointers, or
increment operators - they don't even have assignment. Functional
programming languages are usually considered quite high level, but
some slightly impure functional programming languages - such as OCaml
- are very efficient compiled languages that rival C, Pascal, Ada,
Fortran, etc., for speed. OCaml /does/, AFAIUI (I am no expert in
that language) have variables and pointers or references, but they are
very rarely seen explicitly, and are intentionally cumbersome to use.
That pure functional languages aren't used everywhere suggests they
aren't great at the everyday tasks like the ones I deal with.
(I would like to see Haskell's take on that task of decoding that EXE
file, and dealing with that specific data layout. My example was the
simplest part of it.
For that matter, how would you do it in Python? Rather painfully I would imagine.)
Maybe the OP is designing a language in which pointer dereferencing
and increment are expected to turn up so often that it is useful to
combine them. But I think it is at lot more likely that this is a
mistaken assumption based on limited experience with different kinds
of programming languages. The result will be like your own language -
a re-implementation of C or Pascal, with some benefits and some new
disadvantages, and nothing of real innovation or interest.
Innovation these days seems to be:
* To create incomprehensible languages that require several advanced
degrees in mathematics, PL and type theory to understand
* To make it as hard as possible to perform any tasks by removing
features such as loops, mutable variables and functions with
side-effects. (It's worth bearing in mind that most elements in a
computer system: display, file-system and don't forget the memory, are necessarily mutable.)
* To tie you up in knots with strictly typed everything (or in Rust,
with its 'borrow checker'.
No thanks. My innovation is keeping this stuff simple, accessible, fast,
and at a human scale.
I am trying to make suggestions to break that pattern.
Look at Reddits PL forum. At least 90% of new languages there are
FP-based. Yet when you look at the implementation languages, it tends to
be a different story.
(I just differ from James in thinking that successive
*value-returning** ++ or -- operators, whether prefix or postfix, are
not meaningful. I'd also think it would be bad form to chain them,
but it is not practical to be ban at the syntax level.
If you think it is "bad form", ban it.
Obviously I can't ban `a + b`. Equally obviously, this code is pointless:
a + b;
Given an expression-based language, what would you do? In the past,
after working with C, I would unthinkingly type:
a = b
in my syntax instead of a := b. This wasn't an error: it would compare a
and b then discard the result. But it was a bug.
For any language that is going to be successful in a wider field,
not just a plaything for one person, the man-hour effort in /using/
the language will far outweigh the effort /designing/ or
/implementing/ it. Thus it does not matter if a good design choice is
difficult to implement, as it will save effort in the long run.
In my case, implementing a series of compilers over about 20 years took perhaps one year of /part-time/ work. The rest of it was using the
language, even as an individual. So at least 20:1.
On 14/11/2022 13:17, Dmitry A. Kazakov wrote:
On 2022-11-14 13:45, Bart wrote:
My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little about
it.
No, the point is that no reasonable code should be nonsense-looking
and conversely.
Increments cross that line.
But it performs a task that is needed, and in their absence, would
simply be implemented, less efficiently and with more cluttery code,
using other means.
I have my own misgivings about it: there are in all 6 varieties of
Increment (++x; --x; a:=++x; a:=--x; a:=x++; a:=x--), which are a pig to implement, and they spoil the lines of code like this:
a[++n] := x
b[n] := y
c[n] := z
Delete that first line, and you need to remember to transfer that ++n to
the next line. And not to repeat that ++n as is easy to do.
On 14/11/2022 11:29, Dmitry A. Kazakov wrote:
On 2022-11-14 12:03, James Harris wrote:
On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
On 2022-11-14 11:16, James Harris wrote:
On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
...
Show me the algorithm.
There's no particular algorithm; the construct is a potential
component of many algorithms.
Show me one that is not array assignment.
if is_name_first(b[j])
a[i++] = b[j++]
rep while is_name_follow(b[j])
a[i++] = b[j++]
end rep
a[i] = 0
return TOK_NAME
end if
Now, what don't you like about the ++ operators in that? How would
you prefer to write it?
From parser production code:
procedure Get_Identifier
( Code : in out Source'Class;
Line : String;
Pointer : Integer;
Argument : out Tokens.Argument_Token
) is
Index : Integer := Pointer + 1;
Malformed : Boolean := False;
Underline : Boolean := False;
Symbol : Character;
begin
while Index <= Line'Last loop
Symbol := Line (Index);
if Is_Alphanumeric (Symbol) then
Underline := False;
elsif '_' = Symbol then
Malformed := Malformed or Underline;
Underline := True;
else
exit;
end if;
Index := Index + 1;
end loop;
Malformed := Malformed or Underline;
Set_Pointer (Code, Index);
Argument.Location := Link (Code);
Argument.Value := new Identifier (Index - Pointer);
declare
This : Identifier renames Identifier (Argument.Value.all);
begin
This.Location := Argument.Location;
This.Malformed := Malformed;
This.Value := Line (Pointer..Index - 1);
end;
end Get_Identifier;
Well, that's an astonishingly long piece of code, Dmitry,
and if I read
it correctly it doesn't even check whether it begins on a name-first character: that has to be decided before the procedure starts!
But I am not sure I do understand it. Even allowing for what I believe
is meant to be double underscore detection (except at the start and
end?) it takes significantly more study than the simple name-first, name-follow code which preceded it.
On 2022-11-14 12:03, James Harris wrote:
On 14/11/2022 10:32, Dmitry A. Kazakov wrote:
On 2022-11-14 11:16, James Harris wrote:
On 14/11/2022 07:52, Dmitry A. Kazakov wrote:
Show me the algorithm.
There's no particular algorithm; the construct is a potential
component of many algorithms.
Show me one that is not array assignment.
if is_name_first(b[j])
a[i++] = b[j++]
rep while is_name_follow(b[j])
a[i++] = b[j++]
end rep
a[i] = 0
return TOK_NAME
end if
Now, what don't you like about the ++ operators in that? How would you
prefer to write it?
From parser production code:
procedure Get_Identifier
( Code : in out Source'Class;
Line : String;
Pointer : Integer;
Argument : out Tokens.Argument_Token
) is
Index : Integer := Pointer + 1;
Malformed : Boolean := False;
Underline : Boolean := False;
Symbol : Character;
begin
while Index <= Line'Last loop
Symbol := Line (Index);
if Is_Alphanumeric (Symbol) then
Underline := False;
elsif '_' = Symbol then
Malformed := Malformed or Underline;
Underline := True;
else
exit;
end if;
Index := Index + 1;
end loop;
Malformed := Malformed or Underline;
Set_Pointer (Code, Index);
Argument.Location := Link (Code);
Argument.Value := new Identifier (Index - Pointer);
declare
This : Identifier renames Identifier (Argument.Value.all);
begin
This.Location := Argument.Location;
This.Malformed := Malformed;
This.Value := Line (Pointer..Index - 1);
end;
end Get_Identifier;
Again, IMO it's important for the language to provide such
pseudofunctions so that a programmer's code can be made clearer,
simpler, and more readable.
Like this one?
++p+++
I don't have a +++ operator so I am not sure what that is supposed to
mean. It's no valid in my language.
It could a part of even simpler and brilliantly readable:
++p+++q+++r
On 14/11/2022 16:59, Bart wrote:
I have a dynamic language with proper, first class lists, trees,
strings, records, which take care of 90% of the pointer uses in a
C-class language. Yet they can still be useful. This is an extract
from a program that dumps the contents of an EXE file:
coffptr:=makeref(pedata+coffoffset,imagefileheader)
genstrln("Coff header: "+tostr(coffptr^))
genstrln("Machine: "+tostr(coffptr^.machine,"h2")) >> genstrln("Nsections: "+tostr(coffptr^.nsections,"h2")) >> genstrln("Timestamp: "+tostr(coffptr^.timedatestamp,"h4"))
genstrln("Symtab offset: "+tostr(coffptr^.symtaboffset))
genstrln("Nsymbols: "+tostr(coffptr^.nsymbols))
genstrln("Opt Hdr size: "+tostr(coffptr^.optheadersize))
genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b")) >> genline()
None of that needs pointers or references.
Initialise a "coff_header" read-only unmutable variable from a slice of
the memory array holding the image.
No thanks. My innovation is keeping this stuff simple, accessible,
fast, and at a human scale.
I disagree with your scepticism, but I agree that there are lots of
languages with different paradigms for different purposes.
However, making yet-another-C is IMHO a pointless exercise. It might be better in some ways, but not enough to make it worth the effort.
Obviously I can't ban `a + b`. Equally obviously, this code is pointless:
a + b;
You can ban that. Rule number 42 - the result of an expression must be assigned to a variable, used in another expression, or passed as the
argument to a function call. No problem.
Given an expression-based language, what would you do? In the past,
after working with C, I would unthinkingly type:
a = b
in my syntax instead of a := b. This wasn't an error: it would compare
a and b then discard the result. But it was a bug.
Any decent C compile (with the right options) will complain about the C equivalent, "a == b;", as a statement with no effect. It could just as easily be made an error in a language.
And if you follow my suggestion that expressions can't have
side-effects, then it's easy to distinguish between "statements" and "expressions" because you no longer have a C-style "expression statement".
On 14/11/2022 15:24, Bart wrote:
On 14/11/2022 13:17, Dmitry A. Kazakov wrote:
On 2022-11-14 13:45, Bart wrote:
...
My point is that you can legally combine any number of operators to
result in nonsense-looking code, and the language can do little
about it.
No, the point is that no reasonable code should be nonsense-looking
and conversely.
Increments cross that line.
But it performs a task that is needed, and in their absence, would
simply be implemented, less efficiently and with more cluttery code,
using other means.
I have my own misgivings about it: there are in all 6 varieties of
Increment (++x; --x; a:=++x; a:=--x; a:=x++; a:=x--), which are a pig
to implement, and they spoil the lines of code like this:
I thought you correctly said before that ++x was (++x; x) and x++ was
(t:=x; x++; rval(t)). Adding -- that's still only four varieties, IMO.
On 14/11/2022 17:28, David Brown wrote:
On 14/11/2022 16:14, James Harris wrote:
On 14/11/2022 10:24, Stefan Ram wrote:
James Harris <james.harris.1@gmail.com> writes:
Rather than allowing non-ASCII in source I came up with a scheme of
what
you might call 'named characters' extending the backslash idea of C to >>>>> allow names instead of single characters after the backslash. It's off >>>>> topic for this thread but it allows non-ASCII characters to be named >>>>> (such that the names consist of ASCII characters and would thus be
readable and universal).
Here's an example of a Python program.
print( "\N{REVERSE SOLIDUS}\N{QUOTATION MARK}" )
It prints:
\"
I see they are Unicode names. If I were to support such names I would
have your example as something like
print "\U:reverse solidus/\U:quotation mark/"
but I prefer shorter names such as
print "\bksl/\q11/"
At least with Unicode someone has already defined a name for every
character, but Unicode includes a lot of nonsense such as a character
called
How about using the HTML character entity names? That would be
"\"". These are a good deal shorter than Unicode names but
are vastly better than inventing your own names.
<https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>
Thanks for the pointer. I /could/ include them with my syntax along the
lines of
"\H:Backslash/\H:quot/"
where H: indicates HTML names. I did consider them before but had to
reject them. I cannot remember all the reasons why, now, but from taking
a quick look they appear, like Unicode, to be more for printing than for processing. For example, there is frac45 for 4/5 but such a scheme
allows only the fractions which are predefined. Also, HTML names combine diacritics with characters (e.g. yacute) whereas AISI it's important for
them to be kept separate.
What's needed, IMO, is a set of names intended for /processing/ rather
than for typesetting.
On 14/11/2022 14:47, David Brown wrote:
On 14/11/2022 11:44, James Harris wrote:
On 14/11/2022 09:26, David Brown wrote:
On 13/11/2022 18:11, James Harris wrote:
On 08/11/2022 16:29, David Brown wrote:
...
The expression you mention is just one of a myriad of what you might
consider to be potential nasties. If I am going to prohibit that one
then what about all the others?
Prohibit nasty ones.
Enumerating the 'nasty ones' is the problem. If there are 20 dyadic
operators then there are something like 400 ways of combining them. AIUI
you want me to pick some of those 400 and tell the programmer "you
cannot combine these two even where there's no type mismatch".
A big step in that direction is to say that assignment is a statement,
not an expression,
Done that.
and that variables cannot be changed by side-effects.
I will not be doing that. I know you favour functional programming, and that's fine, but the language I am working on is unapologetically
imperative.
(How you relate this to function calls is a related and complex
issue that I have been glossing over here. An idea would be to
distinguish between "procedures" that may have side effects, and
"functions" that do not.)
That means there is no such thing as an "increment" operator - post or
pre.
It also /hugely/ simplifies the language - both for the programmer,
and for the implementer. If expressions have no side-effects, they
can be duplicated, split up, re-arranged, moved around in code, all
without affected the behaviour of the program.
This needs a separate discussion, David. It is far too big a topic for
this thread. (Feel free to start a new one; I have plenty to say!) All I
can say here is as I said above, the language I am working on is
imperative.
These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.
OK, let's take the combination you mentioned:
++E++
I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well formed. >>>
The very fact that you are discussing how to define it means it is not
clear and obvious.
On that I must disagree.
One cannot expect to understand a language
without at least learning the basics. Consider
a * b + c
I would parse that as (a*b)+c but not all languages would. As a reader
of infix code you would have to know the order in which operators would
be applied. You have to know the basics of a language in order to read
code written in it.
It is not obvious which order the increments happen, or if the order
is defined, or if the order matters. It is not obvious what the
return value should be. It is not obvious where you have lvalues or
rvalues (not that a language should necessarily have such concepts).
It is not obvious what happens to E.
Of course it's not obvious to someone who doesn't know the rules of the language. A language designer cannot produce a language with no rules.
What a language designer /can/ do is to make the rules simple and understandable - but someone who reads the code still has to understand
what the rules are.
As for the rules we have been discussing here those I have come up with
are, in the main, the ones you would be familiar with; even the new ones
are logical and simple. Once you understand them it's incredibly easy to parse an expression, even of the kind which, to you, looks like gibberish.
In fact, I have to say that neither of those you have objected to should really look like gibberish, even to the uninitiated. Would you find
++A + B++
objectionable?
If not, I cannot see why you would find
++E + E++
so objectionable, either.
Isn't the only new thing you, as a reader of
such code, would need to know is the operations are carried out in?
Otherwise it's just like the preceding expression.
...
There's a very tempting myth in language design that /defining/
behaviour is key - that gibberish and incorrect code can somehow be
made "correct" by defining its behaviour. You are not alone in this -
lots of languages try to achieve "no undefined behaviour" by defining
the behaviour of everything instead of banning things that have no
correct behaviour.
My reason for defining /apparent/ code behaviour is to ensure
computational consistency on different platforms. Who wouldn't want that?
[...] But there is absolutely
/nothing/ about being an imperative language that suggests you need
to allow side-effects or assignments /within/ expressions.
On 14/11/2022 11:11, Bart wrote:
On 14/11/2022 10:44, James Harris wrote:
On 14/11/2022 09:26, David Brown wrote:
...
OK, let's take the combination you mentioned:
++E++
I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well formed.
But does this have the same priorities as:
op1 E op2
(where op2 is commonly done first) or does it have special rules, so
that in:
--E++
the -- is done first? If it's different, then what is the ordering
when mixed with other unary ops?
You explained somewhere the circumstances where you think this is
meaningful, but I can't remember what the rules are and I can't find
the exact post.
The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
postfix ++ or --.
Your example would probably be better expressed as
E - 1
!
As for the bigger picture of operator precedences, it's based on the transformations which operators naturally make to their operands. For example, comparisons (such as less-than) naturally take numbers and
produce booleans so their precedences put them after numeric operators
(+, * etc) and before boolean operators (and, or, not, etc). The natural series is (simplified)
locations (such as field selection and array indexing)
numbers (retrieved from those locations)
comparisons (of those numbers)
booleans
Because if they ever start to return lvalues, then this becomes possible:
++E := 0
E++ := 0
(Whichever one is legal in your scheme.) So I think there is little
useful expressivity to be gained.
Indeed, neither of those is useful.
Interestingly, I just tried
++E = 0;
with cc and c++ compilers. The first rejected it (as ++E needing to be
an lvalue); the second accepted it. That's not a reflection of either compiler, BTW, but without checking it's probably more to do with the language/dialect definition. FWIW I expect the second compiler could be persuaded to issue a warning about unreachable code or suchlike.
On 14/11/2022 17:46, David Brown wrote:
On 14/11/2022 16:59, Bart wrote:
I have a dynamic language with proper, first class lists, trees,
strings, records, which take care of 90% of the pointer uses in a
C-class language. Yet they can still be useful. This is an extract
from a program that dumps the contents of an EXE file:
coffptr:=makeref(pedata+coffoffset,imagefileheader)
genstrln("Coff header: "+tostr(coffptr^))
genstrln("Machine: "+tostr(coffptr^.machine,"h2"))
genstrln("Nsections: "+tostr(coffptr^.nsections,"h2"))
genstrln("Timestamp: "+tostr(coffptr^.timedatestamp,"h4"))
genstrln("Symtab offset: "+tostr(coffptr^.symtaboffset))
genstrln("Nsymbols: "+tostr(coffptr^.nsymbols))
genstrln("Opt Hdr size: "+tostr(coffptr^.optheadersize))
genstrln("Characteristics: "+tostr(coffptr^.characteristics,"b"))
genline()
None of that needs pointers or references.
Initialise a "coff_header" read-only unmutable variable from a slice
of the memory array holding the image.
At minimum, this tasks needs to the ability to take a block of bytes,
and variously interpret parts of it as primitive numeric types of
specific widths and signedness.
That can be helped by having pointers to such types. It can be further helped by allowing a struct type which is a collection of such types inIt is certainly handy to have a way of interpreting the bytes of an
a particular layout. And a way to transfer data from arbitrary bytes to
that struct object. Or to map the address of that struct into the middle
of that block. (Or as I do it above, set a pointer to that struct to the middle of the block.)
This is stuff which is meat-and-drink to a lower-level language like C,optheadersize, characteristics = struct.unpack_from("<HHIIIHH")
or like mine (even my scripting language).
It requires some effort in Python and the result will be clunky (and probably require some add-on modules).
import struct # Standard module
bs = open("potato_c.cof").read()
machine, nsections, timestamp, symtaboffset, nsymbols,
While a functional language willI don't believe that. I am not familiar enough with Haskell to be able
struggle (to be accurate, the programmer will struggle because they've chosen the wrong language).
Here's a more challenging record type that comes up in OBJ files:
type imagesymbol=struct
union
stringz*8 shortname
struct
u32 short
u32 long
end
u64 longname
end
u32 value
u16 sectionno
u16 symtype
byte storageclass
byte nauxsymbols
end
(Again, this is defined directly in my /dynamic/ scripting language.)
Nothing of that is /remotely/ worth making a new language and giving upNo thanks. My innovation is keeping this stuff simple, accessible,
fast, and at a human scale.
I disagree with your scepticism, but I agree that there are lots of
languages with different paradigms for different purposes.
However, making yet-another-C is IMHO a pointless exercise. It might
be better in some ways, but not enough to make it worth the effort.
If you're going to use a C-class language, then why not one with some
modern refinements? That's what I do.
(For example, default 64-bit everything; a module scheme; value arrays;sane type syntax; whole-program compilation; slices; expression-based
(see below); a 'byte' type! )
Obviously I can't ban `a + b`. Equally obviously, this code is
pointless:
a + b;
You can ban that. Rule number 42 - the result of an expression must
be assigned to a variable, used in another expression, or passed as
the argument to a function call. No problem.
This is effectively what I did. The only expressions allowed as
standalone statements were assignments; function calls; increments.
Anything else required an `eval` prefix to force evaluation.
Given an expression-based language, what would you do? In the past,
after working with C, I would unthinkingly type:
a = b
in my syntax instead of a := b. This wasn't an error: it would
compare a and b then discard the result. But it was a bug.
Any decent C compile (with the right options) will complain about the
C equivalent, "a == b;", as a statement with no effect. It could just
as easily be made an error in a language.
And if you follow my suggestion that expressions can't have
side-effects, then it's easy to distinguish between "statements" and
"expressions" because you no longer have a C-style "expression
statement".
Because my early languages were loosely based on Algol68, not C, they
were expression-based. Later I simplified to distinct statements and expressions, but now I've gone back.
Now both my languages are expression-based. That is, statements and expressions are interchangeable. That's supposed to be good, right,
because FP languages work the same way? I think expression-based are regarded as superior.
But it does make some things harder.
For a start, any expression can have side-effects, because an expression
can be or can include what you might call a statment.
So I can get rid of ++ here:
A[i++] := 0
but I could simply write it like this:
A[t:=i; i:=i+1; t] := 0
In which case I might as well keep the ++.
On 14/11/2022 20:45, David Brown wrote:
[To James:]
[...] But there is absolutely
/nothing/ about being an imperative language that suggests you need
to allow side-effects or assignments /within/ expressions.
If assignments within expressions are verboten, then you need to either forbid assignments within functions or have two classes of function, those with and those without [inc sub-procedures].
If assignments are
allowed at all, then you cannot in general tell at compile time whether
any assignment is reached at run time, leading to further complications.
If you regard output as a side-effect, that too leads to problems. Yet during program development it is common to insert temporary diagnostic printing or variables. I can understand the concept of languages without assignments or other side effects; and of languages with them; I find it difficult to see the point of languages where such things are allowed in
some places and not in others. If we need hair shirts [and I'm not sure that we do], they should be worn all the time, not put on and taken off
in accordance with arcane rules that only high priests understand.
On 14/11/2022 16:21, James Harris wrote:
On 14/11/2022 11:11, Bart wrote:
The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
postfix ++ or --.
Is that for all unary operators or just ++ and --?
If only for those, then those exceptions will make rules for all unary
ops even more bizarre. If for all of them, then one consequence is that
`-P^` is parsed as `(-P)^`, so that you try to negate a pointer, instead
of what's at the pointer target.
However, I have my own exceptions, which are casts:
ref u16(P)^ := 0
If P was a byte pointer, this will now write 16 bits not 8. Here, the
cast is done first, differently from a unary operator.
But I cover this in my list below: casts are syntax so trump everything. (Which comes first in `ref T (X)[i]`, I don't know, I'd have to test it.
But this would probably be written to as (X[i]) to remove doubt.)
Because if they ever start to return lvalues, then this becomes
possible:
++E := 0
E++ := 0
(Whichever one is legal in your scheme.) So I think there is little
useful expressivity to be gained.
Indeed, neither of those is useful.
Interestingly, I just tried
++E = 0;
with cc and c++ compilers. The first rejected it (as ++E needing to be
an lvalue); the second accepted it. That's not a reflection of either
compiler, BTW, but without checking it's probably more to do with the
language/dialect definition. FWIW I expect the second compiler could
be persuaded to issue a warning about unreachable code or suchlike.
What does C++ expect it to mean?
On 14/11/2022 19:02, James Harris wrote:
On 14/11/2022 17:28, David Brown wrote:
On 14/11/2022 16:14, James Harris wrote:
On 14/11/2022 10:24, Stefan Ram wrote:
James Harris <james.harris.1@gmail.com> writes:
Rather than allowing non-ASCII in source I came up with a scheme
of what
you might call 'named characters' extending the backslash idea of
C to
allow names instead of single characters after the backslash. It's >>>>>> off
topic for this thread but it allows non-ASCII characters to be named >>>>>> (such that the names consist of ASCII characters and would thus be >>>>>> readable and universal).
What's needed, IMO, is a set of names intended for /processing/ ratherThat makes little sense to me. Are you intending to invent your own character encoding, or your own fonts here? Are you planning on making
than for typesetting.
your own display or print system?
The character "⅘" is easily typed on *nix keyboards with a compose key (with common setups), HTML has it as "⅘", Unicode has it as
"U+2158 Vulgar fraction four fifths". They support fractions that are common enough to exist as characters in fonts. You can't add your own personal "twenty two sevenths" character and expect it to turn up when printed, nor will you ever come across it when reading files or
documents from elsewhere. (Of course you can choose to support only a subset of the HTML or Unicode names.)
And what do you mean by "processing", and what makes you think it is
remotely relevant to separate diacritics from characters?
In some
languages, "ä" is a letter "a" with a diacritic, in others it is an
entirely distinct letter of its own. The same applies to lots of characters. Unicode has a complex system of "normalisation" for
relating combining diacritics and letters into single combined Unicode characters, which are often a better choice for display than you would
get with by displaying two individual graphemes.
Are you going to try to split up Chinese or Korean characters into their components? What about Mongolian, or Arabic?
As I've already said, Unicode and HTML are fine for output. Where
programmers work with the semantics of characters, however, they need characters to be in semantic categories, you know: letters, arithmetic symbols, digits, different cases, etc. So far I've not come across
anything to support that multilingually. AISI what's needed is a way to expand character encodings to bit fields such as
<category><base character><variant><diacritics><appearance>
where
category = group (e.g. alphabetic letters, punctuation, etc)
base character = main semantic identification (e.g. an 'a')
variant (e.g. upper or lower case)
diacritics (those applied to this character in this location)
appearance (e.g. a round 'a' or a printer's 'a' or unspecified)
Note that that's purely about semantics; it doesn't include typefaces or character sizes or bold or italic etc which are all for rendering.
Do you also believe that the Unix
bytes = read(fd, &buf[1], reqd);
should be prohibited since it has the side effect within the expression
of modifying the buffer? If so, what would you replace it with??
On 14/11/2022 18:37, James Harris wrote:
On 14/11/2022 14:47, David Brown wrote:
On 14/11/2022 11:44, James Harris wrote:
On 14/11/2022 09:26, David Brown wrote:
A big step in that direction is to say that assignment is a
statement, not an expression,
Done that.
and that variables cannot be changed by side-effects.
I will not be doing that. I know you favour functional programming,
and that's fine, but the language I am working on is unapologetically
imperative.
Many unapologetically imperative languages do not allow side-effects in expressions. It is a natural rule for functional programming languages, since pure functional programming does not have side-effects or
modifiable variables at all. But there is absolutely /nothing/ about
being an imperative language that suggests you need to allow
side-effects or assignments /within/ expressions.
These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the
combination can be used and what it does, it is still not a good or
useful thing.
OK, let's take the combination you mentioned:
++E++
I wonder why you see a problem with it. As I see it, it increments E
before evaluation and then increments E after evaluation. What is so
complex about that? It does exactly what it says on the tin, and in
the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well
formed.
The very fact that you are discussing how to define it means it is
not clear and obvious.
On that I must disagree.
You think it is /obvious/ what "++E++" means?
"++E++" remains meaningless to experienced programmers.
It is not obvious which order the increments happen, or if the order
is defined, or if the order matters. It is not obvious what the
return value should be. It is not obvious where you have lvalues or
rvalues (not that a language should necessarily have such concepts).
It is not obvious what happens to E.
Of course it's not obvious to someone who doesn't know the rules of
the language. A language designer cannot produce a language with no
rules. What a language designer /can/ do is to make the rules simple
and understandable - but someone who reads the code still has to
understand what the rules are.
I do not think "++E++" will be clear and obvious to someone who /does/
know the rules of the language.
Remember, this is not something that
will be commonly used and become idiomatic, like "*p++ = *q++;" is in C.
Programmers will always need to look up the details - that's why it is
not a good idea.
As for the rules we have been discussing here those I have come up
with are, in the main, the ones you would be familiar with; even the
new ones are logical and simple. Once you understand them it's
incredibly easy to parse an expression, even of the kind which, to
you, looks like gibberish.
In fact, I have to say that neither of those you have objected to
should really look like gibberish, even to the uninitiated. Would you
find
++A + B++
objectionable?
Yes.
If not, I cannot see why you would find
++E + E++
so objectionable, either.
It is worse, because you are changing the same thing twice in an
unordered manner.
On 14/11/2022 20:45, David Brown wrote:
[To James:]
[...] But there is absolutely
/nothing/ about being an imperative language that suggests you need
to allow side-effects or assignments /within/ expressions.
If assignments within expressions are verboten, then you need to either forbid assignments within functions or have two classes of function, those with and those without [inc sub-procedures]. If assignments are allowed at all, then you cannot in general tell at compile time whether
any assignment is reached at run time, leading to further complications.
If you regard output as a side-effect, that too leads to problems. Yet during program development it is common to insert temporary diagnostic printing or variables. I can understand the concept of languages without assignments or other side effects; and of languages with them; I find it difficult to see the point of languages where such things are allowed in
some places and not in others. If we need hair shirts [and I'm not sure that we do], they should be worn all the time, not put on and taken off
in accordance with arcane rules that only high priests understand.
On 15/11/2022 01:14, Bart wrote:
On 14/11/2022 16:21, James Harris wrote:
On 14/11/2022 11:11, Bart wrote:
...
The rules for ++ and -- are simple. *Prefix* ++ or -- happens before
postfix ++ or --.
Is that for all unary operators or just ++ and --?
Just ++ and ==, as stated.
If only for those, then those exceptions will make rules for all unary
ops even more bizarre. If for all of them, then one consequence is
that `-P^` is parsed as `(-P)^`, so that you try to negate a pointer,
instead of what's at the pointer target.
However, I have my own exceptions, which are casts:
ref u16(P)^ := 0
Rather than casts I have conversions. Are they the same? I never really understood how people use the term 'cast'. Either way, you raise a good point: Where should type conversions come in the order of precedence.
Because if they ever start to return lvalues, then this becomes
possible:
++E := 0
E++ := 0
(Whichever one is legal in your scheme.) So I think there is little
useful expressivity to be gained.
Indeed, neither of those is useful.
Interestingly, I just tried
++E = 0;
with cc and c++ compilers. The first rejected it (as ++E needing to
be an lvalue); the second accepted it. That's not a reflection of
either compiler, BTW, but without checking it's probably more to do
with the language/dialect definition. FWIW I expect the second
compiler could be persuaded to issue a warning about unreachable code
or suchlike.
What does C++ expect it to mean?
Probably the same as my language (since it retains the lvalue) but I
don't know C++.
On 14/11/2022 20:45, David Brown wrote:
On 14/11/2022 18:37, James Harris wrote:
On 14/11/2022 14:47, David Brown wrote:
On 14/11/2022 11:44, James Harris wrote:
On 14/11/2022 09:26, David Brown wrote:
...
A big step in that direction is to say that assignment is a
statement, not an expression,
Done that.
and that variables cannot be changed by side-effects.
I will not be doing that. I know you favour functional programming,
and that's fine, but the language I am working on is unapologetically
imperative.
Many unapologetically imperative languages do not allow side-effects
in expressions. It is a natural rule for functional programming
languages, since pure functional programming does not have
side-effects or modifiable variables at all. But there is absolutely
/nothing/ about being an imperative language that suggests you need to
allow side-effects or assignments /within/ expressions.
Do you also believe that the Unix
bytes = read(fd, &buf[1], reqd);
should be prohibited since it has the side effect within the expression
of modifying the buffer? If so, what would you replace it with??
...
These two statements go together as well as Dmitry's toaster and
toilet brush. It doesn't matter how precisely you define how the >>>>>> combination can be used and what it does, it is still not a good
or useful thing.
OK, let's take the combination you mentioned:
++E++
I wonder why you see a problem with it. As I see it, it increments
E before evaluation and then increments E after evaluation. What is
so complex about that? It does exactly what it says on the tin, and
in the order that it says it. Remember that unlike C I define the
apparent order of evaluation so the expression is perfectly well
formed.
The very fact that you are discussing how to define it means it is
not clear and obvious.
On that I must disagree.
You think it is /obvious/ what "++E++" means?
If you don't know the rules then its not obvious.
If you know the rules then it's *blindingly* obvious. What's more, the
rules are easy to learn.
On 14/11/2022 19:43, Bart wrote:
It requires some effort in Python and the result will be clunky (and probably require some add-on modules).
import struct # Standard module
bs = open("potato_c.cof").read()
optheadersize, characteristics = struct.unpack_from("<HHIIIHH")machine, nsections, timestamp, symtaboffset, nsymbols,
That's it. Three lines. I would not think of C for this kind of thing
- Python is /much/ better suited. I'd only start looking at C (or C++)
if I need so high speed that the Python code was not fast enough, even
with PyPy.
While a functional language willI don't believe that. I am not familiar enough with Haskell to be able
struggle (to be accurate, the programmer will struggle because they've chosen the wrong language).
to give the code, but I have no doubts at all that someone experienced
with Haskell will manage it fine. IO is not hard in the language, and
it has all the built-in modules needed for such interfaces.
Haskell is apparently number 25 on the list of language popularities on Github, with about 0.4% usage. That's not huge, but not insignificant either. But then, it was never intended to be a major practical
language - though some people and companies (Facebook uses it for
content analysis) do use it for practical work. It's main motivations
are for teaching people good software development, developing new
techniques, algorithms and methods, and figuring out what "works" and
could be incorporated in other languages.
It is that last feature that is most noticeable. Most major modern languages are not pure functional languages in themselves, but contain aspects from functional programming.
I can't think of any serious,
popular language with significant development in the last decade that
does not have lambdas and the ability to work with functions as objects.
This is why I bring it up here - not because I think the OP should be
making a functional programming language, but because I think he should
be taking inspiration and incorporating ideas from that world.
Here's a more challenging record type that comes up in OBJ files:
type imagesymbol=struct
union
stringz*8 shortname
struct
u32 short
u32 long
end
u64 longname
end
u32 value
u16 sectionno
u16 symtype
byte storageclass
byte nauxsymbols
end
(Again, this is defined directly in my /dynamic/ scripting language.)
Again, peanuts in Python - and I expect also peanuts in Haskell.
Nothing of that is /remotely/ worth making a new language and giving up
on everything C - tools, compilers, developer familiarity, libraries,
and all the rest.
I'm not saying that these are not good things (though I might disagree
with you on some of the details). I am saying that it is not worth it.
This is why we still have C, and why it is so popular in practice - it
is not because anyone thinks it is a "perfect" language, it is because
the benefits of the C ecosystem outweigh the small benefits of minor variations of the language.
And I think most of what you like could be achieved by using a subset of
C++ along with a few template libraries. (To be fair, that was
certainly not the case when you started your language.)
Ban side-effects in expressions, and you have :
A[i] := 0
i = i + 1
It is not hard.
And of course, a large proportion of increments are in loops. So now
you have (mixing syntaxes from different languages to avoid prejudice) :
for i in range(10) {
A[i] = 0
}
Or :
for a& in A {
a = 0
}
Or :
A = [0 for a in A]
Or :
A = [0] * 10
Or :
A.set(0)
Or :
A = [0 .. ]
Or :
A = [0 .. ][range(A)]
There are endless choices here, none of which need an increment
operator, or pointers.
On 15/11/2022 08:07, David Brown wrote:
Most major modern
languages are not pure functional languages in themselves, but contain
aspects from functional programming.
Yeah. It turns out that pretty every language you've heard of (except C)
has higher-order functions. Too much pressure from academics I reckon.
Such features have some very subtle behaviours which I find incredibly
hard to get my head around.
(See the 'twice plus-three' example in https://en.wikipedia.org/wiki/Higher-order_function. It me ages to
figure out what was going on there, and what was needed in an
implementation to make it work.)
They make understanding whatever:
E++^
means child's play by comparison. And yet THIS is the feature you want
to ban! I don't get it.
This discussion is about whether to have a shorter way of writing:
<expr> := <the exact same expr> + 1
And whether that is:
<expr> +:= 1
or either of:
++<expr>
<expr>++
(This example uses the non-value-returning variety.)
Yes, AISI the discussion was primarily about what such operations should
mean and how they should be ordered relative to each other. The subtext
was whether they should be included at all. At least now I've got a good
way to include them that choice is still open. That's far better than
just simply banning them and avoiding the challenge.
On 2022-11-15 17:22, James Harris wrote:
Yes, AISI the discussion was primarily about what such operations
should mean and how they should be ordered relative to each other. The
subtext was whether they should be included at all. At least now I've
got a good way to include them that choice is still open. That's far
better than just simply banning them and avoiding the challenge.
That depends on your priorities. E.g. the Rubik's cube. You rotate a
row, four sides change. That's fun. But me, a prosaic programmer, just
pluck the slates one by one and put them back in order... (:-))
On 15/11/2022 12:44, James Harris wrote:
Do you also believe that the Unix
bytes = read(fd, &buf[1], reqd);
should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??
As I said before (a couple of times at least), function calls are
another matter and should be considered separately.
One possibility is to distinguish between "functions" that have no side effects and can therefore be freely mixed, re-arranged, duplicated,
omitted, etc., and "procedures" that have side-effects and must be
called exactly as requested in the code. Such "procedures" would not be allowed in expressions - only as statements or part of assignment
statements.
In many cases where you have modification of parameters or passing by non-const address, a more advanced language could use multiple returns :
bytes, data = read(fd, max_count)
But that might require considerable compiler effort to generate
efficient results in other cases.
You think it is /obvious/ what "++E++" means?
If you don't know the rules then its not obvious.
Yes.
If you know the rules then it's *blindingly* obvious. What's more, the
rules are easy to learn.
No. If you see that written, it is blindingly obvious that the
programmer is ...
On 15/11/2022 14:22, David Brown wrote:
On 15/11/2022 12:44, James Harris wrote:
...
Do you also believe that the Unix
bytes = read(fd, &buf[1], reqd);
should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??
As I said before (a couple of times at least), function calls are
another matter and should be considered separately.
Then you wouldn't be able to prevent a programmer coding
a = b + nudge_up(&c) + d;
and therefore the programmer may query why ++c is not available in the
first place.
BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward to use.
One possibility is to distinguish between "functions" that have no
side effects and can therefore be freely mixed, re-arranged,
duplicated, omitted, etc., and "procedures" that have side-effects and
must be called exactly as requested in the code. Such "procedures"
would not be allowed in expressions - only as statements or part of
assignment statements.
Classifying functions by whether they have side effects or not is not as clear-cut as it may at first appear. Please see the thread I started
today on functional programming.
In many cases where you have modification of parameters or passing by
non-const address, a more advanced language could use multiple returns :
bytes, data = read(fd, max_count)
But that might require considerable compiler effort to generate
efficient results in other cases.
Thanks for the suggestion. I wondered about that and I like it in
principle but couldn't see how one would then sensibly (i.e. efficiently
and in keeping with the rest of the language) then go on to write the
data (whose length we would not know in advance, although we would know
its maximum size) to the correct part of the buffer.
...
You think it is /obvious/ what "++E++" means?
If you don't know the rules then its not obvious.
Yes.
If you know the rules then it's *blindingly* obvious. What's more,
the rules are easy to learn.
No. If you see that written, it is blindingly obvious that the
programmer is ...
No, it is not. In languages in which 'nudge' operators are supported
many programmers may write
++E
as a subexpression if they want E to be incremented before it is
evaluated. They may also write
E++
if they want E to be incremented after it is evaluated. And if the
algorithm they are they are implementing calls for E to be incremented
before and after then programmers should be able to code both. This is
not about them being clever. It's about them being able to have the code naturally express the intent of the algorithm and to reflect the
processing that's in the programmer's mind.
All that's required, compared with C, is for the apparent evaluation
order to be defined.
On 14/11/2022 11:47, Bart wrote:
In-place, value-returning increment ops written as ++ and -- are
common in languages.
Yes. And bugs are common in programs. Being common does not
necessarily mean it's a good idea.
(It doesn't necessarily mean it's a bad idea either - I am not implying
that increment and decrement are themselves a major cause of bugs! But mixing side-effects inside expressions /is/ a cause of bugs.)
Is your point that you shouldn't have either of those operators?
Yes! What gave it away - the first three or four times I said as much?
... (Of course I use increment operator, especially in loops,
because that's how C is written. But a new language can do better than that.)
On 15/11/2022 08:07, David Brown wrote:
On 14/11/2022 19:43, Bart wrote:
It requires some effort in Python and the result will be clunky (and
probably require some add-on modules).
optheadersize, characteristics = struct.unpack_from("<HHIIIHH")import struct # Standard module
bs = open("potato_c.cof").read()
machine, nsections, timestamp, symtaboffset, nsymbols,
That's it. Three lines. I would not think of C for this kind of
thing - Python is /much/ better suited. I'd only start looking at C
(or C++) if I need so high speed that the Python code was not fast
enough, even with PyPy.
I said it will be clunky and require add-on modules and it is and does.
(BTW you might be missing an argument in that struct.unpack_from call.)
Using that approach for the nested structs and unions of my other
example is not so straightforward. You basically have to fight for every field.
The result is a tuple of unnamed fields. You really want a proper
record, which is yet another add-on, with a choice of several modules depending on which set of characterics you need.
The elements of the tuple are also normal Python variables. If you
wanted to modify elements (which needs a mutable tuple anyway), they
will not behave the same way as those packed types, and then you'd have
to write the whole struct back (using .pack), and will need to know its provenance, which here has been lost. With a reference like mine, that
is built-in.
In short, it's a hack. But it's a typical approach using in scripting languages.
'struct' is also not a true Python module; it's a front end for an
internal one called `_struct`, likely implemented in C, and almost
certainly using pointers.
(Whereas, by having pointers as intrinsic features, I know I could
implement such a module in my language, if I needed to. Think of it as a language building feature.)
While a functional language willI don't believe that. I am not familiar enough with Haskell to be
struggle (to be accurate, the programmer will struggle because they've >> > chosen the wrong language).
able to give the code, but I have no doubts at all that someone
experienced with Haskell will manage it fine. IO is not hard in the
language, and it has all the built-in modules needed for such interfaces.
Haskell is apparently number 25 on the list of language popularities
on Github, with about 0.4% usage. That's not huge, but not
insignificant either. But then, it was never intended to be a major
practical language - though some people and companies (Facebook uses
it for content analysis) do use it for practical work. It's main
motivations are for teaching people good software development,
developing new techniques, algorithms and methods, and figuring out
what "works" and could be incorporated in other languages.
I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.
Anything that makes a big deal out of closures, continuations, currying, lambdas and higher order functions. I have little use for such things otherwise.
Haskell is great for elegantly defining certain kinds of types and algorithms, not so good for reams of boilerplate code or UI stuff which
is what much of programming is.
It doesn't even have loops (AIUI); one task of the EXE reader is to
displays lists of sections, imports, exports, base relocations...
Loops are such a basic requirement, and yet a language designer decides
they don't need them. After all you can emulate any loop using
recursion, no matter that it makes for less readable, less intuitive
(and less efficient) code.
It is that last feature that is most noticeable. Most major modern
languages are not pure functional languages in themselves, but contain
aspects from functional programming.
Yeah. It turns out that pretty every language you've heard of (except C)
has higher-order functions. Too much pressure from academics I reckon.
Such features have some very subtle behaviours which I find incredibly
hard to get my head around.
(See the 'twice plus-three' example in https://en.wikipedia.org/wiki/Higher-order_function. It me ages to
figure out what was going on there, and what was needed in an
implementation to make it work.)
I can't think of any serious, popular language with significant
development in the last decade that does not have lambdas and the
ability to work with functions as objects.
Exactly, and that is totally wrong. Too much attention is paid to
academics who seem to know little about designing accessible languages.
In Python, every function is really a variable initialised [effectively]
to some anonymous function. Which means that with 100% of the functions,
you can do this for any defined function F:
F = 42
Or, more subtly, setting it to any arbitrary functions. That sounds incredibly unsafe.
So Python has immutable tuples, but mutable functions! Every identifier
is a variable that you can rebind to something else.
With my scripting language, you can do that with exactly 0% of the
defined functions. If you want mutable function references, /then/ you
use a variable: G := F.
This is why I bring it up here - not because I think the OP should be
making a functional programming language, but because I think he
should be taking inspiration and incorporating ideas from that world.
Sure, I've taken lots of ideas from functional languages, ones that
still work in an imperative style. For example I use functions like
head() and tail() from Haskell (except mine aren't lazy). I once had list-comps too, but they fell into disuse.
I don't have lambda functions, but I have a thing called deferred code,
which I haven't yet gotten around to. The problem is figuring out
exactly how in-depth the implementation should be, because no matter
what you do, there will be yet another example from the FP world which reveals one more dimension you hadn't realised existed.
Then I start to think, I don't really want the people who might use a language like mine to need to bother their heads about it. Code should
be clear and obvious; relying on the incredibly subtle and obscure
behaviours associated with lambdas, closures et al, isn't.
You don't believe amateurs can add value to what mainstream languages
can do. The trouble is that you don't appreciate the things that add
value, so long as there is some unreadable, unsafe way to hack your way around a task.
Nothing of that is /remotely/ worth making a new language and giving
up on everything C - tools, compilers, developer familiarity,
libraries, and all the rest.
With an army of people behind it, such tools can be created for a new language.
In the other group, I mentioned how C code is still predominantly
32-bit, even on 64-bit hardware. That is a big one just by itself.
But, what do I care? MY language /is/ fully 64-bit, I can do 1<<60
without remembering to do 1ULL<60, it /has/ a module system, namespaces,
the works, and that gives me a kick when I compare it to C.
I'm not saying that these are not good things (though I might disagree
with you on some of the details). I am saying that it is not worth it.
This is why we still have C, and why it is so popular in practice - it
is not because anyone thinks it is a "perfect" language, it is because
the benefits of the C ecosystem outweigh the small benefits of minor
variations of the language.
This is exactly why there can still be a place for a language like C,
but tidied up and brought up-to-date without all its baggage. People
want a language they feel they know inside-out, and can be made to do
anyway; they want to feel in charge and confident.
Except that the people creating alternatives, usually try to do much,
and lose many of the attributes of C that make it attractive.
And I think most of what you like could be achieved by using a subset
of C++ along with a few template libraries. (To be fair, that was
certainly not the case when you started your language.)
Templates have problems. Whatever problem they are a solution too, needs
to be done another way if you want a language that is much, much simpler
and faster to build than C++.
Ban side-effects in expressions, and you have :
A[i] := 0
i = i + 1
It is not hard.
It IS hard. Did you miss the bit where I said that expressions and
statements are interchangeable in an expression-based language? So you
HAVE to allow anything within those square brackets.
I use 'unit' to refer to any expression-or-statement. So the syntax for
an array index is A[unit] with a single unit, however my example used a sequence of three, not allowed. But that just means I'd have to write it
like this:
A[(t:=i; i:=i=1; t)] := 0
And of course, a large proportion of increments are in loops. So now
you have (mixing syntaxes from different languages to avoid prejudice) :
for i in range(10) {
A[i] = 0
}
Or :
for a& in A {
a = 0
}
Or :
A = [0 for a in A]
Or :
A = [0] * 10
Or :
A.set(0)
Or :
A = [0 .. ]
Or :
A = [0 .. ][range(A)]
There are endless choices here, none of which need an increment
operator, or pointers.
And in FP, you don't have loops, or assignments.
At this rate there
won't be anything left! Why won't we all just code in lambda calculus as
that can apparently represent any program.
There are still plenty of increments outside of loops (actually I don't
use for-loops much in my programs, mainly in smaller contexts), as well
as inside loops when you're incrementing something that doesn't happen
to be the loop index.
This discussion is about whether to have a shorter way of writing:
<expr> := <the exact same expr> + 1
And whether that is:
<expr> +:= 1
or either of:
++<expr>
<expr>++
(This example uses the non-value-returning variety.)
On 15/11/2022 17:58, James Harris wrote:
On 15/11/2022 14:22, David Brown wrote:
On 15/11/2022 12:44, James Harris wrote:
...
Do you also believe that the Unix
bytes = read(fd, &buf[1], reqd);
should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??
As I said before (a couple of times at least), function calls are
another matter and should be considered separately.
Then you wouldn't be able to prevent a programmer coding
a = b + nudge_up(&c) + d;
Why wouldn't I (as a language designer) be able to prevent that?
You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something, allow it. If you want to prohibit it, prohibit it.
BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward to use.
So do over-generalisations.
One possibility is to distinguish between "functions" that have no
side effects and can therefore be freely mixed, re-arranged,
duplicated, omitted, etc., and "procedures" that have side-effects
and must be called exactly as requested in the code. Such
"procedures" would not be allowed in expressions - only as statements
or part of assignment statements.
Classifying functions by whether they have side effects or not is not
as clear-cut as it may at first appear. Please see the thread I
started today on functional programming.
You'll notice I've replied to it :-)
All that's required, compared with C, is for the apparent evaluation
order to be defined.
I can appreciate that you want to give a meaning to "++E",
that you want
to give a meaning to "E++", and you expect programmers to use one or the other in different contexts. I can appreciate that you want to define
order of evaluation within expressions.
But I have yet to see any indication that "++E++" could ever be a
sensible expression in any real code.
On 15/11/2022 16:26, Bart wrote:
I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.
Anything that makes a big deal out of closures, continuations,
currying, lambdas and higher order functions. I have little use for
such things otherwise.
So because /you/ don't understand these things or how they are used, you assume that people who /do/ understand them can't write programs in functional programming languages?
Haskell is great for elegantly defining certain kinds of types and
algorithms, not so good for reams of boilerplate code or UI stuff
which is what much of programming is.
As I mentioned in another posts, there are opinions, and there are /qualified/ opinions.
Such features have some very subtle behaviours which I find incredibly
hard to get my head around.
You are a smart guy. You could get your head around it quite easily, if only you were willing.
Try pasting the C++ code into godbot.org, and compile with -O2 :
The compiler happily turns "foo" into "return 13".
It's not academics that use these features - it is practical
programmers. A big use of lambdas, for example, is in callbacks and
event handlers - used all the time in GUI programs and Javascript.
Academics may invent this kind of thing, and use languages like Haskell
to play with them - but they are implemented in real languages because
real programmers use them for real code. Rust, Go, C++ - these are not academics' languages.
In Python, every function is really a variable initialised
[effectively] to some anonymous function. Which means that with 100%
of the functions, you can do this for any defined function F:
F = 42
Or, more subtly, setting it to any arbitrary functions. That sounds
incredibly unsafe.
Python /is/ unsafe - it's a very dynamic language, with little
compile-time checking. It is checked at run-time.
But no, Python does not have variables at all. It has /names/, that are references to objects. A function is an object, usually (but not necessarily) given a name with a "def" statement. That name can be
rebound to a different object, just like any other name.
So Python has immutable tuples, but mutable functions! Every
identifier is a variable that you can rebind to something else.
Functions are not mutable in Python. You misunderstand.
With my scripting language, you can do that with exactly 0% of the
defined functions. If you want mutable function references, /then/ you
use a variable: G := F.
Function pointers are not function objects.
You don't believe amateurs can add value to what mainstream languages
can do. The trouble is that you don't appreciate the things that add
value, so long as there is some unreadable, unsafe way to hack your
way around a task.
Amateurs can make good things. But the idea that a single amateur can revolutionise a particular field is almost, but not quite, a complete
myth.
This is exactly why there can still be a place for a language like C,
but tidied up and brought up-to-date without all its baggage. People
want a language they feel they know inside-out, and can be made to do
anyway; they want to feel in charge and confident.
Where are all these people that want something almost, but not quite,
exactly like C?
But what if you don't care that C++ needs a complex compiler, because
you are not a compiler writer? That applies to 99.999% of programmers.
It IS hard. Did you miss the bit where I said that expressions and
statements are interchangeable in an expression-based language? So you
HAVE to allow anything within those square brackets.
You and James are forever seeing problems - you think you are /forced/
into decisions. Take responsibility - you don't /have/ to allow
anything you don't want to allow.
At this rate there won't be anything left! Why won't we all just code
in lambda calculus as that can apparently represent any program.
Why don't you just make a Turing machine? It is an imperative language,
and it's really quite simple.
On 15/11/2022 17:31, David Brown wrote:
On 15/11/2022 17:58, James Harris wrote:
On 15/11/2022 14:22, David Brown wrote:
On 15/11/2022 12:44, James Harris wrote:
...
Do you also believe that the Unix
bytes = read(fd, &buf[1], reqd);
should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace
it with??
As I said before (a couple of times at least), function calls are
another matter and should be considered separately.
Then you wouldn't be able to prevent a programmer coding
a = b + nudge_up(&c) + d;
Why wouldn't I (as a language designer) be able to prevent that?
The question is not whether prevention would be possible but whether you (i.e. DB) would consider it /advisable/. If you prevented it then a lot
of familiar programming patterns and a number of existing APIs would
become unavailable to you so choose wisely...! :-)
...
You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something,
allow it. If you want to prohibit it, prohibit it.
Sorry, but it doesn't work like that.
A language cannot be built on
ad-hoc choices such as you have suggested.
In this very thread you've
suggested I prohibit certain operator combinations, and that I ban side effects in expressions but maybe not necessarily those from parameters
in function calls. It's not that simple. If a language designer were to
'pick and mix' like that the resultant language would be a nightmare to
learn and use. There has to be a language 'ethos' - i.e. an overall
approach it takes - and it has to follow consistent principles if it is
going to be a good design rather than a bad one.
...
BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward to
use.
So do over-generalisations.
Not really.
It's ad-hoc rules which become burdensome.
By contrast,
saying any operator can be 'adjacent' to any other as long as the types
are honoured makes learning a language more logical. It may give the programmer freedoms you personally don't like but they make the language easier to learn and use.
Seriously, try designing a language, yourself. You don't have to
implement it. Just try coming up with a cohesive design of something you would like to program in.
One possibility is to distinguish between "functions" that have no
side effects and can therefore be freely mixed, re-arranged,
duplicated, omitted, etc., and "procedures" that have side-effects
and must be called exactly as requested in the code. Such
"procedures" would not be allowed in expressions - only as
statements or part of assignment statements.
Classifying functions by whether they have side effects or not is not
as clear-cut as it may at first appear. Please see the thread I
started today on functional programming.
You'll notice I've replied to it :-)
:-)
...
All that's required, compared with C, is for the apparent evaluation
order to be defined.
I can appreciate that you want to give a meaning to "++E",
No, I don't! You have this all wrong. The reason for considering the inclusion of the operators we have been discussing in this thread is to
allow a more natural style of expression for algorithms that it suits.
You seem to keep thinking the goal is to attribute meaning to symbols.
That's not so.
that you want to give a meaning to "E++", and you expect programmers
to use one or the other in different contexts. I can appreciate that
you want to define order of evaluation within expressions.
I don't /want/ to define the order of evaluation; I /do/ define the (apparent) order of evaluation. That's part of my language's ethos. If
I, in addition, permit ++ etc and dereference then their apparent order
/has/ to be defined, and it now has been.
But I have yet to see any indication that "++E++" could ever be a
sensible expression in any real code.
Bart came up with an example something like
+(+(+(+ x)))
That's not at all sensible. You want that banned, too?
On 15/11/2022 16:26, Bart wrote:
optheadersize, characteristics = struct.unpack_from("<HHIIIHH")import struct # Standard module
bs = open("potato_c.cof").read()
machine, nsections, timestamp, symtaboffset, nsymbols,
That's it. Three lines. I would not think of C for this kind of
thing - Python is /much/ better suited.
I said it will be clunky and require add-on modules and it is and does.
It is not "clunky" by any sane view - certainly not compared to your
code (or code written in C).
- the "struct" module is part of Python.
(BTW you might be missing an argument in that struct.unpack_from call.)
No, I am not. There is an optional third argument, but it is optional.
Using that approach for the nested structs and unions of my other
example is not so straightforward. You basically have to fight for
every field.
You have to define every field in every language, or define the ones you
want along with offsets to skip uninteresting data.
The result is a tuple of unnamed fields. You really want a proper
record, which is yet another add-on, with a choice of several modules
depending on which set of characterics you need.
You can do that in Python.
In short, you are making up shit in an attempt to make your own language
look better than other languages, because you'd rather say something
silly than admit that any other language could be better in any way for
any task.
'struct' is also not a true Python module; it's a front end for an
internal one called `_struct`, likely implemented in C, and almost
certainly using pointers.
Please re-think what you wrote there. I hope you can realise how
ridiculous you are being.
If someone wants to write code that
involves a lot of squaring, then let them define operators so they can
write "x = squareof y", or "x = y²". They'd be able to write more of a mess, but also be able to write some things very nicely.
On 2022-11-15 10:35, James Harris wrote:
As I've already said, Unicode and HTML are fine for output. Where
programmers work with the semantics of characters, however, they need
characters to be in semantic categories, you know: letters, arithmetic
symbols, digits, different cases, etc. So far I've not come across
anything to support that multilingually. AISI what's needed is a way
to expand character encodings to bit fields such as
<category><base character><variant><diacritics><appearance>
where
category = group (e.g. alphabetic letters, punctuation, etc)
base character = main semantic identification (e.g. an 'a')
variant (e.g. upper or lower case)
diacritics (those applied to this character in this location)
appearance (e.g. a round 'a' or a printer's 'a' or unspecified)
Note that that's purely about semantics; it doesn't include typefaces
or character sizes or bold or italic etc which are all for rendering.
I am not sure what are you trying to say.
The Unicode characterization
is defined in the file:
https://unicode.org/Public/UNIDATA/UnicodeData.txt
There is no problem with Unicode string literals whatsoever. You just
place characters as they are. The only escape is "" for ". That is all.
On 15/11/2022 10:06, Dmitry A. Kazakov wrote:
There is no problem with Unicode string literals whatsoever. You just
place characters as they are. The only escape is "" for ". That is all.
Two problems with that, AISI:
1. some of the characters look like others
On 15/11/2022 21:40, David Brown wrote:
If someone wants to write code that involves a lot of squaring, then
let them define operators so they can write "x = squareof y", or "x =
y²". They'd be able to write more of a mess, but also be able to
write some things very nicely.
I have such an operator, called `sqr`. And also briefly allowed the superscript version (as a postfix op), until Unicode came along and
spoilt it all.
One reason I had sqr was because it was in Pascal (iirc). But it
genuinely comes in useful. Sure, I could also use x**2, but ** used to
be only defined for floats, while `sqr` has been used for much longer.
You could also ask why some languages have a dedicated `sqrt` function
when they could just as easily do x**0.5 or pow(x**0.5).
On 15/11/2022 18:05, David Brown wrote:
On 15/11/2022 16:26, Bart wrote:
I group them all together: ML, OCaml, Haskell, F#, sometimes even Lisp.
Anything that makes a big deal out of closures, continuations,
currying, lambdas and higher order functions. I have little use for
such things otherwise.
So because /you/ don't understand these things or how they are used,
you assume that people who /do/ understand them can't write programs
in functional programming languages?
No. It's nothing I've ever used, and unlikely ever to use. I like my functions plain, and static, just like you prefer your expressions to
use only simple operators with no side-effects.
I still can't comprehend why YOU think this stuff is simple and obvious,
yet you are stumped by an increment of a pointer followed by a dereference.
On a list of must-haves for a programming language, not only would they
not be at the top of my list, they wouldn't even be at the bottom!
Haskell is great for elegantly defining certain kinds of types and
algorithms, not so good for reams of boilerplate code or UI stuff
which is what much of programming is.
As I mentioned in another posts, there are opinions, and there are
/qualified/ opinions.
My opinion comes from 20 years of writing code to /get things done/ in a working environment. Which includes developing the languages and
choosing the features that that best made that possible. Never once did
I think that 'currying' was going to dramatically transform how I coded; never did I spend days working around the omissions of closures.
Such features have some very subtle behaviours which I find
incredibly hard to get my head around.
You are a smart guy. You could get your head around it quite easily,
if only you were willing.
No, it is hard, obscure, subtle. Take my word for it.
Try pasting the C++ code into godbot.org, and compile with -O2 :
The compiler happily turns "foo" into "return 13".
So what does that mean? C++ clearly has support for this, and some optimisations which can collapse the function calls into a constant
value. That tells us nothing about how hard the task is or how hard it
is understand exactly why the task is more difficult that it at first
looks.
I discussed on Reddit how such a thing would look in my dynamic language
if I decided to implement it, which is like this:
fun twice(f) = {x:f(f(x))} # {...} is my deferred code syntax
fun plusthree(x) = x+3 # 'fun' is for one-liners
g := twice(plusthree)
println g(7)
I instead tried with a mock-up, which had two components: the
transformations my bytecode compiler would do, and the support code that would need to be supplied by the language. The mock-up within the
working language looked like this:
# Transformed user code
fun af$1(x, f, g) = f(f(x))
fun twice(f) = makecls(af$1, f)
fun plusthree(x) = x+3
g := twice(plusthree)
println callcls(g, 7)
# Emulating interpreter support
record cls = (var fn, a, b)
func makecls(f, ?a, ?b)=
cls(f, a, b)
end
func callcls(c, x)=
fn := c.fn
fn(x, c.a, c.b)
end
This produced the correct result. Enough worked also so that `twice`
could be called again with a different argument, while the original `g`
still worked. (A cruder implementation could hardcode things so that,
while it produced '13', it would only work with a one-time argument to `twice`.)
This is where it turned out that there were further refinements needed
to make it work with more challenging examples.
In the end I didn't do the necessary changes as, while intriguing to
work on, mere box-ticking was not a worthwhile use of my time, nor a worthwhile complication in my product, since I was never going to use it.
It's not academics that use these features - it is practical
programmers. A big use of lambdas, for example, is in callbacks and
event handlers - used all the time in GUI programs and Javascript.
Yes, that was my deferred code feature, itself deferred. (It means I
have to instead define an explicit, named function.)
Academics may invent this kind of thing, and use languages like
Haskell to play with them - but they are implemented in real languages
because real programmers use them for real code. Rust, Go, C++ -
these are not academics' languages.
I find idiomatic Rust incomprehensible.
In Python, every function is really a variable initialised
[effectively] to some anonymous function. Which means that with 100%
of the functions, you can do this for any defined function F:
F = 42
Or, more subtly, setting it to any arbitrary functions. That sounds
incredibly unsafe.
Python /is/ unsafe - it's a very dynamic language, with little
compile-time checking. It is checked at run-time.
And my dynamic language is a lot less dynamic, so is safer, but of
course Python is superior.
But no, Python does not have variables at all. It has /names/, that
are references to objects. A function is an object, usually (but not
necessarily) given a name with a "def" statement. That name can be
rebound to a different object, just like any other name.
So Python has immutable tuples, but mutable functions! Every
identifier is a variable that you can rebind to something else.
Functions are not mutable in Python. You misunderstand.
I've used 'mutable' to mean two things: in-place modification of an
object, and being able to re-bind a name to something else. These are conflated everywhere, but I reckoned people would get the point.
In Python, a function like this:
def F():
pass
is more or less equivalent to:
def __0001():
pass
F = __0001
Effectively any function is just a variable to which has been assigned
some anonymous function (although in practice, the function retains its
'F' identify even if the user's 'F' variable has been assigned a
different value).
The end result is the same: you can never be sure that 'F' still refers
to that static function.
99.99% of the time you never want such functions to change, so why make
it possible? I can understand that in Python, a bytecode compiler might
not know in advance what F is, but that can be mitigated.
When I once experimented with such a language, any such tentative
functions were initialised at runtime, but once initialised, could not
be changed. So whether an identifier was the name of a function, module, class or variable, was set at runtime, then fixed.
If you want it to be dynamic, then use a 'variable' (the clue is in the name).
With my scripting language, you can do that with exactly 0% of the
defined functions. If you want mutable function references, /then/
you use a variable: G := F.
Function pointers are not function objects.
Is there any practical difference?
You don't believe amateurs can add value to what mainstream languages
can do. The trouble is that you don't appreciate the things that add
value, so long as there is some unreadable, unsafe way to hack your
way around a task.
Amateurs can make good things. But the idea that a single amateur can
revolutionise a particular field is almost, but not quite, a complete
myth.
I don't want to revolutionise everything. I just hope someone else
would, but the current state of PL design to me looks dire.
However I can take my ideas and use them myself, and sod everyone else;
it's their loss.
You seem convinced that that incredibly hackish and unprofessional way
of accessing the contents of that executable file is just as good as
doing it properly. Well carry on thinking that if you want.
(I don't know what it is about scripting languages, and the way they
eschew a feature as straightforward as a record with fields defined at compile-time. Either it doesn't exist, or they try and emulate such a
thing badly and inefficiently.)
This is exactly why there can still be a place for a language like C,
but tidied up and brought up-to-date without all its baggage. People
want a language they feel they know inside-out, and can be made to do
anyway; they want to feel in charge and confident.
Where are all these people that want something almost, but not quite,
exactly like C?
There are loads that want to extend C, or import some favourite feature
of C++ into C, or some that want to write Python-like code in C. This is apart from the ones doing implementing yet another new take on a
functional language.
What I'm talking about however is the popularity of C; why would they
use C, rather then the next one up which is C++?
To me the answer is clear, I guess to you it's less so.
But what if you don't care that C++ needs a complex compiler, because
you are not a compiler writer? That applies to 99.999% of programmers.
I think people care when their project requires a long edit-run cycle.
It IS hard. Did you miss the bit where I said that expressions and
statements are interchangeable in an expression-based language? So
you HAVE to allow anything within those square brackets.
You and James are forever seeing problems - you think you are /forced/
into decisions. Take responsibility - you don't /have/ to allow
anything you don't want to allow.
It's you who don't like it, not me! As I've tried to explain, in an expression-based language, you can have statements inside expressions
inside statements. Everything can have a side-effect.
Even gnu-C has that feature.
At this rate there won't be anything left! Why won't we all just code
in lambda calculus as that can apparently represent any program.
Why don't you just make a Turing machine? It is an imperative
language, and it's really quite simple.
You've missed my point. It's not me reducing everthing down to a handful
of features. Lambda-calculus is where you can easily end up.
On 15/11/2022 20:22, Bart wrote:
Of course you won't use something when you won't even consider trying to learn about it.
I still can't comprehend why YOU think this stuff is simple and
obvious, yet you are stumped by an increment of a pointer followed by
a dereference.
I haven't written anything to suggest that I am "stumped" by this. My
point was to say it is unnecessary to support such expressions in a programming language, and a language may be better in some ways if it
does not allow increment operators or even pointers.
Now, it is undeniable /fact/ that programming languages do not need
operators such as increment, or other operators that cause side-effects.
Yes, but for you, a "must-have" list for a programming language would be mainly "must be roughly like ancient style C in functionality, but with enough change in syntax and appearance so that no one will think it is
C". If that's what you like, and what pays for your daily bread, then that's absolutely fine.
And there's no doubt that a large proportion of programmers go through
their career without ever considering higher order functions (functions
that operate on or return functions).
But equally there's no doubt that they /are/ useful for many people in
many types of coding. Sometimes higher order functions are used without people knowing about them - Python decorators are a fine example.
Actually, Python declarators are such a good example that I recommend
this link <https://realpython.com/primer-on-python-decorators/> that
gives a number of useful examples.
Think of this example. You have some code with functions "foo", "bar"
and "foobar". Mostly you call them as they are in your code.
auto debug(auto const& f) {
return [&f](auto... args) {
std::cout << "Calling ";
((std::cout << " " << args), ...);
std::cout << "\n";
auto r = f(args...);
std::cout << "Returning " << r << "\n";
return r;
};
}
Suppose your real functions are :
int foo(int x);
int bar(int x, double y);
double foobar(int x, double y, const char * p);
Your original code was:
int a = foo(10);
int b = bar(20, 3.14);
double c = foobar(30, 2.71828, "Hello");
None of this gives you things you could not do by hand. But if you find yourself doing the same thing by hand many times, then it is natural to
ask if it can be automated - if you can write a function to do that. You
can, if you have higher order functions.
That's like saying you have 20 years of experience as a taxi driver, and never once had to use "flaps" or "ailerons", or even think about the concept. You therefore can't understand why pilots want to use them all
the time. You can give a qualified opinion on driving round roundabouts
and may be an expert on gearing, but you have no basis for a qualified opinion on flying.
So again - mocking and dismissing concepts that you know nothing about,
makes you look foolish. (Your ignorance of the topic is not the issue -
we are all ignorant of almost everything.)
No, it is hard, obscure, subtle. Take my word for it.
No, I will not take your word for it. You know nothing about it.
Yes, that was my deferred code feature, itself deferred. (It means I
have to instead define an explicit, named function.)
That's the impression I got. I don't know how you handle captures of
local variables (if you do so at all).
Effectively any function is just a variable to which has been assigned
some anonymous function (although in practice, the function retains
its 'F' identify even if the user's 'F' variable has been assigned a
different value).
Python does not have variables. It has /identifiers/. Change
"variable" for "identifier" in your description, and "assigned" to
"bound", and you've got it right.
And you seem convinced that the Python code I showed is "hackish" and "unprofessional".
The code works fine - it is clear and simple, shorter than in your
language, and easy to modify and maintain.
If you prefer to think of structures matching C struct definitions
(which are /one/ way to describe a file format, but certainly not the
only way), you can use the "ctypes" Python module and define a structure.
On 2022-11-16 12:44, James Harris wrote:
On 15/11/2022 10:06, Dmitry A. Kazakov wrote:
There is no problem with Unicode string literals whatsoever. You just
place characters as they are. The only escape is "" for ". That is all.
Two problems with that, AISI:
1. some of the characters look like others
For the reader, not for the compiler. If you want Unicode you get the
whole package, homoglyphs included.
On 16/11/2022 00:30, Bart wrote:
On 15/11/2022 21:40, David Brown wrote:
If someone wants to write code that involves a lot of squaring, then
let them define operators so they can write "x = squareof y", or "x =
y²". They'd be able to write more of a mess, but also be able to
write some things very nicely.
I have such an operator, called `sqr`. And also briefly allowed the
superscript version (as a postfix op), until Unicode came along and
spoilt it all.
Why would Unicode spoil it?
On 2022-11-14 19:26, James Harris wrote:
On 14/11/2022 11:29, Dmitry A. Kazakov wrote:
On 2022-11-14 12:03, James Harris wrote:
if is_name_first(b[j])
a[i++] = b[j++]
rep while is_name_follow(b[j])
a[i++] = b[j++]
end rep
a[i] = 0
return TOK_NAME
end if
Now, what don't you like about the ++ operators in that? How would
you prefer to write it?
From parser production code:
procedure Get_Identifier
( Code : in out Source'Class;
Line : String;
Pointer : Integer;
Argument : out Tokens.Argument_Token
) is
Index : Integer := Pointer + 1;
Malformed : Boolean := False;
Underline : Boolean := False;
Symbol : Character;
begin
while Index <= Line'Last loop
Symbol := Line (Index);
if Is_Alphanumeric (Symbol) then
Underline := False;
elsif '_' = Symbol then
Malformed := Malformed or Underline;
Underline := True;
else
exit;
end if;
Index := Index + 1;
end loop;
Malformed := Malformed or Underline;
Set_Pointer (Code, Index);
Argument.Location := Link (Code);
Argument.Value := new Identifier (Index - Pointer);
declare
This : Identifier renames Identifier (Argument.Value.all);
begin
This.Location := Argument.Location;
This.Malformed := Malformed;
This.Value := Line (Pointer..Index - 1);
end;
end Get_Identifier;
Well, that's an astonishingly long piece of code, Dmitry,
Because it is a production code.
It must deal with different types of
sources, with error handling and syntax tree generation.
But I am not sure I do understand it. Even allowing for what I believe
is meant to be double underscore detection (except at the start and
end?) it takes significantly more study than the simple name-first,
name-follow code which preceded it.
That's how the language defines it. This example is from an Ada 95
parser. Ada 95 RM 2.3:
https://www.adahome.com/rm95/rm9x-02-03.html
On 16/11/2022 16:02, David Brown wrote:
On 16/11/2022 00:30, Bart wrote:
On 15/11/2022 21:40, David Brown wrote:
If someone wants to write code that involves a lot of squaring, then
let them define operators so they can write "x = squareof y", or "x
= y²". They'd be able to write more of a mess, but also be able to
write some things very nicely.
I have such an operator, called `sqr`. And also briefly allowed the
superscript version (as a postfix op), until Unicode came along and
spoilt it all.
Why would Unicode spoil it?
I was using 8-bit code pages for western European alphabets since
probably from the end of the 80s. It was simple, I supported it and it
worked well. (At that time, I was also responsible for vector fonts
within my apps.)
But Unicode makes everything harder, with characters taking up multiple bytes, and a lot of the time it just doesn't work. (I've seen Unicode
errors on everything from TV subtitles to supermarket receipts, and that
was a few weeks ago.)
Then you have the choices of UCS2, UCS4, UTF8, with patchy support for
UTF8 within Windows. Even if I get it working on my machine, how do I
know that someone else running my program will have their machine set up properly?
For me it's just not worth it.
Hmm, I just compiled it with both bcc and tcc, and they both correctly
show €°£ when using code page 65001. So that's something, but what's up with gcc?
If I wanted to display UTF8 right now on Windows, say from a C program
even, I would have to fight it. If I write this (created with Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
CHCP 65001Active code page: 65001
main.exe€°£
On 16/11/2022 23:01, Bart wrote:
On 16/11/2022 16:02, David Brown wrote:
On 16/11/2022 00:30, Bart wrote:
On 15/11/2022 21:40, David Brown wrote:
If someone wants to write code that involves a lot of squaring,
then let them define operators so they can write "x = squareof y",
or "x = y²". They'd be able to write more of a mess, but also be
able to write some things very nicely.
I have such an operator, called `sqr`. And also briefly allowed the
superscript version (as a postfix op), until Unicode came along and
spoilt it all.
Why would Unicode spoil it?
I was using 8-bit code pages for western European alphabets since
probably from the end of the 80s. It was simple, I supported it and it
worked well. (At that time, I was also responsible for vector fonts
within my apps.)
Such code pages did work, but were very limited. In the UK, code pages typically meant nothing worse than mixups between # and £. Go beyond
the English speaking world, and code pages were a nightmare. If one non-English Western European language was enough, they were often not
/too/ bad - but supporting multiple languages was often hugely
complicated and fraught with errors.
Unicode made some things more complex, but other things far easier - it
is not a surprise to me that it has supplanted pretty much every usage
where plain old 7-bit ASCII is insufficient. I understand how Unicode
can be difficult, but it is solving a difficult problem.
But back to your superscript square operator - does that mean you used
an extended ASCII code in a specific code page for superscript 2 (I
think it is 0xfb in Latin-9), but when Unicode came out you stopped
using anything beyond 7-bit ASCII?
But Unicode makes everything harder, with characters taking up
multiple bytes, and a lot of the time it just doesn't work. (I've seen
Unicode errors on everything from TV subtitles to supermarket
receipts, and that was a few weeks ago.)
That's not a Unicode problem - that's a software bug.
Then you have the choices of UCS2, UCS4, UTF8, with patchy support for
UTF8 within Windows. Even if I get it working on my machine, how do I
know that someone else running my program will have their machine set
up properly?
For me it's just not worth it.
In the early days of Unicode, there were different encodings. For the
last couple of decades it's been clear that there is /one/ sensible
encoding - UTF-8.
On 2022-11-17 12:24, Bart wrote:
If I wanted to display UTF8 right now on Windows, say from a C program
even, I would have to fight it. If I write this (created with Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
If you want to display UTF-8, you must obviously use UTF-8, no?
#include <stdio.h>
int main(void) {
printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
}
In CMD:
CHCP 65001Active code page: 65001
main.exe€°£
Of course, you could use the code you wrote under the condition that
both the editor and the compiler use UTF-8.
Which is why every programming guideline must require ASCII-7 source
like I provided.
On 17/11/2022 12:12, Dmitry A. Kazakov wrote:
On 2022-11-17 12:24, Bart wrote:
If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created with
Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
If you want to display UTF-8, you must obviously use UTF-8, no?
#include <stdio.h>
int main(void) {
printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
}
This wasn't the problem. I verified that the text file contained the
correct UTF8 sequences, and the two other compilers worked. This was a problem with gcc, which also fails your version.
In CMD:
;CHCP 65001Active code page: 65001
;main.exe€°£
Of course, you could use the code you wrote under the condition that
both the editor and the compiler use UTF-8.
The point about UTF8 is that it doesn't matter. So the string contains 'character' E2; in C, this is just a byte array, it should just pass it
as it is to the printf function.
That would work, but is also completely impractical for large amounts of non-ASCII content. Or even small amounts. You /need/ editor support. I
don't have it and don't do enough with Unicode to make it worth the
trouble.
On 2022-11-17 13:35, Bart wrote:
On 17/11/2022 12:12, Dmitry A. Kazakov wrote:
On 2022-11-17 12:24, Bart wrote:
If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
If you want to display UTF-8, you must obviously use UTF-8, no?
#include <stdio.h>
int main(void) {
printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
}
This wasn't the problem. I verified that the text file contained the
correct UTF8 sequences, and the two other compilers worked. This was a
problem with gcc, which also fails your version.
The above was compiled with gcc version 10.3.1 20210520.
On 17/11/2022 13:20, Dmitry A. Kazakov wrote:
On 2022-11-17 13:35, Bart wrote:
On 17/11/2022 12:12, Dmitry A. Kazakov wrote:
On 2022-11-17 12:24, Bart wrote:
If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
If you want to display UTF-8, you must obviously use UTF-8, no?
#include <stdio.h>
int main(void) {
printf("\xE2\x82\xAC\xC2\xB0\xC2\xA3");
}
This wasn't the problem. I verified that the text file contained the
correct UTF8 sequences, and the two other compilers worked. This was
a problem with gcc, which also fails your version.
The above was compiled with gcc version 10.3.1 20210520.
And run on Windows?
Further tests show that it works in every case, including using gcc with puts, and gcc+printf under WSL. It only fails with gcc + printf + Windows.
Odd. But then my point is you can't rely on it. You still need the UTF8
code page set.
That's not all, because console display is different from graphical
display.
#include <stdio.h>
#include <windows.h>
int main(void) {
MessageBox(0,"\xE2\x82\xAC\xC2\xB0\xC2\xA3",
"\xE2\x82\xAC\xC2\xB0\xC2\xA3",0);
}
This displays only gobbledygook. Of course, this is set to use
MessageBoxA, which expects an ASCII string; but why won't it take UTF8
and show something sensible?
Presumably it needs the correct code page set for the WinAPI, not the
one I set for the console. But I can't find a way to do it; MS docs
suggest setting this in a resource file or XML manifest file; WTF?
Any wonder that I'm just not interested? Here I have to say that Linux
seems to get it right.
On 16/11/2022 16:50, David Brown wrote:
On 15/11/2022 20:22, Bart wrote:
Of course you won't use something when you won't even consider trying
to learn about it.
I've thought about learning Chinese. Then I decided there was no point.
I still can't comprehend why YOU think this stuff is simple and
obvious, yet you are stumped by an increment of a pointer followed by
a dereference.
I haven't written anything to suggest that I am "stumped" by this. My
point was to say it is unnecessary to support such expressions in a
programming language, and a language may be better in some ways if it
does not allow increment operators or even pointers.
It is something I value, but you don't.
And higher order functions are something you value, but I don't.
That's all it is.
Now, it is undeniable /fact/ that programming languages do not need
operators such as increment, or other operators that cause side-effects.
Just `a := b` causes a side effect. Possibly quite a big one if 'b' is a substantial data structure and ':=' does a deep copy.
There is usually a task to be done. In `A[++i] := 0`, I want two things
to change, which is going to happen whether write it like that, or as `i
:= i+1; A[i] := 0`. So why write `i` 3 times?
It is not a big deal. Maybe in functional programming it might be, but
here *I* am specifying the paradigm and I say it's OK.
I'm not asking you or anyone else to use my language.
Yes, but for you, a "must-have" list for a programming language would
be mainly "must be roughly like ancient style C in functionality, but
with enough change in syntax and appearance so that no one will think
it is C". If that's what you like, and what pays for your daily
bread, then that's absolutely fine.
Yes, I don't need a higher level language for what I use it for. But
there are still dozens of things which make the experience superior to
just using C. Ones either you genuinely don't appreciate, or are just
pissing on for the sake of it.
* Case-insensitive
* 1-based and N-based
* Algol-style syntax, line-oriented and largely semicolon-free, and sane
type syntax
* Module scheme (define everything in exactly one place)
* Namespaces
* Encapsulation (define functions inside records etc)
* Out-of-order definitions including easy mutual record cross-references
* Regular i8-i64 and u8-u64 type denotations, include 'byte' (u8)
* Default 64-bit 'int', 'word' types, and 64-bit integer constants
* Built-in print and read statements WITH NO STUPID FORMAT CODES
* Keyword and default function parameters
* Fewer, more intuitive operator precedences
* Does not conflate arrays and pointers
* 'Proper' for loops; for-in loops
* Separate 'switch' and 'case' selection; the latter has no restrictions
(and no stupid fallthrough on switch)
* Proper named constants
* Break out of nested loops
* Embed strings and binary files
* 'Tabledata' and 'enumdata' features (compare with X-macros)
* Function reflection
* Built-in, overloaded ops like abs, min, max
* 'Properties' such as .len and .lwb
* Built-in 'swap'
* Bit/field extracion/insertion syntax
* Multiple function return values
* Multiple assignment
* Slices (including slices of char arrays to give counted strings)
* Doc strings
* Whole-program compiler that does not need a separate build system
* Pass-by-reference
* Value arrays
Yeah, just like C! If you think this lot is just C with a paint-job,
then you're in denial.
Of course, I fully expect you to be completely dismissive of all of
this. I wouldn't swap any of these for higher-order functions.
And there's no doubt that a large proportion of programmers go through
their career without ever considering higher order functions
(functions that operate on or return functions).
Too right. To be able to use such things, they MUST be 100% intuitive
and be usable with 100% confidence. But that's just the author; you need
to consider other readers of your code too, and those who have to
maintain it.
To me they are a very long way from being 100% intuitive. So what do you think I should do: strive to be a 10th-rate programmer in a functional language I've no clue about; give up programming and tend to my garden;
or carry on coding in a style that *I* understand 100% (and most others
will too)?
The stuff I do simply doesn't require a sophisticated language with
advanced types and curried functions invented on-the-fly. Here is an
actual example from an old app, a small function to keep it short:
proc displaypoletotal =
if not poleenabled then return fi
print @poledev, chr(31), chr(14) ! clear display
print @poledev, "Total:", rightstr(strcash(total, paymentunit),
14)
end
(This is part of a POS and displays running totals, on an LED display
mounted on a pole, driven from a serial port. It ran in a duty-free area
and worked with multiple currencies.)
What can higher-order-functions do for me here? Absolutely sod-all.
But equally there's no doubt that they /are/ useful for many people in
many types of coding. Sometimes higher order functions are used
without people knowing about them - Python decorators are a fine example.
Actually, Python declarators are such a good example that I recommend
Decorators?
this link <https://realpython.com/primer-on-python-decorators/> that
gives a number of useful examples.
Decorators are a /very/ good example of a Python feature that I could
never get my head around. 5 minutes later, I'd have to look them up again.
Think of this example. You have some code with functions "foo", "bar"
and "foobar". Mostly you call them as they are in your code.
auto debug(auto const& f) {
return [&f](auto... args) {
std::cout << "Calling ";
((std::cout << " " << args), ...);
std::cout << "\n";
auto r = f(args...);
std::cout << "Returning " << r << "\n";
return r;
};
}
Suppose your real functions are :
int foo(int x);
int bar(int x, double y);
double foobar(int x, double y, const char * p);
Your original code was:
int a = foo(10);
int b = bar(20, 3.14);
double c = foobar(30, 2.71828, "Hello");
None of this gives you things you could not do by hand. But if you
find yourself doing the same thing by hand many times, then it is
natural to ask if it can be automated - if you can write a function to
do that. You can, if you have higher order functions.
I can't follow the C++ debug function at all.
But I notice the user code
changes from 'foo()' to 'debug()()'; I thought this could be done while leaving the foo() call unchanged.
But no, my language doesn't deal with parameter lists as a first class
entity at all. (At best it can access them as a list object, but it
doesn't help here.)
The best I can do here is to have a dedicated function for each number
of arguments, and to use dynamic code to allow the same function for any types:
func debug3(f, a,b,c)=
println "Calling",f,"with",a,b,c
f(a,b,c)
end
func foobar(a,b,c)=
println "FooBar",a,b,c
return a+b+c
end
x:=debug3(foobar, 5,6,7) # in place of foobar(5, 6, 7)
println x
This displays:
Calling <procid:"foobar"> with 5 6 7
FooBar 5 6 7
18
However this loses to ability to use any keyword or default arguments
for FooBar, since they are only available for direct calls (it's done at compile-time).
So I can see that that C++ debug does some very hairy stuff, to make it
work with static types and for any function, but I just can't understand
it.
However, given the requirement you outlined, I could probably come up
with a custom feature to do just that. Although it might be in the form
of a compiler option which injects the debug code at the start of the relevant functions. Then the user code does not need updating.
See, when you have control of the language and implementation, there are
more and better possibilities.
That's like saying you have 20 years of experience as a taxi driver,
and never once had to use "flaps" or "ailerons", or even think about
the concept. You therefore can't understand why pilots want to use
them all the time. You can give a qualified opinion on driving round
roundabouts and may be an expert on gearing, but you have no basis for
a qualified opinion on flying.
I don't want to fly. (I was once in a small aircraft flying at 7000 ft.
But I've also ridden a bike at 8000 ft, although over a mountain in that case. So who needs to fly?!)
So again - mocking and dismissing concepts that you know nothing
about, makes you look foolish. (Your ignorance of the topic is not
the issue - we are all ignorant of almost everything.)
Have I ever called you ignorant? I don't care about these concepts; they
are not for me. But I appreciate lots of things you don't care for.
Look at this code; it is a silly task, but concentrate on the bit that
does the input:
real a,b,c
print "Three numbers: "
readln a, b, c
println "Their sum is:", a+b+c
The spec is that the three numbers are read /from the same line/, and
can be separated with commas or spaces.
Try to do that `readln` part in Python, and just as simply. Even in C
it's an ordeal.
(My code actually works on either of my languages, static or dynamic.
That's a bonus feature. Imagine a solution in Python or C that works
with both languages.)
No, it is hard, obscure, subtle. Take my word for it.
No, I will not take your word for it. You know nothing about it.
I implemented it, remember? Even if it was a mock-up to see if a
proposed built-in approach would work.
Yes, that was my deferred code feature, itself deferred. (It means I
have to instead define an explicit, named function.)
That's the impression I got. I don't know how you handle captures of
local variables (if you do so at all).
When I had local functions for a while, they could access static
variables, user types, named constants, macros, enums and other local functions within a containing function. Plus of course anything defined globally. But not parameters and stack-frame variables of the enclosing functions.
Quite a lot could actually be done that way. So it could with my
deferred code objects.
Effectively any function is just a variable to which has been
assigned some anonymous function (although in practice, the function
retains its 'F' identify even if the user's 'F' variable has been
assigned a different value).
Python does not have variables. It has /identifiers/. Change
"variable" for "identifier" in your description, and "assigned" to
"bound", and you've got it right.
Just call them variables that work in a particular way: they are
references to objects, but can never be references to other variables.
When you assign a value, you are copying a reference.
And you seem convinced that the Python code I showed is "hackish" and
"unprofessional".
Defining a struct's layout as "IIHHIII" or whatever? Yeah, that's really professional!
The code works fine - it is clear and simple, shorter than in your
language, and easy to modify and maintain.
Really? The struct changes: two fields are swapped. You have to count
along counting which one those characters needed to be exchanged. And
that multiple assignment needs to be revised too. It's a bit hit and miss.
If you prefer to think of structures matching C struct definitions
(which are /one/ way to describe a file format, but certainly not the
only way), you can use the "ctypes" Python module and define a structure.
So why didn't you do that in the first place? I assume that can define pointers too? (Since structs can contain pointers and you might need to access what they point to.)
But I guess that this was about you proving that pointers were
unnecessary...
On 17/11/2022 10:34, David Brown wrote:
On 16/11/2022 23:01, Bart wrote:
But Unicode makes everything harder, with characters taking up
multiple bytes, and a lot of the time it just doesn't work. (I've
seen Unicode errors on everything from TV subtitles to supermarket
receipts, and that was a few weeks ago.)
That's not a Unicode problem - that's a software bug.
It means even the big boys have issues with it.
On 14/11/2022 15:23, David Brown wrote:
On 14/11/2022 11:47, Bart wrote:
...
In-place, value-returning increment ops written as ++ and -- are
common in languages.
Yes. And bugs are common in programs. Being common does not
necessarily mean it's a good idea.
(It doesn't necessarily mean it's a bad idea either - I am not
implying that increment and decrement are themselves a major cause of
bugs! But mixing side-effects inside expressions /is/ a cause of bugs.)
The side effects of even something awkward such as
*(++p) = *(q++);
are little different from those of the longer version
p = p + 1;
*p = *q;
q = q + 1;
The former is clearer, however. That makes it easier to see the intent..
Just blaming operators you don't like is unsound - especially since, as
you seem to suggest below, you use them in your own code!!!
...
[discussion of ++ and -- operators]
Is your point that you shouldn't have either of those operators?
Yes! What gave it away - the first three or four times I said as much?
...
... (Of course I use increment operator, especially in loops, because
that's how C is written. But a new language can do better than that.)
If you think ++ and -- shouldn't exist then why not ban them from your
own programming for a while before you try to get them banned from a new language?
And the "big boys" make mistakes as often as the small folks, both in
terms of understanding the problem, knowing about the solutions, and implementing the code.
On 15/11/2022 18:05, David Brown wrote:
On 15/11/2022 16:26, Bart wrote:
optheadersize, characteristics = struct.unpack_from("<HHIIIHH")import struct # Standard module
bs = open("potato_c.cof").read()
machine, nsections, timestamp, symtaboffset, nsymbols,
That's it. Three lines. I would not think of C for this kind of
thing - Python is /much/ better suited.
I don't believe you.
(BTW you might be missing an argument in that struct.unpack_from call.)
No, I am not. There is an optional third argument, but it is optional.
What about the second argument? I don't understand how the function call knows to get the data from 'bs'.
Using that approach for the nested structs and unions of my other
example is not so straightforward. You basically have to fight for
every field.
You have to define every field in every language, or define the ones
you want along with offsets to skip uninteresting data.
When properly supported, you can define the fields of a struct just as
you would in any static language (see above example), and you can write handling code just as conveniently.
You don't have to manually write strings of anonymous letter codes and
have to remember their ordering everywhere they are used. That is just
crass.
I went out of my way to add such facilities in my scripting language,
because I felt it was important. So you can code just as you would in a static language but with the convenience of informal scripting.
Clearly you don't care for such things and prefer a hack.
The result is a tuple of unnamed fields. You really want a proper
record, which is yet another add-on, with a choice of several modules
depending on which set of characterics you need.
You can do that in Python.
Yeah, I know, you can do anything in Python, since there is an army of
people who will create the necessary add-on modules to create ugly and cumbersome bolted-on solutions.
I can list dozens of things that my scripting language does better than Python. (Here, such a list exists: https://github.com/sal55/langs/blob/master/QLang/QBasics.md.)
In short, you are making up shit in an attempt to make your own
language look better than other languages, because you'd rather say
something silly than admit that any other language could be better in
any way for any task.
Not at all. Python is better for lots of things, mainly because there
are a million libraries that people have written for it, armies of
volunteers who have written suitable, bindings or written all sorts of
shit. And there is huge community and lots of resources to help out.
It is also full of as many advanced, esoteric features that you could
wish for.
But it is short of the more basic and primitive features of the kind I
use and find invaluable.
'struct' is also not a true Python module; it's a front end for an
internal one called `_struct`, likely implemented in C, and almost
certainly using pointers.
Please re-think what you wrote there. I hope you can realise how
ridiculous you are being.
Tell me. Maybe struct.py could be written in pure Python; I don't know.
I'm saying I guarantee mine would have the necessary features to do so.
But this started off being about pointers. Here's another challenge:
this program is for Windows, and displays the first 2 bytes of the
executable image of the interpreter, as loaded in memory:
println peek(0x400000, u16):"m"
fun peek(addr, t=byte) = makeref(addr, t)^
This displays 'MZ' (the signature of PE files on Windows). But of
interest is how Python would implement that peek() function.
On 16/11/2022 00:11, Bart wrote:
I can list dozens of things that my scripting language does better
than Python. (Here, such a list exists:
https://github.com/sal55/langs/blob/master/QLang/QBasics.md.)
I'm not falling for that again. Suffice to say that you can assume any other programmer (not just me) will throw out at least half the list
because they disagree with your opinions. (Which half will, of course,
vary enormously.) And anyone with experience with Python and who is insanely bored could list hundreds or thousands of things that are
better in Python - assuming they could find solid documentation for your language.
This displays 'MZ' (the signature of PE files on Windows). But of
interest is how Python would implement that peek() function.
You think it is a good think that programs have direct access to the
memory their interpreter's executable? Really?
import struct
bs = open("/usr/bin/python").read()
print(struct.unpack_from("H", bs, 0x40000)[0])
That prints the 16-bit value at offset 0x40000 from the start of the
"python" file. Was that what you wanted?
If I want two things to change, why try to squeeze it into /one/
expression or statement? Why not write two statements, each one
doing a single clear and simple task?
On 16/11/2022 21:04, Bart wrote:
That's fine. But you wouldn't enter a discussion with a linguist who
has some experience with Chinese, and try to tell them that Chinese
grammar is beyond human comprehension. You could say that /you/ think
it looks like Chinese writing would be hard to learn - but you could
/not/ say anything about how hard it is for Chinese speakers to learn.
You could not even say that it really would be difficult for you to
learn, because you haven't tried or investigated enough.
I can fully respect your personal preferences - that's not the issue for me. I find it sad and disappointing that someone can have such strong opinions about something they have never really considered or tried to
learn about,
and consequently don't understand, but I guess that is
human nature. It's ancient survival tactics, honed by evolution - when something is new, different and unknown, human instinct is to fear it
and run away, rather than investigate and study it.
The bit that really bugs me is how you (and James) can hold such strong opinions about how /other people/ might like and use these features and languages that support them. Is it so hard to accept that some people
like using higher order functions?
Or that some people write code in
functional programming languages, because they find it a better choice
for their needs?
Is it so hard to accept that other people can write
code for the same task in widely different languages, and /your/ code in /your/ language is not the "perfect" solution or the only "non-clunky"
code?
If I want two things to change, why try to squeeze it into /one/
expression or statement? Why not write two statements, each one doing a single clear and simple task?
(As to writing "i" three times - again, these things are often found in loops, where a good syntax can mean "i" is never written at all.)
Yeah, just like C! If you think this lot is just C with a paint-job,
then you're in denial.
Yes, it is a lot like C.
It has a number of changes, some that I think
are good, some that I think are bad, but basically it is mostly like C.
In particular, it's quite clear to me that when you developed your
language, you had the assembly level implementation heavily in mind when doing so. Why are your numbers 64-bit integers by default?
It is not
because it is a particularly useful size for integers, but because it
fits the cpu's you are targeting. Why do you want integer types that
are powers of [2]?
It's a low-level language.
Of course, I fully expect you to be completely dismissive of all of
this. I wouldn't swap any of these for higher-order functions.
I can't imagine why you would think adding higher-order functions would
mean dropping any of it.
"Intuitive" means you've used it often enough to use the feature without thinking about it. Nothing more. Stop imagining that everything you learned along your programming career is somehow easier than other
methods seen by other people. It's so long since you learned to program that you've forgotten how it goes. When you have a long history of programming in ALGOL, assembly, and perhaps a spot of FORTRAN or BASIC,
the step to C or your own language is minor. That makes it /seem/ intuitive, but it is not - it's just what you are used to.
No. I think you should be happy to accept that you don't know anything about functional programming, and haven't the inclination or motivation
to learn, and leave it at that.
I also think that anyone interested in becoming a better /programmer/ or software developer, rather than just a better /coder/, should learn some function programming. You'll be a better imperative programmer for it.
But for you, personally, I think your prejudice and biases (or
"intuition") are too fixed. You'll never look at something new with an
open mind, so there is little point.
So what? What do bitfield extraction operators give you here? Or
multiple return values? Sod-all.
There is a proposal to add metaclasses to C++.
On 14/11/2022 18:41, Dmitry A. Kazakov wrote:
On 2022-11-14 19:26, James Harris wrote:
On 14/11/2022 11:29, Dmitry A. Kazakov wrote:
Index : Integer := Pointer + 1;
Malformed : Boolean := False;
Underline : Boolean := False;
Symbol : Character;
begin
while Index <= Line'Last loop
Symbol := Line (Index);
if Is_Alphanumeric (Symbol) then
Underline := False;
elsif '_' = Symbol then
Malformed := Malformed or Underline;
Underline := True;
else
exit;
end if;
Index := Index + 1;
end loop;
Malformed := Malformed or Underline;
errors = 0
last_char = line(pointer)
rep for i = pointer + 1, while i le line_last, ++i
ch = line(i)
if ch eq '_'
if last_char eq '_' so ++errors ;Consecutive underscores
on not is_alphanum(ch)
break rep ;If neither underscore nor alphanum we are done
end if
last_char = ch
end rep
if last_char eq '_' so ++errors ;Trailing underscore
On 15/11/2022 20:09, James Harris wrote:
On 15/11/2022 17:31, David Brown wrote:
On 15/11/2022 17:58, James Harris wrote:
The question is not whether prevention would be possible but whether
you (i.e. DB) would consider it /advisable/. If you prevented it then
a lot of familiar programming patterns and a number of existing APIs
would become unavailable to you so choose wisely...! :-)
I am not the language designer here
- and I still don't really grok what
kind of language /you/ want, what you understand from before, what uses
it should have, or what you think is wrong with existing languages. (Or maybe this is all for fun and interest, which is always the best reason
for doing anything.) That makes it hard to give recommendations.
You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something,
allow it. If you want to prohibit it, prohibit it.
Sorry, but it doesn't work like that.
Yes, it does.
A language cannot be built on ad-hoc choices such as you have suggested.
I haven't suggested ad-hoc choices. I have tried to make reasoned suggestions. Being different from languages you have used before, or
how you envision your new language, does not make them ad-hoc.
BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward to
use.
So do over-generalisations.
Not really.
Yes, really.
Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters. Let's treat them the same". Now people can name a function "123", or "2+2".
It's conceivable that you'd work out a grammar and parsing rules that
allow that (Forth, for example, has no problem with functions that are
named by digits. You can redefine "2" to mean "1" if you like). Do you think that would make the language easier to learn and less awkward to use?
It's ad-hoc rules which become burdensome.
Agreed.
Seriously, try designing a language, yourself. You don't have to
implement it. Just try coming up with a cohesive design of something
you would like to program in.
If I had the time... :-)
I fully appreciate that this is not an easy task.
Bart came up with an example something like
+(+(+(+ x)))
That's not at all sensible. You want that banned, too?
Yes :-) Seriously, I appreciate that there will always be compromises - trying to ban everything silly while allowing everything sensible would
mean countless ad-hoc rules, and you are right to reject that. I am advocating drawing a line, just like you - the difference is merely a
matter of where to draw that line. I'd draw the line so that it throws
out the increment and decrement operators entirely. But if you really wanted to keep them, I'd make them postfix only and as statements, not
in expressions - let "x++" mean "x += 1" which means "x = 1" which
should, IMHO, be a statement and not allowed inside an expression.
On 15/11/2022 18:32, James Harris wrote:
The side effects of even something awkward such as
*(++p) = *(q++);
are little different from those of the longer version
p = p + 1;
*p = *q;
q = q + 1;
The former is clearer, however. That makes it easier to see the intent..
Really? I have no idea what the programmer's intent was. "*p++ =
*q++;" is common enough that the intent is clear there, but from your
first code I can't see /why/ the programmer wanted to /pre/increment
"p". Maybe he/she made a mistake? Maybe he/she doesn't really
understand the difference between pre-increment and post-increment? It's
a common beginners misunderstanding.
On the other hand, it is quite clear from the separate lines exactly
what order the programmer intended.
What would you say are the differences in side-effects of these two code snippets? (I'm assuming we are talking about C here.)
Just blaming operators you don't like is unsound - especially since,
as you seem to suggest below, you use them in your own code!!!
All I am saying is that it's worth considering the advantages and disadvantages of making a decision about such operators. I'm not
denying that the operators can be useful - I am questioning whether
those uses are enough in comparison to the advantages of /not/ having them.
... I find it sad and disappointing that someone can have such strong opinions about something they have never really considered or tried to
learn about, ...
and consequently don't understand, but I guess that is
human nature. It's ancient survival tactics, honed by evolution - when something is new, different and unknown, human instinct is to fear it
and run away, rather than investigate and study it.
The bit that really bugs me is how you (and James) can hold such strong opinions about how /other people/ might like and use these features and languages that support them.
On 2022-11-15 12:44, James Harris wrote:
Do you also believe that the Unix
bytes = read(fd, &buf[1], reqd);
should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??
That is simple. Ada's standard library has it:
procedure Read
( Stream : in out Root_Stream_Type;
Item : out Stream_Element_Array;
Last : out Stream_Element_Offset
) is abstract;
Item is an array:
type Stream_Element_Array is
array (Stream_Element_Offset range <>) of aliased Stream_Element;
It is also a "virtual" operation in C++ terms to be overridden by new implementation of stream. Last is the index of the last element read.
Notice non-sliding bounds, as you can do this:
Last := Buff'First - 1;
loop
Read (S, Buff (Last + 1..Buff'Last), Last); -- Non-blocking chunk
exit when Last = Buff'Last; -- Done
end loop;
Since bounds do not slide Last stays valid for all array slices.
On 15/11/2022 12:14, Dmitry A. Kazakov wrote:
On 2022-11-15 12:44, James Harris wrote:
Do you also believe that the Unix
bytes = read(fd, &buf[1], reqd);
should be prohibited since it has the side effect within the
expression of modifying the buffer? If so, what would you replace it
with??
That is simple. Ada's standard library has it:
procedure Read
( Stream : in out Root_Stream_Type;
Item : out Stream_Element_Array;
Last : out Stream_Element_Offset
) is abstract;
Item is an array:
type Stream_Element_Array is
array (Stream_Element_Offset range <>) of aliased Stream_Element;
It is also a "virtual" operation in C++ terms to be overridden by new
implementation of stream. Last is the index of the last element read.
Notice non-sliding bounds, as you can do this:
Last := Buff'First - 1;
loop
Read (S, Buff (Last + 1..Buff'Last), Last); -- Non-blocking chunk
exit when Last = Buff'Last; -- Done
end loop;
Since bounds do not slide Last stays valid for all array slices.
That's cool. So the call passes to Read a 'virtual' array, aka a view of
part of an array, and Last is /output/ from the call? Presumably the
array is made into a view (rather than an actual array) by means of the "aliased" keyword. Is that correct?
Since Last is output from Read why do you set it before the loop starts?
If there's no more data does Read throw an exception?
It's interesting that the array is termed an /out/ parameter even thoughThere is no much difference between out and in out arrays. Basically out
only part of it might be overwritten!
On 15/11/2022 21:40, David Brown wrote:
Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters.
Let's treat them the same". Now people can name a function "123", or
"2+2". It's conceivable that you'd work out a grammar and parsing
rules that allow that (Forth, for example, has no problem with
functions that are named by digits. You can redefine "2" to mean "1"
if you like). Do you think that would make the language easier to
learn and less awkward to use?
Certainly not. Why do you ask?
If I wanted to display UTF8 right now on Windows, say from a C program
even, I would have to fight it. If I write this (created with Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
and compile with gcc, it shows:
€°£
I'm not sure what code page it's on, but if I switch to 65001 which is supposed to be UTF8, then it shows:
�������
(or equivalent in the terminal font). If I dump the C source, it does
indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC code
for the Euro sign.
I'm sure that on Linux it works perfectly within a terminal window. But
I'm on Windows and I can't be bothered to do battle. Even if /I/ get it
to work, I can't guarantee it for anyone else.
On 17/11/2022 11:24, Bart wrote:
...
If I wanted to display UTF8 right now on Windows, say from a C program
even, I would have to fight it. If I write this (created with Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
and compile with gcc, it shows:
€°£
I'm not sure what code page it's on, but if I switch to 65001 which is
supposed to be UTF8, then it shows:
�������
(or equivalent in the terminal font). If I dump the C source, it does
indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC
code for the Euro sign.
I'm sure that on Linux it works perfectly within a terminal window.
But I'm on Windows and I can't be bothered to do battle. Even if /I/
get it to work, I can't guarantee it for anyone else.
I presume you piped the output into hd or xxd to see exactly what was
being sent to the terminal - and hopefully prove it was the correct UTF-8.
On 19/11/2022 16:05, James Harris wrote:
On 17/11/2022 11:24, Bart wrote:
...
If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created with
Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
and compile with gcc, it shows:
€°£
I'm not sure what code page it's on, but if I switch to 65001 which
is supposed to be UTF8, then it shows:
�������
(or equivalent in the terminal font). If I dump the C source, it does
indeed contain the E2 82 AC sequence which is the UTF8 for the 20AC
code for the Euro sign.
I'm sure that on Linux it works perfectly within a terminal window.
But I'm on Windows and I can't be bothered to do battle. Even if /I/
get it to work, I can't guarantee it for anyone else.
I presume you piped the output into hd or xxd to see exactly what was
being sent to the terminal - and hopefully prove it was the correct
UTF-8.
Well, gcc using puts, or bcc/tcc using puts or prints, work correcly.
For some reason gcc+printf doesn't deal with it properly. I assumed
those characters were raw UTF8 bytes, and redirecting it now to a file,
that is exactly what I get.
gcc+printf is bypassing something it shouldn't.
On 2022-11-17 13:35, Bart wrote:
On 17/11/2022 12:12, Dmitry A. Kazakov wrote:
On 2022-11-17 12:24, Bart wrote:
If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
In CMD:
;CHCP 65001Active code page: 65001
;main.exe€°£
Of course, you could use the code you wrote under the condition that
both the editor and the compiler use UTF-8.
The point about UTF8 is that it doesn't matter. So the string contains
'character' E2; in C, this is just a byte array, it should just pass
it as it is to the printf function.
That would work, but is also completely impractical for large amounts
of non-ASCII content. Or even small amounts. You /need/ editor
support. I don't have it and don't do enough with Unicode to make it
worth the trouble.
That's is another guideline topic: you never ever place localization
stuff in the source code.
On 19/11/2022 16:20, Bart wrote:
On 19/11/2022 16:05, James Harris wrote:
On 17/11/2022 11:24, Bart wrote:
...
If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
and compile with gcc, it shows:
€°£
I'm not sure what code page it's on, but if I switch to 65001 which
is supposed to be UTF8, then it shows:
�������
(or equivalent in the terminal font). If I dump the C source, it
does indeed contain the E2 82 AC sequence which is the UTF8 for the
20AC code for the Euro sign.
I'm sure that on Linux it works perfectly within a terminal window.
But I'm on Windows and I can't be bothered to do battle. Even if /I/
get it to work, I can't guarantee it for anyone else.
I presume you piped the output into hd or xxd to see exactly what was
being sent to the terminal - and hopefully prove it was the correct
UTF-8.
Well, gcc using puts, or bcc/tcc using puts or prints, work correcly.
For some reason gcc+printf doesn't deal with it properly. I assumed
those characters were raw UTF8 bytes, and redirecting it now to a
file, that is exactly what I get.
gcc+printf is bypassing something it shouldn't.
Is printf supposed to handle non-ASCII strings? Remember that it is
expected to interpret the first string whereas puts isn't.
Actually, including UTF8 in any simple string sounds dodgy. As an
example, imagine an embedded byte value of 0x80 on a 1s complement
machine. It would likely terminate the string.
IOW I wouldn't expect any of this stuff to work portably.
And as I said before, source should be pure ASCII....!
On 19/11/2022 19:47, James Harris wrote:
On 19/11/2022 16:20, Bart wrote:
For some reason gcc+printf doesn't deal with it properly. I assumed
those characters were raw UTF8 bytes, and redirecting it now to a
file, that is exactly what I get.
gcc+printf is bypassing something it shouldn't.
Is printf supposed to handle non-ASCII strings? Remember that it is
expected to interpret the first string whereas puts isn't.
The only characters printf formats care about are '%', which indicates
the start of a format sequence, and 0, which indicates the end of the
string.
Besides, all the other printf versions I tried worked (and it works on Linux).
Actually, including UTF8 in any simple string sounds dodgy. As an
example, imagine an embedded byte value of 0x80 on a 1s complement
machine. It would likely terminate the string.
What machines use 1s complement these days? You might as well worry
about those using 7-bit characters! Or EBCDIC.
UTF8 was designed to be transparent to anything processing 8-bit
strings. On ones completed it presumably wouldn't work, unless
characters were wider than 8 bits.
(Don't tell me you're avoiding the use of UTF8 for that reason. For
anyone still using ones complement, probably ASCII would be too advanced
as they're still using 5-bit telegraph codes!)
IOW I wouldn't expect any of this stuff to work portably.
And as I said before, source should be pure ASCII....!
Sure. But this is mostly about data within programs which can be anything.
Restricting source code means you can't have Unicode content in
comments, or inside string constants.
The means not being able to have "°", you'd need an escape sequence. Or convert any pasted Unicode string into such a string.
This is unreasonable considering that providing such support within a compiler requires pretty much zero effort.
On 19/11/2022 19:47, James Harris wrote:
On 19/11/2022 16:20, Bart wrote:
On 19/11/2022 16:05, James Harris wrote:
On 17/11/2022 11:24, Bart wrote:
...
If I wanted to display UTF8 right now on Windows, say from a C
program even, I would have to fight it. If I write this (created
with Notepad):
#include <stdio.h>
int main(void) {
printf("€°£");
}
and compile with gcc, it shows:
€°£
gcc+printf is bypassing something it shouldn't.
Is printf supposed to handle non-ASCII strings? Remember that it is
expected to interpret the first string whereas puts isn't.
The only characters printf formats care about are '%', which indicates
the start of a format sequence, and 0, which indicates the end of the
string.
Besides, all the other printf versions I tried worked (and it works on Linux).
On 19/11/2022 20:20, Bart wrote:
Besides, all the other printf versions I tried worked (and it works on Linux).As I said in the other reply, if this is contrary to the specification
then this stuff is poisonous. But curious to see what would happen in my particular environment I tried your code. The source file had your
string between the quotes as
e2 82 ac c2 b0 c2 a3
What is that? UTF8?
On 15/11/2022 21:40, David Brown wrote:
On 15/11/2022 20:09, James Harris wrote:
On 15/11/2022 17:31, David Brown wrote:
On 15/11/2022 17:58, James Harris wrote:
...
The question is not whether prevention would be possible but whether
you (i.e. DB) would consider it /advisable/. If you prevented it then
a lot of familiar programming patterns and a number of existing APIs
would become unavailable to you so choose wisely...! :-)
I am not the language designer here
Uh, huh.
- and I still don't really grok what kind of language /you/ want, what
you understand from before, what uses it should have, or what you
think is wrong with existing languages. (Or maybe this is all for fun
and interest, which is always the best reason for doing anything.)
That makes it hard to give recommendations.
...
You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something,
allow it. If you want to prohibit it, prohibit it.
Sorry, but it doesn't work like that.
Yes, it does.
No, it does not. Your view of language design is far too simplistic.
Note, also, that in a few paragraphs you say that you are not the
language designer whereas I am, but then you go on to try to tell me how
it works and how it doesn't and, previously, that anything can be done.
You'd gain by /trying/ it yourself. They you might see that it's not as straightforward as you suggest.
A language cannot be built on ad-hoc choices such as you have suggested.
I haven't suggested ad-hoc choices. I have tried to make reasoned
suggestions. Being different from languages you have used before, or
how you envision your new language, does not make them ad-hoc.
Saying you'd like selected combinations of operators to be banned looks
like an ad-hoc approach to me.
...
BTW, any time one thinks of 'treating X separately' it's good to be
wary. Step-outs tend to make a language hard to learn and awkward
to use.
So do over-generalisations.
Not really.
Yes, really.
You simply repeating phrases back to me but in the negative does not
make your assertions correct or mine wrong.
Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters.
Let's treat them the same". Now people can name a function "123", or
"2+2". It's conceivable that you'd work out a grammar and parsing
rules that allow that (Forth, for example, has no problem with
functions that are named by digits. You can redefine "2" to mean "1"
if you like). Do you think that would make the language easier to
learn and less awkward to use?
Certainly not. Why do you ask?
It's ad-hoc rules which become burdensome.
Agreed.
Phew!
...
Seriously, try designing a language, yourself. You don't have to
implement it. Just try coming up with a cohesive design of something
you would like to program in.
If I had the time... :-)
I fully appreciate that this is not an easy task.
I'm sure you do but your view of the details is superficial. You have
ideas which are interesting in themselves but you don't appear to
appreciate how decisions bear on each other when you have to bring
hundreds of them together.
...
Bart came up with an example something like
+(+(+(+ x)))
That's not at all sensible. You want that banned, too?
Yes :-) Seriously, I appreciate that there will always be compromises
- trying to ban everything silly while allowing everything sensible
would mean countless ad-hoc rules, and you are right to reject that.
I am advocating drawing a line, just like you - the difference is
merely a matter of where to draw that line. I'd draw the line so that
it throws out the increment and decrement operators entirely. But if
you really wanted to keep them, I'd make them postfix only and as
statements, not in expressions - let "x++" mean "x += 1" which means
"x = 1" which should, IMHO, be a statement and not allowed inside an
expression.
This does, indeed, in a sense, come down to where the designer decides
to draw the line. Unfortunately there is no simple line.
For example, you spoke about banning side effects in expressions. For
sure, you could do that. But then you thought side effects in function
calls in expressions should possibly be treated differently and be left
in! Making such rules is not as simple as it may appear.
All the
decisions a language designer makes have a tendency to bear on each
other, even if only in the ethos of the final language: whether it's
simple and cohesive or ad hoc and confusing.
Further, remember that the decisions the language designer makes have to
be communicated to the programmer. If a designer says "these side
effects are allowed but these other ones are not" then that just gives
the programmer more to learn and remember.
As I say, you could try designing a language. You are a smart guy. You
could work on a design in your head while walking to the shops, while
waiting for a train, etc. As one of my books on language design says,
"design repeatedly: it will make you a better designer".
Look at C as an example. Not everyone likes the language, and the only people who find nothing to dislike in it are people to haven't used it enough.
But it is undoubtedly a highly successful language.
On 18/11/2022 11:00, David Brown wrote:
On 15/11/2022 18:32, James Harris wrote:
...
The side effects of even something awkward such as
*(++p) = *(q++);
are little different from those of the longer version
p = p + 1;
*p = *q;
q = q + 1;
The former is clearer, however. That makes it easier to see the intent..
Really? I have no idea what the programmer's intent was. "*p++ =
*q++;" is common enough that the intent is clear there, but from your
first code I can't see /why/ the programmer wanted to /pre/increment
"p". Maybe he/she made a mistake? Maybe he/she doesn't really
understand the difference between pre-increment and post-increment?
It's a common beginners misunderstanding.
I don't think I know of any language which allows a programmer to say
/why/ something is the case; that's what comments are for. Programs
normally talk about /what/ to do, not why. The very fact that the
assignment does something non-idiomatic is a sign that a comment could
be useful. It's akin to
for (i = 0; i <= n ....
If the test really should be <= then a comment may be useful to explain
why.
On the other hand, it is quite clear from the separate lines exactly
what order the programmer intended.
What would you say are the differences in side-effects of these two
code snippets? (I'm assuming we are talking about C here.)
That depends on whether the operations are ordered or not. In C they'd
be different, potentially, from what they would be in my language. What
would you say they are?
On 18/11/2022 21:14, James Harris wrote:
On 18/11/2022 11:00, David Brown wrote:
On 15/11/2022 18:32, James Harris wrote:
...
The side effects of even something awkward such as
*(++p) = *(q++);
are little different from those of the longer version
p = p + 1;
*p = *q;
q = q + 1;
The former is clearer, however. That makes it easier to see the
intent..
Really? I have no idea what the programmer's intent was. "*p++ =
*q++;" is common enough that the intent is clear there, but from your
first code I can't see /why/ the programmer wanted to /pre/increment
"p". Maybe he/she made a mistake? Maybe he/she doesn't really
understand the difference between pre-increment and post-increment?
It's a common beginners misunderstanding.
I don't think I know of any language which allows a programmer to say
/why/ something is the case; that's what comments are for. Programs
normally talk about /what/ to do, not why. The very fact that the
assignment does something non-idiomatic is a sign that a comment could
be useful. It's akin to
for (i = 0; i <= n ....
If the test really should be <= then a comment may be useful to
explain why.
Ideally there should be no need for a comment, because the code makes it clear - for example via the names of the identifiers, or from the rest
of the context. That rarely happens in out-of-context snippets.
On the other hand, it is quite clear from the separate lines exactly
what order the programmer intended.
What would you say are the differences in side-effects of these two
code snippets? (I'm assuming we are talking about C here.)
That depends on whether the operations are ordered or not. In C they'd
be different, potentially, from what they would be in my language.
What would you say they are?
You said the side-effects are "a little different", so I wanted to hear
what you meant.
In C, there is no pre-determined sequencing between the two increments -
they can occur in any order, or can be interleaved. As far as the C abstract machine is concerned (and that's what determines what
side-effects mean), unsequenced events are not ordered and it doesn't
make sense to say which happened first. You can consider them as
happening at the same time - and if that affects the outcome of the
program, then it is at least unspecified behaviour if not undefined behaviour. (It would be undefined behaviour if "p" and "q" referred to
the same object, for example.)
So I don't think it really makes sense to say that the order is
different. If the original "*(++p) = *(q++);" makes sense at all, and
is defined behaviour, then it's behaviour is not distinguishable from
within the C language from the expanded version.
On 21/11/2022 15:01, David Brown wrote:
On 18/11/2022 21:14, James Harris wrote:
Either way, non-idiomatic code is a flag. And in that it's useful - especially if its easy to read.
On the other hand, it is quite clear from the separate lines exactly
what order the programmer intended.
What would you say are the differences in side-effects of these two
code snippets? (I'm assuming we are talking about C here.)
That depends on whether the operations are ordered or not. In C
they'd be different, potentially, from what they would be in my
language. What would you say they are?
You said the side-effects are "a little different", so I wanted to
hear what you meant.
I said they were "little different", not "a little different".
In other
words, focus on the main point rather than minutiae such as what could
happen if the pointers were identical or overlapped, much as you go on
to mention:
In C, there is no pre-determined sequencing between the two increments
- they can occur in any order, or can be interleaved. As far as the C
abstract machine is concerned (and that's what determines what
side-effects mean), unsequenced events are not ordered and it doesn't
make sense to say which happened first. You can consider them as
happening at the same time - and if that affects the outcome of the
program, then it is at least unspecified behaviour if not undefined
behaviour. (It would be undefined behaviour if "p" and "q" referred
to the same object, for example.)
So I don't think it really makes sense to say that the order is
different. If the original "*(++p) = *(q++);" makes sense at all, and
is defined behaviour, then it's behaviour is not distinguishable from
within the C language from the expanded version.
On 18/11/2022 20:01, James Harris wrote:
On 15/11/2022 21:40, David Brown wrote:
On 15/11/2022 20:09, James Harris wrote:
On 15/11/2022 17:31, David Brown wrote:
You assume /so/ many limitations on what you can do as a language
designer. You can do /anything/. If you want to allow something, >>>>> allow it. If you want to prohibit it, prohibit it.
Sorry, but it doesn't work like that.
Yes, it does.
No, it does not. Your view of language design is far too simplistic.
Note, also, that in a few paragraphs you say that you are not the
language designer whereas I am, but then you go on to try to tell me
how it works and how it doesn't and, previously, that anything can be
done. You'd gain by /trying/ it yourself. They you might see that it's
not as straightforward as you suggest.
That is a fair point. But I challenge you to show me where there are
rules written for language designs. Explain to me exactly why you are
not allowed to, say, provide an operator "-" without a corresponding
operator "+". Tell me who is banning you from deciding that source code lines must be limited to 40 characters, or that every assignment
statement shall be preceded by the keyword "please". I'm not saying any
of these things are a good idea (though something similar has been done
in other cases), I am saying it is /your/ choice to do that or not.
You can say "I can't have feature A and feature B and maintain the consistency I want." You /cannot/ say "I can't have feature A". It is /your/ decision not have feature A. Choosing to have it may mean
changing or removing feature B, or losing some consistency that you had
hoped to maintain. But it is your language, your choices, your responsibility - saying "I can't do that" is abdicating that
responsibility.
A language cannot be built on ad-hoc choices such as you have
suggested.
It most certainly can. Every language is a collection of design
decisions, and most of them are at least somewhat ad-hoc.
However, my suggestions where certainly /not/ ad-hoc
- it was for a
particular way of thinking about operators and expressions, with justification and an explanation of the benefits. Whether you choose to follow those suggestions or not, is a matter of your personal choices
for how you want your language to work - and /that/ choice is therefore somewhat ad-hoc. They only appear ad-hoc if you don't understand what I wrote justifying them or giving their advantages.
Of course you want a language to follow a certain theme or style (or
"ethos", as you called it). But that does not mean you can't make
ad-hoc decisions if you want - it is inevitable that you will do so. And
it certainly does not mean you can't make the choices you want for your language.
Too many ad-hoc choices mean you loose the logic and consistency in the language. Too few, and your language has nothing to it. Excessive consistency is great for some theoretical work - Turing machines,
lambda calculus, infinite register machines, and the like. It is
useless in a real language.
Look at C as an example. Not everyone likes the language, and the only people who find nothing to dislike in it are people to haven't used it enough. But it is undoubtedly a highly successful language. All binary operators require the evaluation of both operands before evaluating the operator. (And before you start thinking that is unavoidable, it is
not, and does not apply to all languages.) Except && and ||, where the second operand is not evaluated if it is not needed - that's an ad-hoc decision, different from the general rule. All access to objects must
be through lvalues of compatible types - except for the ad-hoc rule that character type pointers can also be used.
To be successful at anything - program language design or anything else
- you always need to aim for a balance. Consistency is vital - too much consistency is bad. Generalisation is good - over-generalisation is
bad. Too much ad-hoc is bad, so is too little.
I haven't suggested ad-hoc choices. I have tried to make reasoned
suggestions. Being different from languages you have used before, or
how you envision your new language, does not make them ad-hoc.
Saying you'd like selected combinations of operators to be banned
looks like an ad-hoc approach to me.
Then you misunderstand what I wrote. I don't know if that was my fault
in poor explanations, or your fault in misreading or misunderstanding -
no doubt, it was a combination.
Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters.
Let's treat them the same". Now people can name a function "123", or
"2+2". It's conceivable that you'd work out a grammar and parsing
rules that allow that (Forth, for example, has no problem with
functions that are named by digits. You can redefine "2" to mean "1"
if you like). Do you think that would make the language easier to
learn and less awkward to use?
Certainly not. Why do you ask?
I ask, because it is an example of over-generalisation that makes a
language harder to learn and potentially a lot more confusing to
understand.
Further, remember that the decisions the language designer makes have
to be communicated to the programmer. If a designer says "these side
effects are allowed but these other ones are not" then that just gives
the programmer more to learn and remember.
Sure. But programmers are not stupid (or at least, you are not catering
for stupid programmers). They can learn more than one rule.
As I say, you could try designing a language. You are a smart guy. You
could work on a design in your head while walking to the shops, while
waiting for a train, etc. As one of my books on language design says,
"design repeatedly: it will make you a better designer".
Oh, I have plenty of ideas for a language - I have no end to the number
of languages, OS's, processors, and whatever that I have "designed" in
my head :-) The devil's in the details, however, and I haven't taken
the time for that!
On 23/11/2022 17:59, James Harris wrote:
On 21/11/2022 15:01, David Brown wrote:
On 18/11/2022 21:14, James Harris wrote:
What would you say are the differences in side-effects of these two
code snippets? (I'm assuming we are talking about C here.)
That depends on whether the operations are ordered or not. In C
they'd be different, potentially, from what they would be in my
language. What would you say they are?
You said the side-effects are "a little different", so I wanted to
hear what you meant.
I said they were "little different", not "a little different".
Ah, my mistake. Still, it implies you think there is /some/ difference.
In other words, focus on the main point rather than minutiae such as
what could happen if the pointers were identical or overlapped, much
as you go on to mention:
OK, so you don't think there is any differences in side-effect other
than the possible issue I mentioned of undefined behaviour in very
particular circumstances. That's fine - I just wanted to know if you
were thinking of something else.
(Note that the freedom for compilers to re-arrange code from the
"compact" form to the "expanded" form is one of the reasons why such unsequenced accesses to the same object are undefined behaviour in C.)
On 20/11/2022 12:28, David Brown wrote:
On 18/11/2022 20:01, James Harris wrote:
On 15/11/2022 21:40, David Brown wrote:
Well, your comments have let me know what you mean, at least, but when I
say "it doesn't work like that" I mean that language design is not as
simple as you suggest. In absolute terms I agree with you: you are right
that a designer can make any decisions he wants. But in reality certain things are /infeasible/. You might as well say you could get from your
house to the nearest supermarket by flying to another country first. In absolute terms you probably could do that and eventually get where you
want to go but in reality it's so absurd a suggestion that it's infeasible.
A language cannot be built on ad-hoc choices such as you have
suggested.
It most certainly can. Every language is a collection of design
decisions, and most of them are at least somewhat ad-hoc.
However, my suggestions where certainly /not/ ad-hoc
Hmm, you suggested banning side effects, except in function calls, and banning successive prefix "+" operators. Those suggestions seem rather
ad hoc to me.
- it was for a particular way of thinking about operators and
expressions, with justification and an explanation of the benefits.
Whether you choose to follow those suggestions or not, is a matter of
your personal choices for how you want your language to work - and
/that/ choice is therefore somewhat ad-hoc. They only appear ad-hoc
if you don't understand what I wrote justifying them or giving their
advantages.
True, if there is a legitimate and useful reason for a rule then that
rule will seem less ad hoc than if the reasons for it are unknown.
Of course you want a language to follow a certain theme or style (or
"ethos", as you called it). But that does not mean you can't make
ad-hoc decisions if you want - it is inevitable that you will do so.
And it certainly does not mean you can't make the choices you want for
your language.
Too many ad-hoc choices mean you loose the logic and consistency in
the language. Too few, and your language has nothing to it.
Excessive consistency is great for some theoretical work - Turing
machines, lambda calculus, infinite register machines, and the like.
It is useless in a real language.
Look at C as an example. Not everyone likes the language, and the
only people who find nothing to dislike in it are people to haven't
used it enough. But it is undoubtedly a highly successful language.
All binary operators require the evaluation of both operands before
evaluating the operator. (And before you start thinking that is
unavoidable, it is not, and does not apply to all languages.) Except
&& and ||, where the second operand is not evaluated if it is not
needed - that's an ad-hoc decision, different from the general rule.
All access to objects must be through lvalues of compatible types -
except for the ad-hoc rule that character type pointers can also be used.
To be successful at anything - program language design or anything
else - you always need to aim for a balance. Consistency is vital -
too much consistency is bad. Generalisation is good -
over-generalisation is bad. Too much ad-hoc is bad, so is too little.
Fair enough. Short-circuit evaluation is a good example of what you have
been saying, although it effects a semantic change. By contrast, banning prefix "+" operators because you don't like them does not effect any
useful change in the semantics of a program.
I haven't suggested ad-hoc choices. I have tried to make reasoned
suggestions. Being different from languages you have used before,
or how you envision your new language, does not make them ad-hoc.
Saying you'd like selected combinations of operators to be banned
looks like an ad-hoc approach to me.
Then you misunderstand what I wrote. I don't know if that was my
fault in poor explanations, or your fault in misreading or
misunderstanding - no doubt, it was a combination.
Maybe. I thought you wanted ++E++ banned because it had successive ++ operators but perhaps I misunderstood. Was what you actually wanted
banned /any/ use of ++ operators? If the language /is/ to have ++
operators after all, though, would you still want ++E++ banned?
Imagine if you were to stop treating "letters", "digits" and
"punctuation" separately, and say "They are all just characters.
Let's treat them the same". Now people can name a function "123",
or "2+2". It's conceivable that you'd work out a grammar and parsing
rules that allow that (Forth, for example, has no problem with
functions that are named by digits. You can redefine "2" to mean
"1" if you like). Do you think that would make the language easier
to learn and less awkward to use?
Certainly not. Why do you ask?
I ask, because it is an example of over-generalisation that makes a
language harder to learn and potentially a lot more confusing to
understand.
I don't see any lack of generalisation in setting out rules for
identifier names.
[Snipped a bunch of points on which we agree.]
Further, remember that the decisions the language designer makes have
to be communicated to the programmer. If a designer says "these side
effects are allowed but these other ones are not" then that just
gives the programmer more to learn and remember.
Sure. But programmers are not stupid (or at least, you are not
catering for stupid programmers). They can learn more than one rule.
You are rather changing your tune, there. Earlier you were concerned
about programmers failing to understand the difference between
pre-increment and post-increment!
As I say, you could try designing a language. You are a smart guy.
You could work on a design in your head while walking to the shops,
while waiting for a train, etc. As one of my books on language design
says, "design repeatedly: it will make you a better designer".
Oh, I have plenty of ideas for a language - I have no end to the
number of languages, OS's, processors, and whatever that I have
"designed" in my head :-) The devil's in the details, however, and I
haven't taken the time for that!
Yes, the devil is indeed in the details. It's one thing to have some
good ideas. It's quite another to bring them together into a single
product.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (3 / 13) |
Uptime: | 36:00:42 |
Calls: | 10,392 |
Calls today: | 3 |
Files: | 14,064 |
Messages: | 6,417,152 |