How many people know about this? It was introduced in c99.
If you
“#include <iso646.h>”, then you can use alternative symbols like “not”
instead of “!”, “and” instead of “&&” and “or” instead of “||”.
C++ already had this, without the need to include such a file.
How many people know about this? It was introduced in c99. If you
“#include <iso646.h>”, then you can use alternative symbols like “not”
instead of “!”, “and” instead of “&&” and “or” instead of “||”.
C++ already had this, without the need to include such a file.
On 1/21/24 20:51, Lawrence D'Oliveiro wrote:
C++ already had this, without the need to include such a file.
That's because backwards compatibility with older versions of C is a
lower priority for C++ than it is for C.
How many people know about this? It was introduced in c99. If you
“#include <iso646.h>”, then you can use alternative symbols like “not”
instead of “!”, “and” instead of “&&” and “or” instead of “||”.
C++ already had this, without the need to include such a file.
On 22/01/2024 02:51, Lawrence D'Oliveiro wrote:
How many people know about this? It was introduced in c99. If you
... I don't use the matching names in C++ either ...
Although it was available before 1999 - the SVR4 C compilation System
(CCS) had an iso6[4]6.h header file in the early 90's.
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves readability:
As for "and" being more readable than "&&", that's not necessarily the
case for people who are accustomed to reading C code.
Don’t you think it improves readability:
On Mon, 22 Jan 2024 14:56:53 -0800, Keith Thompson wrote:
As far as I can tell, the macros defined in <iso646.h> have never caught
on significantly.
The nice thing is, I don’t have to care. They have to be part of any standards-compliant C compiler, therefore I am free to use them. And I do.
On Mon, 22 Jan 2024 23:08:53 -0000 (UTC), Blue-Maned_Hawk wrote:
Lawrence D'Oliveiro wrote:
Don’t you think it improves readability:
No.
Lessig’s Law: The one who writes the code makes the rules.
As far as I can tell, the macros defined in <iso646.h> have never caught
on significantly.
Lawrence D'Oliveiro wrote:
Don’t you think it improves readability:
No.
I do, if/when I do use C++ and C. Don't you think it improves
readability:
[example]
On Mon, 22 Jan 2024 23:08:53 -0000 (UTC), Blue-Maned_Hawk wrote:
Lawrence D'Oliveiro wrote:
Don’t you think it improves readability:
No.
Lessig’s Law: The one who writes the code makes the rules.
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves readability: [...]
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves readability:
if (ThisCh < '0' or ThisCh > '9')
{
if (AllowSign and Index == 0 and (ThisCh == '+' or ThisCh == '-'))
{
/* fine */
}
else if (AllowDecimal and not DecimalSeen and ThisCh == '.')
{
DecimalSeen = true; /* only allow one decimal point */
}
else
{
Valid = false;
break;
} /*if*/
} /*if*/
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves readability:
if (ThisCh < '0' or ThisCh > '9')
{
if (AllowSign and Index == 0 and (ThisCh == '+' or ThisCh == '-'))
{
/* fine */
}
else if (AllowDecimal and not DecimalSeen and ThisCh == '.')
{
DecimalSeen = true; /* only allow one decimal point */
}
else
{
Valid = false;
break;
} /*if*/
} /*if*/
if
(
ThisCh >= 'a' and ThisCh <= 'z'
or
ThisCh >= 'A' and ThisCh <= 'Z'
or
ThisCh >= '0' and ThisCh <= '9'
or
ThisCh == '_'
or
ThisCh == '-'
or
ThisCh == '.'
or
ThisCh == '/'
)
{
Result.append(1, ThisCh);
}
On 22/01/2024 20:34, Lawrence D'Oliveiro wrote:
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:It breaks the rule that, in C, variables and functions are alphnumeric, whilst operators are symbols. sizeof is an exception, but a justified
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves readability: >>
one. However it's harder to justify a symbol for "plus" but a word for "or".
On 22/01/2024 20:34, Lawrence D'Oliveiro wrote:
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:It breaks the rule that, in C, variables and functions are alphnumeric, whilst operators are symbols. sizeof is an exception, but a justified
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves
readability:
one. However it's harder to justify a symbol for "plus" but a word for
"or".
On 23/01/2024 16:32, Malcolm McLean wrote:
Every explanation for && and || for every language that copied them from
C, is that && means AND, and || means OR.
bart <bc@freeuk.com> writes:
On 23/01/2024 16:32, Malcolm McLean wrote:
Every explanation for && and || for every language that copied them from
C, is that && means AND, and || means OR.
in C, && specifically means 'conditional and'. The programmer can
rely on the fact that the second term will not be evaluated if
the first term evaluates to false.
Actually, C uses the term "Logical AND". I don't have any idea what "conditional and" is supposed to mean, except that the explanation you provide matches the term "Logical AND".
On 1/23/24 13:52, Scott Lurndal wrote:
bart <bc@freeuk.com> writes:
On 23/01/2024 16:32, Malcolm McLean wrote:
Every explanation for && and || for every language that copied them from >>> C, is that && means AND, and || means OR.
in C, && specifically means 'conditional and'. The programmer can
rely on the fact that the second term will not be evaluated if
the first term evaluates to false.
Actually, C uses the term "Logical AND".
On 22/01/2024 21:34, Lawrence D'Oliveiro wrote:
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves readability:
No. But I fully appreciate that this is personal preference and habit.
On Tue, 23 Jan 2024 16:32:09 +0000, Malcolm McLean wrote:
On 22/01/2024 20:34, Lawrence D'Oliveiro wrote:
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:It breaks the rule that, in C, variables and functions are alphnumeric,
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves readability:
whilst operators are symbols. sizeof is an exception, but a justified
one. However it's harder to justify a symbol for "plus" but a word for "or".
Less importantly, it also violates the convention that C macros are named in upper case to distinguish them from keywords and "regular" identifiers.
It breaks the rule that, in C, variables and functions are alphnumeric, whilst operators are symbols.
sizeof is an exception, but a justified one.
James Kuyper <jameskuyper@alumni.caltech.edu> writes:
On 1/23/24 13:52, Scott Lurndal wrote:
bart <bc@freeuk.com> writes:
On 23/01/2024 16:32, Malcolm McLean wrote:
Every explanation for && and || for every language that copied them from >>>> C, is that && means AND, and || means OR.
in C, && specifically means 'conditional and'. The programmer can
rely on the fact that the second term will not be evaluated if
the first term evaluates to false.
Actually, C uses the term "Logical AND".
The term 'conditional and' has been in common use for decades.
Less importantly, it also violates the convention that C macros are
named in upper case to distinguish them from keywords and "regular" identifiers.
There's also problems with these names. For example '&&' has not the semantics of 'and' but of 'and_then', 'or' is actually 'or_else'.
On Tue, 23 Jan 2024 20:32:39 -0000 (UTC), Kalevi Kolttonen wrote:
If I write this in bash:
rm foo.txt && rm bar.txt
Then the second is only executed if the first one returns zero.
What does C do in this case?
Also note the following inconsistency:
#define and &&
#define bitand &
#define and_eq &= // what happened to "bit"?
This looks like and_eq should correspond to &&=, since and is &&,
and bitand is &. &= wants to be bitand_eq.
Clearly, the purpose of this header is to allow C to be written with the
ISO 646 character set. The choices of identifiers do not look like
evidence of readability having been highly prioritized.
There's no hard rule that operators must be
punctuation, just a general trend.
If I write this in bash:
rm foo.txt && rm bar.txt
On 2024-01-23, Scott Lurndal <scott@slp53.sl.home> wrote:
James Kuyper <jameskuyper@alumni.caltech.edu> writes:
On 1/23/24 13:52, Scott Lurndal wrote:
bart <bc@freeuk.com> writes:
On 23/01/2024 16:32, Malcolm McLean wrote:
Every explanation for && and || for every language that copied them from >>>>> C, is that && means AND, and || means OR.
in C, && specifically means 'conditional and'. The programmer can
rely on the fact that the second term will not be evaluated if
the first term evaluates to false.
Actually, C uses the term "Logical AND".
The term 'conditional and' has been in common use for decades.
Also, a bitwise and is logical!
ANSI Common Lisp uses symbols like logand, logior, logxor, ...
for bitwise operations.
When you implement this stuff with electronic gates it is digital logic circuits. You can read live values in it with a logic probe.
On 2024-01-23, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Tue, 23 Jan 2024 20:32:39 -0000 (UTC), Kalevi Kolttonen wrote:
If I write this in bash:
rm foo.txt && rm bar.txt
Then the second is only executed if the first one returns zero.
What does C do in this case?
C also doesn't evaluate the right operand if the left one is
true.
On Tue, 23 Jan 2024 12:13:27 -0800, Keith Thompson wrote:
There's no hard rule that operators must be
punctuation, just a general trend.
And iso646.h demonstrates that that trend is at an end.
On Tue, 23 Jan 2024 12:13:27 -0800, Keith Thompson wrote:
There's no hard rule that operators must be
punctuation, just a general trend.
And iso646.h demonstrates that that trend is at an end.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Tue, 23 Jan 2024 17:21:40 -0000 (UTC), Lew Pitcher wrote:
Less importantly, it also violates the convention that C macros are
named in upper case to distinguish them from keywords and "regular"
identifiers.
Why does C allow lowercase in macro names, then?
Because it's a convention, not a language rule.
ANSI Common Lisp uses symbols like logand, logior, logxor, ...
for bitwise operations.
Then it is confusing. What does it use for non-bitwise logical operations?
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Tue, 23 Jan 2024 12:13:27 -0800, Keith Thompson wrote:
There's no hard rule that operators must be punctuation, just a
general trend.
And iso646.h demonstrates that that trend is at an end.
It does no such thing.
The header file has been around for over three decades, yet it's not in common (or even uncommon) use.
James Kuyper <jameskuyper@alumni.caltech.edu> writes:
On 1/23/24 13:52, Scott Lurndal wrote:
bart <bc@freeuk.com> writes:
On 23/01/2024 16:32, Malcolm McLean wrote:
Every explanation for && and || for every language that copied them from >>>> C, is that && means AND, and || means OR.
in C, && specifically means 'conditional and'. The programmer can
rely on the fact that the second term will not be evaluated if
the first term evaluates to false.
Actually, C uses the term "Logical AND".
The term 'conditional and' has been in common use for decades.
I believe the only thing iso646.h demonstrates is the (largely former)
need to write C on systems that do not support full ASCII.
If you want to use it in your own code, nobody will stop you.
Obviously, it would mean not following the convention.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
It does seem a very late addition for a purpose which would have been a
lot more relevant decades earlier. By that point, implementations that
did not support full ASCII would have been museum pieces.
<iso646.h> was added in 1995, and it was intended to replace a number of implementation-specific workarounds.
But will I continue to hear complaints from you about it?
I can't continue what I never started.
I believe the only thing iso646.h demonstrates is the (largely former)
need to write C on systems that do not support full ASCII. This is
fully explained in the C99 Rationale, <http://www.open-std.org/jtc1/sc22/WG14/www/C99RationaleV5.10.pdf>;
search for "MSE.4". It says nothing about "and" being more readable
than "&&" on systems that are able to display the '&' character.
On Tue, 23 Jan 2024 16:27:25 -0800, Keith Thompson wrote:
I believe the only thing iso646.h demonstrates is the (largely former)
need to write C on systems that do not support full ASCII.
It does seem a very late addition for a purpose which would have been a
lot more relevant decades earlier. By that point, implementations that did not support full ASCII would have been museum pieces.
If you want to use it in your own code, nobody will stop you.
But will I continue to hear complaints from you about it?
No, <iso646.h> was approved as part of AMD1, in 1995. For the countries
it was targeted at, full ASCII was not a tenable solution - it could not
be used to write their languages. The were still using encodings such as shift-JIS and ISO/IEC 8859-10.
... in my professional contexts there where even [coding] standards
defined that you had to follow.
In the Shift-JIS encoding, character 0x5C, which is the backslash in
ASCII and Unicode, is the Yen sign. That means that if a C source file contains "Hello, world\n", viewing it as Shift-JIS makes it look like
"Hello, world¥n", but a C compiler that treats its input as ASCII would
see a backslash.
On 2024-01-23, David Brown <david.brown@hesbynett.no> wrote:
On 22/01/2024 21:34, Lawrence D'Oliveiro wrote:
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves readability:
No. But I fully appreciate that this is personal preference and habit.
I believe that some of the identifiers improves readability for people
coming from a programming language which uses those English words for
very similar operators rather than && and ||.
[...]
For that reason, these identifiers should not be used, except for machine-encoding of programs into a 6 bit character set.
[...]
Clearly, the purpose of this header is to allow C to be written with the
ISO 646 character set. The choices of identifiers do not look like
evidence of readability having been highly prioritized.
On Tue, 23 Jan 2024 06:47:10 +0100, Janis Papanagnou wrote:
There's also problems with these names. For example '&&' has not the
semantics of 'and' but of 'and_then', 'or' is actually 'or_else'.
Funnily enough, that is how the languages that offer those words interpret them. Not just C and C++.
[...] There's no hard rule that operators must be
punctuation, just a general trend.)
On Tue, 23 Jan 2024 17:21:40 -0000 (UTC), Lew Pitcher wrote:
Less importantly, it also violates the convention that C macros are
named in upper case to distinguish them from keywords and "regular"
identifiers.
Why does C allow lowercase in macro names, then?
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
[...]
whereas in the shell:
(((a && b) || c) && d)
where I'm using "virtual parentheses". If you actually stick in real
ones, they denote subshell execution in a separate process. (Bash
allows curly braces for command grouping that doesn't create
processes.)
On Tue, 23 Jan 2024 14:51:52 -0800, Keith Thompson wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Tue, 23 Jan 2024 17:21:40 -0000 (UTC), Lew Pitcher wrote:
Less importantly, it also violates the convention that C macros are
named in upper case to distinguish them from keywords and "regular"
identifiers.
Why does C allow lowercase in macro names, then?
Because it's a convention, not a language rule.
So what would one mean by “violate”, other than “I personally don’t like
it”?
On Tue, 23 Jan 2024 06:54:56 +0100, Janis Papanagnou wrote:
... in my professional contexts there where even [coding] standards
defined that you had to follow.
What about the open-source code that your company takes without paying? Do you demand that that code follow your rules as well? Do you send it back
to the developers to demand they rewrite it for you?
On 1/23/24 16:28, Scott Lurndal wrote:
The term 'conditional and' has been in common use for decades.
I've never heard of it. When searching Wikipedia, [...]
The header was introduced to make it easier (or possible) to write C
code on systems/keyboards that don't support certain characters like '&'
and '|' -- similar to digraphs and trigraphs.
On 2024-01-23, David Brown <david.brown@hesbynett.no> wrote:
On 22/01/2024 21:34, Lawrence D'Oliveiro wrote:
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves readability:
No. But I fully appreciate that this is personal preference and habit.
I believe that some of the identifiers improves readability for people
coming from a programming language which uses those English words for
very similar operators rather than && and ||.
In a green field programming language design, it's probably better
to design that way from the start. It's a nice bonus if a language
looks readable to newcomers.
Generations of C coders are used to && and || though; that's the normal
way to write C. Using these aliases is a vanishingly rare practice. An important aspect of readability is writing code like everyone else. When
a language is newly designed so that there isn't anyone else, that
doesn't have to be considered.
For that reason, these identifiers should not be used, except for machine-encoding of programs into a 6 bit character set.
Additionally certain names in the iso646.h header are poorly considered,
and obstruct readability. They use the _eq suffix for an operation that
is assignment.
#define and_eq &=
If the purpose of this header were to optimize readability for those unfamiliar with C, this should be called
#define and_set &=
or similar.
The assignment operator = should not be read "equals", but "becomes" or "takes the value" or "is assigned" or "is set to". This should be taken
into consideration when coming up with word-like token or token fragment
to represent it.
Also note the following inconsistency:
#define and &&
#define bitand &
#define and_eq &= // what happened to "bit"?
This looks like and_eq should correspond to &&=, since and is &&,
and bitand is &. &= wants to be bitand_eq.
Clearly, the purpose of this header is to allow C to be written with the
ISO 646 character set. The choices of identifiers do not look like
evidence of readability having been highly prioritized.
On Tue, 23 Jan 2024 21:43:44 -0800, Keith Thompson wrote:
In the Shift-JIS encoding, character 0x5C, which is the backslash in
ASCII and Unicode, is the Yen sign. That means that if a C source file
contains "Hello, world\n", viewing it as Shift-JIS makes it look like
"Hello, world¥n", but a C compiler that treats its input as ASCII would
see a backslash.
So what exactly does iso646.h offer to deal with this?
On 23/01/2024 18:34, bart wrote:
On 23/01/2024 16:32, Malcolm McLean wrote:Mathematically operators are functions, so a mathematican would say that "add" is just as much of a function as "gamma". But to a computer
On 22/01/2024 20:34, Lawrence D'Oliveiro wrote:
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:It breaks the rule that, in C, variables and functions are
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves
readability:
alphnumeric, whilst operators are symbols. sizeof is an exception,
but a justified one. However it's harder to justify a symbol for
"plus" but a word for "or".
But it's OK to justify 'pow' for exponentiation?
programmer an operator compiles to a trivial number of machine code instructions, whilst a function is a subroutine call. Pow is not usually supported in hardware. However it's such a basic mathematical function
that it has special notation. So some languages say it should be an
operator. However ASCII won't represent the standard notation.
are good arguments for and against pow as an operators, and different language take differnet views. But I think the C decision is better, as
C code is for programming computers, not for translating formulae into machine readable form.
Mathematically operators are functions, so a mathematican would say that "add" is just as much of a function as "gamma". But to a computer
programmer an operator compiles to a trivial number of machine code instructions, whilst a function is a subroutine call.
Pow is not usually
supported in hardware. However it's such a basic mathematical function
that it has special notation. So some languages say it should be an
operator. However ASCII won't represent the standard notation.
SO there
are good arguments for and against pow as an operators, and different language take differnet views.
But I think the C decision is better, as
C code is for programming computers,
not for translating formulae into
machine readable form.
On 23/01/2024 18:34, bart wrote:
On 23/01/2024 16:32, Malcolm McLean wrote:Mathematically operators are functions, so a mathematican would say that "add" is just as much of a function as "gamma".
On 22/01/2024 20:34, Lawrence D'Oliveiro wrote:
On Mon, 22 Jan 2024 09:30:21 +0100, David Brown wrote:It breaks the rule that, in C, variables and functions are
... I don't use the matching names in C++ either ...
I do, if/when I do use C++ and C. Don’t you think it improves
readability:
alphnumeric, whilst operators are symbols. sizeof is an exception,
but a justified one. However it's harder to justify a symbol for
"plus" but a word for "or".
But it's OK to justify 'pow' for exponentiation?
But to a computer
programmer an operator compiles to a trivial number of machine code instructions, whilst a function is a subroutine call.
Pow is not usually
supported in hardware. However it's such a basic mathematical function
that it has special notation.
So some languages say it should be an
operator. However ASCII won't represent the standard notation. SO there
are good arguments for and against pow as an operators, and different language take differnet views. But I think the C decision is better, as
C code is for programming computers, not for translating formulae into machine readable form.
On 23.01.2024 21:13, Keith Thompson wrote:
[...] There's no hard rule that operators must be
punctuation, just a general trend.)
Anyone still writing "MULTIPLY a BY b GIVEN c" ? :-)
(Luckily I've never programmed in COBOL, even after
it allowed "COMPUTE c = a * b" (or some such).)
On Tue, 23 Jan 2024 06:54:56 +0100, Janis Papanagnou wrote:
... in my professional contexts there where even [coding] standards
defined that you had to follow.
What about the open-source code that your company takes without paying? Do you demand that that code follow your rules as well? Do you send it back
to the developers to demand they rewrite it for you?
To be sure I had also re-inspected the ASCII character set and it
seems that all C characters (including these operators) are anyway
in the ASCII domain. It's beyond me why they've used the name
"iso646.h".
On Wed, 24 Jan 2024 09:06:22 +0100, Janis Papanagnou wrote:
ITYM
MULTIPLY A BY B GIVING C.
and, yes, COBOL programmers are still in demand, mostly by
financial institutions that have hundreds of millions
of lines of COBOL code to maintain.
(Luckily I've never programmed in COBOL, even after
it allowed "COMPUTE c = a * b" (or some such).)
I have (lucky me :-) ).
While I don't tout COBOL as the "be all and end all" of
programming languages, it still can perform a lot of
useful work, especially in fields where exact calculations
are required and rounding and truncation of mathematical
operations are well defined. Such as financial institutions.
These days, it even supports object oriented code.
FWIW, the last ISO COBOL language standard was issued in 2023.
Are you certain that you want your taxes to be calculated in
floatingpoint? ;-)
On 2024-01-24, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
To be sure I had also re-inspected the ASCII character set and it
seems that all C characters (including these operators) are anyway
in the ASCII domain. It's beyond me why they've used the name
"iso646.h".
Because the macro names in that header are in the ISO 646
invariant set, expanding to tokens that use characters outside
of the invariant set.
ISO 646 looks liken a effort to standardize the "zoo" of regional ASCII variants.
It defines a base character set which looks exactly like ASCII (correct
me if I'm wrong) of which there are national variants. It's like a
"mini ISO Latin" in 7 bits.
The Wikipedia page on it is quite good.
On 24.01.2024 15:17, Lew Pitcher wrote:
On Wed, 24 Jan 2024 09:06:22 +0100, Janis Papanagnou wrote:
programming languages, it still can perform a lot of
useful work, especially in fields where exact calculations
are required and rounding and truncation of mathematical
operations are well defined. Such as financial institutions.
Yes, sure.
On 23.01.2024 23:37, Kalevi Kolttonen wrote:
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
What?!
On 1/24/24 03:10, Janis Papanagnou wrote:
On 23.01.2024 23:37, Kalevi Kolttonen wrote:
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
What?!
Could you explain what surprises you about that statement?
[...]
Now, logical-AND and logical-OR are two cases where the order of
evaluation is, in fact, specified. Are you expressing surprise that
there are other languages where that's not the case? I can't remember
where, but I'm fairly sure I've seen a language where the closest
equivalent of C's (expression1 && expression2) causes both
sub-expressions to be evaluated, in an arbitrary order, before
evaluating the equivalent of && itself. Unfortunately, I don't remember where.
It sounds so wrong, not matching anything I've experienced in the
programming languages I heard about and about compiler construction
that I can only express my astonishment about such a statement. The
poster's statement itself is not explained, though, and if anything,
the poster should first explain what makes him "pretty sure" about
it before we can exchange arguments.
[...]
The closest I met were theoretical expressions (like e.g. Dijktra's
Guards, or how they were called) in per se non-deterministic contexts.
Janis
I suppose it is possible that in some languages out-of-order
evaluation could be what happens, e.g. the "logical AND" operands
could be evaluated in parallel by different CPUs.
On 24.01.2024 00:10, Keith Thompson wrote:
The header was introduced to make it easier (or possible) to write C
code on systems/keyboards that don't support certain characters like '&'
and '|' -- similar to digraphs and trigraphs.
I think this is the most likely explanation; the restricted _keyboards_
(and not the restricted [ASCII] character set). Matches my experiences
with old keyboards I used decades ago.
On 24/01/2024 13:54, David Brown wrote:
On 24/01/2024 13:20, Malcolm McLean wrote:
Many operators in C are not mathematical operations. "sizeof" is an
operator, so are indirection operators, structure member access
operators, function calls, and the comma operator.
I've discussed this ad infinitum with people who don't really understand
what the term "function" means. Anththing that maps one set to another
set such that there is one and only one mapping from each member if the >struture set to the result set is mathematically a "function".
Sizeof clearly counts.
On 24/01/2024 13:54, David Brown wrote:
On 24/01/2024 13:20, Malcolm McLean wrote:I've discussed this ad infinitum with people who don't really understand
Many operators in C are not mathematical operations. "sizeof" is an
operator, so are indirection operators, structure member access
operators, function calls, and the comma operator.
what the term "function" means. Anththing that maps one set to another
set such that there is one and only one mapping from each member if the struture set to the result set is mathematically a "function".
Sizeof clearly counts.
On 23/01/2024 21:51, Lawrence D'Oliveiro wrote:
On Tue, 23 Jan 2024 16:32:09 +0000, Malcolm McLean wrote:
It breaks the rule that, in C, variables and functions are alphnumeric,
whilst operators are symbols.
Where is there such a “rule”?
Valid function names have to begin with an alphabetical symbol or
(annoyingly for me) an underscore, as do variables. They may not contain >non-alphanumerical symbols except for underscore
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 24.01.2024 18:24, James Kuyper wrote:
On 1/24/24 03:10, Janis Papanagnou wrote:
On 23.01.2024 23:37, Kalevi Kolttonen wrote:
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
What?!
Could you explain what surprises you about that statement?
It sounds so wrong, not matching anything I've experienced in the
programming languages I heard about and about compiler construction
that I can only express my astonishment about such a statement. The
poster's statement itself is not explained, though, and if anything,
the poster should first explain what makes him "pretty sure" about
it before we can exchange arguments.
A concrete example:
#include <stdio.h>
static int count(void) {
static int result = 0;
return ++result;
}
int main(void) {
printf("%d %d %d\n", count(), count(), count());
return 0;
}
C does not specify the order in which the arguments are evaluated
(likewise for operands of most operators). This program could produce
any of 6 possible outputs, at the whim of the compiler. (On my system,
I see "3 2 1" with gcc and "1 2 3" with clang; both are perfectly
valid.)
I'm surprised that that surprises you. It's a fairly fundamental
property of C (and also of C++).
[...]
As quoted, it's a general statement which includes C: "Except as
specified later, side effects and value computations of subexpressions
are unsequenced."
Trigraphs, digraphs, and <iso646.h> were all introduced to support
systems that *don't* support the full ASCII character set.
On 24/01/2024 07:35, Lawrence D'Oliveiro wrote:
On Tue, 23 Jan 2024 21:43:44 -0800, Keith Thompson wrote:In Scandinavian language variants of ASCII ...
In the Shift-JIS encoding, character 0x5C, which is the backslash in
ASCII and Unicode, is the Yen sign. That means that if a C source
file contains "Hello, world\n", viewing it as Shift-JIS makes it look
like "Hello, world¥n", but a C compiler that treats its input as ASCII
would see a backslash.
So what exactly does iso646.h offer to deal with this?
"Logical disjunction is usually short-circuited ...
and, yes, COBOL programmers are still in demand, mostly by financial institutions that have hundreds of millions of lines of COBOL code to maintain.
Are you certain that you want your taxes to be calculated in
floatingpoint? ;-)
On Wed, 24 Jan 2024 07:58:44 -0800, Keith Thompson wrote:
Trigraphs, digraphs, and <iso646.h> were all introduced to support
systems that *don't* support the full ASCII character set.
Where is there a national character set that doesn’t support the symbols for which iso646.h introduces synonyms?
On Wed, 24 Jan 2024 14:17:04 -0000 (UTC), Lew Pitcher wrote:
and, yes, COBOL programmers are still in demand, mostly by financial
institutions that have hundreds of millions of lines of COBOL code to
maintain.
I suspect a lot of those institutions have already gone out of business,
or are close to going out of business.
And the amounts they have to pay COBOL programmers
to maintain their code are hastening that end.
Are you certain that you want your taxes to be calculated in
floatingpoint? ;-)
How else would you handle compound interest?
David Brown <david.brown@hesbynett.no> writes:
[...]
(It could not have been added as "**", because - as Keith said in
another post - "x ** y" already has a meaning in C. While I believe
it would be possible to distinguish the uses based on the type of "y",
other than for the literal 0, having "x ** y" mean two /completely/
different things depending on the type of "y" would not be a good idea
for C.)
The problem with a "**" exponentation operator is lexical. It's common
to have two consecutive unary "*" operators in declarations and
expression:
char **argv;
char c = **argv;
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Wed, 24 Jan 2024 07:58:44 -0800, Keith Thompson wrote:
Trigraphs, digraphs, and <iso646.h> were all introduced to support
systems that *don't* support the full ASCII character set.
Where is there a national character set that doesn’t support the symbols >> for which iso646.h introduces synonyms?
Just one example: <https://en.wikipedia.org/wiki/Code_page_1016> has 'ø'
in the slot that ASCII uses for '|'.
I don't believe it's in common use today, but it may have been in 1995.
Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:
[...]
These days, it even supports object oriented code.
FWIW, the last ISO COBOL language standard was issued in 2023.
ADD 1 TO COBOL GIVING COBOL
On Wed, 24 Jan 2024 18:40:08 -0000 (UTC), Kalevi Kolttonen wrote:
"Logical disjunction is usually short-circuited ...
I wonder why that shouldn’t apply to anything else. E.g. in
a × (b + c)
if “a” evaluates to zero, why not avoid the computation of “b + c” and
just return zero as the value of the expression?
On 24.01.2024 17:56, Kaz Kylheku wrote:
The Wikipedia page on it is quite good.
The German Wikipedia has a table that is better legible IMO https://de.wikipedia.org/wiki/ISO_646
On 1/24/24 03:10, Janis Papanagnou wrote:
On 23.01.2024 23:37, Kalevi Kolttonen wrote:
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
What?!
Could you explain what surprises you about that statement? As quoted,
it's a general statement which includes C: "Except as specified later,
side effects and value computations of subexpressions are unsequenced."
On Wed, 24 Jan 2024 14:17:04 -0000 (UTC), Lew Pitcher wrote:
and, yes, COBOL programmers are still in demand, mostly by financial
institutions that have hundreds of millions of lines of COBOL code to
maintain.
I suspect a lot of those institutions have already gone out of business,
or are close to going out of business. And the amounts they have to pay
COBOL programmers to maintain their code are hastening that end.
Are you certain that you want your taxes to be calculated in
floatingpoint? ;-)
How else would you handle compound interest?
On Wed, 24 Jan 2024 07:58:44 -0800, Keith Thompson wrote:
Trigraphs, digraphs, and <iso646.h> were all introduced to support
systems that *don't* support the full ASCII character set.
Where is there a national character set that doesn’t support the symbols for which iso646.h introduces synonyms?
On 2024-01-24, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:
[...]
These days, it even supports object oriented code.
FWIW, the last ISO COBOL language standard was issued in 2023.
ADD 1 TO COBOL GIVING COBOL
Oh, oh, I have a new one to this oldie:
ADD 100 TO PITCH OF COBOL
(100 cents in a semitone.)
Are you certain that you want your taxes to be calculated in
floatingpoint? ;-)
On Wed, 24 Jan 2024 20:11:33 +0000, Lawrence D'Oliveiro wrote:
Where is there a national character set that doesn’t support the
symbols for which iso646.h introduces synonyms?
EBCDIC-US, for one. It lacks the CIRCUMFLEX (^) character.
I've discussed this ad infinitum with people who don't really understand
what the term "function" means. Anththing that maps one set to another
set such that there is one and only one mapping from each member if the struture set to the result set is mathematically a "function".
Sizeof clearly counts.
On Wed, 24 Jan 2024 20:25:00 -0000 (UTC), Lew Pitcher wrote:
On Wed, 24 Jan 2024 20:11:33 +0000, Lawrence D'Oliveiro wrote:
Where is there a national character set that doesn’t support the
symbols for which iso646.h introduces synonyms?
EBCDIC-US, for one. It lacks the CIRCUMFLEX (^) character.
Were any of the EBCDICs official standards anywhere in the world, outside
of IBM?
Thinking about what the “A” in “ASCII” stands for ...
David Brown <david.brown@hesbynett.no> writes:
[...]
(It could not have been added as "**", because - as Keith said in
another post - "x ** y" already has a meaning in C. While I believe
it would be possible to distinguish the uses based on the type of "y",
other than for the literal 0, having "x ** y" mean two /completely/
different things depending on the type of "y" would not be a good idea
for C.)
The problem with a "**" exponentation operator is lexical. It's common
to have two consecutive unary "*" operators in declarations and
expression:
char **argv;
char c = **argv;
Adding a "**" operator would have made the above invalid due to the
"maximal munch" rule, before the type of the argument is even
considered.
See also x+++++y, which might be intended as x++ + ++y, but is scanned
as x ++ ++ + y, a syntax error.
C could have added "**" very early, but then we'd have to write
"* *argv" or "*(*argv)".
Dollar symbol ($) is an allowed extension.
On Wed, 24 Jan 2024 19:52:58 GMT, Scott Lurndal wrote:
Dollar symbol ($) is an allowed extension.
I wonder if we have DEC to thank for that ... ?
On Wed, 24 Jan 2024 20:25:00 -0000 (UTC), Lew Pitcher wrote:
On Wed, 24 Jan 2024 20:11:33 +0000, Lawrence D'Oliveiro wrote:
Where is there a national character set that doesn’t support the
symbols for which iso646.h introduces synonyms?
EBCDIC-US, for one. It lacks the CIRCUMFLEX (^) character.
Were any of the EBCDICs official standards anywhere in the world, outside
of IBM?
On Wed, 24 Jan 2024 20:25:00 -0000 (UTC), Lew Pitcher wrote:
On Wed, 24 Jan 2024 20:11:33 +0000, Lawrence D'Oliveiro wrote:
Where is there a national character set that doesn’t support the
symbols for which iso646.h introduces synonyms?
EBCDIC-US, for one. It lacks the CIRCUMFLEX (^) character.
Were any of the EBCDICs official standards anywhere in the world, outside
of IBM?
Thinking about what the “A” in “ASCII” stands for ...
On 2024-01-24, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 1/24/24 03:10, Janis Papanagnou wrote:
On 23.01.2024 23:37, Kalevi Kolttonen wrote:
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
What?!
Could you explain what surprises you about that statement? As quoted,
it's a general statement which includes C: "Except as specified later,
side effects and value computations of subexpressions are unsequenced."
Pretty much any language has to guarantee *something* about
order of evaluation, somewhere.
Like for instance that calculating output is not possible before a
needed input is available.
I said that the C standard's
use of the term "function" to mean "subroutine" was a misuse ...
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
As for K&R's thinking, I have no particular insight on that. I have no problem with some operators being represented by symbols and others by keywords (I'm accustomed to it from other languages), and I don't see
that the decision to make "sizeof" a keyword even requires any
justification.
(C++, in 2011 IIRC, introduced special handling for the >> token, which occurs in things like std::vector<std::vector<int>>).
[...]
It might be interesting to hear from any native Germans who were
programming C at that time.
Germany is big enough that people
programmed in German (so comments would be in German, for example), and
their 7-bit ASCII variant (Code page 1011) also had accented letters in
place of some symbols used by C - including "|".
On Thu, 25 Jan 2024 03:56:13 +0000, Malcolm McLean wrote:
I said that the C standard's
use of the term "function" to mean "subroutine" was a misuse ...
Common Python terminology does the same.
Back in Pascal days, a “function” returned a value, while a “procedure”
had some effect on the machine state. If you wanted to refer to both, you tried a semi-common term like “routine” and hoped they understood.
On 24.01.2024 17:27, Keith Thompson wrote:
(C++, in 2011 IIRC, introduced special handling for the >> token, which
occurs in things like std::vector<std::vector<int>>).
So you need not any more have to write it with a space as in ...?
std::vector<std::vector<int> >
That's fine.
But I suppose they haven't fixed its precedence in cin<< and cout>>
contexts? (I suppose it's still handled with shl/shr precedence?)
On 1/24/24 16:11, Kaz Kylheku wrote:
On 2024-01-24, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 1/24/24 03:10, Janis Papanagnou wrote:
On 23.01.2024 23:37, Kalevi Kolttonen wrote:
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
What?!
Could you explain what surprises you about that statement? As quoted,
it's a general statement which includes C: "Except as specified later,
side effects and value computations of subexpressions are unsequenced."
Pretty much any language has to guarantee *something* about
order of evaluation, somewhere.
Not the functional languages, I believe - but I've only heard about such languages, not used them.
Like for instance that calculating output is not possible before a
needed input is available.
Oddly enough, for a long time the C standard never said anything about
that issue. I argued that this was logically necessary, and few people disagreed with that argument, but I couldn't point to wording in the
standard to support that claim.
That changed when they added support for multi-threaded code to C in
C2011. That required the standard to be very explicit about which things could happen simultaneously in different threads, and which things had
to occur in a specified order. All of the wording about "sequenced" was
first introduced at that time. In particular, the following wording was added:
"The value computations of the operands of an operator are sequenced
before the value computation of the result of the operator." (6.5p1)
On 2024-01-24, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
(It could not have been added as "**", because - as Keith said in
another post - "x ** y" already has a meaning in C. While I believe
it would be possible to distinguish the uses based on the type of "y",
other than for the literal 0, having "x ** y" mean two /completely/
different things depending on the type of "y" would not be a good idea
for C.)
The problem with a "**" exponentation operator is lexical. It's common
to have two consecutive unary "*" operators in declarations and
expression:
char **argv;
char c = **argv;
Clearly, then, the way forward with this ** operator is to wait for the
C++ people to do the unthinkable, and reluctantly copy it some years
later.
Ya know, like what they did with stacked template closers, which are
already the >> operator.
On 24/01/2024 13:54, David Brown wrote:
On 24/01/2024 13:20, Malcolm McLean wrote:
Many operators in C are not mathematical operations. "sizeof" is an
operator, so are indirection operators, structure member access
operators, function calls, and the comma operator.
I've discussed this ad infinitum with people who don't really understand
what the term "function" means.
Anththing that maps one set to another
set such that there is one and only one mapping from each member if the struture set to the result set is mathematically a "function".
Sizeof clearly counts.
It's pretty common in the sort of programming that I do. But this is
Exponentiation is not particularly common in programming, except for a
few special cases - easily written as "x * x", "x * x * x", "1.0 / x",
or "sqrt(x)", which are normally significantly more efficient than a
generic power function or operator would be.
fair point. A lot of programs don't apply complex transformations to
data in the way that mine typically do.
Yes, ** and ^, which are the two common ASCII fallbacks, are already
That is not an argument against having an operator in C called "pow".
It is simply not useful enough for there to be a benefit in adding it
to the language as an operator, when it could (and was) easily be
added as a function in the standard library.
(It could not have been added as "**", because - as Keith said in
another post - "x ** y" already has a meaning in C. While I believe
it would be possible to distinguish the uses based on the type of "y",
other than for the literal 0, having "x ** y" mean two /completely/
different things depending on the type of "y" would not be a good idea
for C.)
taken. But as you said earlier, in reality most exponentiation
operations are either square or cube, or square root. And in C, that
means either special functions or inefficiently converting the exponent
into a double. If pow were an operator, that wouldn't be an issue.
[...] We wrote a function returning pi as an
infinite list of decimal digits - the printout of that started long
before the calculation itself was finished!
The problem was with the order of evaluation. Prior to C++17 (where it
was fixed), if you wrote "cout << one() << two() << three();", the order
the three functions were evaluated was unspecified.
On 25/01/2024 06:01, James Kuyper wrote:
On 1/24/24 16:11, Kaz Kylheku wrote:
On 2024-01-24, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 1/24/24 03:10, Janis Papanagnou wrote:Pretty much any language has to guarantee *something* about
On 23.01.2024 23:37, Kalevi Kolttonen wrote:
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
What?!
Could you explain what surprises you about that statement? As quoted,
it's a general statement which includes C: "Except as specified later, >>>> side effects and value computations of subexpressions are unsequenced." >>>
order of evaluation, somewhere.
Not the functional languages, I believe - but I've only heard about such
languages, not used them.
I remember a programming task at university around infinite lists in a functional programming language (not Haskell, but very similar -
arguably its predecessor). We wrote a function returning pi as an
infinite list of decimal digits - the printout of that started long
before the calculation itself was finished!
On 25/01/2024 00:30, Lawrence D'Oliveiro wrote:
On Wed, 24 Jan 2024 19:33:09 +0000, Malcolm McLean wrote:
I've discussed this ad infinitum with people who don't really understand >>> what the term "function" means. Anththing that maps one set to another
set such that there is one and only one mapping from each member if the
struture set to the result set is mathematically a "function".
Sizeof clearly counts.
It does in the mathematical sense. But in the C sense, a “function” is a >> block of code which is called at runtime with zero or more arguments and
returns a result (which might be void). It can also have side-effects on
the machine state.
It helps the discussion to be clear what your terms mean. Otherwise the
people you are arguing with have a right to be indignant at what they
might perceive to be wilful obtuseness.
You haven't been around for long enough. I said that the C standard's
use of the term "function" to mean "subroutine" was a misuse, and that I
was going to use the term "function", in context, to refer to that
subset of C subroutines which calculate mathematical functions of bits
in the computer's memory. The opposition and outrage that this generated
was incredible, and must have gone on for years.
On 24/01/2024 21:50, Kaz Kylheku wrote:
On 2024-01-24, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
(It could not have been added as "**", because - as Keith said in
another post - "x ** y" already has a meaning in C. While I believe
it would be possible to distinguish the uses based on the type of "y", >>>> other than for the literal 0, having "x ** y" mean two /completely/
different things depending on the type of "y" would not be a good idea >>>> for C.)
The problem with a "**" exponentation operator is lexical. It's common >>> to have two consecutive unary "*" operators in declarations and
expression:
char **argv;
char c = **argv;
Clearly, then, the way forward with this ** operator is to wait for the
C++ people to do the unthinkable, and reluctantly copy it some years
later.
I'm hoping the C++ people while do the sane/unthinkable (cross out one, according to personal preference) thing and allow Unicode symbols for operators, which will then be added to the standard library rather than
to the language. Then we'll have "x ↑ y", and no possible confusion.
(It's actually almost fully possible already - all they need to do is
allow characters such as ↑ to be used as macros, and we're good to go.)
On 23/01/2024 21:51, Lawrence D'Oliveiro wrote:
On Tue, 23 Jan 2024 16:32:09 +0000, Malcolm McLean wrote:
It breaks the rule that, in C, variables and functions are alphnumeric,
whilst operators are symbols.
Where is there such a “rule”?
Valid function names have to begin with an alphabetical symbol or
(annoyingly for me) an underscore, as do variables. They may not contain non-alphanumerical symbols except for underscore. It's in the C standard somewhere.
C operators are all non-alphanumerical symbols, with the exception of "sizeof". Again, the operators are listed in the C standard.
sizeof is an exception, but a justified one.
This is how religious people argue: they use circular reasoning to say
something is justified because it is justified.
No. This isn't circular reasoning. It's a claim which hasn't been backed
up. It's expected that the reader won't ask for this because it is so
obvious that we can give sensible reasons for "sizeof" being a
function-like alphabetical word rather than a symbol. But if you do, of course I'm sure someone will provide such a justification.
On 25.01.2024 13:43, David Brown wrote:
[...] We wrote a function returning pi as an
infinite list of decimal digits - the printout of that started long
before the calculation itself was finished!
You had an algorithm for an infinite list of decimals that finished?
I think this formulation will go into my cookie jar of noteworthy achievements. - And, sorry, I could not resist. :-)
On 25/01/2024 03:59, Keith Thompson wrote:
As for K&R's thinking, I have no particular insight on that. I have noI looked it up on the web, but I can't find anything that goes back to K
problem with some operators being represented by symbols and others by
keywords (I'm accustomed to it from other languages), and I don't see
that the decision to make "sizeof" a keyword even requires any
justification.
and R and explains why they took that decision. But clearly to use a
word rather than punctuators, as was the case with every other operator,
must have had a reason.
I think they wanted it to look function-like, because it is function,
though a function of a type rather than of bits, so of course not a "function" in the C standard sense of the term.
But all operators are
functions in this sense. However sizeof doesn't map to anything used in non-computer mathematics. But "size" is conventionally denoted by two vertical lines. These are taken by "OR", and would be misleading as in mathematics it means "absolute", not "physical area of paper taken up by
the notation".
So I would imagine that that was why they thought a word would be appropriate, and these reasons were strong enough to justify breaking
the general patrern that operators are punctuators.
I could be completely wrong of course in the absence of actual
statements by K and R. But this would seem to make sense.
On 24/01/2024 20:33, Malcolm McLean wrote:
On 24/01/2024 13:54, David Brown wrote:
On 24/01/2024 13:20, Malcolm McLean wrote:I've discussed this ad infinitum with people who don't really understand
Many operators in C are not mathematical operations. "sizeof" is an
operator, so are indirection operators, structure member access
operators, function calls, and the comma operator.
what the term "function" means.
Yes, you have - usually at least somewhat incorrectly, and usually
without being clear if you are talking about a "C function", a
mathematical "function", or a "Malcolm function" using your own private definitions.
Anththing that maps one set to another
set such that there is one and only one mapping from each member if the
struture set to the result set is mathematically a "function".
Sizeof clearly counts.
"sizeof" clearly does not count.
You don't get to mix "mathematical" definitions and "C" definitions.
"sizeof" is a C feature - it makes no sense to ask if it is a
mathematical function or not. It /does/ make sense to ask if it is a
/C/ function or not - and it is not a C function.
On 25/01/2024 12:43, David Brown wrote:
On 25/01/2024 06:01, James Kuyper wrote:
On 1/24/24 16:11, Kaz Kylheku wrote:
On 2024-01-24, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 1/24/24 03:10, Janis Papanagnou wrote:
On 23.01.2024 23:37, Kalevi Kolttonen wrote:
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
What?!
Could you explain what surprises you about that statement? As quoted, >>>>> it's a general statement which includes C: "Except as specified later, >>>>> side effects and value computations of subexpressions are
unsequenced."
Pretty much any language has to guarantee *something* about
order of evaluation, somewhere.
Not the functional languages, I believe - but I've only heard about such >>> languages, not used them.
I remember a programming task at university around infinite lists in a
functional programming language (not Haskell, but very similar -
arguably its predecessor). We wrote a function returning pi as an
infinite list of decimal digits - the printout of that started long
before the calculation itself was finished!
You can write something like that in C. I adapted a program to print the first N digits so that it doesn't stop. It looks like this:
int main(void) {
while (1) {
printf("%c",nextpidigit());
}
}
(The output starts as "314159..."; it will need a tweak to insert the
decimal point.)
The algorithm obviously wasn't mine; I've no idea how it works. (Tn a sequence like ...399999999..., how does it know that 3 is a 3 and not a
4, before it's calculated further? It's magic.)
The nextpidigit() function is set up as a generator.
It also relies on using big integers (I used it to test my library), so
will rapidly get much slower at calculating the next digit.
Even with a much faster library, eventually memory will be exhausted, so
this is not suitable for an 'infinite' number, or even an unlimited
number of digits; it will eventually grind to a halt.
Was yours any different?
On 25/01/2024 14:19, Janis Papanagnou wrote:
On 25.01.2024 13:43, David Brown wrote:
[...] We wrote a function returning pi as an
infinite list of decimal digits - the printout of that started long
before the calculation itself was finished!
You had an algorithm for an infinite list of decimals that finished?
That's the beauty of lazy evaluation!
I think this formulation will go into my cookie jar of noteworthy
achievements. - And, sorry, I could not resist. :-)
On 25/01/2024 13:01, David Brown wrote:
On 24/01/2024 21:50, Kaz Kylheku wrote:
On 2024-01-24, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
(It could not have been added as "**", because - as Keith said in
another post - "x ** y" already has a meaning in C. While I believe >>>>> it would be possible to distinguish the uses based on the type of "y", >>>>> other than for the literal 0, having "x ** y" mean two /completely/
different things depending on the type of "y" would not be a good idea >>>>> for C.)
The problem with a "**" exponentation operator is lexical. It's common >>>> to have two consecutive unary "*" operators in declarations and
expression:
char **argv;
char c = **argv;
Clearly, then, the way forward with this ** operator is to wait for the
C++ people to do the unthinkable, and reluctantly copy it some years
later.
I'm hoping the C++ people while do the sane/unthinkable (cross out
one, according to personal preference) thing and allow Unicode symbols
for operators, which will then be added to the standard library rather
than to the language. Then we'll have "x ↑ y", and no possible
confusion.
(It's actually almost fully possible already - all they need to do is
allow characters such as ↑ to be used as macros, and we're good to go.)
Suppose ↑ could be used in as macro now, what would such a definition
look like?
Surely you'd be able to invoke it as ↑(x, y)?
On 25.01.2024 14:07, David Brown wrote:
The problem was with the order of evaluation. Prior to C++17 (where it
was fixed), if you wrote "cout << one() << two() << three();", the order
the three functions were evaluated was unspecified.
The last decade or two I haven't been in C++ to any depth. But I'm a bit surprised by that. The op<< is defined by something like [informally]
stream op<<(stream,value), where "two() << three()" is "value << value",
but "cout << one()" would yield a stream, say X, and "X << two()" again
a stream, etc. So actually we have nested functions
op<<( op<<( op<<(cout, one()), two()), three())
At least you'd need to evaluate one() to obtain the argument for the
next outer of the nested calls.
On 25/01/2024 14:35, Janis Papanagnou wrote:
On 25.01.2024 14:07, David Brown wrote:
The problem was with the order of evaluation. Prior to C++17 (where it
was fixed), if you wrote "cout << one() << two() << three();", the order >>> the three functions were evaluated was unspecified.
The last decade or two I haven't been in C++ to any depth. But I'm a bit
surprised by that. The op<< is defined by something like [informally]
stream op<<(stream,value), where "two() << three()" is "value << value",
but "cout << one()" would yield a stream, say X, and "X << two()" again
a stream, etc. So actually we have nested functions
op<<( op<<( op<<(cout, one()), two()), three())
At least you'd need to evaluate one() to obtain the argument for the
next outer of the nested calls.
Not quite. To simplify :
cout << one() << two()
is parsed as :
(cout << one()) << two()
So "cout << one()" is like a call to "op<<(cout, one())", and the full expression is like :
op<<(op<<(cout, one()), two())
Without the new C++17 order of evaluation rules, the compiler can
happily execute "two()" before "op<<(cout, one())". The operands to the outer call need to be executed before the outer call itself, but the
order in which these two operands are evaluated is unspecified (until
C++17).
[...]
On 25/01/2024 03:59, Keith Thompson wrote:
However sizeof doesn't map to anything used in
non-computer mathematics. But "size" is conventionally denoted by two >vertical lines.
I'm hoping the C++ people while do the sane/unthinkable (cross out one, according to personal preference) thing and allow Unicode symbols for operators, which will then be added to the standard library rather than
to the language.
David Brown <david.brown@hesbynett.no> writes:
On 24/01/2024 21:50, Kaz Kylheku wrote:[...]
On 2024-01-24, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
The problem with a "**" exponentation operator is lexical. It's common >>>> to have two consecutive unary "*" operators in declarations andClearly, then, the way forward with this ** operator is to wait for
expression:
char **argv;
char c = **argv;
the C++ people to do the unthinkable, and reluctantly copy it some
years later.
I'm hoping the C++ people while do the sane/unthinkable (cross out
one, according to personal preference) thing and allow Unicode symbols
for operators, which will then be added to the standard library rather
than to the language. Then we'll have "x ↑ y", and no possible
confusion.
That's difficult to type -- but they could add a new trigraph! 8-)}
If the committee decides C needs an exponentation operator (which, as
far as I know, nobody has submitted a proposal for), "^^" is available.
(It's actually almost fully possible already - all they need to do is
allow characters such as ↑ to be used as macros, and we're good to
go.)
You'd also need something for ↑ to expand to.
Ya know, like what they did with stacked template closers, which are
already the >> operator.
The "maximum munch" parsing rule seemed like such a good idea, long ago!
It still does. It's simple to describe, and ambiguous cases like
x+++++y should be resolved with whitespace. (">>" was a real problem in
C++, resolved with a special-case rule in C11; C has no such problems of similar severity.)
On 25.01.2024 16:53, David Brown wrote:
On 25/01/2024 14:35, Janis Papanagnou wrote:
On 25.01.2024 14:07, David Brown wrote:
The problem was with the order of evaluation. Prior to C++17 (where it >>>> was fixed), if you wrote "cout << one() << two() << three();", the order >>>> the three functions were evaluated was unspecified.
The last decade or two I haven't been in C++ to any depth. But I'm a bit >>> surprised by that. The op<< is defined by something like [informally]
stream op<<(stream,value), where "two() << three()" is "value << value", >>> but "cout << one()" would yield a stream, say X, and "X << two()" again
a stream, etc. So actually we have nested functions
op<<( op<<( op<<(cout, one()), two()), three())
At least you'd need to evaluate one() to obtain the argument for the
next outer of the nested calls.
Not quite. To simplify :
cout << one() << two()
is parsed as :
(cout << one()) << two()
So "cout << one()" is like a call to "op<<(cout, one())", and the full
expression is like :
op<<(op<<(cout, one()), two())
Yes, up to here that's exactly what I said above (with three nestings).
op<<( op<<( op<<(cout, one()), two()), three())
Remove one
op<<( op<<(cout, one()), two())
Without the new C++17 order of evaluation rules, the compiler can
happily execute "two()" before "op<<(cout, one())". The operands to the
outer call need to be executed before the outer call itself, but the
order in which these two operands are evaluated is unspecified (until
C++17).
If that was formerly the case then the update was obviously necessary.
Functionally there would probably have been commotion if
tmp = op<<(cout, one())
op<<( tmp, two())
and
op<<( op<<(cout, one()), two())
would have had different results.
Is or was there any compiler that implemented that in the "unexpected"
order?
Then we'll have "x ↑ y", and no possible confusion.
That's difficult to type
On Thu, 25 Jan 2024 09:57:43 -0800, Keith Thompson wrote:
Then we'll have "x ↑ y", and no possible confusion.
That's difficult to type
Compose-circumflex-bar, or compose-bar-circumflex.
↑↑ (typed by me)
This illustrates the two big difficulties with Unicode symbols for this
kind of thing. Lots of them are difficult to type for many people (at
least, not without a good deal of messing around or extra programs).
And it's easy to have different symbols that appear quite similar as
glyphs, but are very different characters as far as the compiler is concerned.
On Thu, 25 Jan 2024 14:01:36 +0100, David Brown wrote:
I'm hoping the C++ people while do the sane/unthinkable (cross out one,
according to personal preference) thing and allow Unicode symbols for
operators, which will then be added to the standard library rather than
to the language.
Why not do what Algol-68 did, and specify a set of characters that could
be used to define new custom operators?
On Thu, 25 Jan 2024 09:57:43 -0800, Keith Thompson wrote:
Then we'll have "x ↑ y", and no possible confusion.
That's difficult to type
Compose-circumflex-bar, or compose-bar-circumflex.
↑↑ (typed by me)
On Thu, 25 Jan 2024 14:01:36 +0100, David Brown wrote:
I'm hoping the C++ people while do the sane/unthinkable (cross out one,
according to personal preference) thing and allow Unicode symbols for
operators, which will then be added to the standard library rather than
to the language.
Why not do what Algol-68 did, and specify a set of characters that could
be used to define new custom operators?
Imagine putting that power into the hands of ordinary users.
On Thu, 25 Jan 2024 21:16:14 +0000, bart wrote:
Imagine putting that power into the hands of ordinary users.
Shock, horror. Of course we elite cannot allow that into the hands of the plebs. Imagine what they might do!
On 25/01/2024 17:11, Janis Papanagnou wrote:
On 25.01.2024 16:53, David Brown wrote:
On 25/01/2024 14:35, Janis Papanagnou wrote:
On 25.01.2024 14:07, David Brown wrote:
The problem was with the order of evaluation. Prior to C++17
(where it
was fixed), if you wrote "cout << one() << two() << three();", the
order
the three functions were evaluated was unspecified.
The last decade or two I haven't been in C++ to any depth. But I'm a
bit
surprised by that. The op<< is defined by something like [informally]
stream op<<(stream,value), where "two() << three()" is "value <<
value",
but "cout << one()" would yield a stream, say X, and "X << two()" again >>>> a stream, etc. So actually we have nested functions
op<<( op<<( op<<(cout, one()), two()), three())
At least you'd need to evaluate one() to obtain the argument for the
next outer of the nested calls.
Not quite. To simplify :
cout << one() << two()
is parsed as :
(cout << one()) << two()
So "cout << one()" is like a call to "op<<(cout, one())", and the full
expression is like :
op<<(op<<(cout, one()), two())
Yes, up to here that's exactly what I said above (with three nestings).
op<<( op<<( op<<(cout, one()), two()), three())
Remove one
op<<( op<<(cout, one()), two())
Without the new C++17 order of evaluation rules, the compiler can
happily execute "two()" before "op<<(cout, one())". The operands to the >>> outer call need to be executed before the outer call itself, but the
order in which these two operands are evaluated is unspecified (until
C++17).
If that was formerly the case then the update was obviously necessary.
Functionally there would probably have been commotion if
tmp = op<<(cout, one())
op<<( tmp, two())
and
op<<( op<<(cout, one()), two())
would have had different results.
Is or was there any compiler that implemented that in the "unexpected"
order?
There were indeed such real-world cases, complaints were made,
and the rules changed in C++17.
Usually it doesn't matter what order arguments to functions (or operands
to operators) are evaluated. Some compilers have consistent ordering
(and it is often last to first, not first to last), others pick whatever makes sense at the time. The ordering has been explicitly and clearly
stated as "unspecified" since around the beginning of time (which was,
as we all know, 01.01.1970).
On 25.01.2024 23:01, Lawrence D'Oliveiro wrote:
On Thu, 25 Jan 2024 21:16:14 +0000, bart wrote:
Imagine putting that power into the hands of ordinary users.
Shock, horror. Of course we elite cannot allow that into the hands of the
plebs. Imagine what they might do!
Power to the people!
On Thu, 25 Jan 2024 21:07:55 +0100, David Brown wrote:
This illustrates the two big difficulties with Unicode symbols for this
kind of thing. Lots of them are difficult to type for many people (at
least, not without a good deal of messing around or extra programs).
The compose key on *nix systems gives you a fairly mnemonic way of typing many of them.
And it's easy to have different symbols that appear quite similar as
glyphs, but are very different characters as far as the compiler is
concerned.
You can actually take advantage of that. E.g. from some of my Python code:
for cłass in (Window, Pixmap, Cursor, GContext, Region) :
delattr(cłass, "__del__")
#end for
The human reader might not actually notice (or care) that a particular identifier looks like a reserved word, since the meaning is obvious from context. The compiler cannot deduce the meaning from that context, but
then, it doesn’t need to.
We could say that in comp.lang.c "function" shall mean "a subroutine"
On 25.01.2024 21:11, David Brown wrote:
On 25/01/2024 17:11, Janis Papanagnou wrote:
Is or was there any compiler that implemented that in the "unexpected"
order?
There were indeed such real-world cases, complaints were made,
Complaints that the rule was not clear in its definition?
Or complaints that their compiler did not support cout<<a<<b<<c;
correctly? - I would be astonished about the latter.
This is so fundamental a construct and so frequently used that any
compiler would have been withdrawn in the week after it came out.
That is my expectation. So I would be grateful if you could provide
some evidence that I can look up.
Mind that even if two() is evaluated before one(), it will not be
output before the stream of the first expression op<<(cout, one())
is available, and for this one() must be evaluated. Then one() can
be sent to the stream, and then also two() can be sent to the stream.
(Am I missing something?)
Janis
and the rules changed in C++17.
Usually it doesn't matter what order arguments to functions (or operands
to operators) are evaluated. Some compilers have consistent ordering
(and it is often last to first, not first to last), others pick whatever
makes sense at the time. The ordering has been explicitly and clearly
stated as "unspecified" since around the beginning of time (which was,
as we all know, 01.01.1970).
David Brown <david.brown@hesbynett.no> writes:
On 25/01/2024 10:55, Malcolm McLean wrote:
On 25/01/2024 03:59, Keith Thompson wrote:
As for K&R's thinking, I have no particular insight on that. I have no >>>> problem with some operators being represented by symbols and others by >>>> keywords (I'm accustomed to it from other languages), and I don't seeI looked it up on the web, but I can't find anything that goes back
that the decision to make "sizeof" a keyword even requires any
justification.
to K and R and explains why they took that decision. But clearly to
use a word rather than punctuators, as was the case with every other
operator, must have had a reason.
I think they wanted it to look function-like, because it is
function, though a function of a type rather than of bits, so of
course not a "function" in the C standard sense of the term.
It is not a function in the C sense - "sizeof x" is not like a
function call (where "x" is a variable or expression, rather than a
type). However, many people (myself included) feel it is clearer in
code to write it as "sizeof(x)", making it look more like a function
or function-like macro.
And many people (myself included) feel it is clearer to write it as
`sizeof x`, precisely so it *doesn't* look like a function call, because
it isn't one. Similarly, I don't use unnecessary parentheses on return statements.
I also write `sizeof (int)` rather than `sizeof(int)`. The parentheses
look similar to those in a function call, but the construct is
semantically distinct. I think of keywords as a different kind of
token than identifiers, even though they look similar (and the standard describes them that way).
I suspect the prime reason "sizeof" is a word, rather than a symbol or
sequence of symbols, is that the word is very clear while there are no
suitable choices of symbols for the task. The nearest might have been
"#", but that might have made pre-processor implementations more
difficult. Of course any symbol or combination /could/ have been
used, and people would have learned its meaning, but "sizeof" just
seems so much simpler.
It has occurred to me that if there had been a strong desire to use a
symbol, "$" could have worked. It even suggests the 's' in the word
"size".
But there was no such desire. sizeof happens to be the only operator
whose symbol is a keyword, but I see no particular significance to this,
and no reason not to define it that way. I might even have preferred keywords for some of C's well-populated zoo of operators. See also
Pascal, which has keywords "and", "or", "not", and "mod".
On 26/01/2024 02:21, Janis Papanagnou wrote:
On 25.01.2024 21:11, David Brown wrote:
On 25/01/2024 17:11, Janis Papanagnou wrote:
Is or was there any compiler that implemented that in the "unexpected" >>>> order?
There were indeed such real-world cases, complaints were made,
Complaints that the rule was not clear in its definition?
Or complaints that their compiler did not support cout<<a<<b<<c;
correctly? - I would be astonished about the latter.
The pre-C++17 rule was perfectly clear - there was no specified order of execution for the operands. (And I thought I'd made /that/ perfectly
clear already.) Compilers all worked correctly - they can hardly have
fallen foul of a rule that did not exist.
The complaints (at least, the ones based on facts rather than misunderstandings) were about the lack of a rule that enforced
evaluation order in certain cases.
So C++17 added rules for evaluation orders in some circumstances, but
not others. In C++17, but not before (and not in C), the evaluation of
the expression "one" (and any side-effects) must come before the
evaluation of "two" for, amongst other things :
one << two
one >> two
one[two]
two = one
There is still /no/ ordering for
one * two
one + two
and many other cases.
And of course there are cases where there has always been a sequence
point, and therefore an order of evaluation (a logical order, that is -
if the compiler can see it makes no difference to the observable
effects, it can always re-arrange anything).
<https://en.cppreference.com/w/cpp/language/eval_order> <https://en.cppreference.com/w/c/language/eval_order>
This is so fundamental a construct and so frequently used that any
compiler would have been withdrawn in the week after it came out.
That is my expectation. So I would be grateful if you could provide
some evidence that I can look up.
<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0145r3.pdf>
For an example in practice, where you can see the generated assembly:
<https://www.godbolt.org/z/fWezzx1nd>
If I remember correctly, gcc 7 implemented the ordering rules from C++17
and back-ported them to previous C++ standards for user convenience (as
the order was previously unspecified, it was fine to do that).
Look at the generated assembly and the order in which the calls to
one(), two(), three() and four() are made. For the operator "<<", they
are made in order one() to four(). For the operator "+", and for
function call parameters, they are generated in order four() to one()
for this case. (In other cases, that may be different - that's what "unspecified" means.)
Mind that even if two() is evaluated before one(), it will not be
output before the stream of the first expression op<<(cout, one())
is available, and for this one() must be evaluated. Then one() can
be sent to the stream, and then also two() can be sent to the stream.
(Am I missing something?)
The output to the stream must be in the order given in the code - that
is true. But the values to be output could (prior to C++17) be
evaluated in any order. If one() and two() have side-effects, that is critical - those side-effects could be executed in any order.
Janis
and the rules changed in C++17.
Usually it doesn't matter what order arguments to functions (or operands >>> to operators) are evaluated. Some compilers have consistent ordering
(and it is often last to first, not first to last), others pick whatever >>> makes sense at the time. The ordering has been explicitly and clearly
stated as "unspecified" since around the beginning of time (which was,
as we all know, 01.01.1970).
On 26/01/2024 13:17, Malcolm McLean wrote:
We could say that in comp.lang.c "function" shall mean "a subroutine"
Why don't we just say - as everyone in this group except you already
says, that in c.l.c. "function" means "C function" as described in the C standards, and any other type of function needs to be qualified?
Thus "the tan function" here means the function from <math.h>, not the mathematical function, or something done when making leather.
It really is not difficult.
All what you wrote below targets at your last sentense
"those side-effects could be executed in any order".
For the examples we had, like (informally) cout<<a<<b<<c;
this is undisputed for the SIDE EFFECTS of "a", etc. You
had "hidden" those side effects in "one()", I gave in an
earlier post the more obvious example c++ in the context
of cout << c++ << c++ << c++ << endl; as side effects.
All side effects can be a problem (and should be avoided
unless "necessary"). My point was that the order of '<<'
with its arguments is NOT corrupted. I interpreted your
previous posting that you'd have heard that to be an issue.
If you haven't meant to say that there's nothing more to
say about the issue, since the other things you filled your
post with is only distracting from the point in question.
On 26.01.2024 17:01, David Brown wrote:
On 26/01/2024 02:21, Janis Papanagnou wrote:
On 25.01.2024 21:11, David Brown wrote:
On 25/01/2024 17:11, Janis Papanagnou wrote:
Is or was there any compiler that implemented that in the "unexpected" >>>>> order?
There were indeed such real-world cases, complaints were made,
Complaints that the rule was not clear in its definition?
Or complaints that their compiler did not support cout<<a<<b<<c;
correctly? - I would be astonished about the latter.
The pre-C++17 rule was perfectly clear - there was no specified order of
execution for the operands. (And I thought I'd made /that/ perfectly
clear already.) Compilers all worked correctly - they can hardly have
fallen foul of a rule that did not exist.
The complaints (at least, the ones based on facts rather than
misunderstandings) were about the lack of a rule that enforced
evaluation order in certain cases.
So C++17 added rules for evaluation orders in some circumstances, but
not others. In C++17, but not before (and not in C), the evaluation of
the expression "one" (and any side-effects) must come before the
evaluation of "two" for, amongst other things :
one << two
one >> two
one[two]
two = one
There is still /no/ ordering for
one * two
one + two
and many other cases.
And of course there are cases where there has always been a sequence
point, and therefore an order of evaluation (a logical order, that is -
if the compiler can see it makes no difference to the observable
effects, it can always re-arrange anything).
<https://en.cppreference.com/w/cpp/language/eval_order>
<https://en.cppreference.com/w/c/language/eval_order>
This is so fundamental a construct and so frequently used that any
compiler would have been withdrawn in the week after it came out.
That is my expectation. So I would be grateful if you could provide
some evidence that I can look up.
<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0145r3.pdf>
For an example in practice, where you can see the generated assembly:
<https://www.godbolt.org/z/fWezzx1nd>
If I remember correctly, gcc 7 implemented the ordering rules from C++17
and back-ported them to previous C++ standards for user convenience (as
the order was previously unspecified, it was fine to do that).
Look at the generated assembly and the order in which the calls to
one(), two(), three() and four() are made. For the operator "<<", they
are made in order one() to four(). For the operator "+", and for
function call parameters, they are generated in order four() to one()
for this case. (In other cases, that may be different - that's what
"unspecified" means.)
Mind that even if two() is evaluated before one(), it will not be
output before the stream of the first expression op<<(cout, one())
is available, and for this one() must be evaluated. Then one() can
be sent to the stream, and then also two() can be sent to the stream.
(Am I missing something?)
The output to the stream must be in the order given in the code - that
is true. But the values to be output could (prior to C++17) be
evaluated in any order. If one() and two() have side-effects, that is
critical - those side-effects could be executed in any order.
Janis
and the rules changed in C++17.
Usually it doesn't matter what order arguments to functions (or operands >>>> to operators) are evaluated. Some compilers have consistent ordering
(and it is often last to first, not first to last), others pick whatever >>>> makes sense at the time. The ordering has been explicitly and clearly >>>> stated as "unspecified" since around the beginning of time (which was, >>>> as we all know, 01.01.1970).
On 26.01.2024 17:06, David Brown wrote:
On 26/01/2024 13:17, Malcolm McLean wrote:
We could say that in comp.lang.c "function" shall mean "a subroutine"
Why don't we just say - as everyone in this group except you already
says, that in c.l.c. "function" means "C function" as described in the C
standards, and any other type of function needs to be qualified?
Thus "the tan function" here means the function from <math.h>, not the
mathematical function, or something done when making leather.
It really is not difficult.
Unless the discussion was done on a meta-level as opposed to a
concrete language specific implementation-model of a function,
or a concrete functions. - My impression from the posts upthread
was that we were taking on the meta-level to understand what we
actually have (with tha 'sizeof' beast) or how to consider it
conceptionally.
I also think that this is the key to not talk past each other.
The term "function" in computer science seems to have never been
an issue of dispute - I mean on a terminology level; explanations
in lectures or books were quite coherent, and since there was no
dispute everyone seems to have understood what a function is; in
computer science and in mathematics.
From my references it seems a consensus at least in that it's
reflecting a mathematical f: (x,y,...) -> (u,v,...) which is
projected at (or implemented by) some routine/procedure/method/
function, etc. - however it's called in any programming language.
The terminology certainly differs, but the interpretation less.
If we look deeper at the issue we can of course make academic
battles about other "function concepts" (my favorite example
is analogue computers; but that's extreme, of course). But in
that narrow corner we're discussing things it's sufficient IMO,
and probably more rewarding than restricting on the C function
implementation model.
How should we get principle insights on 'sizeof', what it is,
what it should be, etc., if we stay within this restricted C
world terminology, and discussing even a very special type of
a, umm.., function (sort of).
David Brown <david.brown@hesbynett.no> writes:
[...]
You are, quite obviously, guaranteed that in "cout << a << b << c",[...]
the output was in order a, b, c. But that is a totally different
matter from the order of evaluation (and execution, for function
calls) of the subexpressions a, b, and c.
Perhaps I can help clarify this a bit (or perhaps muddy the waters
even further). I'll try to add a bit of C relevance at the bottom.
In `cout << a << b << c`, if a, b, and c are names of non-volatile
objects, the evaluation order doesn't matter. The values of a, b,
and c will be written to the standard output stream in that order,
in all versions of C++.
In `cout << x() << y() << z()`, it's also guaranteed that the
result of the call to `x()` will precede the result of the call to
`y()`, which will precede the result of the call to `z()`, in the
text written to the output stream. What's not guaranteed prior
to C++17 is the order in which the three functions will be called.
If none of the functions have side effects that affect the results
of the other two, or depend on non-local data, it doesn't matter.
If the functions return, say, a string representation of the current
time with nanosecond resolution, the three results can be in any
of 6 orders prior to C++17; in C++17 and later, the timestamps will
always be in increasing order.
C++ overloads the "<<" shift operator for output operations, so each
"<<" after `std::cout` is really a function call, but the rules for sequencing and order of evaluation are the same as for the built-in
"<<" integer shift operation. C++ could have imposed sequencing
requirements only on overloaded "<<" and ">> operators, but that
would have been more difficult to specify in the standard.
C++17 added a new requirement that the evaluation of the left
operand of "<<" or ">>" is "sequenced before" the right operand,
meaning that any side effects of the evaluation of the left operand
must be complete before evaluation of the right operand begins
(though optimizations that don't change the visible behavior are
still allowed). It did not add such a requirement for the "+"
operator, which is overloaded for std::string concatenation.
[ snip example and prospect ]
On Thu, 25 Jan 2024 14:07:25 +0100, David Brown wrote:
"cout << one() << two() << three();"
Those C++ operators for I/O are a brain-dead idea. C-style printf formats actually work better.
I said - repeatedly - that the order of evaluation of the operands to
most operators is unspecified in C and C++. [...]
A typical example would be :
cout << "Start time: " << get_time() << "\n"
<< "Running tests... " << run_tests() << "\n"
<< "End time: " << get_time();
It was realistic - and indeed happened in some cases - for pre-C++17 compilers to generate the second "get_time()" call before "run_tests()",
and finally do the first "get_time()" call.
Alternatively, the compiler
could call "get_time()" twice, with "run_tests()" called either before
or after that pair. In all these cases, the user will see an output
that was not at all what they intended, with time appearing to go
backwards or the test apparently taking no time.
This was the case regardless of whether or not "get_time()" and
"run_tests()" had any side-effects.
You are, quite obviously, guaranteed that in "cout << a << b << c", the output was in order a, b, c. But that is a totally different matter
from the order of evaluation (and execution, for function calls) of the subexpressions a, b, and c.
I have said exactly what I intended to say in this thread, but I suspect
you have mistaken what the term "order of evaluation" means, and
therefore misunderstood what I wrote. I hope this is all clear to you now.
"cout << one() << two() << three();"
On 26.01.2024 22:16, Lawrence D'Oliveiro wrote:
On Thu, 25 Jan 2024 14:07:25 +0100, David Brown wrote:
"cout << one() << two() << three();"
Those C++ operators for I/O are a brain-dead idea. C-style printf
formats actually work better.
Well, no. There's a reason for using operators.
Also the stream hierarchy offers
design and implementation paths that you just don't have with printf().
You can do this with POSIX printf.
POSIX specifies an extension to printf that allows arguments to be re-ordered. For example:
printf("%2$s%1$s\n", "foo", "bar");
prints "barfoo".
ISO C does not have this feature.
C++'s `cout << ...` has advantages and disadvantages.
On 26/01/2024 18:59, Janis Papanagnou wrote:
On 26.01.2024 17:06, David Brown wrote:
On 26/01/2024 13:17, Malcolm McLean wrote:
We could say that in comp.lang.c "function" shall mean "a subroutine"
Why don't we just say - as everyone in this group except you already
says, that in c.l.c. "function" means "C function" as described in the C >>> standards, and any other type of function needs to be qualified?
Thus "the tan function" here means the function from <math.h>, not the
mathematical function, or something done when making leather.
It really is not difficult.
Unless the discussion was done on a meta-level as opposed to a
concrete language specific implementation-model of a function,
or a concrete functions. - My impression from the posts upthread
was that we were taking on the meta-level to understand what we
actually have (with tha 'sizeof' beast) or how to consider it
conceptionally.
We are - probably futilely - trying to get Malcolm to understand that
even in "meta-level" discussions, it is vital to be clear what is meant
by terms. And "function" alone means "C function" in c.l.c. You might
often think it is obvious from the context whether someone means "C functions", "mathematical functions", or "wedding functions", but with Malcolm you /never/ know. It regularly means "Malcolm functions", which
have an approximate definition that might change at any time.
I also think that this is the key to not talk past each other.
The term "function" in computer science seems to have never been
an issue of dispute - I mean on a terminology level; explanations
in lectures or books were quite coherent, and since there was no
dispute everyone seems to have understood what a function is; in
computer science and in mathematics.
The term "function" is most certainly in dispute in computer science. It means different things - sometimes subtly, sometimes significantly - in
the context of different programming languages, or computation theory,
or mathematics.
A "C function" is different from a "Pascal function", a
"lambda calculus function", a "Turing machine function", or any other
kind of function definition you want to pick.
From my references it seems a consensus at least in that it's
reflecting a mathematical f: (x,y,...) -> (u,v,...) which is
projected at (or implemented by) some routine/procedure/method/
function, etc. - however it's called in any programming language.
No, that is only one kind of function.
There are all sorts of questions to ask.
Can functions have side effects?
Do functions have to have outputs? Do they have to have inputs?
Does a function have to give the same output for the same inputs?
Can a function give more than one output? Does a function actually have
to be executed as called, or can the language re-arrange things?
Is it valid to have a function that does not satisfy certain
requirements, if that function is never called?
Can functions operate on types? Can they operate on other functions?
Can they operate on whole programs?
Does the function include some kind of data store? Does it include the machine it executes on?
Does a function have to be executable? Does it even have to be
computable? Does it have to execute in a finite time?
Is a function a run-time entity, or a compile-time entity? Can it be
changed at run-time? Does it make sense to "run" a function at compile
time?
I'm sure we could go on.
The terminology certainly differs, but the interpretation less.
The problem is that the terminology is the same, but the interpretation
can be wildly different. In order to communicate, we must be sure that
a given term is interpreted in the same way be each person.
If we look deeper at the issue we can of course make academic
battles about other "function concepts" (my favorite example
is analogue computers; but that's extreme, of course). But in
that narrow corner we're discussing things it's sufficient IMO,
and probably more rewarding than restricting on the C function
implementation model.
I think we're fine sticking to "function" meaning "C function", which is
well defined by the C standards, and using "mathematical function" for mathematical functions, which are also quite solidly defined. Any other usage will need to be explained at the time.
How should we get principle insights on 'sizeof', what it is,
what it should be, etc., if we stay within this restricted C
world terminology, and discussing even a very special type of
a, umm.., function (sort of).
Sizeof is not a C function.
It is a C operator. If you don't know what
it is or how it works, or want the technical details, it's all in
6.5.3.4 of the C standards.
Trying to describe "sizeof" as a function of some sort with a different
kind of use of the word "function" really doesn't get us anywhere, as
shown in this thread. It is what it is - trying to mush it into another
term is not helpful.
On Fri, 26 Jan 2024 15:41:43 -0800, Keith Thompson wrote:
C++'s `cout << ...` has advantages and disadvantages.
Interesting about Java, with all its needless complexity and futile
attempts at simplification, that this was one decision it made correctly,
and that was not to copy those operators.
On Fri, 26 Jan 2024 22:46:25 +0100, Janis Papanagnou wrote:
Also the stream hierarchy offers
design and implementation paths that you just don't have with printf().
And that you don’t need, frankly.
Java manages just fine with printf-style formatting and “toString()” methods.
On 27.01.2024 00:52, Lawrence D'Oliveiro wrote:
On Fri, 26 Jan 2024 22:46:25 +0100, Janis Papanagnou wrote:
Also the stream hierarchy offers design and implementation paths that
you just don't have with printf().
And that you don’t need, frankly.
Don't be so fast with your judgment. Of course we use it to elegantly
and scaleably solve tasks in C++.
Java (as a newer language) has also some advantages, but was in many
respects far behind C++ (IMO).
Java ...C++ ..
But that's anyway all off-topic here.
No, you are wrong, I'm not the owner of this piece of... code.
If someone makes a big heap of fecal in a public park, would
you think I'm the owner? I'd rather sue the one who did that;
because the park (or Usenet) is common property, and the heap
of fecal (or that code) is not.
All what you wrote below targets at your last sentense
"those side-effects could be executed in any order".
For the examples we had, like (informally) cout<<a<<b<<c;
this is undisputed for the SIDE EFFECTS of "a", etc. You
had "hidden" those side effects in "one()", I gave in an
earlier post the more obvious example c++ in the context
of cout << c++ << c++ << c++ << endl; as side effect
All side effects can be a problem (and should be avoided
unless "necessary").
On 26.01.2024 20:18, David Brown wrote:...
(I don't like the habit of introducing personalized terms like
"Malcolm functions"; this habit exposes more of the person who
introduced it than anything else. And it anyway would only muddy
the issue not clarify.)
A "C function" is different from a "Pascal function", a
"lambda calculus function", a "Turing machine function", or any other
kind of function definition you want to pick.
What relevance has any technical difference of "C functions"
and "Pascal functions"? - None.
It's not really important for our discussions to consider Algol's
ref, Pascal's var, C++'s const, or what else.
Yes. But remember that our question was not a technical one; wasn't
the question by the other poster (Malcolm?) about a mathematical
function term and how it fits to determine what 'sizeof' actually is
to be considered.
We disagree here; it may not appear so to you but get_time() actually
has a "side effect" (I put it in quotes, because it's literally no
"effect" but for the argument of its _sequencing problem_ it's a
relevant externality). It obtains (probably from a hardware device)
the time when the call happened.
On Sat, 27 Jan 2024 01:27:55 +0100, Janis Papanagnou wrote:
On 27.01.2024 00:52, Lawrence D'Oliveiro wrote:
On Fri, 26 Jan 2024 22:46:25 +0100, Janis Papanagnou wrote:
Also the stream hierarchy offers design and implementation paths that
you just don't have with printf().
And that you don’t need, frankly.
Don't be so fast with your judgment. Of course we use it to elegantly
and scaleably solve tasks in C++.
But not localization, which is an important issue. printf-style formatting allows rearrangement of parts of a message to suit grammar purposes, C++- style output operators do not.
Java (as a newer language) has also some advantages, but was in many
respects far behind C++ (IMO).
It made many mistakes. The goal of trying to be simpler than C++ was I
think a failure.
[...] Personally I wanted just
"function" and for it to be clear from context that here the term did
not mean "subroutine".
It's hard to think of anything that can be passed to standard outut
other than integers, floating point values, and strings. So you only
need three atomic operations.
You can then buld complex objects consisting of integers, floats and
strings on top of those three basic operations. But the stream itself
should be locked down and not open to derivation.
[...]
On 27/01/2024 10:34, Janis Papanagnou wrote:
On 27.01.2024 04:05, Malcolm McLean wrote:
[...] Personally I wanted just
"function" and for it to be clear from context that here the term did
not mean "subroutine".
In my book; there's the "concept function" (mathematical), and the
mapping/implementation onto/in a computer (a "calculation routine").
The latter has just different names in different languages and it
naturally has different technical details. In any form its purpose
is to be an implemented instance of a formal mathematical concept.
Janis
I don't really see how "Bleep" is any sort of mathematical function. But
it is clearly a "subroutine".
What I am saying is that standard output can take integers, floats and strings.
So the stream should have some facilites for writing integers (leading
zeros, signs, maybe commas separators for thousands), some for floats (rounding, precision, scientific notation etc), some for strings (not
much you can do here other than just pass the raw characters).
Now when we've got those facilites and we are happy with them, that's
it. We don't allow further derivation of the stream to change the basic behaviour. Now people might say "booleans, you've forgotten booleans,
surely when you pass booleans it should print "true" or "false". No.
We'll handle that at a higher level and pass "true" and "false" as strings.
The disadvantage is that you are locked into an integer/float/string paradigm. Amd it's not OO. But the advantage is that it will be stable.
On 1/26/24 12:31, Janis Papanagnou wrote:
All side effects can be a problem (and should be avoided
unless "necessary").
Virtually everything useful that a computer program does qualifies as a
side effect. Side effects cannot be avoided, they can only be controlled.
On Sat, 27 Jan 2024 01:27:55 +0100, Janis Papanagnou wrote:
On 27.01.2024 00:52, Lawrence D'Oliveiro wrote:
On Fri, 26 Jan 2024 22:46:25 +0100, Janis Papanagnou wrote:
Also the stream hierarchy offers design and implementation paths that
you just don't have with printf().
And that you don’t need, frankly.
Don't be so fast with your judgment. Of course we use it to elegantly
and scaleably solve tasks in C++.
But not localization, which is an important issue. printf-style formatting allows rearrangement of parts of a message to suit grammar purposes, C++- style output operators do not.
On 26.01.2024 19:59, David Brown wrote:
I said - repeatedly - that the order of evaluation of the operands to
most operators is unspecified in C and C++. [...]
Yes, and this was undisputed.
A typical example would be :
cout << "Start time: " << get_time() << "\n"
<< "Running tests... " << run_tests() << "\n"
<< "End time: " << get_time();
It was realistic - and indeed happened in some cases - for pre-C++17
compilers to generate the second "get_time()" call before "run_tests()",
and finally do the first "get_time()" call.
Yes, we have no differences.
And the sample is fine to show how we should NOT implement such time measurements (or similar logic)!
A computer scientist or a sophisticated programmer would know that
there are run-times associated in such expressions:
cout << "S1" << f1() << "S2" << f2() << "S3" << f3();
t1 t2 t3 t4 t5 t6 t7 t8 t9
and he would act accordingly and serialize the expression (see below).
Alternatively, the compiler
could call "get_time()" twice, with "run_tests()" called either before
or after that pair. In all these cases, the user will see an output
that was not at all what they intended, with time appearing to go
backwards or the test apparently taking no time.
This was the case regardless of whether or not "get_time()" and
"run_tests()" had any side-effects.
We disagree here; it may not appear so to you but get_time() actually
has a "side effect" (I put it in quotes, because it's literally no
"effect" but for the argument of its _sequencing problem_ it's a
relevant externality). It obtains (probably from a hardware device)
the time when the call happened.
That's why somewhat experienced programmers would not write above
code that way; something like "run_tests()" is (typically) or can be
very time consuming, so they'd do
t0 = get_time(); res = run_tests(); t1 = get_time();
cout << ... etc.
(Note: This argument implies NOT that a language shouldn't be made as bulletproof as possible and sensible.)
You are, quite obviously, guaranteed that in "cout << a << b << c", the
output was in order a, b, c. But that is a totally different matter
from the order of evaluation (and execution, for function calls) of the
subexpressions a, b, and c.
(It was meant as a "meta expression". I've addressed that in my
response to Keith already; please see there.)
I have said exactly what I intended to say in this thread, but I suspect
you have mistaken what the term "order of evaluation" means, and
therefore misunderstood what I wrote. I hope this is all clear to you now.
The order of evaluation of the '<<' was what I spoke about. The order
of the arguments had never been an issue. The "problem" with the order
of the arguments becomes a problem (without quotes) when side effects
of the arguments are inherent to the arguments.
You had been focused on the evaluation of the arguments (where side
effects might lead to unexpected behavior). I wasn't.
On 26/01/2024 19:18, David Brown wrote:
I think we're fine sticking to "function" meaning "C function", whichBasically I wanted "function" for C functions which are also
is well defined by the C standards, and using "mathematical function"
for mathematical functions, which are also quite solidly defined. Any
other usage will need to be explained at the time.
mathematical functions, and "procedure" for C functions which do not
meet the definition of mathematical functions. In context, of course.
And since this is normal, accepted usage, I though It would be accepted
here.
On 27.01.2024 00:51, Lawrence D'Oliveiro wrote:
On Fri, 26 Jan 2024 15:41:43 -0800, Keith Thompson wrote:
C++'s `cout << ...` has advantages and disadvantages.
Interesting about Java, with all its needless complexity and futile
attempts at simplification, that this was one decision it made correctly,
and that was not to copy those operators.
Chosing these operators is a separate issue.
On 26.01.2024 20:18, David Brown wrote:
On 26/01/2024 18:59, Janis Papanagnou wrote:
On 26.01.2024 17:06, David Brown wrote:
On 26/01/2024 13:17, Malcolm McLean wrote:
(I don't like the habit of introducing personalized terms like
"Malcolm functions"; this habit exposes more of the person who
introduced it than anything else. And it anyway would only muddy
the issue not clarify.)
(I fear this thread will lead nowhere, but okay, I'll enter...)
A "C function" is different from a "Pascal function", a
"lambda calculus function", a "Turing machine function", or any other
kind of function definition you want to pick.
What relevance has any technical difference of "C functions"
and "Pascal functions"? - None.
Note: I don't want you to answer these questions. I suppose
you might have some substantial CS background (I certainly do)
and are not just spreading buzzwords.
Neither the technical (implementation) differences of the first
two types are relevant for the topics that have been discussed,
nor the algorithm theory definitions of the latter two function
types are relevant here.
From my references it seems a consensus at least in that it's
reflecting a mathematical f: (x,y,...) -> (u,v,...) which is
projected at (or implemented by) some routine/procedure/method/
function, etc. - however it's called in any programming language.
No, that is only one kind of function.
That is an abstract representation from mathematics (and I am
not interest in syntactic differences to other forms) that can
be directly mapped to an algorithmic representation.
We write (for example [borrowed from a book]):
f: R x R x R -> R for the domains; R here: real numbers
f(r,R,h) -> pi/3 x h x (r^2 + r x R + R^2)
and in computer languages (for example) syntactic variants of:
f = (real r, real R, real h) real :
pi/3 * h * (r^2 + r * R + R^2)
The function from the language closely resembles that from the
mathematic domain.
There are all sorts of questions to ask.
Yes, but not many (none?) of significance in our discussion context
here.
How should we get principle insights on 'sizeof', what it is,
what it should be, etc., if we stay within this restricted C
world terminology, and discussing even a very special type of
a, umm.., function (sort of).
Sizeof is not a C function.
I know it's an operator in C. And I also wasn't saying that it's a
C function. - You still see the "(sort of)" in my statement. And we
already spoke about the close (but not exact) equivalences between
functions and operators.
It is a C operator. If you don't know what
it is or how it works, or want the technical details, it's all in
6.5.3.4 of the C standards.
If that's all the OP wanted to discuss it would be easy. You don't
even need any C standard document. Open any book, even the old K&R
is sufficient, and look up 'sizeof'. You can read about it being an
operator and fine. File closed. Goodbye. (What for was the original
question of this thread? I seem to recall something about the form
with parenthesis and type?)
Trying to describe "sizeof" as a function of some sort with a different
kind of use of the word "function" really doesn't get us anywhere, as
shown in this thread. It is what it is - trying to mush it into another
term is not helpful.
What would be the difference if the parenthesized form would be
called a function, given that functions and operators are similar,
and the context so restricted?
I don't think you can get an address
of it (or can we?); but that again is just another implementation
details (C specific).
The need for parenthesis in sizeof(type) seems anyway to be only a
hack, necessary for type expressions with blanks, sizeof(struct x) ?
Janis
BTW: There was another subthread about preprocessor use for NELEM determination using sizeof. When I looked up the K&R reference I
saw its use described even as a standard pattern to determine the
number of array elements. No wonder it became idiomatic.
On 27/01/2024 01:38, Lawrence D'Oliveiro wrote:
On Sat, 27 Jan 2024 01:27:55 +0100, Janis Papanagnou wrote:
On 27.01.2024 00:52, Lawrence D'Oliveiro wrote:
On Fri, 26 Jan 2024 22:46:25 +0100, Janis Papanagnou wrote:
Also the stream hierarchy offers design and implementation paths that >>>>> you just don't have with printf().
And that you don’t need, frankly.
Don't be so fast with your judgment. Of course we use it to elegantly
and scaleably solve tasks in C++.
But not localization, which is an important issue. printf-style formatting >> allows rearrangement of parts of a message to suit grammar purposes, C++-
style output operators do not.
Standard printf formatting also does not allow such re-arrangements.
(My own key dislike about the C++ output streams is the mess of stateful
"IO manipulators".)
On 27.01.2024 00:52, Lawrence D'Oliveiro wrote:
On Fri, 26 Jan 2024 22:46:25 +0100, Janis Papanagnou wrote:
Also the stream hierarchy offers
design and implementation paths that you just don't have with printf().
And that you don’t need, frankly.
Don't be so fast with your judgment. Of course we use it to elegantly
and scaleably solve tasks in C++.
Java manages just fine with printf-style formatting and “toString()” methods.
I tried to explain in my other post that it's not just about a format
(or a string-sequencing member function). But I'm sure one must be
deeper in the topic or have experienced (besides any supposed issues)
the sophisticated possibilities that C++ offers to support good design.
Java (as a newer language) has also some advantages, but was in many
respects far behind C++ (IMO).
David Brown <david.brown@hesbynett.no> writes:
On 27/01/2024 01:38, Lawrence D'Oliveiro wrote:
On Sat, 27 Jan 2024 01:27:55 +0100, Janis Papanagnou wrote:
On 27.01.2024 00:52, Lawrence D'Oliveiro wrote:
On Fri, 26 Jan 2024 22:46:25 +0100, Janis Papanagnou wrote:
Also the stream hierarchy offers design and implementation paths that >>>>>> you just don't have with printf().
And that you don’t need, frankly.
Don't be so fast with your judgment. Of course we use it to elegantly
and scaleably solve tasks in C++.
But not localization, which is an important issue. printf-style formatting >>> allows rearrangement of parts of a message to suit grammar purposes, C++- >>> style output operators do not.
Standard printf formatting also does not allow such re-arrangements.
Depends on what standard you use. POSIX certainly does.
(My own key dislike about the C++ output streams is the mess of stateful
"IO manipulators".)
Hear! Hear!
The run-time cost of all those stateful manipulators isn't free, either.
On 27/01/2024 18:26, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
(My own key dislike about the C++ output streams is the mess of stateful >>> "IO manipulators".)
Hear! Hear!
The run-time cost of all those stateful manipulators isn't free, either.
For my own use, I've sometimes used classes letting you do :
debug_log << "X = " << x << " = 0x" << hex(x, 8) << "\n";
David Brown <david.brown@hesbynett.no> writes:
Depends on what standard you use. POSIX certainly does.
Standard printf formatting also does not allow such re-arrangements.
What I am saying is that standard output can take integers, floats and strings.
You can of course encode any data format as any other as long as you can write enough. But standard output can't take images or audio, for example.
$ cat a.out | xxd | head -1 00000000: cffa edfe 0c00 0001 0000 0000 0200
0000 ................
On 27/01/2024 21:06, Lawrence D'Oliveiro wrote:
On Sat, 27 Jan 2024 11:13:16 +0000, Malcolm McLean wrote:Yes, and we could say fixed point, complex, etc.
What I am saying is that standard output can take integers, floats and
strings.
You forgot booleans. Also enumerations can be useful.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
C and POSIX go together like a horse and carriage; one without the
other is a lot less useful.
Which is why horseless carriages never caught on.
On 27/01/2024 02:10, James Kuyper wrote:
On 1/26/24 12:31, Janis Papanagnou wrote:
All side effects can be a problem (and should be avoided
unless "necessary").
Virtually everything useful that a computer program does qualifies as a
side effect. Side effects cannot be avoided, they can only be controlled.
Try telling that to Haskell programmers :-)
David Brown <david.brown@hesbynett.no> writes:
For my own use, I've sometimes used classes letting you do :
debug_log << "X = " << x << " = 0x" << hex(x, 8) << "\n";
"hex(x, 8)" returns a value of a class holding "x" and the number of
digits 8, and then there is an overload for the << operator on this
class. No extra state needs to be stored in the logging class, I can
make as many of these formatters as I like, and the intermediary
classes all disappear in the optimisation.
Or hex() could just return a std::string.
On 1/27/24 10:44, David Brown wrote:
On 27/01/2024 02:10, James Kuyper wrote:
On 1/26/24 12:31, Janis Papanagnou wrote:
All side effects can be a problem (and should be avoided
unless "necessary").
Virtually everything useful that a computer program does qualifies as a
side effect. Side effects cannot be avoided, they can only be controlled. >>>
Try telling that to Haskell programmers :-)
I was talking very specifically in reference to C's definition of "side-effect". I'm not particularly familiar with Haskell - does it have
a different definition of "side effect", or does it somehow get
something useful done without qualifying under C's definition? If so, how?
On Sat, 27 Jan 2024 17:26:24 GMT, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
Depends on what standard you use. POSIX certainly does.
Standard printf formatting also does not allow such re-arrangements.
C and POSIX go together like a horse and carriage; one without the other
is a lot less useful.
David Brown <david.brown@hesbynett.no> writes:
[...]
Seriously, how hard would it be for you to accept the usage of
"function" to mean "C function" in this group? How difficult would it
be for you to try to speak the same language as the rest of us? Do
you really expect everyone else to adapt to suit your personal choice
of definitions? How often do you need to go round the same circles
again and again, instead of trying to communicate with people in a
sane manner?
You don't really think difficulty is the issue, do you?
On 28/01/2024 01:26, Keith Thompson wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:Exactly.
On 27/01/2024 21:06, Lawrence D'Oliveiro wrote:
On Sat, 27 Jan 2024 11:13:16 +0000, Malcolm McLean wrote:Yes, and we could say fixed point, complex, etc.
What I am saying is that standard output can take integers, floats and >>>>> strings.You forgot booleans. Also enumerations can be useful.
It's not inherently a bad idea to extend our little stdout interface
to include booleans. But in fact there are too many output formats you
might need.
Fixed point - in C or C++ there's no standard for that, so now you are
going the OO route. As you would with enumerations as the symbol
doesn't exist at runtime.
It's not that there is no case to be made for the OO approach. What I
am saying is that in practice the locked down restricted interface
will work better.
I think you mean it will work Malcolm-better.
Apparently inflexibility and vulnerability to type errors are
Malcolm-better than the alternative.
Inflexibility can be better. Because in reality most program work with a restricted set of data types which it makes sense to pass to a text
stream, and so you only need three atomic types.
Tpye errors are of course a nuisance with printf(). But that's because
of the quirks of C, not because it takes a restricted set of types, and
you can write a different restricted interface without this problem.
The fact is that printf(), which works basically as I recommend, is
widely used as the interface to standard output, and often OO
alternatives are available and not used for various reasons.
So the world is in fact "Malcolm better".
On 28/01/2024 02:59, Malcolm McLean wrote:
[...]
You mean the Malcolm-world is Malcolm-better with these restrictions,
because in the Malcolm-world the only programming tasks that are done
are Malcolm-tasks, and the programmers are all Malcolm-programmers.
On 28/01/2024 12:00, David Brown wrote:[...]
You put non-ASCII text on stdout?
I mean, obviously in a program for international use itself. But in
routine program for general use?
how is
cout << std::hex << std::setw((bits + 3)/4) << value << std::eol;
better than
printf("%*x\n", (bits+3/4), value);
Standard output is any sequence of ASCII characters.
printf() is the
main C interface to that, and supports integers, floats and strings, to
a first approximation.
You can of course encode any data format as any other as long as you can write enough. But standard output can't take images or audio, for example.
The OO method is to allow the stream to be extended. So, in one common system, we might have a "decimal" stream which takes floats and outputs in the format 123.456. Then we could derive a different type of stream from
that which outputs floats as 1.23456e2. [...]
[...]
On 28/01/2024 12:00, David Brown wrote:
On 28/01/2024 02:59, Malcolm McLean wrote:You put non-ASCII text on stdout?
On 28/01/2024 01:26, Keith Thompson wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:Exactly.
On 27/01/2024 21:06, Lawrence D'Oliveiro wrote:
On Sat, 27 Jan 2024 11:13:16 +0000, Malcolm McLean wrote:Yes, and we could say fixed point, complex, etc.
What I am saying is that standard output can take integers,You forgot booleans. Also enumerations can be useful.
floats and
strings.
It's not inherently a bad idea to extend our little stdout interface >>>>> to include booleans. But in fact there are too many output formats you >>>>> might need.
Fixed point - in C or C++ there's no standard for that, so now you are >>>>> going the OO route. As you would with enumerations as the symbol
doesn't exist at runtime.
It's not that there is no case to be made for the OO approach. What I >>>>> am saying is that in practice the locked down restricted interface
will work better.
I think you mean it will work Malcolm-better.
Apparently inflexibility and vulnerability to type errors are
Malcolm-better than the alternative.
Inflexibility can be better. Because in reality most program work
with a restricted set of data types which it makes sense to pass to a
text stream, and so you only need three atomic types.
Tpye errors are of course a nuisance with printf(). But that's
because of the quirks of C, not because it takes a restricted set of
types, and you can write a different restricted interface without
this problem.
The fact is that printf(), which works basically as I recommend, is
widely used as the interface to standard output, and often OO
alternatives are available and not used for various reasons.
So the world is in fact "Malcolm better".
You mean the Malcolm-world is Malcolm-better with these restrictions,
because in the Malcolm-world the only programming tasks that are done
are Malcolm-tasks, and the programmers are all Malcolm-programmers.
At least that's all cleared up nicely, and the rest of the world can
go back to using more than three types, and generating outputs that
are not just ASCII text.
I mean, obviously in a program for international use itself. But in
routine program for general use?
On 26/01/2024 22:30, Janis Papanagnou wrote:
On 26.01.2024 19:59, David Brown wrote:
A computer scientist or a sophisticated programmer would know that
there are run-times associated in such expressions:
cout << "S1" << f1() << "S2" << f2() << "S3" << f3();
t1 t2 t3 t4 t5 t6 t7 t8 t9
The experienced or knowledgable C++ programmer (prior to C++17) would
know that the parts here are not necessarily executed in the order you
give.
[...]
That's why somewhat experienced programmers would not write above
code that way; something like "run_tests()" is (typically) or can be
very time consuming, so they'd do
t0 = get_time(); res = run_tests(); t1 = get_time();
cout << ... etc.
Of course.
In practice, they could still be badly wrong even with that code -
there's a lot of subtle points to consider when trying to time code, and
my experience is that very few programmers get it entirely right.
[...]
On 27/01/2024 21:59, Lawrence D'Oliveiro wrote:
On Sat, 27 Jan 2024 17:26:24 GMT, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
Depends on what standard you use. POSIX certainly does.
Standard printf formatting also does not allow such re-arrangements.
C and POSIX go together like a horse and carriage; one without the
other is a lot less useful.
To the nearest percent, 0% of all systems running C programs support
POSIX (or Windows, or any other "big" system). The world of small
embedded systems totally outweigh "big" systems by many orders of
magnitude. And perhaps 80% of such small systems are programmed in C.
A lot (for some interpretations of "a lot") of embedded systems run
Android. Those aren't the one David was talking about.
On Sun, 28 Jan 2024 14:49:53 -0800, Keith Thompson wrote:
A lot (for some interpretations of "a lot") of embedded systems run
Android. Those aren't the one David was talking about.
They have a POSIX-type C runtime. Which does support “%«n»$” for reordering args to the printf routines.
The point being the prevalence of POSIX is a little larger than you give
it credit for.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
The point being the prevalence of POSIX is a little larger than you
give it credit for.
Again, David wasn't talking about Android systems.
On 27.01.2024 16:43, David Brown wrote:
On 26/01/2024 22:30, Janis Papanagnou wrote:
That's why somewhat experienced programmers would not write above
code that way; something like "run_tests()" is (typically) or can be
very time consuming, so they'd do
t0 = get_time(); res = run_tests(); t1 = get_time();
cout << ... etc.
Of course.
You can serialize (as I suggested previously as one example) or
embed functions like take_time(run_tests()) as another example.
In practice, they could still be badly wrong even with that code -
there's a lot of subtle points to consider when trying to time code, and
my experience is that very few programmers get it entirely right.
Really? - I mostly had to do with folks, even newbies with a
proper CS education, who had enough experience or knowledge.
Most problems appeared in contexts where the used languages
have inherent design issues; not in any case we could avoid
use of such languages in the first place.
On Sun, 28 Jan 2024 12:53:36 +0100, David Brown wrote:
On 27/01/2024 21:59, Lawrence D'Oliveiro wrote:
On Sat, 27 Jan 2024 17:26:24 GMT, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
Depends on what standard you use. POSIX certainly does.
Standard printf formatting also does not allow such re-arrangements. >>>>>
C and POSIX go together like a horse and carriage; one without the
other is a lot less useful.
To the nearest percent, 0% of all systems running C programs support
POSIX (or Windows, or any other "big" system). The world of small
embedded systems totally outweigh "big" systems by many orders of
magnitude. And perhaps 80% of such small systems are programmed in C.
And a lot of those “embedded” systems are running Android.
Android ships as many units per year as the entire installed base of Microsoft Windows.
On Sun, 28 Jan 2024 17:48:53 -0800, Keith Thompson wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
The point being the prevalence of POSIX is a little larger than you
give it credit for.
Again, David wasn't talking about Android systems.
No, I was, as an example of the sort of POSIX system he thought was too minuscule to worry about.
On Sun, 28 Jan 2024 14:49:53 -0800, Keith Thompson wrote:
A lot (for some interpretations of "a lot") of embedded systems run
Android. Those aren't the one David was talking about.
They have a POSIX-type C runtime. Which does support “%«n»$” for reordering args to the printf routines.
The point being the prevalence of POSIX is a little larger than you give
it credit for.
On 27.01.2024 17:46, David Brown wrote:
[...]
FYI: Too long to read at the moment. (Maybe later, maybe not.)
On 28/01/2024 18:24, David Brown wrote:
On 28/01/2024 17:09, Malcolm McLean wrote:
I'd expect that most general purpose programs written by Norwegians useYou put non-ASCII text on stdout?
I mean, obviously in a program for international use itself. But in
routine program for general use?
I commonly write out in UTF-8 - it does not have to be
"international". (I assume that by "international" you, as a good
Brit, mean "not UK". After all, a program written solely for use in
Norwegian is not international.)
an English interface, even if it isn't really expected that the program
will find an audience beyond some users in Norway. Except of course for programs which in some way are about Norway.
Sometimes I will have binary data of some kind on the standard output.I've never used standard output for binary data. It might be necessary
It's a lot less common, but it happens. A common example would be
code for generating images or other files for a webserver.
Most of my "real" programs, rather than small utilities, are for
embedded systems where the concept of "standard output" is not really
the same as for PC's.
for webservers that serve images. But it strikes me as a poor design decision.
And in environments like POSIX that don't distinguish between text and
binary output streams, it can be perfectly sensible (though not 100% portable) to send binary data to stdout.
On 28/01/2024 21:43, Lawrence D'Oliveiro wrote:
On Sun, 28 Jan 2024 12:53:36 +0100, David Brown wrote:
On 27/01/2024 21:59, Lawrence D'Oliveiro wrote:
On Sat, 27 Jan 2024 17:26:24 GMT, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
Depends on what standard you use. POSIX certainly does.
Standard printf formatting also does not allow such re-arrangements. >>>>>>
C and POSIX go together like a horse and carriage; one without the
other is a lot less useful.
To the nearest percent, 0% of all systems running C programs support
POSIX (or Windows, or any other "big" system). The world of small
embedded systems totally outweigh "big" systems by many orders of
magnitude. And perhaps 80% of such small systems are programmed in C.
And a lot of those “embedded” systems are running Android.
No, they are not. Android is Linux, and is included in the 0%.
Android ships as many units per year as the entire installed base of
Microsoft Windows.
Sure. And it is still within the 0%.
Take your car as an example. There's a reasonable chance, if it is
modern, that the entertainment and navigation system is running Android.
You might have a couple of other parts running embedded Linux of other types. And you might have 100 other microcontrollers running programs written in C, but not running a "big" POSIX OS. Some will run RTOS's,
some will be bare metal.
On the computer on your desk, you have a microcontroller in your mouse, keyboard, webcam, screen, harddisk, managed switch. Your printer might
have some kind of embedded Linux for its display and UI, but probably
has many other microcontrollers in it. Your toaster, oven, fridge,
alarm clock, digital thermometer - microcontrollers are everywhere.
Even your typical Android device - a phone or tablet - will have a few separate microcontrollers, and a variety of bits and pieces in its SoC
that are programmed in C but do not have a POSIX system.
On 29/01/2024 12:35, David Brown wrote:
On 28/01/2024 21:43, Lawrence D'Oliveiro wrote:
On Sun, 28 Jan 2024 12:53:36 +0100, David Brown wrote:
On 27/01/2024 21:59, Lawrence D'Oliveiro wrote:And a lot of those “embedded” systems are running Android.
On Sat, 27 Jan 2024 17:26:24 GMT, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
Depends on what standard you use. POSIX certainly does.
Standard printf formatting also does not allow such re-arrangements. >>>>>>>
C and POSIX go together like a horse and carriage; one without the
other is a lot less useful.
To the nearest percent, 0% of all systems running C programs support
POSIX (or Windows, or any other "big" system). The world of small
embedded systems totally outweigh "big" systems by many orders of
magnitude. And perhaps 80% of such small systems are programmed in C. >>>
No, they are not. Android is Linux, and is included in the 0%.
Android ships as many units per year as the entire installed base of
Microsoft Windows.
Sure. And it is still within the 0%.
Take your car as an example. There's a reasonable chance, if it is
modern, that the entertainment and navigation system is running
Android. You might have a couple of other parts running embedded
Linux of other types. And you might have 100 other microcontrollers
running programs written in C, but not running a "big" POSIX OS. Some
will run RTOS's, some will be bare metal.
On the computer on your desk, you have a microcontroller in your
mouse, keyboard, webcam, screen, harddisk, managed switch. Your
printer might have some kind of embedded Linux for its display and UI,
but probably has many other microcontrollers in it. Your toaster,
oven, fridge, alarm clock, digital thermometer - microcontrollers are
everywhere.
I think this is being disingenuous. Of course there are countless
millions of integrated circuits used everywhere, that will outnumber the packaged consumer devices that everyone knows about.
Some of them may have programmable elements. But, no matter how crude,
how limited, if somebody, somewhere, has configured a program to turn a subset of C into code for that device, that enables you to add that to
the list of systems you claim are programmed in 'C'.
Even if it relies on dedicated extensions or uses lots of inline assembly.
Even your typical Android device - a phone or tablet - will have a few
separate microcontrollers, and a variety of bits and pieces in its SoC
that are programmed in C but do not have a POSIX system.
Maybe you can count each main CPU and each core separately too!
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 27.01.2024 21:17, Malcolm McLean wrote:[...]
printf() is the
main C interface to that, and supports integers, floats and strings, to
a first approximation.
It's not an approximation; printf() is _restricted_ to these types (and
a few more variants of these few basic types, to be correct).
printf also supports pointer values with "%p". And it support single characters, which are not strings.
strings of cousre absolutely do not have to be ASCII. Using printf to
print data with embedded null bytes is tricky
-- but of course printf is
not the only interface. We can print arbitrary data with putchar,
fwrite, etc.
And in environments like POSIX that don't distinguish between text and
binary output streams,
it can be perfectly sensible (though not 100%
portable) to send binary data to stdout.
On 27/01/2024 11:36, Janis Papanagnou wrote:
On 27.01.2024 12:02, Malcolm McLean wrote:
On 27/01/2024 10:34, Janis Papanagnou wrote:
On 27.01.2024 04:05, Malcolm McLean wrote:
In many languages, including C, there's a difference between functions
that return a value and functions that don't, in that
if (realloc(ptr, 0))
is allowed
whilst
if (free(ptr))
On 29/01/2024 19:32, Malcolm McLean wrote:
On 27/01/2024 11:36, Janis Papanagnou wrote:
On 27.01.2024 12:02, Malcolm McLean wrote:
On 27/01/2024 10:34, Janis Papanagnou wrote:
On 27.01.2024 04:05, Malcolm McLean wrote:
In many languages, including C, there's a difference between functions
that return a value and functions that don't, in that
In some languages, yes.
if (realloc(ptr, 0))
is allowed
whilst
if (free(ptr))
struct S { int a, b; };
struct S foo(void);
foo() returns a value, but "if (foo())" is not allowed.
C does not make much difference between functions that return a value,
and those that don't. The key distinction is whether the "return"
statement must have an expression or must not have an expression.
On 29/01/2024 16:18, David Brown wrote:
On 28/01/2024 20:49, Malcolm McLean wrote:Generally programmers are educated people and educated people use
On 28/01/2024 18:24, David Brown wrote:
I'd expect that most general purpose programs written by Norwegians
use an English interface, even if it isn't really expected that the
program will find an audience beyond some users in Norway. Except of
course for programs which in some way are about Norway.
Why?
English for serious purposes.
Not always of course and Norway might be
an exception. But I'd expect that in a Norweigian university, for
example, it would be forbidden to document a program in Norwegian or to
use non-English words for identifiers. And probably the same in a large Norwegina company. I might be wrong about that and I have never visited Norway or worked for a Norweigian employer (and obviously I couldn't do
so unless the policy I expect was followed).
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
On 2024-01-29, David Brown <david.brown@hesbynett.no> wrote:
On 29/01/2024 19:32, Malcolm McLean wrote:
On 27/01/2024 11:36, Janis Papanagnou wrote:
On 27.01.2024 12:02, Malcolm McLean wrote:
On 27/01/2024 10:34, Janis Papanagnou wrote:
On 27.01.2024 04:05, Malcolm McLean wrote:
In many languages, including C, there's a difference between functions
that return a value and functions that don't, in that
In some languages, yes.
if (realloc(ptr, 0))
is allowed
whilst
if (free(ptr))
struct S { int a, b; };
struct S foo(void);
foo() returns a value, but "if (foo())" is not allowed.
C does not make much difference between functions that return a value,
and those that don't. The key distinction is whether the "return"
statement must have an expression or must not have an expression.
Don't forget that we can have:
struct S s = foo();
not to mention
struct S bar(void) { return foo(); }
as well as:
extern bar(struct S);
bar(foo());
none of which patterns is possible if foo returns void.
A void return is qualitatively different. A function which returns
a value can plausibly belong into the functional domain. A function
which returns void is necessarily an imperative procedure.
Even if it does nothing, a void foo() function it is a procedure in that
it cannot be planted into a functional expression like bar(foo()).
So we can identify an emergent category there.
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans
printf ("%s\n", "My\0string");
which won't work as some may expect (you will only see "My").
If you exclude obvious cases like
phones, tablets, and smart TVs, there are many more embedded Linux
systems that are not Android, than embedded Android systems. Those are
all POSIX too. And yet they are all part of the 0%.
... I don't think
my use of standard output is all that untypical. It's unacceptable for anything released to customers and is used mainly for debugging.
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might
have unusual effects if passed though systems designed to handle human-readable text. For instance in some systems designed to receive
ASCII text, there is no distinction between the nul byte and "waiting
for next data byte". Obviously this will cause difficuties if the data
is binary.
Also many binary formats can't easily be extended, so you can pass one
image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to
extend.
On 29/01/2024 20:10, David Brown wrote:
Sure. But Malcolm suggested that the "if" pattern was a specialNo, Malcolm gave if() as an example of a distinction in a language grammar between functions that return a value and functions that don't. Sometimes functions return a value and if() still isn't allowed. Fair enough
distinguishing feature. (He has already made it clear that the only
types of interest, in his world, are integers, floats and strings.)
point.But it doesn't really detract from the point that Malcolm is makin
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might
have unusual effects if passed though systems designed to handle
human-readable text. For instance in some systems designed to receive
ASCII text, there is no distinction between the nul byte and "waiting
for next data byte". Obviously this will cause difficuties if the data >>> is binary.
Also many binary formats can't easily be extended, so you can pass one
image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to
extend.
Your reasoning is all gobbledygook. Your comments reflect only
limitations in your thinking, not any essential truth about using
standard out for binary data.
I must admit that it's nothing I have ever done or considered doing.
However standard output is designed for text and not binary ouput.
Whilst there is a "printf()" which operates on standard output by
default, there are no functions which write binary data to standard
outout by default, for example. Though of course you can pass stdout to
the regular binary output functions like fwrite().
So I'm obviously not the only person to take the view that passing
binary data to standard output is a rather odd thing to do.
I suspect the truth is that is is a bad design and I am right, but
because for some reason communications have to be via standard output,
people make the best of it and contrive that it shall work, and then
forget that essentially it is a misuse of a text stream. They are
slightly proud of their efforts and intolerant of my point.
That you couldn't actually mount a defence of your position whilst I
could also strongly implies that I am right.
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might
have unusual effects if passed though systems designed to handle
human-readable text. For instance in some systems designed to receive
ASCII text, there is no distinction between the nul byte and "waiting
for next data byte". Obviously this will cause difficuties if the data
is binary.
Also many binary formats can't easily be extended, so you can pass one
image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to
extend.
Your reasoning is all gobbledygook. Your comments reflect only
limitations in your thinking, not any essential truth about using
standard out for binary data.
I must admit that it's nothing I have ever done or considered doing.
However standard output is designed for text and not binary ouput.
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
I must admit that it's nothing I have ever done or considered doing.
However standard output is designed for text and not binary ouput.
That was never the case. stdout is a unformatted stream of bytes
associated by default with file descriptor number one in the
application.
Long before windows was even a gleam in gates eye.
On 30/01/2024 15:00, Scott Lurndal wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:It is a stream of bytes at the level that the file descriptor is used to >generate a write event for a byte which can be arbitrary. But standard
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:I must admit that it's nothing I have ever done or considered doing.
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might >>>>> have unusual effects if passed though systems designed to handle
human-readable text. For instance in some systems designed to receive >>>>> ASCII text, there is no distinction between the nul byte and "waiting >>>>> for next data byte". Obviously this will cause difficuties if the data >>>>> is binary.
Also many binary formats can't easily be extended, so you can pass one >>>>> image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to >>>>> extend.
Your reasoning is all gobbledygook. Your comments reflect only
limitations in your thinking, not any essential truth about using
standard out for binary data.
However standard output is designed for text and not binary ouput.
That was never the case. stdout is a unformatted stream of bytes
associated by default with file descriptor number one in the
application.
output is often quickly transformed into a stream of characters.
Sometimes within the application executable.
those who use the less common, often more expensive Unix systems.
On 30/01/2024 15:00, Scott Lurndal wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
I must admit that it's nothing I have ever done or considered doing.
However standard output is designed for text and not binary ouput.
That was never the case. stdout is a unformatted stream of bytes
associated by default with file descriptor number one in the
application.
Long before windows was even a gleam in gates eye.
I don't know what Windows has to do with it.
The difference between text and binary byte streams is something
invented by C, so that conversions could be done for byte '\n' on
systems with alternate line-endings.
On 30/01/2024 13:03, David Brown wrote:
On 30/01/2024 10:13, Malcolm McLean wrote:Speical facilities for text don't necessarily mean that text is the only >output intended to be used, fair enough.
There is no standard C library function that takes stderr as the default
stream. Does that mean stderr was not designed to be used at all?
"printf" exists and works the way it does because it is convenient and
useful. It can be viewed as a short-cut for "fprintf(stdout, ...".
Indeed, that is /exactly/ how the C standard describes the function.
That means the C standards acknowledge that people often want to print
out formatted text (which in no way implies plain ASCII) to stdout. This
does not mean they expect this to be the /only/ use of stdout, or that
people will not use binary outputs to stdout, any more than it implies
that text output will always be sent to stdout and not other streams or
files.
printf has no binary data format specifier.
The fact that there is no
similar function for standard error
. Similarly there is no
function "write`" that passes binary data to standard output by default.
bart <bc@freeuk.com> writes:
On 30/01/2024 15:00, Scott Lurndal wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
I must admit that it's nothing I have ever done or considered doing.
However standard output is designed for text and not binary ouput.
That was never the case. stdout is a unformatted stream of bytes
associated by default with file descriptor number one in the
application.
Long before windows was even a gleam in gates eye.
I don't know what Windows has to do with it.
The difference between text and binary byte streams is something
invented by C, so that conversions could be done for byte '\n' on
systems with alternate line-endings.
No, it was invented to support windows CRLF line endings.
Regardless of your digression, stdout is still an unformatted
stream of bytes. Any structure on that stream is imposed
by the -consumer- of those bytes.
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 30/01/2024 13:03, David Brown wrote:
On 30/01/2024 10:13, Malcolm McLean wrote:Speical facilities for text don't necessarily mean that text is the only >>output intended to be used, fair enough.
There is no standard C library function that takes stderr as the default >>> stream. Does that mean stderr was not designed to be used at all?
"printf" exists and works the way it does because it is convenient and
useful. It can be viewed as a short-cut for "fprintf(stdout, ...".
Indeed, that is /exactly/ how the C standard describes the function.
That means the C standards acknowledge that people often want to print
out formatted text (which in no way implies plain ASCII) to stdout. This >>> does not mean they expect this to be the /only/ use of stdout, or that
people will not use binary outputs to stdout, any more than it implies
that text output will always be sent to stdout and not other streams or
files.
Even text is just an unformatted stream of bytes. It is the ultimate consumer of that text that imposed structure on it (e.g. by treating
it as ASCII, UTF-16, UTF-8, UTF-32, EBCDIC, et cetera, et alia, und so weiter)
printf has no binary data format specifier.
%s? Simply copies non-nul bytes. That's almost as binary as
one can get, it certainly isn't restricted to printable characters.
And of course, there are putc and putchar.
Not to mention using printf where the format string argument
includes binary data.
<snip>
The fact that there is no
similar function for standard error
fprintf(stderr, "%s", binary_data_with_no_embedded_nul_bytes);
. Similarly there is no
function "write`" that passes binary data to standard output by default.
In the real world, and in the world the C was created to support
there are several functions (write, pwrite, mmap, aio_listio, aio_read, aio_write et cetera, et alia, und so wieter).
Most of which existed before 1989.
On 30/01/2024 13:03, David Brown wrote:
On 30/01/2024 10:13, Malcolm McLean wrote:Speical facilities for text don't necessarily mean that text is the only output intended to be used, fair enough.
There is no standard C library function that takes stderr as the
default stream. Does that mean stderr was not designed to be used at
all?
"printf" exists and works the way it does because it is convenient and
useful. It can be viewed as a short-cut for "fprintf(stdout, ...".
Indeed, that is /exactly/ how the C standard describes the function.
That means the C standards acknowledge that people often want to print
out formatted text (which in no way implies plain ASCII) to stdout.
This does not mean they expect this to be the /only/ use of stdout, or
that people will not use binary outputs to stdout, any more than it
implies that text output will always be sent to stdout and not other
streams or files.
printf has no binary data format specifier.
And as you say, the fact
that it is provided is an acknowledgement that programmers often want to
pass formatted text to standard output.
The fact that there is no
similar function for standard error suggests that wanting to pass
formatted text to error is a less common requirement.
Which is my
experience for the sort of programming that I do. Similarly there is no function "write`" that passes binary data to standard output by default.
So this suggests that passing binary data to standard output is a less
common requirement. And in fact on many systems standard output will
corrupt such data by in default mode.
So these three things together - no binary data format specifer for
printf(), no binary equivalent function to printf that defaults to
standard output, and the fact that standard output will corrupt binary
data in default mode on some systems, adds up to a pretty powerful
argument for my position.
Do learn to think. I've given coherent, reasonable, justifications that are open to dispute on their own terms.
If I claim grass is pink, and I know this because it is the same
colour as the sea which is also pink, then I have given a
justification and a defence of my position. That doesn't mean it is
worth the pixels it is written with, or that anyone needs to elaborate
when they same I am talking nonsense.
It is so blindingly clear and obvious that stdout is regularly used
for non-text data, and so many undeniably accurate and common examples
have been given, that your position is entirely untenable.
That you are capable of inventing an
incoherent argument on a different topic proves nothing even by analogy except, to be fair, that it is plausible that people will make bad
arguments.
And apart from "that's how you have to do it to make a web server work
under Unix", I haven't seen much of anything in this sub thread which constitutes a good argument for passing binary data to standard output.
On 30/01/2024 15:54, Scott Lurndal wrote:
bart <bc@freeuk.com> writes:
On 30/01/2024 15:00, Scott Lurndal wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
I must admit that it's nothing I have ever done or considered doing. >>>>>
However standard output is designed for text and not binary ouput.
That was never the case. stdout is a unformatted stream of bytes
associated by default with file descriptor number one in the
application.
Long before windows was even a gleam in gates eye.
I don't know what Windows has to do with it.
The difference between text and binary byte streams is something
invented by C, so that conversions could be done for byte '\n' on
systems with alternate line-endings.
No, it was invented to support windows CRLF line endings.
You just want to have a go at Windows don't you?
I was using CRLF line-endings in 1970s, they weren't an invention of
Windows, which didn't exist until the mid-80s and didn't become popular
until the mid-90s.
So, how did C deal with CRLF in all those non-Windows settings?
Regardless of your digression, stdout is still an unformatted
stream of bytes. Any structure on that stream is imposed
by the -consumer- of those bytes.
Of course. But it still a bad idea to write actual output that you KNOW
does not represent text, to a consumer that will expect text.
For example to a terminal window, which can happen if you forget to
redirect it.
On 30/01/2024 16:06, Scott Lurndal wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:No. If we know that text is ASCII it is not highly structured.
On 30/01/2024 13:03, David Brown wrote:
On 30/01/2024 10:13, Malcolm McLean wrote:Speical facilities for text don't necessarily mean that text is the only >>> output intended to be used, fair enough.
There is no standard C library function that takes stderr as the default >>>> stream. Does that mean stderr was not designed to be used at all?
"printf" exists and works the way it does because it is convenient and >>>> useful. It can be viewed as a short-cut for "fprintf(stdout, ...".
Indeed, that is /exactly/ how the C standard describes the function.
That means the C standards acknowledge that people often want to print >>>> out formatted text (which in no way implies plain ASCII) to stdout. This >>>> does not mean they expect this to be the /only/ use of stdout, or that >>>> people will not use binary outputs to stdout, any more than it implies >>>> that text output will always be sent to stdout and not other streams or >>>> files.
Even text is just an unformatted stream of bytes. It is the ultimate
consumer of that text that imposed structure on it (e.g. by treating
it as ASCII, UTF-16, UTF-8, UTF-32, EBCDIC, et cetera, et alia, und so weiter)
On 30/01/2024 16:49, Malcolm McLean wrote:
The fact that there is no
similar function for standard error suggests that wanting to pass
formatted text to error is a less common requirement.
stderr is a newer invention than stdout and stdin.
On 30/01/2024 16:25, David Brown wrote:
On 30/01/2024 16:49, Malcolm McLean wrote:[stderr less used than stdout]
Which is my experience for the sort of programming that I do.
Yes. printf() could easily have been omitted and fprintf() onlySimilarly there is no function "write`" that passes binary data to
standard output by default.
What would that gain? One fewer parameters to fwrite() ?
provided.
You just want to have a go at Windows don't you?
I was using CRLF line-endings in 1970s, they weren't an invention of
Windows, which didn't exist until the mid-80s and didn't become
popular until the mid-90s.
CRLF line endings were the invention of printers or teletype machines.
It took time to move print heads from the end of one line to the
beginning of the next, and separating the "carriage return" and "line
feed" commands made timings easier. It also let printer implementers
handle the two operations independently - occasionally people would want
to do one but not the other.
The use of CRLF as a standard for line endings in files was, I believe,
from CP/M - which came after Unix and Multics, which had standardised on
LF line endings. (Most OS's before that made things up as they choose, rather than being "standard", or used record-based files, punched cards, etc.)
So CRLF precedes Windows quite significantly.
(I have no idea why Macs picked CR - perhaps they just wanted to be different.)
So, how did C deal with CRLF in all those non-Windows settings?
The difference between "text" and "binary" streams in C is, in practice,
up to the implementation. That can be the implementation of the C
library, or the OS functions (or DLLs/SOs) that the C library calls. The
norm is that you use "\n" for line endings in the C code - what happens
after that, for text streams, is beyond C.
The reason C distinguishes between text and binary streams is that some
OS's distinguish between them.
Regardless of your digression, stdout is still an unformatted
stream of bytes. Any structure on that stream is imposed
by the -consumer- of those bytes.
Of course. But it still a bad idea to write actual output that you
KNOW does not represent text, to a consumer that will expect text.
That's just a specific example of "it's a bad idea for a program to
behave in a way that a reasonable user would not expect". Which is, of course, true - but not a big surprise.
For example to a terminal window, which can happen if you forget to
redirect it.
If the program can reasonably be expected to generate binary output,
then it is the user's fault if they do this accidentally. Examples
shown in this thread include cat and zcat - that's what these programs do.
Sometimes people make mistakes, and try to "cat" (or "type") non-text files. Mistakes happen.
On 30/01/2024 16:49, David Brown wrote:
Sometimes people make mistakes, and try to "cat" (or "type") non-text
files. Mistakes happen.
If you routinely write pure binary data to stdout, then users are going
to see garbage a lot more often.
On 29/01/2024 21:00, Keith Thompson wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 16:18, David Brown wrote:
On 28/01/2024 20:49, Malcolm McLean wrote:Generally programmers are educated people and educated people use
On 28/01/2024 18:24, David Brown wrote:Why?
I'd expect that most general purpose programs written by Norwegians
use an English interface, even if it isn't really expected that the
program will find an audience beyond some users in Norway. Except
of course for programs which in some way are about Norway.
English for serious purposes. Not always of course and Norway might be
an exception. But I'd expect that in a Norweigian university, for
example, it would be forbidden to document a program in Norwegian or
to use non-English words for identifiers. And probably the same in a
large Norwegina company. I might be wrong about that and I have never
visited Norway or worked for a Norweigian employer (and obviously I
couldn't do so unless the policy I expect was followed).
You assert that "educated people use English for serious purposes".
I don't have the experience to refute that claim, but I suspect
it's arrogant nonsense. I could be wrong, of course.
Everything which is at all intellectually serious is these days written
in English. It's the new Latin. It's the language all educated people
use to communicate with each other when discussing scientific,
philosophical, or scholarly matters. And also technical matters to a
large extent.
There are a few exceptions but very few. I remember a discussion about whether you could get away with organising a scientific conference in
French, in France, and the conclusion was that you could not. Even in
France. However the French are very reluctant to concede, which is why
the discussion took place at all.
If a large Norwegian company allows programmers to document software in Norwegian, then it cannot employ non-Norwegian programmers to work on
it. So I would imagine that this would be forbidden, But I've never
actually worked for a Norwegian company and I don't actually know. David Brown, to be fair, does work for a Norwegian company so he might know
better. But he asks "why?" and I gave the reason.
David Brown <david.brown@hesbynett.no> writes:
On 30/01/2024 16:49, Malcolm McLean wrote:
The fact that there is no
similar function for standard error suggests that wanting to pass
formatted text to error is a less common requirement.
stderr is a newer invention than stdout and stdin.
c'est what?
C plays fast and loose with the char type. But you can't pass embedded
nuls. These are so common in binary data that in practis=ce you can't
use %s for binary data at all.
On 30/01/2024 16:49, David Brown wrote:
Elsethred [David Brown]
If the program can reasonably be expected to generate binary output,
then it is the user's fault if they do this accidentally. Examples
shown in this thread include cat and zcat - that's what these programs
do.
Sometimes people make mistakes, and try to "cat" (or "type") non-text
files. Mistakes happen.
I wonder if there is any *nix program older or simpler than "cat" - a program that simply passes its input files or the stdin to stdout.
On 30/01/2024 16:49, David Brown wrote:
You just want to have a go at Windows don't you?
I was using CRLF line-endings in 1970s, they weren't an invention of
Windows, which didn't exist until the mid-80s and didn't become
popular until the mid-90s.
CRLF line endings were the invention of printers or teletype machines.
It took time to move print heads from the end of one line to the
beginning of the next, and separating the "carriage return" and "line
feed" commands made timings easier. It also let printer implementers
handle the two operations independently - occasionally people would
want to do one but not the other.
The use of CRLF as a standard for line endings in files was, I
believe, from CP/M - which came after Unix and Multics, which had
standardised on LF line endings. (Most OS's before that made things
up as they choose, rather than being "standard", or used record-based
files, punched cards, etc.)
So CRLF precedes Windows quite significantly.
(I have no idea why Macs picked CR - perhaps they just wanted to be
different.)
So, how did C deal with CRLF in all those non-Windows settings?
The difference between "text" and "binary" streams in C is, in
practice, up to the implementation. That can be the implementation of
the C library, or the OS functions (or DLLs/SOs) that the C library
calls. The norm is that you use "\n" for line endings in the C code -
what happens after that, for text streams, is beyond C.
The reason C distinguishes between text and binary streams is that
some OS's distinguish between them.
Regardless of your digression, stdout is still an unformatted
stream of bytes. Any structure on that stream is imposed
by the -consumer- of those bytes.
Of course. But it still a bad idea to write actual output that you
KNOW does not represent text, to a consumer that will expect text.
That's just a specific example of "it's a bad idea for a program to
behave in a way that a reasonable user would not expect". Which is,
of course, true - but not a big surprise.
For example to a terminal window, which can happen if you forget to
redirect it.
If the program can reasonably be expected to generate binary output,
then it is the user's fault if they do this accidentally. Examples
shown in this thread include cat and zcat - that's what these programs
do.
Sometimes people make mistakes, and try to "cat" (or "type") non-text
files. Mistakes happen.
If you routinely write pure binary data to stdout, then users are going
to see garbage a lot more often.
I gave an example earlier when displaying a binary file with 'type' was better-behaved than with 'cat', since 'type' stops at the first 1A byte.
I used this in my binary formats by adding 1A after the signature, so
you if you attempted to type it out, it wouldn't go mad. Here's another example:
c:\sc>type tree.scd
SCD
(.scd is a binary file containing CAD drawing data.)
If I again do that with 'cat' under WSL, it's goes even crazier. In
starts to try and interpret of the output as commands (with what
program, I don't know), with lots of Bell sounds, and I can't get back
to the WSL prompt.
It's just very, very sloppy.
On 30/01/2024 17:55, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 30/01/2024 16:49, Malcolm McLean wrote:
The fact that there is no
similar function for standard error suggests that wanting to pass
formatted text to error is a less common requirement.
stderr is a newer invention than stdout and stdin.
c'est what?
According to Wikipedia (it's not infallible, but it knows better than me here) :
"""
Standard error was added to Unix in the 1970s after several wasted phototypesetting runs ended with error messages being typeset instead of displayed on the user's terminal.[4]
"""
<https://web.archive.org/web/20200925010614/https://minnie.tuhs.org/pipermail/tuhs/2013-December/006113.html>
"""
One of the most amusing and unexpected consequences of phototypesetting
was the Unix standard error file (!). After phototypesetting, you had to take a long wide strip of paper and feed it carefully into a smelly, icky machine which eventually (several minutes later) spat out the paper with
the printing visible.
One afternoon several of us had the same experience -- typesetting
something, feeding the paper through the developer, only to find a single, beautifully typeset line: "cannot open file foobar" The grumbles were loud enough and in the presence of the right people, and a couple of days later the standard error file was born...
"""
stdout and stdin were apparently available in FORTRAN in the 1950's.
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 30/01/2024 16:25, David Brown wrote:
On 30/01/2024 16:49, Malcolm McLean wrote:[stderr less used than stdout]
Which is my experience for the sort of programming that I do.
Yes. printf() could easily have been omitted and fprintf() onlySimilarly there is no function "write`" that passes binary data to
standard output by default.
What would that gain? One fewer parameters to fwrite() ?
provided.
IIRC, printf() existed even before fprintf was invented and
it was used by a whole lot of code when the C standardization
efforts began.
If the program can reasonably be expected to generate binary output,
then it is the user's fault if they do this accidentally. Examples
shown in this thread include cat and zcat - that's what these programs do.
On 30/01/2024 17:55, Scott Lurndal wrote:
David Brown <david.brown@hesbynett.no> writes:
On 30/01/2024 16:49, Malcolm McLean wrote:
The fact that there is no
similar function for standard error suggests that wanting to pass
formatted text to error is a less common requirement.
stderr is a newer invention than stdout and stdin.
c'est what?
According to Wikipedia (it's not infallible, but it knows better than me >here) :
"""
Standard error was added to Unix in the 1970s after several wasted >phototypesetting runs ended with error messages being typeset instead of >displayed on the user's terminal.[4]
The use of CRLF as a standard for line endings in files was, I believe,
from CP/M ...
VMS (now OpenVMS) was also a significant system at the time, and it had
some rather complex file formats ...
Linus Torvald's native language is Finnish, for example.
Most systems run Windows where the model of piping from standard output
to standard input of the next program is much less used than in Unix,
this is true. That sometimes generates a feeling of superiority amongst
those who use the less common, often more expensive Unix systems. It's
very silly, but that's how people think.
However standard output is designed for text and not binary ouput.
Whilst there is a "printf()" which operates on standard output by
default, there are no functions which write binary data to standard
outout by default, for example.
stdout and stdin were apparently available in FORTRAN in the 1950's.
Nobody uses printf to output binary data.
Mixing binary data with formatted text data is very unlikely to be
useful.
On Tue, 30 Jan 2024 19:39:24 +0000, Richard Harnden wrote:
Nobody uses printf to output binary data.
Do terminal-control escape sequences count?
On Tue, 30 Jan 2024 21:04:56 +0000, Malcolm McLean wrote:
Linus Torvald's native language is Finnish, for example.
No, it would be Swedish. He’s an ethnic Swede, from Finland.
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might
have unusual effects if passed though systems designed to handle
human-readable text. For instance in some systems designed to receive
ASCII text, there is no distinction between the nul byte and "waiting
for next data byte". Obviously this will cause difficuties if the data
is binary.
Also many binary formats can't easily be extended, so you can pass one
image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to
extend.
Your reasoning is all gobbledygook. Your comments reflect only
limitations in your thinking, not any essential truth about using
standard out for binary data.
I must admit that it's nothing I have ever done or considered doing.
[...]
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it
might have unusual effects if passed though systems designed to
handle human-readable text. For instance in some systems
designed to receive ASCII text, there is no distinction between
the nul byte and "waiting for next data byte". Obviously this
will cause difficuties if the data is binary.
Also many binary formats can't easily be extended, so you can
pass one image and that's all. While it is possible to devise a
text format which is similar, in practice text formats usually
have enough redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and
harder to extend.
Your reasoning is all gobbledygook. Your comments reflect only
limitations in your thinking, not any essential truth about using
standard out for binary data.
I must admit that it's nothing I have ever done or considered doing.
[...]
Simple example (disclaimer: not tested):
ssh foo 'cd blah ; tar -cf - . | gzip -c' | \
(mkdir foo.blah ; cd foo.blah ; gunzip -c | tar -xf -)
Of the five main programs in this command, four are using
standard out to send binary data:
tar -cf - .
gzip -c
ssh foo [...]
gunzip -c
The tar -xf - at the end reads binary data on standard in
but doesn't output any (or anything else for that matter).
It is FAR more cumbersome to accomplish what this command
is doing without sending binary data through standard out.
Anyone who doesn't understand this doesn't understand Unix.
On 31/01/2024 07:04, Lawrence D'Oliveiro wrote:
On Tue, 30 Jan 2024 21:04:56 +0000, Malcolm McLean wrote:
Linus Torvald's native language is Finnish, for example.
No, it would be Swedish. He’s an ethnic Swede, from Finland.
He is Finnish, but has Swedish as his mother tongue (like about 5% of Finns). Speaking Swedish as your main language does not make you
ethically Swedish.
As a university-educated Finn, brought up in
Helsinki, he will also speak Finnish quite fluently.
On 1/30/24 11:49, David Brown wrote:
...
If the program can reasonably be expected to generate binary output,
then it is the user's fault if they do this accidentally. Examples
shown in this thread include cat and zcat - that's what these programs do.
? There's no problem using cat to concatenate binary files. I've used
'split' to split binary files into smaller pieces, and then used 'cat'
to recombine them, and it worked fine. I don't remember why, but I had
to transfer the files from one place to another by a method that imposed
an upper limit on the size of individual files.
On Tue, 30 Jan 2024 11:50:35 -0800, Keith Thompson wrote:
VMS (now OpenVMS) was also a significant system at the time, and it had
some rather complex file formats ...
[...]
Apparently Linus Torvalds used VMS for a while, and hated it.
On 31.01.2024 07:02, Lawrence D'Oliveiro wrote:
On Tue, 30 Jan 2024 11:50:35 -0800, Keith Thompson wrote:
VMS (now OpenVMS) was also a significant system at the time, and
it had some rather complex file formats ...
[...]
Apparently Linus Torvalds used VMS for a while, and hated it.
I don't understand the intention of this comment.
VMS and Torvalds are completely different eras.
And were is the relation?
(Or just meant as anecdotal trivia?)
Janis
On 30/01/2024 19:22, bart wrote:
[...]
If you have a program like that, then it probably makes sense to have a
flag to say "output the data to stdout" and the default being writing to
a file.
On Wed, 31 Jan 2024 12:43:04 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 31.01.2024 07:02, Lawrence D'Oliveiro wrote:
On Tue, 30 Jan 2024 11:50:35 -0800, Keith Thompson wrote:
VMS (now OpenVMS) was also a significant system at the time, and
it had some rather complex file formats ...
[...]
Apparently Linus Torvalds used VMS for a while, and hated it.
I don't understand the intention of this comment.
VMS and Torvalds are completely different eras.
And were is the relation?
Linus is older than you probably realize.
He entered the University of
Helsinki in 1988. Back then VMS was only slightly behind its peak of popularity.
By value, likely still bigger than all Unixen combined.
On 31/01/2024 10:43, Michael S wrote:
Frankly, Unix redirection racket looks like something hacked together
rather than designed as result of the solid thinking process.
As long as there were only standard input and output it was sort of
logical. But when they figured out that it is insufficient, they had
chosen a quick hack instead of constructing a solution that wouldn't
offend engineering senses of any non-preconditioned observer.
It was designed for very memory constrained systems which handled text
on a line by line basis. So one line of a long file wuld be processed
and passed down the pipeline, and you wouldn't need temporary disk files
or large amounts of memory. I'm sure it worked quite well for that.
Linus Torvald's native language is Finnish, for example. But git was
released in English. There might be Finnish language bindings for it
now, but I'm pretty sure not in the original version. Similarly Bjarne Strousup is Swedish, but C++ uses keywords like "class" and "friend",
not the Swedish terms.
On 31.01.2024 12:58, Michael S wrote:
On Wed, 31 Jan 2024 12:43:04 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 31.01.2024 07:02, Lawrence D'Oliveiro wrote:
On Tue, 30 Jan 2024 11:50:35 -0800, Keith Thompson wrote:
VMS (now OpenVMS) was also a significant system at the time, and
it had some rather complex file formats ...
[...]
Apparently Linus Torvalds used VMS for a while, and hated it.
I don't understand the intention of this comment.
VMS and Torvalds are completely different eras.
And were is the relation?
Linus is older than you probably realize.
Why do you think that I'd be thinking that?
I know that he's quite some years younger than I am. So what?
He entered the University of
Helsinki in 1988. Back then VMS was only slightly behind its peak of popularity.
What? - I'm not sure where you're coming from.
I associate DEC's VMS with the old DEC VAX-11 system, both
from around the mid of the 1970's.
I programmed on a DEC's
VAX with VMS obviously before Linus Torvalds started his
studies. And that was at a time when the DEC VAX and VMS
were replaced at our sites by Unix systems.
By value, likely still bigger than all Unixen combined.
Not sure what (to me strange sounding) ideas you have here.
Janis
On 30/01/2024 20:06, Keith Thompson wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:Things tend to trickle down.
On 29/01/2024 21:00, Keith Thompson wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 16:18, David Brown wrote:You assert that "educated people use English for serious purposes".
On 28/01/2024 20:49, Malcolm McLean wrote:Generally programmers are educated people and educated people use
On 28/01/2024 18:24, David Brown wrote:Why?
I'd expect that most general purpose programs written by Norwegians >>>>>>> use an English interface, even if it isn't really expected that the >>>>>>> program will find an audience beyond some users in Norway. Except >>>>>>> of course for programs which in some way are about Norway.
English for serious purposes. Not always of course and Norway might be >>>>> an exception. But I'd expect that in a Norweigian university, for
example, it would be forbidden to document a program in Norwegian or >>>>> to use non-English words for identifiers. And probably the same in a >>>>> large Norwegina company. I might be wrong about that and I have never >>>>> visited Norway or worked for a Norweigian employer (and obviously I
couldn't do so unless the policy I expect was followed).
I don't have the experience to refute that claim, but I suspect
it's arrogant nonsense. I could be wrong, of course.
Everything which is at all intellectually serious is these days
written in English. It's the new Latin. It's the language all educated
people use to communicate with each other when discussing scientific,
philosophical, or scholarly matters. And also technical matters to a
large extent.
Even if that's true, you assertion was about user interfaces.
Are you under the impression that mobile phones show their messages only
in English because all their users are scholars?
Teenagers from non-English speaking countries hum along to pop songs in English.
And whilst programmers aren't usually scholars, if they are C
programmers they will use a programming language with keywords based on English.
And you will likley get quite a bit of English coming through the mobile phone.
Linus Torvald's native language is Finnish, for example.
But git was
released in English.
There might be Finnish language bindings for it
now, but I'm pretty sure not in the original version. Similarly Bjarne Strousup is Swedish, but C++ uses keywords like "class" and "friend",
not the Swedish terms.
Now I think that would rub off on Norwegian programmers. It would be surprising of it did not.
Are you aware of the existence of medical devices (my current $DAYJOB)Some software is internationalised. But it takes quite a lot of
that can be configured to display messages in any of a number of
languages?
resources to translate software. With medical device software the
software is likely so expensive to develop anyway because of all the
safety critical portions that the cost is tolerable.
Our software has
purely English user interfaces. It was something we looked at, but it
would have been expensive and made the code base harder to manange, and
the users said that the benefit to them was marginal as they spoke
enough English to understand a few simple GUI labels. I think our
experience is more typical, but some people will no doubt make out that
it is narrow and parochial.
[...]
Correct, you don't actually know. Why doesn't that prevent you from
making assertions rather than asking questions, so that you can learn
something from people who know more than you do?
I'm a qualified scientist, amongst other things.
In science, the things
that you know are usually either quite basic and covered in the first
degree, or they are not terribly interesting.
What matters is what you
don't actually know, but believe to be the case, based on sound evidence
and reasoning.
And I believe it to be the case that English is used very
widely in Norway. And in fact, if David Brown, who is in a position to
know, asserts this not to be the case, I'd put it down to his
contentious nature and tend to dismiss it. Now of course I could have
misled myself. But I doubt it.
On Tue, 30 Jan 2024 17:25:31 +0100, David Brown wrote:
Mixing binary data with formatted text data is very unlikely to be
useful.
PDF does exactly that. To the point where the spec suggests putting some random unprintable bytes up front, to distract format sniffers from
thinking they’re looking at a text file.
On Tue, 30 Jan 2024 15:21:01 +0000, Malcolm McLean wrote:
Most systems run Windows where the model of piping from standard
output to standard input of the next program is much less used than
in Unix, this is true. That sometimes generates a feeling of
superiority amongst those who use the less common, often more
expensive Unix systems. It's very silly, but that's how people
think.
Also we can do select/poll on pipes on *nix systems, you can’t on
Windows.
On Wed, 31 Jan 2024 08:59:22 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 31/01/2024 07:04, Lawrence D'Oliveiro wrote:
On Tue, 30 Jan 2024 21:04:56 +0000, Malcolm McLean wrote:
Linus Torvald's native language is Finnish, for example.
No, it would be Swedish. He’s an ethnic Swede, from Finland.
He is Finnish, but has Swedish as his mother tongue (like about 5% of
Finns). Speaking Swedish as your main language does not make you
ethically Swedish.
Linus has Swedish as his mother tongue.
Linus has Swedish family name. Or at least Scandinavian, for me it
sounds more Danish than Swedish, but I am not an expert. It certainly
does not sound Finnish.
When Linus was younger, he used to like to tell stereotypical jokes
about Finns.
When it quacks like a duck...
As a university-educated Finn, brought up in
Helsinki, he will also speak Finnish quite fluently.
That's true.
But it is correct that English has become the main language for
international communication, and is therefore critical for anything that involves cross-border communication, or where there are significant
numbers of foreign workers. That includes academic work. Different
parts of Europe previously used German or Russian for this,
[...]
Nobody uses printf to output binary data. fwrite(3) would be common, as
would write(2).
Maybe you could use printf("%c%c%c" ... but it'd be beyond tedious.
On 30.01.2024 20:46, David Brown wrote:
On 30/01/2024 19:22, bart wrote:
[...]
If you have a program like that, then it probably makes sense to have a
flag to say "output the data to stdout" and the default being writing to
a file.
Did you here mean to say "output the data to the terminal"?
(I noticed
that a lot of the posts here have a misconception about what 'stdout'
is; they seem to use it synonymously to "terminal" or "screen/display".
But you are not guaranteed that stdout will produce screen output; it
depends on the environment. Being more accurate with the distinction
might help prevent misconceptions if replying to these people.)
More to your point you wrote, I don't think this would be a good design
as you've written it. A default would imply the necessity of some fixed
name (or naming schema) - think of the disputable "a.out" default.
On 1/30/24 11:49, David Brown wrote:
...
If the program can reasonably be expected to generate binary output,
then it is the user's fault if they do this accidentally. Examples
shown in this thread include cat and zcat - that's what these programs do.
? There's no problem using cat to concatenate binary files. I've used
'split' to split binary files into smaller pieces, and then used 'cat'
to recombine them, and it worked fine. I don't remember why, but I had
to transfer the files from one place to another by a method that imposed
an upper limit on the size of individual files.
I would expect that the majority of uses of "cat" are with just one
file,
but certainly it is useful when you want to combine files in
different ways.
On 31/01/2024 12:22, Janis Papanagnou wrote:
On 30.01.2024 20:46, David Brown wrote:
On 30/01/2024 19:22, bart wrote:
[...]
If you have a program like that, then it probably makes sense to have a
flag to say "output the data to stdout" and the default being writing to >>> a file.
Did you here mean to say "output the data to the terminal"?
No.
I said that if you have a program that sometimes gives binary output on stdout, and sometimes gives text messages,
and this leads people to have
a significant chance of accidentally dumping binary output to their
terminal, [...]
On 31/01/2024 07:18, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might >>>>> have unusual effects if passed though systems designed to handle
human-readable text. For instance in some systems designed to receive >>>>> ASCII text, there is no distinction between the nul byte and "waiting >>>>> for next data byte". Obviously this will cause difficuties if the
data
is binary.
Also many binary formats can't easily be extended, so you can pass one >>>>> image and that's all. While it is possible to devise a text format >>>>> which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to >>>>> extend.
Your reasoning is all gobbledygook. Your comments reflect only
limitations in your thinking, not any essential truth about using
standard out for binary data.
I must admit that it's nothing I have ever done or considered doing.
[...]
Simple example (disclaimer: not tested):
ssh foo 'cd blah ; tar -cf - . | gzip -c' | \
(mkdir foo.blah ; cd foo.blah ; gunzip -c | tar -xf -)
Of the five main programs in this command, four are using
standard out to send binary data:
tar -cf - .
gzip -c
ssh foo [...]
gunzip -c
The tar -xf - at the end reads binary data on standard in
but doesn't output any (or anything else for that matter).
It is FAR more cumbersome to accomplish what this command
is doing without sending binary data through standard out.
Anyone who doesn't understand this doesn't understand Unix.
Yes. I don't do that sort of thing.
Whilst I have used Unix, it is as a platform for interactive programs
which work on graohics, or a general C compilation environment. I don;t
build pipeliens to do that sort of data processing. If I had to download
a tar file I'd either use a graphical tool or type serveal commands into
the shell, each launching single executable, interactively.
The reason is that I'd only run the command once, and it's so likely
that there will be either a syntax misunderstanding or a typing error
that I'd have to test to ensure that it was right. And by the time
you've done that any time saved by typing only one commandline is lost.
Of course if you are writing scripts then that doesn't apply. But now
it's effectively a programming language, and, from the example code, a
very poorly designed one which is cryptic and fussy and liable to be
hard to maintain. So it's better to use a language like Perl to achieve
the same thing, and I did have a few Perl scripts handy for repetitive
jobs of that nature in my Unix days.
You admit this with "not tested". Says it all. '"Understandig Unix" is
an intellectually useless achievement. You might have to do it if you
have to use the system and debug and trouble shoot. But it's nothing to
be proud about.
On 31/01/2024 10:43, Michael S wrote:
On Tue, 30 Jan 2024 23:18:21 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it
might have unusual effects if passed though systems designed to
handle human-readable text. For instance in some systems
designed to receive ASCII text, there is no distinction between
the nul byte and "waiting for next data byte". Obviously this
will cause difficuties if the data is binary.
Also many binary formats can't easily be extended, so you can
pass one image and that's all. While it is possible to devise a
text format which is similar, in practice text formats usually
have enough redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and
harder to extend.
Your reasoning is all gobbledygook. Your comments reflect only
limitations in your thinking, not any essential truth about using
standard out for binary data.
I must admit that it's nothing I have ever done or considered
doing.
[...]
Simple example (disclaimer: not tested):
ssh foo 'cd blah ; tar -cf - . | gzip -c' | \
(mkdir foo.blah ; cd foo.blah ; gunzip -c | tar -xf -)
Of the five main programs in this command, four are using
standard out to send binary data:
tar -cf - .
gzip -c
ssh foo [...]
gunzip -c
The tar -xf - at the end reads binary data on standard in
but doesn't output any (or anything else for that matter).
It is FAR more cumbersome to accomplish what this command
is doing without sending binary data through standard out.
If I am not mistaken, tar, gzip and gunzip do not write binary data
to standard output by default. They should be specifically told to
do so. For ssh I don't know. Anyway, ssh is not a "normal" program
so it's not surprising when textuality of ssh output is the same as textuality of the command it carries.
Anyone who doesn't understand this doesn't understand Unix.
Frankly, Unix redirection racket looks like something hackedIt was designed for very memory constrained systems which handled
together rather than designed as result of the solid thinking
process. As long as there were only standard input and output it
was sort of logical. But when they figured out that it is
insufficient, they had chosen a quick hack instead of constructing
a solution that wouldn't offend engineering senses of any non-preconditioned observer.
text on a line by line basis. So one line of a long file wuld be
processed and passed down the pipeline, and you wouldn't need
temporary disk files or large amounts of memory. I'm sure it worked
quite well for that.
On Tue, 30 Jan 2024 17:49:57 +0100, David Brown wrote:
The use of CRLF as a standard for line endings in files was, I believe,
from CP/M ...
Which I think copied it from DEC minicomputer systems.
Fun fact: on some of those DEC systems (which I used when they were still >being made), you could end a line with CR-LF, or LF-CR-NUL.
What was the NUL for? Padding. Why did it need padding? (This was before
CRT terminals.)
On 31.01.2024 12:58, Michael S wrote:
On Wed, 31 Jan 2024 12:43:04 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 31.01.2024 07:02, Lawrence D'Oliveiro wrote:
On Tue, 30 Jan 2024 11:50:35 -0800, Keith Thompson wrote:
VMS (now OpenVMS) was also a significant system at the time, and
it had some rather complex file formats ...
[...]
Apparently Linus Torvalds used VMS for a while, and hated it.
I don't understand the intention of this comment.
VMS and Torvalds are completely different eras.
And were is the relation?
Linus is older than you probably realize.
Why do you think that I'd be thinking that?
I know that he's quite some years younger than I am. So what?
He entered the University of
Helsinki in 1988. Back then VMS was only slightly behind its peak of
popularity.
What? - I'm not sure where you're coming from.
I associate DEC's VMS with the old DEC VAX-11 system, both
from around the mid of the 1970's.
On Tue, 30 Jan 2024 19:39:24 +0000, Richard Harnden wrote:
Nobody uses printf to output binary data.
Do terminal-control escape sequences count?
On Tue, 30 Jan 2024 11:50:35 -0800, Keith Thompson wrote:
VMS (now OpenVMS) was also a significant system at the time, and it had
some rather complex file formats ...
It relied on extra file metadata called “record attributes” in order to >make sense of the file format. It was quite common to transfer files from >other systems, and have them not be readable until you had set appropriate >record attributes on them. Picky, picky, I know.
On 31/01/2024 07:07, Lawrence D'Oliveiro wrote:
On Tue, 30 Jan 2024 17:25:31 +0100, David Brown wrote:
Mixing binary data with formatted text data is very unlikely to be
useful.
PDF does exactly that. To the point where the spec suggests putting some
random unprintable bytes up front, to distract format sniffers from
thinking they’re looking at a text file.
PDF files start with the "magic" indicator "%PDF", which is enough for
many programs to identify them correctly. And they are usually
compressed so that the content text is not directly readable or
identifiable as strings. If they are not compressed, then yes, there is
can be text mixed in with everything else. But I would not call that
"mixing binary data and formatted text" - I would just say that some of
the binary data happens to be strings. It's the same as elf files
containing copies of strings from the program, or identifiers for
external linking.
However, I learned a new trick when checking that I was not mistaken
about this - it turns out that "less file.pdf" gives a nice text-only
output from the pdf file (by passing it through "lesspipe"). There's
always something new to learn from inane conversations on Usenet :-)
On 31/01/2024 09:36, Malcolm McLean wrote:
The reason is that I'd only run the command once, and it's so likely
that there will be either a syntax misunderstanding or a typing error
that I'd have to test to ensure that it was right. And by the time
you've done that any time saved by typing only one commandline is lost.
Of course if you are writing scripts then that doesn't apply. But now
it's effectively a programming language, and, from the example code, a
very poorly designed one which is cryptic and fussy and liable to be
hard to maintain. So it's better to use a language like Perl to achieve
the same thing, and I did have a few Perl scripts handy for repetitive
jobs of that nature in my Unix days.
That gave me a laugh! You think bash is cryptic, fussy and poorly
designed, and choose /Perl/ as the alternative :-)
You admit this with "not tested". Says it all. '"Understandig Unix" is
an intellectually useless achievement. You might have to do it if you
have to use the system and debug and trouble shoot. But it's nothing to
be proud about.
It is "useless" for people who don't use it. For people who /do/ use
it, is very useful.
I've used sequences like Tim's - it's a way to copy data remotely from a >different machine. I would likely write it slightly differently - I'd >probably do the mkdir and cd first, thus avoiding the need for a
subshell, and I'd use "ssh -C" or "tar -z" to do the compression rather
than "gzip".
There's no doubt that the learning curve is longer for doing this sort
of thing from the command line than using gui programs. There is also
no doubt that when you are used to it, command line utilities and a good >shell are very flexible and efficient.
Learn to use the tools that are conveniently available, and then pick
the right tool for the job - whether it is command line or gui.
On Tue, 30 Jan 2024 23:18:21 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Anyone who doesn't understand this doesn't understand Unix.
Frankly, Unix redirection racket looks like something hacked together
rather than designed as result of the solid thinking process.
As long as there were only standard input and output it was sort of
logical. But when they figured out that it is insufficient, they had
chosen a quick hack instead of constructing a solution that wouldn't
offend engineering senses of any non-preconditioned observer.
On 31/01/2024 09:36, Malcolm McLean wrote:
[ I snipped a couple of "I actually don't know/need it" things ]
But now it's effectively a programming language, and, from the example
code, a very poorly designed one which is cryptic and fussy and liable
to be hard to maintain. So it's better to use a language like Perl to
achieve the same thing, and I did have a few Perl scripts handy for
repetitive jobs of that nature in my Unix days.
That gave me a laugh! You think bash is cryptic, fussy and poorly
designed, and choose /Perl/ as the alternative :-)
There's no doubt that the learning curve is longer for doing this sort
of thing from the command line than using gui programs. There is also
no doubt that when you are used to it, command line utilities and a good shell are very flexible and efficient.
Learn to use the tools that are conveniently available, and then pick
the right tool for the job - whether it is command line or gui.
On 30/01/2024 22:16, James Kuyper wrote:
On 1/30/24 11:49, David Brown wrote:
...
If the program can reasonably be expected to generate binary output,? There's no problem using cat to concatenate binary files. I've used
then it is the user's fault if they do this accidentally. Examples
shown in this thread include cat and zcat - that's what these programs do. >>
'split' to split binary files into smaller pieces, and then used 'cat'
to recombine them, and it worked fine. I don't remember why, but I had
to transfer the files from one place to another by a method that imposed
an upper limit on the size of individual files.
I think there's a misunderstanding here - I gave "cat" is an example of
a program that /can/ be expected to produce binary output. (It can also >produce text output - you get what you put in.) So it is the user's
fault if the type "cat /bin/cat" and are surprised by a mess in their >terminal.
Michael S <already5chosen@yahoo.com> writes:
[...]
You mean like
exec 3< /path/to/input/file
read -u3 line_from_input file
How does that offend your engineering senses?
Michael S <already5chosen@yahoo.com> writes:
On Tue, 30 Jan 2024 23:18:21 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Anyone who doesn't understand this doesn't understand Unix.
Frankly, Unix redirection racket looks like something hacked together >rather than designed as result of the solid thinking process.
It seems you don't understand Unix.
As long as there were only standard input and output it was sort of >logical. But when they figured out that it is insufficient, they had
chosen a quick hack instead of constructing a solution that wouldn't
offend engineering senses of any non-preconditioned observer.
You mean like
exec 3< /path/to/input/file
read -u3 line_from_input file
How does that offend your engineering senses?
On Wed, 31 Jan 2024 15:25:00 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
You mean like
exec 3< /path/to/input/file
read -u3 line_from_input file
How does that offend your engineering senses?
That was not in 2-3 books that I had read. I can't say that I understand
what is going on, what environment we are and whether what you show is >generic or specific to 'exec' and 'read'.
On Wed, 31 Jan 2024 12:15:23 +0000
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
It was designed for very memory constrained systems which handled
text on a line by line basis. So one line of a long file wuld be
processed and passed down the pipeline, and you wouldn't need
temporary disk files or large amounts of memory. I'm sure it worked
quite well for that.
A concept of pipes is fine. I was not talking about that side.
My objection is with each program having exactly 1 special input and
exactly 2 special outputs. Instead of having, say, up to 5 of each,
fully interchangeable with the first of the five being special only in
that that it is a default and as such allows for shorter syntax in the
shell.
On 31.01.2024 15:21, David Brown wrote:
On 31/01/2024 09:36, Malcolm McLean wrote:
[ I snipped a couple of "I actually don't know/need it" things ]
But now it's effectively a programming language, and, from the example
code, a very poorly designed one which is cryptic and fussy and liable
to be hard to maintain. So it's better to use a language like Perl to
achieve the same thing, and I did have a few Perl scripts handy for
repetitive jobs of that nature in my Unix days.
That gave me a laugh! You think bash is cryptic, fussy and poorly
designed, and choose /Perl/ as the alternative :-)
I don't think it's that clear a joke. The Unix shell is extremely
error prone to program, and you should not let a newbie write shell
programs without careful supervision.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 31 Jan 2024 12:15:23 +0000
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
It was designed for very memory constrained systems which handled
text on a line by line basis. So one line of a long file wuld be
processed and passed down the pipeline, and you wouldn't need
temporary disk files or large amounts of memory. I'm sure it worked
quite well for that.
A concept of pipes is fine. I was not talking about that side.
My objection is with each program having exactly 1 special input and >exactly 2 special outputs. Instead of having, say, up to 5 of each,
fully interchangeable with the first of the five being special only
in that that it is a default and as such allows for shorter syntax
in the shell.
Each program has 1024 (on my system - it's configurable on a
per-process basis) fully interchangable "inputs" and "outputs" (also
known as files).
$ application 5> /tmp/file5
will redirect file descriptor five to the specified file.
There's nothing special about stdin, stdout or stderr other than
that they are tags applied to the first three file descriptors.
There is a convention the that the first file descriptor
is used for input, the second for output and the third
for diagnostic output. But it's just a convention
On Wed, 31 Jan 2024 15:25:00 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
You mean like
exec 3< /path/to/input/file
read -u3 line_from_input file
How does that offend your engineering senses?
That was not in 2-3 books that I had read. I can't say that I understand
what is going on, what environment we are and whether what you show is generic or specific to 'exec' and 'read'.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 31.01.2024 15:21, David Brown wrote:
On 31/01/2024 09:36, Malcolm McLean wrote:
[ I snipped a couple of "I actually don't know/need it" things ]
But now it's effectively a programming language, and, from the example >>>> code, a very poorly designed one which is cryptic and fussy and liable >>>> to be hard to maintain. So it's better to use a language like Perl to
achieve the same thing, and I did have a few Perl scripts handy for
repetitive jobs of that nature in my Unix days.
That gave me a laugh! You think bash is cryptic, fussy and poorly
designed, and choose /Perl/ as the alternative :-)
I don't think it's that clear a joke. The Unix shell is extremely
error prone to program, and you should not let a newbie write shell
programs without careful supervision.
Nonsense.
I associate DEC's VMS with the old DEC VAX-11 system, both
from around the mid of the 1970's. I programmed on a DEC's
VAX with VMS obviously before Linus Torvalds started his
studies. And that was at a time when the DEC VAX and VMS
were replaced at our sites by Unix systems.
The big advantage of non-GUI is for process automation. With GUI
oriented applications you can mainly only interactively (=slow and cumbersome) do what it provides. Rarely GUI applications support a
scripting interface, and if so it's then typically some proprietary non-standard language.
On 31.01.2024 16:42, Michael S wrote:
On Wed, 31 Jan 2024 15:25:00 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
You mean like
exec 3< /path/to/input/file
read -u3 line_from_input file
How does that offend your engineering senses?
That was not in 2-3 books that I had read. I can't say that I
understand what is going on, what environment we are and whether
what you show is generic or specific to 'exec' and 'read'.
'-u' is obviously an option of read. Various shells support it; at
least ksh since 30 years, and bash meanwhile as well. But it's not in
POSIX.
Other redirections are standard, and these should certainly be known
by anyone who had visited a course and read any book on the Unix
shell. The syntax is not difficult, follows rules, and certainly not arbitrary.
The one in above code is assigning the file descriptor 3 to the given
file for reading. You can let a FD point to the channel another one is pointing to, like in the "well known" '2>&1' (where stderr is
connected to the same channel than stdin currently points to).
Similar you can in above example use the standard form 'read line_from_input_file <&3', which may certainly appear more cryptic
than an option '-u3', but it's essential to any shell programmer.
Janis
On Wed, 31 Jan 2024 16:25:30 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
The big advantage of non-GUI is for process automation. With GUI
oriented applications you can mainly only interactively (=slow and cumbersome) do what it provides. Rarely GUI applications support a scripting interface, and if so it's then typically some proprietary non-standard language.
I'd take almost any proprietary non-standard GUI macro language over non-proprietary non-standard tcl. They say, Lua is better. I never had motivation to look at it more closely.
My objection is with each program having exactly 1 special input and
exactly 2 special outputs. Instead of having, say, up to 5 of each,
fully interchangeable with the first of the five being special only in
that that it is a default and as such allows for shorter syntax in the
shell.
On Wed, 31 Jan 2024 17:05:23 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
The books were talking about Bourne shell and C shell. They acknowledged
an existence of ksh, but didn't go into details. I don't remember if
bash was mentioned at all.
Of course, in practice in this century I used bash almost exclusively,
but never learned it formally, by book, from start to finish.
The same as over 90% of bash users, I'd guess.
I did understand '3<' by association with '2>' that was in the book,
but more importantly, is something I use regularly.
However I had never seen '3<' in the books.
On Wed, 31 Jan 2024 15:49:10 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 31 Jan 2024 12:15:23 +0000
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
It was designed for very memory constrained systems which handled
text on a line by line basis. So one line of a long file wuld be
processed and passed down the pipeline, and you wouldn't need
temporary disk files or large amounts of memory. I'm sure it worked
quite well for that.
A concept of pipes is fine. I was not talking about that side.
My objection is with each program having exactly 1 special input and
exactly 2 special outputs. Instead of having, say, up to 5 of each,
fully interchangeable with the first of the five being special only
in that that it is a default and as such allows for shorter syntax
in the shell.
Each program has 1024 (on my system - it's configurable on a
per-process basis) fully interchangable "inputs" and "outputs" (also
known as files).
$ application 5> /tmp/file5
will redirect file descriptor five to the specified file.
There's nothing special about stdin, stdout or stderr other than
that they are tags applied to the first three file descriptors.
There is a convention the that the first file descriptor
is used for input, the second for output and the third
for diagnostic output. But it's just a convention
I don't understand.
Are not descriptors 0,1 and 2 special in that that they are already
open (I don't know if by OS or by shell) when the program starts and the
rest of them, if ever used, have to be opened by the program code?
On only remotely related note, what happens on your system when you
want more than 1024 files to be open by one program simultaneously?
scott@slp53.sl.home (Scott Lurndal) writes:
[...]
Quick and dirty editor:
$ cat > /tmp/file < /dev/tty
line1
line2
line3
^D
$
$ cat /tmp/file
line1
line2
line3
$
You probably don't need the "< /dev/tty".
On 31.01.2024 16:58, Scott Lurndal wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 31.01.2024 15:21, David Brown wrote:
On 31/01/2024 09:36, Malcolm McLean wrote:
[ I snipped a couple of "I actually don't know/need it" things ]
But now it's effectively a programming language, and, from the example >>>>> code, a very poorly designed one which is cryptic and fussy and liable >>>>> to be hard to maintain. So it's better to use a language like Perl to >>>>> achieve the same thing, and I did have a few Perl scripts handy for
repetitive jobs of that nature in my Unix days.
That gave me a laugh! You think bash is cryptic, fussy and poorly
designed, and choose /Perl/ as the alternative :-)
I don't think it's that clear a joke. The Unix shell is extremely
error prone to program, and you should not let a newbie write shell
programs without careful supervision.
Nonsense.
Not the least. - I'm not sure about your background in shell.
On 1/31/24 07:35, Janis Papanagnou wrote:
...
I associate DEC's VMS with the old DEC VAX-11 system, both
from around the mid of the 1970's. I programmed on a DEC's
VAX with VMS obviously before Linus Torvalds started his
studies. And that was at a time when the DEC VAX and VMS
were replaced at our sites by Unix systems.
OK - so it's that association you've got wrong. I know VMS was still
going strong around 1990 when I was introduced to it. It might have been
in decline at the time, but it was very far from being gone.
scott@slp53.sl.home (Scott Lurndal) writes:
Michael S <already5chosen@yahoo.com> writes:[...]
On only remotely related note, what happens on your system when you
want more than 1024 files to be open by one program simultaneously?
$ ulimit -f 2048
Will increase the limit, to any arbitrary value, subject to system
wide limits configured by the superuser (system manager).
That sets the limit for file size. I think you mean "ulimit -n 2048".
On 31.01.2024 17:18, Michael S wrote:
On Wed, 31 Jan 2024 17:05:23 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
(See for example 'man ksh' Section "Input/Output". But careful; ksh
has additional non-standard additions. So a peek into the POSIX docs
might serve you better.)
[ DEC's VMS ]
Released in 1977.
Reached the peak of popularity in mid 1980s, when DEC decided to use
VAX not just as mini/super-mini, but also as competitor to mainframes, effectively killing their earlier mainframe line (PDP-6/10/20).
[...]
By value, likely still bigger than all Unixen combined.
Not sure what (to me strange sounding) ideas you have here.
I can say the same.
I'm not sure why you mentioned 5, whether that's better or worse.
There's naturally some limit on OS level on the number of parallel
open file descriptors, but that limit is very high. Mind that you
can always close unused ones.
Janis
David Brown <david.brown@hesbynett.no> writes:
[...]
However, I learned a new trick when checking that I was not mistaken
about this - it turns out that "less file.pdf" gives a nice text-only
output from the pdf file (by passing it through "lesspipe"). There's
always something new to learn from inane conversations on Usenet :-)
It doesn't necessarily do this by default. See the documentation for
details (which are of course off-topic here).
On 31.01.2024 14:10, Michael S wrote:
[ DEC's VMS ]
Released in 1977.
Reached the peak of popularity in mid 1980s, when DEC decided to use
VAX not just as mini/super-mini, but also as competitor to
mainframes, effectively killing their earlier mainframe line
(PDP-6/10/20).
This is interesting. These days all major players here switched to
Unix systems (in our context specifically AIX and HP-UX), exactly
to exchange the huge sports halls full of mainframe computers to
just a small room full of Unix servers.
[...]
By value, likely still bigger than all Unixen combined.
Not sure what (to me strange sounding) ideas you have here.
I can say the same.
Sure, so let me expand. The "By value" was what made me doubt. The
"values" (Real Money) I experienced in the legacy mainframe areas,
in the financial sector (banks and assurance companies); these were
not DECs here, and they were hard to replace. - I know that every
couple years they made their business cases about how they can get
rid of the mainframes, to no avail. (Don't know how it evolved the
past 20 years, though.) And later all the ISP computing power went
in Linux plants, where the money was made. I never observed that
DEC/VMS was of any importance "by value". If it had some value by
means I'm not aware of, I take your word as granted.
Janis
On Wed, 31 Jan 2024 16:25:30 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
The big advantage of non-GUI is for process automation. With GUI
oriented applications you can mainly only interactively (=slow and
cumbersome) do what it provides. Rarely GUI applications support a
scripting interface, and if so it's then typically some proprietary
non-standard language.
I'd take almost any proprietary non-standard GUI macro language over non-proprietary non-standard tcl. They say, Lua is better. I never had motivation to look at it more closely.
On 31.01.2024 15:09, David Brown wrote:
I would expect that the majority of uses of "cat" are with just one
file,
And of course just because of ignorance; the majority of (but not all)
uses with just one file are UUOCs.
but certainly it is useful when you want to combine files in
different ways.
I don't know of any concatenations in "different" ways, but of course
there's some more of the other usages that are supported by options.
On 31.01.2024 15:21, David Brown wrote:
On 31/01/2024 09:36, Malcolm McLean wrote:
[ I snipped a couple of "I actually don't know/need it" things ]
But now it's effectively a programming language, and, from the example
code, a very poorly designed one which is cryptic and fussy and liable
to be hard to maintain. So it's better to use a language like Perl to
achieve the same thing, and I did have a few Perl scripts handy for
repetitive jobs of that nature in my Unix days.
That gave me a laugh! You think bash is cryptic, fussy and poorly
designed, and choose /Perl/ as the alternative :-)
I don't think it's that clear a joke. The Unix shell is extremely
error prone to program, and you should not let a newbie write shell
programs without careful supervision. ("newbie" [in shell context]
= less than 10 years of practical experience. - Am I exaggerating?
Maybe. But not much.)
David Brown <david.brown@hesbynett.no> writes:
On 31/01/2024 09:36, Malcolm McLean wrote:
The reason is that I'd only run the command once, and it's so likely
that there will be either a syntax misunderstanding or a typing error
that I'd have to test to ensure that it was right. And by the time
you've done that any time saved by typing only one commandline is lost.
Of course if you are writing scripts then that doesn't apply. But now
it's effectively a programming language, and, from the example code, a
very poorly designed one which is cryptic and fussy and liable to be
hard to maintain. So it's better to use a language like Perl to achieve
the same thing, and I did have a few Perl scripts handy for repetitive
jobs of that nature in my Unix days.
That gave me a laugh! You think bash is cryptic, fussy and poorly
designed, and choose /Perl/ as the alternative :-)
You admit this with "not tested". Says it all. '"Understandig Unix" is
an intellectually useless achievement. You might have to do it if you
have to use the system and debug and trouble shoot. But it's nothing to
be proud about.
It is "useless" for people who don't use it. For people who /do/ use
it, is very useful.
I've used sequences like Tim's - it's a way to copy data remotely from a
different machine. I would likely write it slightly differently - I'd
probably do the mkdir and cd first, thus avoiding the need for a
subshell, and I'd use "ssh -C" or "tar -z" to do the compression rather
than "gzip".
There's no doubt that the learning curve is longer for doing this sort
of thing from the command line than using gui programs. There is also
no doubt that when you are used to it, command line utilities and a good
shell are very flexible and efficient.
Learn to use the tools that are conveniently available, and then pick
the right tool for the job - whether it is command line or gui.
And there are often more than one tool for the job. e.g. rsync(1)
for copying data remotely.
It's just the numbers of file descriptors and whether it's an input >
or output < channel, or even a read/write channel <> .
output<> in/out
<< here-docappend
On 31.01.2024 17:26, Michael S wrote:
On Wed, 31 Jan 2024 16:25:30 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
The big advantage of non-GUI is for process automation. With GUI
oriented applications you can mainly only interactively (=slow and
cumbersome) do what it provides. Rarely GUI applications support a
scripting interface, and if so it's then typically some proprietary
non-standard language.
I'd take almost any proprietary non-standard GUI macro language over
non-proprietary non-standard tcl. They say, Lua is better. I never had
motivation to look at it more closely.
I don't recall to have ever used tcl (maybe once, long ago?),
and I never stumbled across an application (GUI or otherwise)
where I needed scripting and it would have provided Lua. Thus
I cannot help you here, I either don't know it.
All I can say is that the Unix shell was a reliable companion
wherever we had to automate tasks on Unix systems or on Cygwin
enhanced Windows.
On Wed, 31 Jan 2024 17:29:30 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
I'm not sure why you mentioned 5, whether that's better or worse.
There's naturally some limit on OS level on the number of parallel
open file descriptors, but that limit is very high. Mind that you
can always close unused ones.
Five of each sort, i.e. five inputs and five outputs, sound good to me.
Much more than five of each sort pre-opened by shell sound like too
much. If there exist a need for more than five channels for
communication between complex of programs then this complex of programs
very likely was designed to work together and only together. And then
any intervention of the user into communication between them will
likely do more harm than good.
Of course, I fully expect that usefully using more than three
predefined channels of any particular direction would be very rare, but
I still like five, or at least four, better than three.
As to not using predefined direction and instead just providing a pool
of up to 10 pre-open descriptors, this idea didn't cross my mind in
those particular five minutes that I was writing my initial (yes, provocative, yes intentionally so) post. Right now I don't want to
think whether I like it or not, because I see no good reasons to
think deeply about this particular water under bridge.
On 30.01.2024 20:39, Richard Harnden wrote:
Nobody uses printf to output binary data. fwrite(3) would be common, as
would write(2).
Right. I'm using the OS'es write(2), but also printf with ANSI escapes,
e.g. sprintf (buf, "\033[%d;%dH", ...
Maybe you could use printf("%c%c%c" ... but it'd be beyond tedious.
Since I recall to have used it in some thread I want to clarify that
it was just meant as an example countering an incorrect argument of
"not being able to output binary data on stdout", or some such.
Janis
Terminal control sequences (almost always based on VT100 these days) are typically not printable, but tend to avoid null characters, which means
you can very probably use printf to print them (assuming you're on a POSIX-like system).
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might
have unusual effects if passed though systems designed to handle
human-readable text.
For instance in some systems designed to receive
I must admit that it's nothing I have ever done or considered doing.ASCII text, there is no distinction between the nul byte and "waitingYour reasoning is all gobbledygook. Your comments reflect only
for next data byte". Obviously this will cause difficuties if the data
is binary.
Also many binary formats can't easily be extended, so you can pass one
image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to
extend.
limitations in your thinking, not any essential truth about using
standard out for binary data.
However standard output is designed for text and not binary ouput.
That you couldn't actually mount a defence of your position whilst I could also strongly implies that I am right.
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
According to you, these tools are poorly designed. I don't think so.
How would you design them? Endless input and output file names to be
juggled and tidied up afterwards?
I think they're poorly designed too.
From the POV of interactive console programs, they /are/ poor.
But the
mistake is thinking that they are actual programs or commands, when
really they are just filters. They are not designed to be standalone >commands.
Even 'cat', if I type it by itself, just sits there.
(I wonder what use
it has in a sequence like ... | cat | ...; what does it add to the data?)
AFAICS, this stuff mainly works inside scripts. Or do people here spend
all day manually piping stuff between programs?
As for alternatives, I don't know.
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might
have unusual effects if passed though systems designed to handle
human-readable text.
Maybe you are not used to a system where it's trivial to inspect such
data. When "some_prog" produces data that are not compatible with the current terminal settings, "some_prog | hd" shows a hex dump instead.
The need to do this does not make "some_prog" poorly designed. It may
simply mean that the output is /intended/ for further processing.
For instance in some systems designed to receive
I must admit that it's nothing I have ever done or considered doing.ASCII text, there is no distinction between the nul byte and "waitingYour reasoning is all gobbledygook. Your comments reflect only
for next data byte". Obviously this will cause difficuties if the data >>>> is binary.
Also many binary formats can't easily be extended, so you can pass one >>>> image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to
extend.
limitations in your thinking, not any essential truth about using
standard out for binary data.
However standard output is designed for text and not binary ouput.
What is your evidence? stdout was just designed for output (as far as I
can tell) and, anyway, what is the distinction you are making between
binary and text? iconv --from ACSII --to EBCDIC-UK will produce
something that is "logically" text on stdout, but it might look like
binary to you.
An example where it's really useful not to care: I have a suite of tools
for doing toy cryptanalysis. Some apply various transformations and/or filters to byte streams and others collect and output (on stderr)
various statistics. Plugging them together in various pipelines is very handy when investigating an encrypted text. The output is almost always "binary" in the sense that there would be not point in looking at on a terminal.
According to you, these tools are poorly designed. I don't think so.
How would you design them? Endless input and output file names to be
juggled and tidied up afterwards?
On 31/01/2024 16:33, Keith Thompson wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:There's kind of an implication that it's something I ought to be doing.
On 31/01/2024 07:18, Tim Rentsch wrote:[...]
It is FAR more cumbersome to accomplish what this commandYes. I don't do that sort of thing.
is doing without sending binary data through standard out.
Anyone who doesn't understand this doesn't understand Unix.
You don't. Others do. What was your point again?
bart <bc@freeuk.com> writes:
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
According to you, these tools are poorly designed. I don't think so.
How would you design them? Endless input and output file names to be
juggled and tidied up afterwards?
I think they're poorly designed too.
Of course you do. They're not bart programs.
From the POV of interactive console programs, they /are/ poor.
You don't provide any reason why - do elucidate!
AFAICS, this stuff mainly works inside scripts. Or do people here spend
all day manually piping stuff between programs?
Yes and Yes.
Kaz Kylheku <433-929-6894@kylheku.com> writes:
On 2024-01-31, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Terminal control sequences (almost always based on VT100 these days) are >>> typically not printable, but tend to avoid null characters, which means
you can very probably use printf to print them (assuming you're on a
POSIX-like system).
They use text. For instance, a cursor position is both accepted and
reported in a decimal format like 13;17. All the commands and
delimiting characters are textual, except for part of the CSI (control
sequence introducer). The 7 bit CSI uses two characters, ESC and [.
Except for that one ESC, everything is printable.
I'd describe "printable except for ESC" as binary. And some sequences
use other non-printable characters like ASCII BEL (Ctl-G) (perhaps not
VT100 standard, but for example commands to change fonts and colors for xterm).
On 2024-01-31, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Kaz Kylheku <433-929-6894@kylheku.com> writes:
On 2024-01-31, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Terminal control sequences (almost always based on VT100 these days) are >>>> typically not printable, but tend to avoid null characters, which means >>>> you can very probably use printf to print them (assuming you're on a
POSIX-like system).
They use text. For instance, a cursor position is both accepted and
reported in a decimal format like 13;17. All the commands and
delimiting characters are textual, except for part of the CSI (control
sequence introducer). The 7 bit CSI uses two characters, ESC and [.
Except for that one ESC, everything is printable.
I'd describe "printable except for ESC" as binary. And some sequences
use other non-printable characters like ASCII BEL (Ctl-G) (perhaps not
VT100 standard, but for example commands to change fonts and colors for
xterm).
But ESC is related text; it's a character described in ASCII used for >signaling in the middle of text, which is what it's doing here.
On 31/01/2024 13:47, Janis Papanagnou wrote:
On 30.01.2024 20:39, Richard Harnden wrote:
Nobody uses printf to output binary data. fwrite(3) would be common, as
would write(2).
Right. I'm using the OS'es write(2), but also printf with ANSI escapes,
e.g. sprintf (buf, "\033[%d;%dH", ...
I meant 'binary' as in has \0s
It seems to work fine with ESC's and utf8 (and i abuse it thus often)
... but, from what James said, that is not actually guarenteed.
Richard Harnden <richard.nospam@gmail.invalid> writes:
On 31/01/2024 13:47, Janis Papanagnou wrote:
On 30.01.2024 20:39, Richard Harnden wrote:
Right. I'm using the OS'es write(2), but also printf with ANSI
Nobody uses printf to output binary data. fwrite(3) would be common, as >>>> would write(2).
escapes,
e.g. sprintf (buf, "\033[%d;%dH", ...
I meant 'binary' as in has \0s
I don't think that's what "binary" means.
David Brown <david.brown@hesbynett.no> writes:
On 31/01/2024 15:46, Janis Papanagnou wrote:
On 31.01.2024 15:09, David Brown wrote:
And of course just because of ignorance; the majority of (but not
I would expect that the majority of uses of "cat" are with just one
file,
all)
uses with just one file are UUOCs.
I regularly see it as more symmetrical and clearer to push data left
to right. So I might write "cat infile | grep foo | sort > outfile".
Of course I could use "<" redirection, but somehow it seems more
natural to me to have this flow. I'll use "<" for simpler cases.
But perhaps this is just my habit, and makes little sense to other people.
You can also use:
< infile grep foo | sort > outfile
Redirections don't have to be written after a command.
On 31.01.2024 14:05, David Brown wrote:
But it is correct that English has become the main language for
international communication, and is therefore critical for anything that
involves cross-border communication, or where there are significant
numbers of foreign workers. That includes academic work. Different
parts of Europe previously used German or Russian for this,
Don't forget the importance of French! - The whole postal and telecommunication sectors were (and probably still are) massively
influenced by France.
(You're always writing so much text, so I'll skip it and avoid
more comments.)
Just two (unrelated) notes concerning statements I've seen
somewhere in the thread (maybe here as well)...
First; the EU publishes in all languages of the member states,
for example. (There's no single lingua franca.)
And the second note; we have to distinguish the language of the
programming language's keywords, the comments in the source
code, and the language(s) used for user-interaction.
I don't know whether there's some native language that use
non-English keywords, but I'd suppose so, since in the past
I've seen some using preprocessors for a "native language"
source code. So while not typical, probably a demand at some
places. (Elsethread I mentioned the German TR440 commands,
but a [primitive] command language, as opposed to, say, the
Unix shell, I don't consider much as a language.)
The comments' languages varies, in my experience. Sometimes
there's coding standards (that demand the native language, or
that demand English), sometimes it's not defined. Myself I'm
reluctant to switch between languages and stay with English.
But there were also other cases with longer descriptions on
a conceptual basis; if you come from a native language's
perspective it can be better to stay with the language of the
specification instead of introducing sources of misunderstanding.
The user interface, finally, is of course as specified, and can
be anything, or even multi-lingual.
On 31/01/2024 23:36, Ben Bacarisse wrote:
Well almost by definition binary output is intended for further
Maybe you are not used to a system where it's trivial to inspect such
data. When "some_prog" produces data that are not compatible with the
current terminal settings, "some_prog | hd" shows a hex dump instead.
The need to do this does not make "some_prog" poorly designed. It may
simply mean that the output is /intended/ for further processing.
processing. Binary audio files must ultimately be converted to analogue
if anyone is to listen to them, for example.
I had to check how to do a hex dump on the system I'm typing this on.
The name of the hex dumper is xxd instead of hd, but otherwise it works
the same way and will accept piped data. But the fact I had to look it
up tells you that I've never actually used it.
The two problems with hex
dumps are that you've got to do mental arithmetic to convert 8 bit hex
values into 16 or 32 bit fields,
and that once you get a variable length
field, it's virtually impossible to keep track of and match up the
following fields.
So in reality what I do when troubleshooting binary
data is to write a scratch program, or, more often because the trouble
is in the existing parser, put diagnosics in an existing parser to print
out a few fields and inspect them that way.
Of course to check that
audio or image data is right you have to listen to it or view it - you
can't tell from looking at the individual samples.
On Tue, 30 Jan 2024 23:18:21 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Simple example (disclaimer: not tested):
ssh foo 'cd blah ; tar -cf - . | gzip -c' | \
(mkdir foo.blah ; cd foo.blah ; gunzip -c | tar -xf -)
Of the five main programs in this command, four are using
standard out to send binary data:
tar -cf - .
gzip -c
ssh foo [...]
gunzip -c
The tar -xf - at the end reads binary data on standard in
but doesn't output any (or anything else for that matter).
It is FAR more cumbersome to accomplish what this command
is doing without sending binary data through standard out.
If I am not mistaken, tar, gzip and gunzip do not write binary
data to standard output by default. [...]
Anyone who doesn't understand this doesn't understand Unix.
Frankly, Unix redirection racket looks like something hacked
together rather than designed as result of the solid thinking
process. As long as there were only standard input and output it
was sort of logical. But when they figured out that it is
insufficient, they had chosen a quick hack instead of
constructing a solution that wouldn't offend engineering senses
of any non-preconditioned observer.
On 31/01/2024 07:18, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might >>>>> have unusual effects if passed though systems designed to handle
human-readable text. For instance in some systems designed to receive >>>>> ASCII text, there is no distinction between the nul byte and "waiting >>>>> for next data byte". Obviously this will cause difficuties if the data >>>>> is binary.
Also many binary formats can't easily be extended, so you can pass one >>>>> image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to >>>>> extend.
Your reasoning is all gobbledygook. Your comments reflect only
limitations in your thinking, not any essential truth about using
standard out for binary data.
I must admit that it's nothing I have ever done or considered doing.
[...]
Simple example (disclaimer: not tested):
ssh foo 'cd blah ; tar -cf - . | gzip -c' | \
(mkdir foo.blah ; cd foo.blah ; gunzip -c | tar -xf -)
Of the five main programs in this command, four are using
standard out to send binary data:
tar -cf - .
gzip -c
ssh foo [...]
gunzip -c
The tar -xf - at the end reads binary data on standard in
but doesn't output any (or anything else for that matter).
It is FAR more cumbersome to accomplish what this command
is doing without sending binary data through standard out.
Anyone who doesn't understand this doesn't understand Unix.
Yes. I don't do that sort of thing.
While I have used Unix, it is as a platform for interactive programs
which work on graohics, or a general C compilation environment. I
don;t build pipeliens to do that sort of data processing. If I had to download a tar file I'd either use a graphical tool or type serveal
commands into the shell, each launching single executable,
interactively.
The reason is that I'd only run the command once, and it's so likely
that there will be either a syntax misunderstanding or a typing error
that I'd have to test to ensure that it was right. And by the time
you've done that any time saved by typing only one commandline is
lost. Of course if you are writing scripts then that doesn't
apply. But now it's effectively a programming language, and, from the example code, a very poorly designed one which is cryptic and fussy
and liable to be hard to maintain. So it's better to use a language
like Perl to achieve the same thing, and I did have a few Perl scripts
handy for repetitive jobs of that nature in my Unix days.
You admit this with "not tested". Says it all. '"Understandig Unix" is
an intellectually useless achievement. You might have to do it if you
have to use the system and debug and trouble shoot. But it's nothing
to be proud about.
On 01/02/2024 00:47, Scott Lurndal wrote:
bart <bc@freeuk.com> writes:
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
According to you, these tools are poorly designed. I don't think so.
How would you design them? Endless input and output file names to be
juggled and tidied up afterwards?
I think they're poorly designed too.
Of course you do. They're not bart programs.
From the POV of interactive console programs, they /are/ poor.
You don't provide any reason why - do elucidate!
They only do one thing, like you can't first do A, then B. They don't
give any prompts. They often apparently do nothing (so you can't tell if >they're busy, waiting for input, or hanging). There is no dialog.
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:I'd write a monolithic program.
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might >>>>> have unusual effects if passed though systems designed to handle
human-readable text.
Maybe you are not used to a system where it's trivial to inspect such
data. When "some_prog" produces data that are not compatible with the
current terminal settings, "some_prog | hd" shows a hex dump instead.
The need to do this does not make "some_prog" poorly designed. It may
simply mean that the output is /intended/ for further processing.
For instance in some systems designed to receive
I must admit that it's nothing I have ever done or considered doing.ASCII text, there is no distinction between the nul byte and "waiting >>>>> for next data byte". Obviously this will cause difficuties if the data >>>>> is binary.Your reasoning is all gobbledygook. Your comments reflect only
Also many binary formats can't easily be extended, so you can pass one >>>>> image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to >>>>> extend.
limitations in your thinking, not any essential truth about using
standard out for binary data.
However standard output is designed for text and not binary ouput.
What is your evidence? stdout was just designed for output (as far as I
can tell) and, anyway, what is the distinction you are making between
binary and text? iconv --from ACSII --to EBCDIC-UK will produce
something that is "logically" text on stdout, but it might look like
binary to you.
An example where it's really useful not to care: I have a suite of tools
for doing toy cryptanalysis. Some apply various transformations and/or
filters to byte streams and others collect and output (on stderr)
various statistics. Plugging them together in various pipelines is very
handy when investigating an encrypted text. The output is almost always
"binary" in the sense that there would be not point in looking at on a
terminal.
According to you, these tools are poorly designed. I don't think so.
How would you design them? Endless input and output file names to be
juggled and tidied up afterwards?
Load the encryoted text into memory, and then pass it to subroutines to
do the various analyses.
On 01/02/2024 00:47, Scott Lurndal wrote:
bart <bc@freeuk.com> writes:
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
According to you, these tools are poorly designed. I don't think so. >>>> How would you design them? Endless input and output file names to be >>>> juggled and tidied up afterwards?
I think they're poorly designed too.
Of course you do. They're not bart programs.
From the POV of interactive console programs, they /are/ poor.
You don't provide any reason why - do elucidate!
They only do one thing, like you can't first do A, then B. They don't
give any prompts. They often apparently do nothing (so you can't tell if they're busy, waiting for input, or hanging). There is no dialog.
On 31/01/2024 23:34, Keith Thompson wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:The standard is ASCII. (American standard for computer information >interchange). Byte zero is NUL, which means "ignore".
On 31/01/2024 20:14, Keith Thompson wrote:[...]
Terminal control sequences (almost always based on VT100 these days)In ASCII, 0 means NUL, or "ignore". So an ASCII sequence may contain
are typically not printable, but tend to avoid null characters, which
means you can very probably use printf to print them (assuming you're
on a POSIX-like system).
[...]
any number of embedded zero bytes, which the receiver ignores. That's
because for technical reasons some communications channels have to
send data every cycle, and if there is no data, they will send a
signal indistinguishable from all bits zero.
Not particularly relevant. A quick experiment with xterm indicates that
embedding null bytes in a control sequence prevents it from being
recognized. There may be some standards that require embedded zero
bytes to be ignored, but xterm doesn't any such standard. Similarly, if
you embed null bytes in text written to a file, the result is corrupted
text file.
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 30/01/2024 07:27, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might >>>>> have unusual effects if passed though systems designed to handle
human-readable text.
Maybe you are not used to a system where it's trivial to inspect such
data. When "some_prog" produces data that are not compatible with the
current terminal settings, "some_prog | hd" shows a hex dump instead.
The need to do this does not make "some_prog" poorly designed. It may
simply mean that the output is /intended/ for further processing.
For instance in some systems designed to receive
I must admit that it's nothing I have ever done or considered doing.ASCII text, there is no distinction between the nul byte and "waiting >>>>> for next data byte". Obviously this will cause difficuties if theYour reasoning is all gobbledygook. Your comments reflect only
data
is binary.
Also many binary formats can't easily be extended, so you can pass one >>>>> image and that's all. While it is possible to devise a text format >>>>> which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to >>>>> extend.
limitations in your thinking, not any essential truth about using
standard out for binary data.
However standard output is designed for text and not binary ouput.
What is your evidence? stdout was just designed for output (as far as I
can tell) and, anyway, what is the distinction you are making between
binary and text? iconv --from ACSII --to EBCDIC-UK will produce
something that is "logically" text on stdout, but it might look like
binary to you.
An example where it's really useful not to care: I have a suite of tools
for doing toy cryptanalysis. Some apply various transformations and/or
filters to byte streams and others collect and output (on stderr)
various statistics. Plugging them together in various pipelines is very
handy when investigating an encrypted text. The output is almost always
"binary" in the sense that there would be not point in looking at on a
terminal.
According to you, these tools are poorly designed. I don't think so.
How would you design them? Endless input and output file names to be
juggled and tidied up afterwards?
I think they're poorly designed too.
From the POV of interactive console programs, they /are/ poor. But the mistake is thinking that they are actual programs or commands, when
really they are just filters. They are not designed to be standalone commands.
Even 'cat', if I type it by itself, just sits there. (I wonder what use
it has in a sequence like ... | cat | ...; what does it add to the data?)
AFAICS, this stuff mainly works inside scripts. Or do people here spend
all day manually piping stuff between programs?
As for alternatives, I don't know. There are any number of ways this
could be done. But if everyone has become inured to this piping
business, they will not be receptive to anything different.
Here however are some ideas:
* Have versions of these tools for use as filters with no UI, just
a default input and output, and versions for interactive use with
helpful prompts. Or even just a sensibly named output! Instead of
every program writing a.out.
* Have a concept of a current block of data, analogous to a clipboard.
Then separate commands can load data, sort it, count it, display it,
write it etc, with no need for intermediate named files.
But I'd be happier if this was all contained within a separate
application from an OS shell program.
> load fred
Data loaded
> lc
4 lines
> list
1 one
2 two
3 three
4 four
> rev
Reversed
> list
1 four
2 three
3 two
4 one
> sort
Sorted
> upper
> list
1 FOUR
2 ONE
3 THREE
4 TWO
> save bill
Written to bill
> q
On 01.02.2024 06:24, Malcolm McLean wrote:
On 31/01/2024 23:36, Ben Bacarisse wrote:
Well almost by definition binary output is intended for further
Maybe you are not used to a system where it's trivial to inspect such
data. When "some_prog" produces data that are not compatible with the
current terminal settings, "some_prog | hd" shows a hex dump instead.
The need to do this does not make "some_prog" poorly designed. It may
simply mean that the output is /intended/ for further processing.
processing. Binary audio files must ultimately be converted to analogue
if anyone is to listen to them, for example.
Well, not necessarily. Let's leave the typical use case for a moment...
It might also be analyzed and converted to a digitally represented
formula, say some TeX code, or e.g. like the formal syntax that the
lilypond program uses.
I had to check how to do a hex dump on the system I'm typing this on.
The name of the hex dumper is xxd instead of hd, but otherwise it works
the same way and will accept piped data. But the fact I had to look it
up tells you that I've never actually used it.
Well, there's always the old Unix standard tool, 'od'.
I use that without thinking or looking it up, since it was ever there, >despite I only rarely use it.
And you observed correctly that nowadays there's typically even more
than one tool available. (And Bart will probably write his own tool. :-)
The two problems with hex
dumps are that you've got to do mental arithmetic to convert 8 bit hex
values into 16 or 32 bit fields,
Hmm.. - have you inspected the man pages of the tools?
At least for 'od' I know it's easy per option...
od -c file # characters (or escapes and octals)
od -t x1 file # hex octets
od -t x2 file # words (two octets)
od -c -t x1 file # characters and octets
On 31/01/2024 23:36, Ben Bacarisse wrote:
I'd write a monolithic program.
An example where it's really useful not to care: I have a suite of tools
for doing toy cryptanalysis. Some apply various transformations and/or
filters to byte streams and others collect and output (on stderr)
various statistics. Plugging them together in various pipelines is very
handy when investigating an encrypted text. The output is almost always
"binary" in the sense that there would be not point in looking at on a
terminal.
According to you, these tools are poorly designed. I don't think so.
How would you design them? Endless input and output file names to be
juggled and tidied up afterwards?
On 01/02/2024 13:02, Janis Papanagnou wrote:
Well, not necessarily. Let's leave the typical use case for a moment...And ultimately converted to a non binary form. A list of 1s and 0s is
It might also be analyzed and converted to a digitally represented
formula, say some TeX code, or e.g. like the formal syntax that the
lilypond program uses.
seldom any use to the final consumer of the data.
I just ran "man xxd". The man page contains this statement.The two problems with hex
dumps are that you've got to do mental arithmetic to convert 8 bit hex
values into 16 or 32 bit fields,
Hmm.. - have you inspected the man pages of the tools?
The tool's weirdness matches its creator's brain. Use entirely at your
own risk. Copy files. Trace it. Become a wizard.
At least for 'od' I know it's easy per option...So a JPEG file starts with
od -c file # characters (or escapes and octals)
od -t x1 file # hex octets
od -t x2 file # words (two octets)
od -c -t x1 file # characters and octets
FF D8
FF E0
hi lo (length of the FF E0 segment)
So we want the output
FF D8 FF E0 [1000] to check that the segment markers are correct and FF
E0 segment is genuinely a thousand bytes (or whatever it is). This isn't
easy to achieve with a hex dump utility.
On 01/02/2024 14:45, Scott Lurndal wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
I'd write a monolithic program.
Even a monolithic program is decomposed into subroutines (or malcolm
functions).
A pipeline is the same concept at a higher level.
Exactly. So whilst it might have some advantages, they aren't going to
be very large, because as you say, it;s the same basic concept.
I just ran "man xxd". The man page contains this statement.
The tool's weirdness matches its creator's brain. Use entirely at your
own risk. Copy files. Trace it. Become a wizard.
On 01/02/2024 01:21, bart wrote:
* Have versions of these tools for use as filters with no UI, just
a default input and output, and versions for interactive use with
helpful prompts. Or even just a sensibly named output! Instead of
every program writing a.out.
Sometimes a "front end" with a nice UI /is/ useful. So people write
front ends with nice UI's - text-based or gui. Typically these are
great for common tasks, and are cumbersome or useless for rarer or more advanced stuff. And that's fine - use whatever works best for you at
the time. If you think "ssh" is complicated from the command line, use "putty" - but that won't handle all the uses that some people need.
But I most certainly don't want interactive and "helpful" prompts for
most of my command-line tools. It's fine on occasion if it is
necessary, useful if something out of the ordinary happens, and
appropriate for things like passwords. But when I know what I am doing,
why would I want to see help messages repeated? And if I don't know
what I am doing - it's the first time using a command, or I've forgotten
some details, then I have "prog --help", "man prog", or "google prog"
that will all give much more useful information than interactive prompts
ever could.
Can you imagine typing "ls" and being asked :
* Did you want to list files in the current directory, or elsewhere?
* Did you want to list all files, including hidden files?
* Did you want to use colour?
and then twenty more questions for the other common options for ls?
What would be the benefits of "sort" using command line options for
"filter" or "script" usage and then asking a dozen questions for "interactive" use?
...* Have a concept of a current block of data, analogous to a clipboard.
Then separate commands can load data, sort it, count it, display it,
write it etc, with no need for intermediate named files.
You mean, a convenient way of moving data between programs? Sort of
But I'd be happier if this was all contained within a separate
application from an OS shell program.
Yes, because it is /so/ much better if it is limited to a few commands
that you think of when writing this special application, than having a general system that works with any commands.
Basically, all you are saying is that you'd like command line utilities
to work with a default file name "/tmp/clipboard" - something you didn't
want earlier on.
Let's use /tmp/x for convenience....
/tmp$ cat > fred
one
two
three
four
<ctrl-D>
> load fred
Data loaded
$ cp fred x
$ cat -n x
1 FOUR
2 ONE
3 THREE
4 TWO
> save bill
Written to bill
> q
$ cp x bill
With pipes, this is all vastly simpler:
$ cat fred | tac | sort | awk '{print toupper($0)}' > bill
The "sponge" utility reads all of its stdin, then writes the file. Otherwise, since Unix is inherently multi-tasking and runs the programs
in parallel (unlike your utility), trying to redirect output back into
the same file you use for output is a race condition. Utilities are generally designed for pipes, not destructive changes to a single file.
So could you list one or two reasons why you might prefer a program with
five subroutines, and one or two reasons why you might prefer to write
five programs which communicate via piped data?
On 01.02.2024 15:57, Malcolm McLean wrote:
On 01/02/2024 14:45, Scott Lurndal wrote:
Exactly. So whilst it might have some advantages, they aren't going to
be very large, because as you say, it;s the same basic concept.
I think that you draw the wrong conclusion (on a statement that is
prone to misunderstandings or even wrong).
Pipelines are a very useful method to let processes communicate in
a one-way direction (as the name already suggests). From that it's >immediately recognizable that filters are a natural element in that >OS-architectural glue.
One original Unix philosophy was to have specialized commands that
do one thing well, and to combine such tasks as necessary. (To some
degree there was as similar statement concerning C function design.) >Unfortunately some popular GNU tools deviate from that. Features get >incorporated (as duplicates) in many tools (instead of using the
existing specialized one).
On 31/01/2024 19:35, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
I regularly see it as more symmetrical and clearer to push data left
to right. So I might write "cat infile | grep foo | sort > outfile".
Of course I could use "<" redirection, but somehow it seems more
natural to me to have this flow. I'll use "<" for simpler cases.
But perhaps this is just my habit, and makes little sense to other
people.
You can also use:
< infile grep foo | sort > outfile
Redirections don't have to be written after a command.
I did not know you could write it that way - thanks for another
off-topic, but useful, tip.
Kaz Kylheku <433-929-6894@kylheku.com> writes:
On 2024-01-31, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Kaz Kylheku <433-929-6894@kylheku.com> writes:But ESC is related text; it's a character described in ASCII used for >>signaling in the middle of text, which is what it's doing here.
So are most of the other ASCII codes less than 0x20. Including
file and record delimiters and shift-in/shift-out.
On 01/02/2024 15:07, Janis Papanagnou wrote:
On 01.02.2024 14:26, Malcolm McLean wrote:JPEG is an extremely common binary file format and JPEG files will be
On 01/02/2024 13:02, Janis Papanagnou wrote:
Well, not necessarily. Let's leave the typical use case for a moment... >>>>And ultimately converted to a non binary form. A list of 1s and 0s is
It might also be analyzed and converted to a digitally represented
formula, say some TeX code, or e.g. like the formal syntax that the
lilypond program uses.
seldom any use to the final consumer of the data.
No, I was speaking about an application that creates lilypond _input_,
which is a formal language to write notes, e.g. for evaluation by the
lilypond software, but not excluding other usages.
I just ran "man xxd". The man page contains this statement.The two problems with hex
dumps are that you've got to do mental arithmetic to convert 8 bit hex >>>>> values into 16 or 32 bit fields,
Hmm.. - have you inspected the man pages of the tools?
The tool's weirdness matches its creator's brain. Use entirely at your
own risk. Copy files. Trace it. Become a wizard.
This statement repelled you? (Can't help you here.)
At least for 'od' I know it's easy per option...So a JPEG file starts with
od -c file # characters (or escapes and octals)
od -t x1 file # hex octets
od -t x2 file # words (two octets)
od -c -t x1 file # characters and octets
FF D8
FF E0
hi lo (length of the FF E0 segment)
So we want the output
FF D8 FF E0 [1000] to check that the segment markers are correct and FF
E0 segment is genuinely a thousand bytes (or whatever it is). This isn't >>> easy to achieve with a hex dump utility.
I don't know binary format details about jpg, so I cannot help you here.
found on most general purpose computers.
All you need to know for the purposes of the discussion is that the
first four bytes are segment identifiers and must have the values I
gave, whilst bytes five and six are a big endian 16 bit number that >represents a segment length, and that potentially any of those values
could be unexpected and you might want to inspect them.
So how would you achieve that in a convenient and non-error prone way?
On 01/02/2024 02:29, bart wrote:
On 01/02/2024 00:47, Scott Lurndal wrote:
bart <bc@freeuk.com> writes:
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
According to you, these tools are poorly designed. I don't think so. >>>>> How would you design them? Endless input and output file names to be >>>>> juggled and tidied up afterwards?
I think they're poorly designed too.
Of course you do. They're not bart programs.
From the POV of interactive console programs, they /are/ poor.
You don't provide any reason why - do elucidate!
They only do one thing, like you can't first do A, then B. They don't
give any prompts. They often apparently do nothing (so you can't tell
if they're busy, waiting for input, or hanging). There is no dialog.
That's the whole point!
If you want to do A, then B, then you do "A | B", or "A; B", or "A && B"
or "A || B". And if you want to do A, then B twice, then C, then A
again, you write "A | B | B | C | A". Other operator choices let you
say "do this then that", or "do this, and if successful do that", etc.
Your monolithic AB program fails when you want to do C, or want to do A
and B in a way the AB author didn't envisage.
You have a Transformer - a toy that can be either a car or a robot. I've
got a box of Lego. Sometimes I need instructions and a bit of time, but
I can have a car, a robot, a plane, an alien, a house, and anything else
I might want.
On 31/01/2024 14:35, Janis Papanagnou wrote:
First; the EU publishes in all languages of the member states,
for example. (There's no single lingua franca.)
Weirdly, while Norway is not in the EU but Sweden and Denmark are, they publish (for some things at least) in Norwegian but not in Swedish or
Danish. [...]
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 01/02/2024 15:07, Janis Papanagnou wrote:
On 01.02.2024 14:26, Malcolm McLean wrote:JPEG is an extremely common binary file format and JPEG files will be
On 01/02/2024 13:02, Janis Papanagnou wrote:
Well, not necessarily. Let's leave the typical use case for a moment... >>>>>And ultimately converted to a non binary form. A list of 1s and 0s is
It might also be analyzed and converted to a digitally represented
formula, say some TeX code, or e.g. like the formal syntax that the
lilypond program uses.
seldom any use to the final consumer of the data.
No, I was speaking about an application that creates lilypond _input_,
which is a formal language to write notes, e.g. for evaluation by the
lilypond software, but not excluding other usages.
I just ran "man xxd". The man page contains this statement.The two problems with hex
dumps are that you've got to do mental arithmetic to convert 8 bit hex >>>>>> values into 16 or 32 bit fields,
Hmm.. - have you inspected the man pages of the tools?
The tool's weirdness matches its creator's brain. Use entirely at your >>>> own risk. Copy files. Trace it. Become a wizard.
This statement repelled you? (Can't help you here.)
At least for 'od' I know it's easy per option...So a JPEG file starts with
od -c file # characters (or escapes and octals)
od -t x1 file # hex octets
od -t x2 file # words (two octets)
od -c -t x1 file # characters and octets
FF D8
FF E0
hi lo (length of the FF E0 segment)
So we want the output
FF D8 FF E0 [1000] to check that the segment markers are correct and FF >>>> E0 segment is genuinely a thousand bytes (or whatever it is). This isn't >>>> easy to achieve with a hex dump utility.
I don't know binary format details about jpg, so I cannot help you here. >>>
found on most general purpose computers.
All you need to know for the purposes of the discussion is that the
first four bytes are segment identifiers and must have the values I
gave, whilst bytes five and six are a big endian 16 bit number that
represents a segment length, and that potentially any of those values
could be unexpected and you might want to inspect them.
So how would you achieve that in a convenient and non-error prone way?
$ if file /tmp/garage.jpg | grep JPEG > /dev/null^Jthen^Jecho "it is a jpeg"^Jfi
it is a jpeg
On 01.02.2024 16:41, Malcolm McLean wrote:
So could you list one or two reasons why you might prefer a program with
five subroutines, and one or two reasons why you might prefer to write
five programs which communicate via piped data?
A quite appealing and naturally appearing task (from the past) to use
pipes was to model communication cascades. Something like (off the top
of my head)...
data-source | sign | compress | crc | encrypt | channel-enc |
interleaver | channel-simulator | deinterleaver | channel-dec |
decrypt | crc-check | uncompress | check-sign | data-sink
Component-pairs can be omitted, say you may leave out the un-/compress >function. And every component may be either special purpose or general.
A special purpose entity could be BCH-enc and RCPC-enc, or it can also
be (if better suited) a combined module, say 'crc -16' vs. 'crc -32'
with the function realized as option argument.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 01.02.2024 16:41, Malcolm McLean wrote:
So could you list one or two reasons why you might prefer a program with >>> five subroutines, and one or two reasons why you might prefer to write
five programs which communicate via piped data?
A quite appealing and naturally appearing task (from the past) to use
pipes was to model communication cascades. Something like (off the top
of my head)...
data-source | sign | compress | crc | encrypt | channel-enc |
interleaver | channel-simulator | deinterleaver | channel-dec |
decrypt | crc-check | uncompress | check-sign | data-sink
Component-pairs can be omitted, say you may leave out the un-/compress >>function. And every component may be either special purpose or general.
A special purpose entity could be BCH-enc and RCPC-enc, or it can also
be (if better suited) a combined module, say 'crc -16' vs. 'crc -32'
with the function realized as option argument.
There was also the widely used netpbm package for translating
between different image formats.
https://en.wikipedia.org/wiki/Netpbm
$ giftopnm somepic.gif | ppmtobmp > somepic.bmp
$ for i in *.png; do pngtopam $i | ppmtojpeg >`basename $i .png`.jpg; done
On 2024-02-01, Scott Lurndal <scott@slp53.sl.home> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 01.02.2024 16:41, Malcolm McLean wrote:
So could you list one or two reasons why you might prefer a program with >>>> five subroutines, and one or two reasons why you might prefer to write >>>> five programs which communicate via piped data?
A quite appealing and naturally appearing task (from the past) to use
pipes was to model communication cascades. Something like (off the top
of my head)...
data-source | sign | compress | crc | encrypt | channel-enc |
interleaver | channel-simulator | deinterleaver | channel-dec |
decrypt | crc-check | uncompress | check-sign | data-sink
Component-pairs can be omitted, say you may leave out the un-/compress
function. And every component may be either special purpose or general.
A special purpose entity could be BCH-enc and RCPC-enc, or it can also
be (if better suited) a combined module, say 'crc -16' vs. 'crc -32'
with the function realized as option argument.
There was also the widely used netpbm package for translating
between different image formats.
https://en.wikipedia.org/wiki/Netpbm
$ giftopnm somepic.gif | ppmtobmp > somepic.bmp
$ for i in *.png; do pngtopam $i | ppmtojpeg >`basename $i .png`.jpg; done
Also, in regard to some silly objections upthread about the danger of
binary data on standard ouptut, programs in Unix can easily do the
Following (and arguably should):
if (isatty(STDOUT_FILENO)) {
fprintf(stderr, "Cowardly refusing to dump binary data to a terminal.\n");
exit(EXIT_FAILURE);
}
bart <bc@freeuk.com> writes:
On 01/02/2024 16:30, Scott Lurndal wrote:[...]
$ if file /tmp/garage.jpg | grep JPEG > /dev/null^Jthen^Jecho "it is a jpeg"^Jfi
it is a jpeg
That doesn't work for me:
Not if you type the "^J"s as '^' and 'J'. They were intended to
represent newlines. I would use semicolons instead:
bart <bc@freeuk.com> writes:
On 01/02/2024 16:30, Scott Lurndal wrote:[...]
$ if file /tmp/garage.jpg | grep JPEG > /dev/null^Jthen^Jecho "it is a jpeg"^Jfi
it is a jpeg
That doesn't work for me:
Not if you type the "^J"s as '^' and 'J'. They were intended to
represent newlines. I would use semicolons instead:
$ if file /tmp/garage.jpg | grep JPEG > /dev/null ; then echo "it is a jpeg" ; fi
it is a jpeg
(I might also use "grep -q" rather than redirecting to /dev/null.)
[...]
I think anyway that you need to grep for JFIF not JPEG, but that is a
really poor way to check for a JPEG file. Any text or binary file can
have a JFIF byte sequence.
That's not an issue. "file" doesn't just look for "JFIF" to determine
that a file is a jpg.
On 01.02.2024 11:30, David Brown wrote:
On 31/01/2024 14:35, Janis Papanagnou wrote:
First; the EU publishes in all languages of the member states,
for example. (There's no single lingua franca.)
Weirdly, while Norway is not in the EU but Sweden and Denmark are, they
publish (for some things at least) in Norwegian but not in Swedish or
Danish. [...]
Hmm.. - in my ears this sounds strange. I've looked it up and found...
"The EU has 24 official languages:
Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish,
French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian,
Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and
Swedish."
On 01.02.2024 11:34, David Brown wrote:
On 31/01/2024 19:35, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
I regularly see it as more symmetrical and clearer to push data left
to right. So I might write "cat infile | grep foo | sort > outfile".
Of course I could use "<" redirection, but somehow it seems more
natural to me to have this flow. I'll use "<" for simpler cases.
But perhaps this is just my habit, and makes little sense to other
people.
I completely understand that.
You can also use:
< infile grep foo | sort > outfile
Redirections don't have to be written after a command.
Indeed. And if we also respect that 'grep' accepts arguments,
then it's even more compact and yet probably better legible... :-)
grep foo infile | sort > outfile
I did not know you could write it that way - thanks for another
off-topic, but useful, tip.
Yes. We certainly should instead have written
grep foo iso646.h | sort > outfile
On 01/02/2024 14:50, David Brown wrote:
On 01/02/2024 02:29, bart wrote:
On 01/02/2024 00:47, Scott Lurndal wrote:
bart <bc@freeuk.com> writes:
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
According to you, these tools are poorly designed. I don't think so. >>>>>> How would you design them? Endless input and output file names to be >>>>>> juggled and tidied up afterwards?
I think they're poorly designed too.
Of course you do. They're not bart programs.
From the POV of interactive console programs, they /are/ poor.
You don't provide any reason why - do elucidate!
They only do one thing, like you can't first do A, then B. They don't
give any prompts. They often apparently do nothing (so you can't tell
if they're busy, waiting for input, or hanging). There is no dialog.
That's the whole point!
If you want to do A, then B, then you do "A | B", or "A; B", or "A &&
B" or "A || B". And if you want to do A, then B twice, then C, then A
again, you write "A | B | B | C | A". Other operator choices let you
say "do this then that", or "do this, and if successful do that", etc.
Your monolithic AB program fails when you want to do C, or want to do
A and B in a way the AB author didn't envisage.
You have a Transformer - a toy that can be either a car or a robot.
I've got a box of Lego. Sometimes I need instructions and a bit of
time, but I can have a car, a robot, a plane, an alien, a house, and
anything else I might want.
You can only do one thing, as you can only have one unbroken byte
sequence as output sent to stdout.
You can't send output A to stdout, then B to stdout, and certainly can't interleave messages to the console on stdout, as that would then be all
mixed up with the possibly binary data, and if redirected, you won't see
it.
I can see the idea of having one permanently open channel, but call it stdbinout or stdpipeout. But you still won't be able to generate a
sequence of distinct data blocks along that one channel because it is continuous.
This why 'as' only ever produces one object file, even for multiple
input source files.
And explains why 'as' treats multiple .s input files as though they were
all part of the same single source file: you can take one .s file, chop
it up into multiple .s files, and submit them all to 'as' (keeping the
right order).
It's a feature! It's also the whackiest assembler I've encountered, this century anyway. That fact that it's implemented as a crude filter with
one input stream and one output streams helps explain it.
Although it works differently from most such filters, because if its
output is not piped, and not redirected, it is sent to a file (always
called a.out). It's not quite crazy enough to send binary object file
data to the termimal; I wonder why not?
On 01/02/2024 14:55, David Brown wrote:
On 01/02/2024 02:53, Malcolm McLean wrote:By breaking down the problem into several parts e.g. "collect
On 31/01/2024 23:36, Ben Bacarisse wrote:
I'd write a monolithic program.
An example where it's really useful not to care: I have a suite of
tools
for doing toy cryptanalysis. Some apply various transformations and/or >>>> filters to byte streams and others collect and output (on stderr)
various statistics. Plugging them together in various pipelines is
very
handy when investigating an encrypted text. The output is almost
always
"binary" in the sense that there would be not point in looking at on a >>>> terminal.
According to you, these tools are poorly designed. I don't think so. >>>> How would you design them? Endless input and output file names to be >>>> juggled and tidied up afterwards?
It's very strange to me to see people that consider themselves
programmers talk about having multiple small functions to do specific
tasks and combining them into bigger functions to solve bigger
problems, yet are reduced to quivering jellies at the thought of
multiple small programs to do specific tasks that can be combined to
solve bigger tasks.
Do you think the C standard library would be improved by a single
function "flubadub" that takes 20 parameters and can calculate
logarithms, print formatted text, allocate memory and write it all to
a file?
statistical data, analyse statistics, form hypothesis, attempt
decryption, check decrypt for plausible plaintext" we can usually attack
it better. And you're right, there's not a fundamental difference
between writing one program with five subroutines, or five programs
which pass data to each other via pipelines.
But that doesn't mean that decision must not be made, or that you can't
give reasons for and against each option.
So could you list one or two reasons why you might prefer a program with
five subroutines, and one or two reasons why you might prefer to write
five programs which communicate via piped data?
On 2/1/2024 1:25 PM, bart wrote:
On 01/02/2024 20:09, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 01/02/2024 16:30, Scott Lurndal wrote:[...]
$ if file /tmp/garage.jpg | grep JPEG > /dev/null^Jthen^Jecho "it is >>>>> a jpeg"^Jfi
it is a jpeg
That doesn't work for me:
Not if you type the "^J"s as '^' and 'J'. They were intended to
represent newlines. I would use semicolons instead:
$ if file /tmp/garage.jpg | grep JPEG > /dev/null ; then echo "it >>> is a jpeg" ; fi
it is a jpeg
(I might also use "grep -q" rather than redirecting to /dev/null.)
[...]
I think anyway that you need to grep for JFIF not JPEG, but that is a
really poor way to check for a JPEG file. Any text or binary file can
have a JFIF byte sequence.
That's not an issue. "file" doesn't just look for "JFIF" to determine
that a file is a jpg.
I see, so 'file' is a special command that does all the work. grep
checks whether the description contains JPEG. Although it won't work for
any of my private formats.
Why would it work with your private formats? ;^)
On 01/02/2024 20:09, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 01/02/2024 16:30, Scott Lurndal wrote:[...]
$ if file /tmp/garage.jpg | grep JPEG > /dev/null^Jthen^Jecho "it is a jpeg"^Jfi
it is a jpeg
That doesn't work for me:
Not if you type the "^J"s as '^' and 'J'. They were intended to
represent newlines. I would use semicolons instead:
$ if file /tmp/garage.jpg | grep JPEG > /dev/null ; then echo "it is a jpeg" ; fi
it is a jpeg
(I might also use "grep -q" rather than redirecting to /dev/null.)
[...]
I think anyway that you need to grep for JFIF not JPEG, but that is a
really poor way to check for a JPEG file. Any text or binary file can
have a JFIF byte sequence.
That's not an issue. "file" doesn't just look for "JFIF" to determine
that a file is a jpg.
I see, so 'file' is a special command that does all the work. grep
checks whether the description contains JPEG. Although it won't work for
any of my private formats.
On 01/02/2024 18:06, bart wrote:
You can't send output A to stdout, then B to stdout, and certainly
can't interleave messages to the console on stdout, as that would then
be all mixed up with the possibly binary data, and if redirected, you
won't see it.
$ cat A
one
two
three
$ cat B
cat
dog
cow
$ (cat A; cat B) | wc -l
6
That's the output of two commands, "cat A" and "cat B", each going to
their stdout, and they are concatenated into a single pipe going to the
"wc -l" command to count the lines.
I'm not sure we are getting anywhere with you trying to invent more and
more complex situations in an attempt to find something that can't be
done from a Linux bash shell.
And explains why 'as' treats multiple .s input files as though they
were all part of the same single source file: you can take one .s
file, chop it up into multiple .s files, and submit them all to 'as'
(keeping the right order).
It does that because that's what makes sense.
Having it generate multiple .o files for multiple .s inputs
would restrict that choice.
You really are scraping the bottom of the barrel to try to justify your irrational hatreds, aren't you? You put a lot of effort into
desperately trying to dislike programs that don't work exactly the way
your programs work.
It's a very strange hobby you have.
their earlier mainframe line (PDP-6/10/20).
Yet I don't understand the relation to Linus Torvalds that was the
source of mentioning VMS. - I mean; only that he dislikes it is not much
of a news.
On 31/01/2024 07:07, Lawrence D'Oliveiro wrote:
On Tue, 30 Jan 2024 17:25:31 +0100, David Brown wrote:
Mixing binary data with formatted text data is very unlikely to be
useful.
PDF does exactly that. To the point where the spec suggests putting
some random unprintable bytes up front, to distract format sniffers
from thinking they’re looking at a text file.
PDF files start with the "magic" indicator "%PDF", which is enough for
many programs to identify them correctly.
On 01/02/2024 18:35, Janis Papanagnou wrote:
On 01.02.2024 11:30, David Brown wrote:
On 31/01/2024 14:35, Janis Papanagnou wrote:
First; the EU publishes in all languages of the member states,
for example. (There's no single lingua franca.)
Weirdly, while Norway is not in the EU but Sweden and Denmark are, they
publish (for some things at least) in Norwegian but not in Swedish or
Danish. [...]
Hmm.. - in my ears this sounds strange. I've looked it up and found...
"The EU has 24 official languages:
Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish,
French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian,
Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and
Swedish."
I can't say I have looked this up myself, or particularly care what
languages are used there. Maybe it only applied to some documents, or
used to apply but no longer does. Maybe some things don't stick rigidly
to the official languages, or maybe different guidelines are used for internal documents.
In ASCII, 0 means NUL, or "ignore".
But consider the more generic case of file-transfer tools that try to automatically convert between line-endings for text files on different platforms: if they mistook a PDF file for text, they could screw it up royally.
But it's not difficult to have intermediary files if you want to do more complicated things.
On 31/01/2024 19:01, Janis Papanagnou wrote:
All I can say is that the Unix shell was a reliable companion
wherever we had to automate tasks on Unix systems or on Cygwin
enhanced Windows.
Automation is certainly easier with good scripting - whatever the
language or shell.
On 31/01/2024 16:25, Janis Papanagnou wrote:
On 31.01.2024 15:21, David Brown wrote:
On 31/01/2024 09:36, Malcolm McLean wrote:
[ I snipped a couple of "I actually don't know/need it" things ]
But now it's effectively a programming language, and, from the example >>>> code, a very poorly designed one which is cryptic and fussy and liable >>>> to be hard to maintain. So it's better to use a language like Perl to
achieve the same thing, and I did have a few Perl scripts handy for
repetitive jobs of that nature in my Unix days.
That gave me a laugh! You think bash is cryptic, fussy and poorly
designed, and choose /Perl/ as the alternative :-)
I don't think it's that clear a joke. The Unix shell is extremely
error prone to program, and you should not let a newbie write shell
programs without careful supervision. ("newbie" [in shell context]
= less than 10 years of practical experience. - Am I exaggerating?
Maybe. But not much.)
I'm not a great fan of shell programming - anything advanced, and I tend
to reach for Python. But I think that is a matter of familiarity and practice. But if you consider bash programming as difficult to get
right, I'll not argue.
Perl is famously known as a "write-only" language. Sure, it is possible
to write good, clear, maintainable Perl code - but few people do that.
Thus the idea that finding bash cryptic or difficult and using Perl
instead is the joke.
On Wed, 31 Jan 2024 14:45:49 +0100, David Brown wrote:
On 31/01/2024 07:07, Lawrence D'Oliveiro wrote:
On Tue, 30 Jan 2024 17:25:31 +0100, David Brown wrote:
Mixing binary data with formatted text data is very unlikely to be
useful.
PDF does exactly that. To the point where the spec suggests putting
some random unprintable bytes up front, to distract format sniffers
from thinking they’re looking at a text file.
PDF files start with the "magic" indicator "%PDF", which is enough for
many programs to identify them correctly.
Sure, if you were looking for PDF files specifically.
But consider the more generic case of file-transfer tools that try to >automatically convert between line-endings for text files on different >platforms: if they mistook a PDF file for text, they could screw it up >royally.
On Wed, 31 Jan 2024 23:25:25 +0000, Malcolm McLean wrote:
In ASCII, 0 means NUL, or "ignore".
Fun fact: one of the names for hex 7F was “rubout”.
On 01/02/2024 22:02, David Brown wrote:
On 01/02/2024 18:06, bart wrote:
You can't send output A to stdout, then B to stdout, and certainly
can't interleave messages to the console on stdout, as that would
then be all mixed up with the possibly binary data, and if
redirected, you won't see it.
$ cat A
one
two
three
$ cat B
cat
dog
cow
$ (cat A; cat B) | wc -l
6
That's the output of two commands, "cat A" and "cat B", each going to
their stdout, and they are concatenated into a single pipe going to
the "wc -l" command to count the lines.
I see you don't get it. This is the equivalent of a program which is
supposed to do this:
print A to TTY
print B to LPT
print C to TTY
print D to LPT
but instead is written as this:
print A to TTY
print B to TTY
print C to TTY
print D to TTY
and you are expected to redirect all TTY output to LPT.
At least, on LPT, B and D can each start with a separate title page; on stdout directed to a file, it will be all mixed up.
bart <bc@freeuk.com> writes:
[...]
I've just realised why it is that your filter programs don't show
prompts or any kinds of messages: because those are sent to stdout,
and therefore will screw up any data that is being sent there as the
primary output.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
But consider the more generic case of file-transfer tools that try to
automatically convert between line-endings for text files on different
platforms: if they mistook a PDF file for text, they could screw it up
royally.
Just another reason not to use the system with two-byte line endings.
Not a problem on unix.
On 02.02.2024 03:12, Scott Lurndal wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
But consider the more generic case of file-transfer tools that try to
automatically convert between line-endings for text files on different
platforms: if they mistook a PDF file for text, they could screw it up
royally.
Just another reason not to use the system with two-byte line endings.
That cannot always be avoided.
Not a problem on unix.
There are several situations where it matters to consider CR/LF or when
some OS setting may handle these line terminators. Even if you're only >staying in your Unix universe. The "funniest" thing is if you process
files that have been edited by different people on different platforms.
(I know that I am not the first one who has written a CR-LF-CRLF tool
to check and fix (in some consistent way) the line endings of files.)
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Wed, 31 Jan 2024 23:25:25 +0000, Malcolm McLean wrote:
Fun fact: one of the names for hex 7F was “rubout”.
Additional fun fact. Rubout was the legend on the keycap on the ASR-33
used to rub out the prior character (the A in ASR means it has the reader/punch). On paper tape, it means ignore the prior character.
On 01/02/2024 22:02, David Brown wrote:
I'm not sure we are getting anywhere with you trying to invent more
and more complex situations in an attempt to find something that can't
be done from a Linux bash shell.
They're remarkably simple situations!
I'm sure you're capable of going through the exercise and then you might
gain a bit of insight on how to design such software systems. And, no, arguing that you'd go for a monolithic program doesn't necessarily mean
that you are a "quivering jelly" at the thought of writing several
simpler ones. And in fact to start you off I actually mentioned a few advantages of the pipeline approach.
There are advantages and drawbacks to both. But I can't force you to
think about what those might be if you won't, and from experience just telling you provokes your natural contentiousness and isn't very effective.
[...] I would use semicolons instead:
$ if file /tmp/garage.jpg | grep JPEG > /dev/null ; then echo "it is a jpeg" ; fi
it is a jpeg
(I might also use "grep -q" rather than redirecting to /dev/null.)
On 01/02/2024 15:07, Janis Papanagnou wrote:
JPEG is an extremely common binary file format and JPEG files will be
I don't know binary format details about jpg, so I cannot help you here.
found on most general purpose computers.
All you need to know for the purposes of the discussion is that the
first four bytes are segment identifiers and must have the values I
gave, whilst bytes five and six are a big endian 16 bit number that represents a segment length, and that potentially any of those values
could be unexpected and you might want to inspect them.
So how would you achieve that in a convenient and non-error prone way?
The difference is that the syntax for redirecting output in the UNIX
shell is ony of the slightest use if you happen to run that particular
type of system.
[...] And whilst pipes are a concept, they are no way
comparable in depth and fundamental importance to the concept of
functions of functions.
The point is the two are not comparable. [...]
In Perl you have an implict variable called $_. Some Perl statements
will operate on $_ without it actually being specified, and you then
have to reference $_ exoictitly to obtain the result. It's highly
confusing for anyone used to a conventional language with only one type
of named varibales. And that's one of the main decisions which makes
Perl hard to read.
However often you can write slightly less idiomatic Perl code which
doesn't make use of this feature, and then it's clearer. Or you can lay
the code out so that all the places where $_ are used in the same way
are together and make it a bit easier to work out what is going on.
There are thing you can do and Perl doesn't have to look like a
confusing mess.
On 02/02/2024 08:51, David Brown wrote:
On 01/02/2024 23:38, Malcolm McLean wrote:Well it's kind of poroof of the pudding. Ben has several programs
I'm sure you're capable of going through the exercise and then you
might gain a bit of insight on how to design such software systems.
And, no, arguing that you'd go for a monolithic program doesn't
necessarily mean that you are a "quivering jelly" at the thought of
writing several simpler ones. And in fact to start you off I actually
mentioned a few advantages of the pipeline approach.
I am perfectly aware of the advantages and disadvantages of monolithic
approaches.
connected by piplines and asked me what I thought of the design. I said
I'd go for a monolithinc approach. You criticised mem giving no reason
ither than that my oreferred approach was monolithic. So any reasonable erson would assume that you think that a monolithic approach is in and
of itself bad.
When invited to list the advantages and disadvantages if either, you
refused to do so. I am sure that you are capable of doing this, and you
are basically right. But you haven't actually done so. And it's proof of
the pudding.
Thne fact is there is case for `Ben's approach, there's a case for my approach, and maybe Ben's case is better. I've no objection to anyone weighing in on that. But fundamentally you do not understand what it
means to offer an argument or how to make a case.
On Fri, 02 Feb 2024 02:15:05 GMT, Scott Lurndal wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Wed, 31 Jan 2024 23:25:25 +0000, Malcolm McLean wrote:
Fun fact: one of the names for hex 7F was “rubout”.
Additional fun fact. Rubout was the legend on the keycap on the ASR-33
used to rub out the prior character (the A in ASR means it has the
reader/punch). On paper tape, it means ignore the prior character.
No, you had to overpunch the character to be ignored. Did that key >automatically backspace the tape for you, or did you have to do it
manually?
On Wed, 31 Jan 2024 23:25:25 +0000, Malcolm McLean wrote:
In ASCII, 0 means NUL, or "ignore".
Fun fact: one of the names for hex 7F was “rubout”. On seven-track paper tape, if you made a mistake typing your program, intead of throwing away
the tape and starting again, you could go back and punch out all the holes
at that position to produce a “rubout” character. The meaning was “ignore
this character”.
On 01/02/2024 15:07, Janis Papanagnou wrote:
On 01.02.2024 14:26, Malcolm McLean wrote:JPEG is an extremely common binary file format and JPEG files will be found on most general purpose computers.
On 01/02/2024 13:02, Janis Papanagnou wrote:No, I was speaking about an application that creates lilypond _input_,
Well, not necessarily. Let's leave the typical use case for a moment... >>>>And ultimately converted to a non binary form. A list of 1s and 0s is
It might also be analyzed and converted to a digitally represented
formula, say some TeX code, or e.g. like the formal syntax that the
lilypond program uses.
seldom any use to the final consumer of the data.
which is a formal language to write notes, e.g. for evaluation by the
lilypond software, but not excluding other usages.
This statement repelled you? (Can't help you here.)
I just ran "man xxd". The man page contains this statement.The two problems with hex
dumps are that you've got to do mental arithmetic to convert 8 bit hex >>>>> values into 16 or 32 bit fields,
Hmm.. - have you inspected the man pages of the tools?
The tool's weirdness matches its creator's brain. Use entirely at your
own risk. Copy files. Trace it. Become a wizard.
I don't know binary format details about jpg, so I cannot help you here.At least for 'od' I know it's easy per option...So a JPEG file starts with
od -c file # characters (or escapes and octals)
od -t x1 file # hex octets
od -t x2 file # words (two octets)
od -c -t x1 file # characters and octets
FF D8
FF E0
hi lo (length of the FF E0 segment)
So we want the output
FF D8 FF E0 [1000] to check that the segment markers are correct and FF
E0 segment is genuinely a thousand bytes (or whatever it is). This isn't >>> easy to achieve with a hex dump utility.
All you need to know for the purposes of the discussion is that the first four bytes are segment identifiers and must have the values I gave,
So how would you achieve that in a convenient and non-error prone way?
I've commented elsewhere on why I think a monolithic program is not a
good design, so I won't repeat that here.
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 30/01/2024 07:27, Tim Rentsch wrote:Maybe you are not used to a system where it's trivial to inspect such
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might >>>>> have unusual effects if passed though systems designed to handle
human-readable text.
data. When "some_prog" produces data that are not compatible with the
current terminal settings, "some_prog | hd" shows a hex dump instead.
The need to do this does not make "some_prog" poorly designed. It may
simply mean that the output is /intended/ for further processing.
For instance in some systems designed to receiveWhat is your evidence? stdout was just designed for output (as far as I
I must admit that it's nothing I have ever done or considered doing.ASCII text, there is no distinction between the nul byte and "waiting >>>>> for next data byte". Obviously this will cause difficuties if the data >>>>> is binary.Your reasoning is all gobbledygook. Your comments reflect only
Also many binary formats can't easily be extended, so you can pass one >>>>> image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to >>>>> extend.
limitations in your thinking, not any essential truth about using
standard out for binary data.
However standard output is designed for text and not binary ouput.
can tell) and, anyway, what is the distinction you are making between
binary and text? iconv --from ACSII --to EBCDIC-UK will produce
something that is "logically" text on stdout, but it might look like
binary to you.
An example where it's really useful not to care: I have a suite of tools
for doing toy cryptanalysis. Some apply various transformations and/or
filters to byte streams and others collect and output (on stderr)
various statistics. Plugging them together in various pipelines is very
handy when investigating an encrypted text. The output is almost always
"binary" in the sense that there would be not point in looking at on a
terminal.
According to you, these tools are poorly designed. I don't think so.
How would you design them? Endless input and output file names to be
juggled and tidied up afterwards?
I think they're poorly designed too.
From the POV of interactive console programs, they /are/ poor. But the mistake is thinking that they are actual programs or commands, when really they are just filters. They are not designed to be standalone
commands.
On 31/01/2024 23:36, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:I'd write a monolithic program.
On 30/01/2024 07:27, Tim Rentsch wrote:Maybe you are not used to a system where it's trivial to inspect such
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
On 29/01/2024 20:10, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
[...]
I've never used standard output for binary data.
[...] it strikes me as a poor design decision.
How so?
Because the output can't be inspected by humans, and because it might >>>>> have unusual effects if passed though systems designed to handle
human-readable text.
data. When "some_prog" produces data that are not compatible with the
current terminal settings, "some_prog | hd" shows a hex dump instead.
The need to do this does not make "some_prog" poorly designed. It may
simply mean that the output is /intended/ for further processing.
For instance in some systems designed to receiveWhat is your evidence? stdout was just designed for output (as far as I
I must admit that it's nothing I have ever done or considered doing.ASCII text, there is no distinction between the nul byte and "waiting >>>>> for next data byte". Obviously this will cause difficuties if the data >>>>> is binary.Your reasoning is all gobbledygook. Your comments reflect only
Also many binary formats can't easily be extended, so you can pass one >>>>> image and that's all. While it is possible to devise a text format
which is similar, in practice text formats usually have enough
redundancy to be easily extended.
So it's harder to correct errors, more prone to errors, and harder to >>>>> extend.
limitations in your thinking, not any essential truth about using
standard out for binary data.
However standard output is designed for text and not binary ouput.
can tell) and, anyway, what is the distinction you are making between
binary and text? iconv --from ACSII --to EBCDIC-UK will produce
something that is "logically" text on stdout, but it might look like
binary to you.
An example where it's really useful not to care: I have a suite of tools
for doing toy cryptanalysis. Some apply various transformations and/or
filters to byte streams and others collect and output (on stderr)
various statistics. Plugging them together in various pipelines is very
handy when investigating an encrypted text. The output is almost always
"binary" in the sense that there would be not point in looking at on a
terminal.
According to you, these tools are poorly designed. I don't think so.
How would you design them? Endless input and output file names to be
juggled and tidied up afterwards?
Load the encryoted text into memory, and then pass it to subroutines to do the various analyses.
You can of course process it, and then pass the processed output to other programs. And that does have a point if the program which is acceoting the processed outout is doing something which has no necessary connection to cryptanalysis. So for example a program to produce a pie chart from a list
of letter frequencies. But if it's transforming the encrypted text in intricate and specialised ways, then analysing the transformed text in
other specialised and intricate ways, then firstly you've probably
introduced coupling and dependency between the two programs, and secondly you're probably at some point going to want to modify the second program in the pipeline to look at the raw data.
On 01/02/2024 13:24, Tim Rentsch wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
You admit this with "not tested". Says it all. '"Understandig Unix" is >>> an intellectually useless achievement. You might have to do it if you
have to use the system and debug and trouble shoot. But it's nothing
to be proud about.
You're an idiot. As usual trying to have a useful discussion
with you has turned out to be a complete waste of time.
Some things are interesting in themselves and worth talking about at
lenght. Like how Haskell builds up functions of functions. Other
things really aren't. And how to set up a Unix pipeline is one of
those that really aren't (unless actually faced with a such a system
and with a practical need to do it).
I think you have the intelligence to understand this, if you'd just understand where I am coming from. This arrogant and dismissive
attitude does not become you.
On 1/24/24 16:11, Kaz Kylheku wrote:
On 2024-01-24, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 1/24/24 03:10, Janis Papanagnou wrote:
On 23.01.2024 23:37, Kalevi Kolttonen wrote:
[...] I am
pretty sure that not all computer languages
provide guarantees about the order of evaluation.
What?!
Could you explain what surprises you about that statement? As quoted,
it's a general statement which includes C: "Except as specified later,
side effects and value computations of subexpressions are unsequenced."
Pretty much any language has to guarantee *something* about
order of evaluation, somewhere.
Not the functional languages, I believe - but I've only heard about such languages, not used them.
Like for instance that calculating output is not possible before a
needed input is available.
Oddly enough, for a long time the C standard never said anything about
that issue. I argued that this was logically necessary, and few people disagreed with that argument, but I couldn't point to wording in the
standard to support that claim.
That changed when they added support for multi-threaded code to C in
C2011. That required the standard to be very explicit about which things could happen simultaneously in different threads, and which things had
to occur in a specified order. All of the wording about "sequenced" was first introduced at that time. [...]
On 30/01/2024 16:49, Malcolm McLean wrote:[nonsense as usual]
Mixing binary data with formatted text data is very unlikely to be
useful. fwrite() is perfectly good for writing binary data - it would
make no sense to have some awkward printf specifier to do this. (What
would the specifier even be? It would need to take two items of data -
a pointer and a length - and thus be very different from existing specifiers.)
On Tue, 30 Jan 2024 20:29:17 +0100, David Brown wrote:
stdout and stdin were apparently available in FORTRAN in the 1950's.
There was a convention that channel 5 was the card reader, and 6 was the
line printer.
When interactive systems came along later, this became channel 5 for
keyboard input, and 6 for terminal output.
What happened to channels 1, 2, 3 & 4? Dont know.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (0 / 16) |
Uptime: | 161:39:24 |
Calls: | 10,385 |
Calls today: | 2 |
Files: | 14,057 |
Messages: | 6,416,500 |