Forum: >>> Magnum BBS <<<

Re: text in programming languages, Unicode in strings

From John Levine@21:1/5 to All on Sat May 18 17:16:16 2024

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

I don't know how Japanese feel about that, but I certainly don't want
to have to use some Germanized form of C or Forth. This kind of
catering for different natural-language programmers has been tried and
has not taken over the world. I guess that's because

1) You need to learn a lot about what "printf" means and how it is
used; remembering the name is only a minor aspect.

2) Having a name common on all the world allows you to read programs
from all over the world, use reference material from all over the
world, etc.

A similar concept was implemented in COBOL, where the designers though
that having to write

ADD A TO B GIVING C

or somesuch makes programming easier than writing

C = A+B

in FORTRAN. ...

That's a common misconception. The point of having COBOL look like
English wasn't to make it easier to program but to make it easier for non-programmers to read. Think of an auditor looking at the program to
see if its business logic matches what the company says it does.

Also, I'm with you on localized command languages. Back in the 1980s I
worked on a modelling package called Javelin that had a keystroke
macro language. After going to some effort to provide macro names in
various languages, our foreign customers consistently told us that we
needn't have bothered.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Schultz@21:1/5 to John Levine on Sat May 18 13:25:59 2024

On 5/18/24 12:16 PM, John Levine wrote:

That's a common misconception. The point of having COBOL look like
English wasn't to make it easier to program but to make it easier for non-programmers to read. Think of an auditor looking at the program to
see if its business logic matches what the company says it does.

It certainly didn't make it easier for people learning to use it.

I remember way back in my undergrad days, when keypunch was king, that
every semester at some point someone would hang out a shingle: COBOL
Help Desk.

Never saw the equivalent for FORTRAN, etc.

--
http://davesrocketworks.com
David Schultz

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to David Schultz on Sun May 19 01:53:36 2024

David Schultz wrote:

On 5/18/24 12:16 PM, John Levine wrote:

That's a common misconception. The point of having COBOL look like
English wasn't to make it easier to program but to make it easier for
non-programmers to read. Think of an auditor looking at the program to
see if its business logic matches what the company says it does.

It certainly didn't make it easier for people learning to use it.

I remember way back in my undergrad days, when keypunch was king, that
every semester at some point someone would hang out a shingle: COBOL
Help Desk.

Never saw the equivalent for FORTRAN, etc.

I, myself, played the Fortran part. But, instead of having a help desk,
I could do almost anything a Fortran programmer wanted over a
telephone.
I was sitting over my soldering bench disassembling a stereo, and
teaching
a person how to do a simple sort program while repairing a stereo
amplifier.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to John Levine on Sun May 19 12:55:13 2024

John Levine wrote:

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

I don't know how Japanese feel about that, but I certainly don't want
to have to use some Germanized form of C or Forth. This kind of
catering for different natural-language programmers has been tried and
has not taken over the world. I guess that's because

1) You need to learn a lot about what "printf" means and how it is
used; remembering the name is only a minor aspect.

2) Having a name common on all the world allows you to read programs
from all over the world, use reference material from all over the
world, etc.

A similar concept was implemented in COBOL, where the designers though
that having to write

ADD A TO B GIVING C

or somesuch makes programming easier than writing

C = A+B

in FORTRAN. ...

That's a common misconception. The point of having COBOL look like
English wasn't to make it easier to program but to make it easier for non-programmers to read. Think of an auditor looking at the program to
see if its business logic matches what the company says it does.

Also, I'm with you on localized command languages. Back in the 1980s I
worked on a modelling package called Javelin that had a keystroke
macro language. After going to some effort to provide macro names in
various languages, our foreign customers consistently told us that we
needn't have bothered.

With one glaring exception: A financial package we used worldwide in
Hydro had such single-key macros, based on the first letter of the
command. This worked great everywhere since we used the US English
version worldwide.

The exception was in France where you have to pay a pretty significant
fine (in the form of substantial pay rises) for any worker who had to
interact with anything not written in French, so there we had the
localized version of the package, and none of those key macro functions
worked as-is.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to All on Sun May 19 16:32:29 2024

MitchAlsup1 wrote:

David Schultz wrote:

On 5/18/24 12:16 PM, John Levine wrote:

That's a common misconception. The point of having COBOL look like
English wasn't to make it easier to program but to make it easier for
non-programmers to read. Think of an auditor looking at the program to
see if its business logic matches what the company says it does.

It certainly didn't make it easier for people learning to use it.

I remember way back in my undergrad days, when keypunch was king, that
every semester at some point someone would hang out a shingle: COBOL
Help Desk.

Never saw the equivalent for FORTRAN, etc.

I, myself, played the Fortran part. But, instead of having a help desk,
I could do almost anything a Fortran programmer wanted over a
telephone.
I was sitting over my soldering bench disassembling a stereo, and
teaching
a person how to do a simple sort program while repairing a stereo
amplifier.

Reminds me of talking to a Hydro guy in southern Iran, close to the Iraq border, who had a non-bootable laptop. I was able to tell him on the
phone how to run debug to load the master boot record and hex-edit it to
make it bootable again and then write it back, without seeing his screen
at any point.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Levine on Sun May 19 15:52:49 2024

John Levine <johnl@taugh.com> writes:

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

A similar concept was implemented in COBOL, where the designers though
that having to write

ADD A TO B GIVING C

or somesuch makes programming easier than writing

C = A+B

in FORTRAN. ...

That's a common misconception. The point of having COBOL look like
English wasn't to make it easier to program but to make it easier for >non-programmers to read. Think of an auditor looking at the program to
see if its business logic matches what the company says it does.

That may have been the idea, but I think the idea was wrong. People
learn in primary school what + means. Variables are somewhat later,
but COBOL does not spare auditors from understanding those. The case
of = is interesting in that FORTRAN's usage is decidedly
non-mathematical in statements such as

I = I+1

However, my experience is that beginning programmers have no problems understanding the idea of imperative programming (less than e.g.,
understanding the use of = in Prolog which is much more mathematical)
and assignments even after many years of being taught with the use of
= for equations.

In any case, I think that even an auditor who does not know
programming at all will find the FORTRAN syntax for a computation like
the one above easier to read than the COBOL syntax once he has to read
more than a dozen such statements. Hasn't COBOL also included a more mathematically oriented way of writing computations?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Anton Ertl on Sun May 19 18:08:18 2024

Anton Ertl wrote:

John Levine <johnl@taugh.com> writes:

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

A similar concept was implemented in COBOL, where the designers
though that having to write

ADD A TO B GIVING C

or somesuch makes programming easier than writing

C = A+B

in FORTRAN. ...

That's a common misconception. The point of having COBOL look like
English wasn't to make it easier to program but to make it easier
for non-programmers to read. Think of an auditor looking at the
program to see if its business logic matches what the company says
it does.

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly
wouldn't. But I give the designers some slack as, in the late 1950s,
there was lettle knowledge about programming languages to go on. Now,
the mistake is obvious.

snip

Hasn't COBOL also included a more
mathematically oriented way of writing computations?

Yes, the COMPUTE statement. i.e. COMPUTE I = I + 1

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sun May 19 19:29:06 2024

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly
wouldn't. But I give the designers some slack as, in the late 1950s,
there was lettle knowledge about programming languages to go on. Now,
the mistake is obvious.

COBOL is older than Fortran, and back in the day there were plenty of
people who were outraged at I=I+1 which is mathematically absurd for the physicicts and mathematicians who were Fortran's early users.

Algol gave us various kinds of := which were supposed to be better.

Yes, the COMPUTE statement. i.e. COMPUTE I = I + 1

You could do that, but I think this is at least as clear:

ADD 1 TO PRODUCT-INDEX.

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good. Everything in a C or C++
structure comes from COBOL by way of PL/I.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Levine on Sun May 19 20:48:33 2024

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly
wouldn't. But I give the designers some slack as, in the late 1950s,
there was lettle knowledge about programming languages to go on. Now,
the mistake is obvious.

COBOL is older than Fortran, and back in the day there were plenty of
people who were outraged at I=I+1 which is mathematically absurd for
the
physicicts and mathematicians who were Fortran's early users.

Algol gave us various kinds of := which were supposed to be better.

Yes, the COMPUTE statement. i.e. COMPUTE I = I + 1

You could do that, but I think this is at least as clear:

ADD 1 TO PRODUCT-INDEX.

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good. Everything in a C or C++ structure comes from COBOL by way of PL/I.

Picture data structures ??

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to John Levine on Sun May 19 23:02:29 2024

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly
wouldn't. But I give the designers some slack as, in the late
1950s, there was lettle knowledge about programming languages to go
on. Now, the mistake is obvious.

COBOL is older than Fortran, and back in the day there were plenty of
people who were outraged at I=I+1 which is mathematically absurd for
the physicicts and mathematicians who were Fortran's early users.

Algol gave us various kinds of := which were supposed to be better.

Yes, the COMPUTE statement. i.e. COMPUTE I = I + 1

You could do that, but I think this is at least as clear:

ADD 1 TO PRODUCT-INDEX.

Note that my statement on the COMPUTE statement was in direct answer to
Anton's question. I have no problem with your statement, but note that
things get messy quickly when doing more complex stuff. For example,
the non-COMPUTE syntax for something like

E = (A*B) + (C*D)

requires three separate statements, and, if you need to keep the
original values of A through D, you must define two additional
variables for the intermediate results.

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good.

Absolutely! The power of a MOVE CORRESPONDING with properly defined
data structures was a joy to behold.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stephen Fuld on Mon May 20 00:12:36 2024

Stephen Fuld wrote:

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly
wouldn't. But I give the designers some slack as, in the late
1950s, there was lettle knowledge about programming languages to go
on. Now, the mistake is obvious.

COBOL is older than Fortran, and back in the day there were plenty of
people who were outraged at I=I+1 which is mathematically absurd for
the physicicts and mathematicians who were Fortran's early users.

Algol gave us various kinds of := which were supposed to be better.

Yes, the COMPUTE statement. i.e. COMPUTE I = I + 1

You could do that, but I think this is at least as clear:

ADD 1 TO PRODUCT-INDEX.

Note that my statement on the COMPUTE statement was in direct answer to Anton's question. I have no problem with your statement, but note that things get messy quickly when doing more complex stuff. For example,
the non-COMPUTE syntax for something like

E = (A*B) + (C*D)

requires three separate statements, and, if you need to keep the
original values of A through D, you must define two additional
variables for the intermediate results.

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good.

Absolutely! The power of a MOVE CORRESPONDING with properly defined
data structures was a joy to behold.

The power of PL/1s DCL like was similarly a joy to use.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to All on Mon May 20 01:14:38 2024

MitchAlsup1 wrote:

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly wouldn't. But I give the designers some slack as, in the late
1950s, there was lettle knowledge about programming languages to
go on. Now, the mistake is obvious.

COBOL is older than Fortran, and back in the day there were plenty
of people who were outraged at I=I+1 which is mathematically absurd
for the
physicicts and mathematicians who were Fortran's early users.

Algol gave us various kinds of := which were supposed to be better.

Yes, the COMPUTE statement. i.e. COMPUTE I = I + 1

You could do that, but I think this is at least as clear:

ADD 1 TO PRODUCT-INDEX.

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good. Everything in a C or
C++ structure comes from COBOL by way of PL/I.

Picture data structures ??

I'm not sure what you are saying here. While Picture clauses are not
in C nor C++, John never cleamed they were. His clain was that those
features that were included came from COBOL. e.g. nested structs,
array of structs, structs of arrays, etc.

And I miss some equivalent of picture clauses in C every time I see,
including in this NG, a number consisting of a string of say 8 or 9 or
more digits without the every three digit separator character, which
sure makes reading such numbers easier. :-(

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Mon May 20 01:51:22 2024

According to MitchAlsup1 <mitchalsup@aol.com>:

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good.

Absolutely! The power of a MOVE CORRESPONDING with properly defined
data structures was a joy to behold.

The power of PL/1s DCL like was similarly a joy to use.

That was of course no coincidence. Considering how quickly PL/I was
thrown together, it was a remarkably good language.

I see Iron Spring still sells PL/I compilers if you want to run it on linux.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Levine on Mon May 20 11:10:55 2024

John Levine <johnl@taugh.com> writes:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly
wouldn't. But I give the designers some slack as, in the late 1950s,
there was lettle knowledge about programming languages to go on.

Certainly.

Now, the mistake is obvious.

Maybe not so obvious. Certainly, as the start of this discussion
shows, the idea that a programming language should orient itself
towards the native language of a person is not yet universally
considered a mistake.

Anyway, such mistakes are valuable as we now can say that this idea
was tried, and did not catch on. Ok, this might be due to programming
language designers not liking the idea while it was popular with
programmers, but given that programmers language designers tend to
also be programmers, and many programmers have designed another
programming language if they did not like what they are given, I doubt
that.

COBOL is older than Fortran

According to Wikipedia, COBOL was designed in 1959. A draft of the
FORTRAN specification was completed in 1954, a manual appeared in
1956, and the compiler was delivered in 1957. COBOL also looks
syntactically more modern, with something BNF-like already leading to
excessive syntax, whereas Fortran's approach to white space makes it
obvious that the modern (i.e., post-FORTRAN) division into scanning an
parsing had not been developed yet and had not affected the syntax.

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good. Everything in a C or C++ >structure comes from COBOL by way of PL/I.

And Algol 68.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Stephen Fuld on Mon May 20 12:53:16 2024

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

And I miss some equivalent of picture clauses in C every time I see, >including in this NG, a number consisting of a string of say 8 or 9 or
more digits without the every three digit separator character, which
sure makes reading such numbers easier. :-(

Why would you need a picture clause for that? For output the Single
Unix Specification v3 specifies the flag "'" for printf(), which means
(from man 3 printf):

For decimal conversion (i, d, u, f, F, g, G) the output is to be
grouped with thousands' grouping characters if the locale infor‐
mation indicates any. (See setlocale(3).) Note that many ver‐
sions of gcc(1) cannot parse this option and will issue a warn‐
ing. (SUSv2 did not include %'F, but SUSv3 added it.)

You can also specify "'" in scanf():

For decimal conversions, an optional quote character ('). This
specifies that the input number may include thousands' separa‐
tors as defined by the LC_NUMERIC category of the current lo‐
cale. (See setlocale(3).) The quote character may precede or
follow the '*' assignment-suppression character.

I am not convinced that the locale-specific input is a good idea, though.

Python3 supports _ as digit-group separator on input, and we have
implemented it in Gforth's development version, too. For getting _ in
output from C programs with the feature above, I have created a locals
"prog": <https://www.complang.tuwien.ac.at/anton/locale-prog/>

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Dallman@21:1/5 to Anton Ertl on Mon May 20 14:44:00 2024

In article <2024May20.145316@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

I am not convinced that the locale-specific input is a good idea,
though.

You look pretty silly if your input function can't read the products of
your output function, and figuring out what separators have been used automatically is not foolproof.

John

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Stephen Fuld on Mon May 20 15:35:15 2024

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

MitchAlsup1 wrote:

And I miss some equivalent of picture clauses in C every time I see, >including in this NG, a number consisting of a string of say 8 or 9 or
more digits without the every three digit separator character, which
sure makes reading such numbers easier. :-(

printf has supported that capability for decades.

$ LANG=en_US.utf8 printf "%'10.2f\n" $(( 1540.0 * 179.47 + (7928 + 401 - 535) * 67.61 + 2295 * 173.35 + 3230 * 191.00 + 192 * 750.15 + 150 * 509.85 + 254 * 6>
2,505,496.61

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Scott Lurndal on Mon May 20 15:51:34 2024

Scott Lurndal wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

MitchAlsup1 wrote:

And I miss some equivalent of picture clauses in C every time I see, including in this NG, a number consisting of a string of say 8 or 9
or more digits without the every three digit separator character,
which sure makes reading such numbers easier. :-(

printf has supported that capability for decades.

$ LANG=en_US.utf8 printf "%'10.2f\n" $(( 1540.0 * 179.47 + (7928 +
401 - 535) * 67.61 + 2295 * 173.35 + 3230 * 191.00 + 192 * 750.15 +
150 * 509.85 + 254 * 6> 2,505,496.61

If the mechanisms that you and Anton present were so easily available,
why do I see so many instances of "non separated" outputs? Are they
too recent, or are programmers too lazy, or do they believe that users
don't care (i.e. that I am in a minority)?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Scott Lurndal on Mon May 20 19:05:15 2024

On Mon, 20 May 2024 15:35:15 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

MitchAlsup1 wrote:

And I miss some equivalent of picture clauses in C every time I see, >including in this NG, a number consisting of a string of say 8 or 9
or more digits without the every three digit separator character,
which sure makes reading such numbers easier. :-(

printf has supported that capability for decades.

$ LANG=en_US.utf8 printf "%'10.2f\n" $(( 1540.0 * 179.47 + (7928 +
401 - 535) * 67.61 + 2295 * 173.35 + 3230 * 191.00 + 192 * 750.15 +
150 * 509.85 + 254 * 6> 2,505,496.61

That's POSIX rather than C. Even with gcc compiler I don't think it
would work on majority of embedded targets. It certainly does not work
with gcc under any Windows/msys2 operation environment

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Anton Ertl on Mon May 20 16:04:29 2024

Anton Ertl wrote:

John Levine <johnl@taugh.com> writes:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly wouldn't. But I give the designers some slack as, in the late
1950s, there was lettle knowledge about programming languages to
go on.

Certainly.

Now, the mistake is obvious.

Maybe not so obvious. Certainly, as the start of this discussion
shows, the idea that a programming language should orient itself
towards the native language of a person is not yet universally
considered a mistake.

Subtle difference. To me, the "mistake" was trying to make the
programming language as similar as possible to "natural" language (as
opposed to the relative tersness of a more "mathematical language", not
any particular natural language. On the other hand, I think APL showed
that one can go too far in that direction as well. :-(

Anyway, such mistakes are valuable as we now can say that this idea
was tried, and did not catch on.

Agreed.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Stephen Fuld on Mon May 20 19:13:44 2024

On Mon, 20 May 2024 15:51:34 -0000 (UTC)
"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> wrote:

Scott Lurndal wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

MitchAlsup1 wrote:

And I miss some equivalent of picture clauses in C every time I
see, including in this NG, a number consisting of a string of say
8 or 9 or more digits without the every three digit separator
character, which sure makes reading such numbers easier. :-(

printf has supported that capability for decades.

$ LANG=en_US.utf8 printf "%'10.2f\n" $(( 1540.0 * 179.47 + (7928 +
401 - 535) * 67.61 + 2295 * 173.35 + 3230 * 191.00 + 192 * 750.15
+ 150 * 509.85 + 254 * 6> 2,505,496.61

If the mechanisms that you and Anton present were so easily available,
why do I see so many instances of "non separated" outputs? Are they
too recent, or are programmers too lazy, or do they believe that users
don't care (i.e. that I am in a minority)?

All three and more. It's none-recent, but not universally available
extension.
4th reason - output formatted this way is less
machine-readable which is important downside on comp.arch.
5th reason is that separators are culture-dependent. How many European countries use comma-separated triads? I think, very few.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Stephen Fuld on Mon May 20 16:32:31 2024

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Scott Lurndal wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

MitchAlsup1 wrote:

And I miss some equivalent of picture clauses in C every time I see,
including in this NG, a number consisting of a string of say 8 or 9
or more digits without the every three digit separator character,
which sure makes reading such numbers easier. :-(

printf has supported that capability for decades.

$ LANG=en_US.utf8 printf "%'10.2f\n" $(( 1540.0 * 179.47 + (7928 +
401 - 535) * 67.61 + 2295 * 173.35 + 3230 * 191.00 + 192 * 750.15 +
150 * 509.85 + 254 * 6> 2,505,496.61

If the mechanisms that you and Anton present were so easily available,
why do I see so many instances of "non separated" outputs? Are they
too recent, or are programmers too lazy, or do they believe that users
don't care (i.e. that I am in a minority)?

The separation character is defined by the designated locale. The
C locale doesn't define a separation character. The programmer
needs to be aware of the apostrophe flag in the format string and
use it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Anton Ertl on Mon May 20 12:22:23 2024

Anton Ertl wrote:

John Levine <johnl@taugh.com> writes:

COBOL is older than Fortran

According to Wikipedia, COBOL was designed in 1959. A draft of the
FORTRAN specification was completed in 1954, a manual appeared in
1956, and the compiler was delivered in 1957. COBOL also looks
syntactically more modern, with something BNF-like already leading to excessive syntax, whereas Fortran's approach to white space makes it
obvious that the modern (i.e., post-FORTRAN) division into scanning an parsing had not been developed yet and had not affected the syntax.

Fortran of old has a context sensitive lex - what constitutes a "token"
depends on what else is in the same statement and/or where it is
located in the punch card (even if the "punch card" is a disk file).
E.g.

do x = 1,5
do x = 1.5

When one removes the white space they become dox=1,5 and dox=1.5 so to determine if the first token is "DO", a language token, or "DOX",
an identifier token, it must scan ahead and see the ',' or '.'.
If it sees ',' then it must be a DO loop so 'X' must be an identifier,
but if it sees a '.' then this is an assignment to an identifier "DOX".
And that scan ahead can cross many continuation cards.

There is a set of rules called "Sale's Algorithm" published by someone
named Sale in CACM in the sixties that covers some of this.

Fortran lex/parse gets worse with the DEC Fortran 77 enhancements
as the interpretation of tokens can change depending on declarations
made elsewhere in the program. E.G.

if (a .eq. b) then

(".eq." is the Fortran compare-equal operator)
This could be a compare if "A" equals "B",
or a record named "A" containing a sub-record named "EQ",
containing a logical (boolean) field named "B" to test TRUE/FALSE.
Here the lexer has to consult the symbol table to determine
what kind of token ".eq." is.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Michael S on Mon May 20 16:34:50 2024

Michael S <already5chosen@yahoo.com> writes:

On Mon, 20 May 2024 15:51:34 -0000 (UTC)
"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> wrote:

Scott Lurndal wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

MitchAlsup1 wrote:

And I miss some equivalent of picture clauses in C every time I
see, including in this NG, a number consisting of a string of say
8 or 9 or more digits without the every three digit separator
character, which sure makes reading such numbers easier. :-(

printf has supported that capability for decades.

$ LANG=en_US.utf8 printf "%'10.2f\n" $(( 1540.0 * 179.47 + (7928 +
401 - 535) * 67.61 + 2295 * 173.35 + 3230 * 191.00 + 192 * 750.15
+ 150 * 509.85 + 254 * 6> 2,505,496.61

If the mechanisms that you and Anton present were so easily available,
why do I see so many instances of "non separated" outputs? Are they
too recent, or are programmers too lazy, or do they believe that users
don't care (i.e. that I am in a minority)?

All three and more. It's none-recent, but not universally available >extension.
4th reason - output formatted this way is less
machine-readable which is important downside on comp.arch.
5th reason is that separators are culture-dependent. How many European >countries use comma-separated triads? I think, very few.

The separator character is defined in the locale database,
so for a european language, the period would be used instead of
the american comma.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Stephen Fuld on Mon May 20 16:23:36 2024

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Scott Lurndal wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

MitchAlsup1 wrote:

And I miss some equivalent of picture clauses in C every time I see,
including in this NG, a number consisting of a string of say 8 or 9
or more digits without the every three digit separator character,
which sure makes reading such numbers easier. :-(

printf has supported that capability for decades.

$ LANG=en_US.utf8 printf "%'10.2f\n" $(( 1540.0 * 179.47 + (7928 +
401 - 535) * 67.61 + 2295 * 173.35 + 3230 * 191.00 + 192 * 750.15 +
150 * 509.85 + 254 * 6> 2,505,496.61

If the mechanisms that you and Anton present were so easily available,
why do I see so many instances of "non separated" outputs?

What locale are you using?

Are they
too recent,

SUSv3 is from 2001. But have Windows implementations of printf and
scanf implemented the "'" flag? If so, when?

or are programmers too lazy, or do they believe that users
don't care (i.e. that I am in a minority)?

An alternative explanation is that programmers believe that users care
that the output of the program can be used as input to programs with a
minimum of fuss. There are two difficulties here:

* There are likely programs that do not understand input with
thousands separators. This includes the printf program shown above
at least as supplied with Debian 12. So if a program outputs
thousands separators, there may be difficulties in feeding that to
downstream programs.

* The text representation becomes locale-specific:

[b3:~:105507] LC_NUMERIC=en_US printf "%'10d\n" 505496
505,496
[b3:~:105508] LC_NUMERIC=de_AT printf "%'10d\n" 505496
505.496

Note that the first output would be interpreted as 505496E-3 in the
de_AT locale, and the second would also be interpreted as 505496E-3 in
the en_US locale.

Therefore I think that we need some locale-independent thousands (and
decimal) separators. For the decimal separator every programming
language uses ".". For the thousands separator, Ada, C#, D, Eiffel,
Go, Haskell, Java, Julia, Perl, Python3, Ruby, Rust, and Swift use _.
And Gforth, too.

In C++, there is a conflict of _ with another feature, so they instead
chose '. However, C++ is not used interactively, and in a code
generator that generates C++ code, having to replace _ with ' for
generating C++ code is a minor pain.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Dallman on Mon May 20 17:24:03 2024

jgd@cix.co.uk (John Dallman) writes:

In article <2024May20.145316@mips.complang.tuwien.ac.at>, >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

I am not convinced that the locale-specific input is a good idea,
though.

You look pretty silly if your input function can't read the products of
your output function, and figuring out what separators have been used >automatically is not foolproof.

Yes and yes. Especially given the "," vs. "." roles in various
locales.

But OTOH, not being able to read or, worse, misinterpreting the output
produced by someone else just because that output was produced under a different locale is pretty silly, too.

For reserved words and builtin names of programming languages, the
solution has been to make them independent of the locale and ignore
Algol 60 and Algol 68 for programming, which suggested something else.

We already do the same for the decimal separator in the usual output
functions (it uses "."), we should introduce thousands separators that
are also locale-independent.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Anton Ertl on Mon May 20 17:50:15 2024

Anton Ertl wrote:

John Levine <johnl@taugh.com> writes:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly >>>wouldn't. But I give the designers some slack as, in the late 1950s, >>>there was lettle knowledge about programming languages to go on.

Certainly.

Now, the mistake is obvious.

Maybe not so obvious. Certainly, as the start of this discussion
shows, the idea that a programming language should orient itself
towards the native language of a person is not yet universally
considered a mistake.

Anyway, such mistakes are valuable as we now can say that this idea
was tried, and did not catch on. Ok, this might be due to programming language designers not liking the idea while it was popular with
programmers, but given that programmers language designers tend to
also be programmers, and many programmers have designed another
programming language if they did not like what they are given, I doubt
that.

COBOL is older than Fortran

According to Wikipedia, COBOL was designed in 1959. A draft of the
FORTRAN specification was completed in 1954, a manual appeared in
1956, and the compiler was delivered in 1957. COBOL also looks
syntactically more modern, with something BNF-like already leading to excessive syntax, whereas Fortran's approach to white space makes it
obvious that the modern (i.e., post-FORTRAN) division into scanning an parsing had not been developed yet and had not affected the syntax.

DO 400 I = 10

Is an assignment statement assigning the variable DO400I the value of
10

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good. Everything in a C or C++ >>structure comes from COBOL by way of PL/I.

And Algol 68.

- anton

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stephen Fuld on Mon May 20 17:46:49 2024

Stephen Fuld wrote:

MitchAlsup1 wrote:

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

That may have been the idea, but I think the idea was wrong.

I think few would disagree with both parts of that. I certainly
wouldn't. But I give the designers some slack as, in the late
1950s, there was lettle knowledge about programming languages to
go on. Now, the mistake is obvious.

COBOL is older than Fortran, and back in the day there were plenty
of people who were outraged at I=I+1 which is mathematically absurd
for the
physicicts and mathematicians who were Fortran's early users.

Algol gave us various kinds of := which were supposed to be better.

Yes, the COMPUTE statement. i.e. COMPUTE I = I + 1

You could do that, but I think this is at least as clear:

ADD 1 TO PRODUCT-INDEX.

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good. Everything in a C or
C++ structure comes from COBOL by way of PL/I.

Picture data structures ??

I'm not sure what you are saying here.

PICTURE x $999,999,999.00;

While Picture clauses are not
in C nor C++, John never cleamed they were. His clain was that those features that were included came from COBOL. e.g. nested structs,
array of structs, structs of arrays, etc.

And I miss some equivalent of picture clauses in C every time I see, including in this NG, a number consisting of a string of say 8 or 9 or
more digits without the every three digit separator character, which
sure makes reading such numbers easier. :-(

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to EricP on Mon May 20 14:33:57 2024

EricP wrote:

Fortran of old has a context sensitive lex - what constitutes a "token" depends on what else is in the same statement and/or where it is
located in the punch card (even if the "punch card" is a disk file).
E.g.

do x = 1,5
do x = 1.5

When one removes the white space they become dox=1,5 and dox=1.5 so to determine if the first token is "DO", a language token, or "DOX",
an identifier token, it must scan ahead and see the ',' or '.'.
If it sees ',' then it must be a DO loop so 'X' must be an identifier,
but if it sees a '.' then this is an assignment to an identifier "DOX".
And that scan ahead can cross many continuation cards.

There is a set of rules called "Sale's Algorithm" published by someone
named Sale in CACM in the sixties that covers some of this.

Fortran lex/parse gets worse with the DEC Fortran 77 enhancements
as the interpretation of tokens can change depending on declarations
made elsewhere in the program. E.G.

if (a .eq. b) then

(".eq." is the Fortran compare-equal operator)
This could be a compare if "A" equals "B",
or a record named "A" containing a sub-record named "EQ",
containing a logical (boolean) field named "B" to test TRUE/FALSE.
Here the lexer has to consult the symbol table to determine
what kind of token ".eq." is.

This paper spends about 8 pages covering the abiguities in
parsing the C language (typedef gets a lot of pages):

A Simple, Possibly Correct LR Parser for C11, 2017 https://hal.science/hal-01633123v1/file/jourdan2017simple.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to Anton Ertl on Mon May 20 21:33:15 2024

Anton Ertl wrote:

jgd@cix.co.uk (John Dallman) writes:

In article <2024May20.145316@mips.complang.tuwien.ac.at>,
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

I am not convinced that the locale-specific input is a good idea,
though.

You look pretty silly if your input function can't read the products of
your output function, and figuring out what separators have been used
automatically is not foolproof.

Yes and yes. Especially given the "," vs. "." roles in various
locales.

But OTOH, not being able to read or, worse, misinterpreting the output produced by someone else just because that output was produced under a different locale is pretty silly, too.

For reserved words and builtin names of programming languages, the
solution has been to make them independent of the locale and ignore
Algol 60 and Algol 68 for programming, which suggested something else.

We already do the same for the decimal separator in the usual output functions (it uses "."), we should introduce thousands separators that
are also locale-independent.

Yeah, this is one of those misfeatures with no good solution. Hydro,
with 130 operating countries (factories in 70+ of them) had lots of
issues with programs that insisted on producing output (or reading
input) in whatever their current locale/country specified, vs those that
would always use US (or even worse: Norwegian) rules.

I've personally written perl scripts to parse/inspect financial
consolidation reports, figure out the locale rules used and then convert
to the company standard.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Anton Ertl on Mon May 20 19:19:59 2024

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Scott Lurndal wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

MitchAlsup1 wrote:

And I miss some equivalent of picture clauses in C every time I

see, >> > including in this NG, a number consisting of a string of
say 8 or 9 >> > or more digits without the every three digit
separator character, >> > which sure makes reading such numbers
easier. :-( >>

printf has supported that capability for decades.

$ LANG=en_US.utf8 printf "%'10.2f\n" $(( 1540.0 * 179.47 + (7928 +
401 - 535) * 67.61 + 2295 * 173.35 + 3230 * 191.00 + 192 *

750.15 + >> 150 * 509.85 + 254 * 6> 2,505,496.61

If the mechanisms that you and Anton present were so easily
available, why do I see so many instances of "non separated"
outputs?

What locale are you using?

I happen to be in the US, however . . .

When I mentioned "in this NG", I meant posts that present the outputs
of some similation program, reduced trace data, GREP output, etc. I
certainly don't want to deprecate such posts, as clearly having actual
data is better than even informed guesses or opinions. And,in regards
to locale, I can handle just fine, numbers presented with just about
any reasonable separator - I am not insisting on commas. But I suspect
that most readers in this news group would be happy with either commas
or periods, spaces, underscore, etc. The key is something versus
nothing.

Are they
too recent,

SUSv3 is from 2001. But have Windows implementations of printf and
scanf implemented the "'" flag? If so, when?

or are programmers too lazy, or do they believe that users
don't care (i.e. that I am in a minority)?

An alternative explanation is that programmers believe that users care
that the output of the program can be used as input to programs with a minimum of fuss.

Good point.

There are two difficulties here:

* There are likely programs that do not understand input with
thousands separators. This includes the printf program shown above
at least as supplied with Debian 12. So if a program outputs
thousands separators, there may be difficulties in feeding that to
downstream programs.

* The text representation becomes locale-specific:

[b3:~:105507] LC_NUMERIC=en_US printf "%'10d\n" 505496
505,496
[b3:~:105508] LC_NUMERIC=de_AT printf "%'10d\n" 505496
505.496

Note that the first output would be interpreted as 505496E-3 in the
de_AT locale, and the second would also be interpreted as 505496E-3
in the en_US locale.

Therefore I think that we need some locale-independent thousands (and decimal) separators.

Fair enough.

For the decimal separator every programming
language uses ".".

Do they? I vaguely remember that in COBOL (which was defined before
locale were a thing), if you specified "Decimal is Comma" (I may have
the syntax wrong), then the decimal speparator became the comma.
Perhpa this wasn't brought forward. But then if you use the period for
the three digits separator and the decimal separator, doesn't that
cause confusion?

For the thousands separator, Ada, C#, D, Eiffel,
Go, Haskell, Java, Julia, Perl, Python3, Ruby, Rust, and Swift use _.
And Gforth, too.

In C++, there is a conflict of _ with another feature, so they instead
chose '. However, C++ is not used interactively, and in a code
generator that generates C++ code, having to replace _ with ' for
generating C++ code is a minor pain.

- anton

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Levine on Mon May 20 21:52:14 2024

John Levine <johnl@taugh.com> schrieb:

COBOL is older than Fortran,

Certainly not (unless you mean "Fortran" in the Fortran 90+ sense).
FORTRAN was released 1957, and the first Cobol specification
appears to have been passed in 1960.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to tkoenig@netcologne.de on Tue May 21 00:32:45 2024

It appears that Thomas Koenig <tkoenig@netcologne.de> said:

John Levine <johnl@taugh.com> schrieb:

COBOL is older than Fortran,

Certainly not (unless you mean "Fortran" in the Fortran 90+ sense).
FORTRAN was released 1957, and the first Cobol specification
appears to have been passed in 1960.

I was thinking of Flow-Matic which is arguably older than
Fortran and is where most of COBOL came from. Grace Hopper
was entirely familar with mathematical notation and said that
Univac's business customers didn't like it.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Levine on Tue May 21 05:08:08 2024

John Levine <johnl@taugh.com> schrieb:

It appears that Thomas Koenig <tkoenig@netcologne.de> said:

John Levine <johnl@taugh.com> schrieb:

COBOL is older than Fortran,

Certainly not (unless you mean "Fortran" in the Fortran 90+ sense).
FORTRAN was released 1957, and the first Cobol specification
appears to have been passed in 1960.

I was thinking of Flow-Matic which is arguably older than
Fortran and is where most of COBOL came from. Grace Hopper
was entirely familar with mathematical notation and said that
Univac's business customers didn't like it.

Seems that Flow-Matic became publically available in 1958 and was "substantially complete" by 1959, so FORTRAN came earlier.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Tue May 21 15:45:16 2024

According to Thomas Koenig <tkoenig@netcologne.de>:

John Levine <johnl@taugh.com> schrieb:

It appears that Thomas Koenig <tkoenig@netcologne.de> said:

John Levine <johnl@taugh.com> schrieb:

COBOL is older than Fortran,

Certainly not (unless you mean "Fortran" in the Fortran 90+ sense). >>>FORTRAN was released 1957, and the first Cobol specification
appears to have been passed in 1960.

I was thinking of Flow-Matic which is arguably older than
Fortran and is where most of COBOL came from. Grace Hopper
was entirely familar with mathematical notation and said that
Univac's business customers didn't like it.

Seems that Flow-Matic became publically available in 1958 and was >"substantially complete" by 1959, so FORTRAN came earlier.

Hopper said she started working on it in 1953 and there was an
internal version in 1955. Maybe it's not older than Fortran but it
certainly isn't much newer and the two were going on at the same time.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Stephen Fuld on Tue May 21 15:51:26 2024

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

If the mechanisms that you and Anton present were so easily
available, why do I see so many instances of "non separated"
outputs?

What locale are you using?

I happen to be in the US, however . . .

When I mentioned "in this NG", I meant posts that present the outputs
of some similation program, reduced trace data, GREP output, etc.

So the question is what locale the posters in this newsgroup use. I
typically use the C.utf8 locale, because then ls sorts directories as
Thompson intended, but which does not show thousands separator. If I
want to show thousands separators, I usually do it with

LC_NUMERIC=prog <command>

or (on machines where I have not installed the prog locale):

LC_NUMERIC=en_US <command>

Let's see how that works for some programs:

[c8:~:105615] LC_NUMERIC=prog perf stat true

Performance counter stats for 'true':

0.17 msec task-clock # 0.376 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
42 page-faults # 0.242 M/sec
470_561 cycles # 2.716 GHz
5_214 stalled-cycles-frontend # 1.11% frontend cycles idle
28_375 stalled-cycles-backend # 6.03% backend cycles idle
515_987 instructions # 1.10 insn per cycle
# 0.05 stalled cycles per insn
103_096 branches # 595.157 M/sec
4_973 branch-misses # 4.82% of all branches

0.000460708 seconds time elapsed

0.000522000 seconds user
0.000000000 seconds sys

I don't know what simulation program you have in mind, or what
reduction method you have in mind for trace data, or a grep option
that produces big numbers, but one thing I often do with grep or other
output is to pipe it to wc. So let's see:

[c8:~:105619] LC_NUMERIC=prog wc -c types.bib
21028 types.bib

So, no luck here; one can use printf to reformat the output, e.g.

LC_NUMERIC=prog printf "%'d %s\\n" `wc -c types.bib`

but I have never done that.

For the decimal separator every programming
language uses ".".

Do they?

In the source code, every one that I have ever encountered, and likely
most that I have not. Sure, there is the Algol 60 and Algol 68
crazyness of making that implementation-defined, but are there any implementations of these languages that do not use "." for the decimal
point?

I vaguely remember that in COBOL (which was defined before
locale were a thing), if you specified "Decimal is Comma" (I may have
the syntax wrong), then the decimal speparator became the comma.

In the source code? And was is the default? And how many programs
used this option?

Perhpa this wasn't brought forward. But then if you use the period for
the three digits separator and the decimal separator, doesn't that
cause confusion?

There are certainly cases where it would. But I don't think anybody
proposed that.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Anton Ertl on Tue May 21 17:27:29 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

If the mechanisms that you and Anton present were so easily
available, why do I see so many instances of "non separated"
outputs?

What locale are you using?

I vaguely remember that in COBOL (which was defined before
locale were a thing), if you specified "Decimal is Comma" (I may have
the syntax wrong), then the decimal speparator became the comma.

In the source code? And was is the default? And how many programs
used this option?

Yes. It would exchange period and comma in both picture clauses
and numeric literals. It was in the CONFIGURATION SECTION of the
program source.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Anton Ertl on Tue May 21 17:51:23 2024

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

If the mechanisms that you and Anton present were so easily
available, why do I see so many instances of "non separated"
outputs?

What locale are you using?

I happen to be in the US, however . . .

When I mentioned "in this NG", I meant posts that present the
outputs of some similation program, reduced trace data, GREP
output, etc.

So the question is what locale the posters in this newsgroup use. I typically use the C.utf8 locale, because then ls sorts directories as Thompson intended, but which does not show thousands separator. If I
want to show thousands separators, I usually do it with

LC_NUMERIC=prog <command>

or (on machines where I have not installed the prog locale):

LC_NUMERIC=en_US <command>

Let's see how that works for some programs:

[c8:~:105615] LC_NUMERIC=prog perf stat true

Performance counter stats for 'true':

0.17 msec task-clock # 0.376 CPUs
utilized 0 context-switches # 0.000
K/sec 0 cpu-migrations # 0.000
K/sec 42 page-faults # 0.242
M/sec 470_561 cycles # 2.716 GHz
5_214 stalled-cycles-frontend # 1.11% frontend
cycles idle 28_375 stalled-cycles-backend #
6.03% backend cycles idle 515_987 instructions
# 1.10 insn per cycle
# 0.05 stalled cycles per insn 103_096 branches
# 595.157 M/sec 4_973 branch-misses #
4.82% of all branches

0.000460708 seconds time elapsed

0.000522000 seconds user
0.000000000 seconds sys

I would be happier with that (using an underscore for the thousands
separator) than with no separation. Of course, that is a personal
preference, and others may have different ones.

I don't know what simulation program you have in mind, or what
reduction method you have in mind for trace data, or a grep option
that produces big numbers, but one thing I often do with grep or other
output is to pipe it to wc. So let's see:

[c8:~:105619] LC_NUMERIC=prog wc -c types.bib
21028 types.bib

So, no luck here; one can use printf to reformat the output, e.g.

LC_NUMERIC=prog printf "%'d %s\\n" `wc -c types.bib`

but I have never done that.

I agree that it could be done, but rarely, if ever is.

For the decimal separator every programming
language uses ".".

Do they?

In the source code, every one that I have ever encountered, and likely
most that I have not. Sure, there is the Algol 60 and Algol 68
crazyness of making that implementation-defined, but are there any implementations of these languages that do not use "." for the decimal
point?

I vaguely remember that in COBOL (which was defined before
locale were a thing), if you specified "Decimal is Comma" (I may
have the syntax wrong), then the decimal speparator became the
comma.

In the source code?

I had tolook this up, as it has been far to long, but yes.

https://www.ibm.com/docs/en/cobol-zos/6.3?topic=section-decimal-point-is-comma-clause

And was is the default?

No, if you wanted to use this, you added the "Decimal point is comma"
statement in the configuration section. Note that this is obsolete, as
COBOL now supports some version of locales.

And how many programs
used this option?

I have no idea. I never did, though all the COBOL programs I wrote
were for US only. On the other hand, IBM supported it in their COBOL
compiler and I believe it was part of the ANSI standard.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Michael S on Wed May 22 11:58:55 2024

On 5/20/2024 9:13 AM, Michael S wrote:

On Mon, 20 May 2024 15:51:34 -0000 (UTC)
"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> wrote:

Scott Lurndal wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

MitchAlsup1 wrote:

And I miss some equivalent of picture clauses in C every time I
see, including in this NG, a number consisting of a string of say
8 or 9 or more digits without the every three digit separator
character, which sure makes reading such numbers easier. :-(

printf has supported that capability for decades.

$ LANG=en_US.utf8 printf "%'10.2f\n" $(( 1540.0 * 179.47 + (7928 +
401 - 535) * 67.61 + 2295 * 173.35 + 3230 * 191.00 + 192 * 750.15
+ 150 * 509.85 + 254 * 6> 2,505,496.61

If the mechanisms that you and Anton present were so easily available,
why do I see so many instances of "non separated" outputs? Are they
too recent, or are programmers too lazy, or do they believe that users
don't care (i.e. that I am in a minority)?

All three and more. It's none-recent, but not universally available extension.

OK.

4th reason - output formatted this way is less
machine-readable which is important downside on comp.arch.

Good point. Using that format is sort of a least common denominator.
If you knew the output was only going to be read by humans, you would
probably use some separator, etc. (see below) to make it easier to read,
and if you knew that the output was only going to be read by a program, especially one using the same data formats (endianness, FP formats,
etc), you would probably write the output in binary, as it would save
CPU cycles on both the write and the read, and require less space. But
if you don't know these things, "unformatted display" is a reasonable compromise.

As for readability on comp.arch, I don't know how many programs read
(other than to output it do a screen) comp.arch postings, but I suspect
it is tiny. Remember, my original complaint was about postings on
comp.arch and similar things, i.e. those designed to be human readable.

5th reason is that separators are culture-dependent. How many European countries use comma-separated triads? I think, very few.

While I agree, as I posted elsewhere in this thread, I, and I suspect
others would prefer any reasonable separator to no separator. Of course
YMMV.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Stephen Fuld on Sat May 25 16:17:02 2024

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

[...]

So the question is what locale the posters in this newsgroup use. I
typically use the C.utf8 locale, because then ls sorts directories as
Thompson intended, but which does not show thousands separator. If I
want to show thousands separators, I usually do it with

LC_NUMERIC=prog <command>

or (on machines where I have not installed the prog locale):

LC_NUMERIC=en_US <command>

Let's see how that works for some programs:

[c8:~:105615] LC_NUMERIC=prog perf stat true

Performance counter stats for 'true':

0.17 msec task-clock # 0.376 CPUs
utilized 0 context-switches # 0.000
K/sec 0 cpu-migrations # 0.000
K/sec 42 page-faults # 0.242
M/sec 470_561 cycles # 2.716 GHz
5_214 stalled-cycles-frontend # 1.11% frontend
cycles idle 28_375 stalled-cycles-backend #
6.03% backend cycles idle 515_987 instructions
# 1.10 insn per cycle
# 0.05 stalled cycles per insn 103_096 branches
# 595.157 M/sec 4_973 branch-misses #
4.82% of all branches

0.000460708 seconds time elapsed

0.000522000 seconds user
0.000000000 seconds sys

I would be happier with that (using an underscore for the thousands >separator) than with no separation.

The underscore is due to my "prog" locale <https://www.complang.tuwien.ac.at/anton/locale-prog/>. If you use LC_NUMERIC=en_US, you get "," as thousands separator; if you use,
e.g., LC_NUMERIC=de_AT, you get ".". If you use LC_NUMERIC=C, you get
nothing. All assuming you have these locales installed.

I vaguely remember that in COBOL (which was defined before
locale were a thing), if you specified "Decimal is Comma" (I may
have the syntax wrong), then the decimal speparator became the
comma.

In the source code?

I had tolook this up, as it has been far to long, but yes.

https://www.ibm.com/docs/en/cobol-zos/6.3?topic=section-decimal-point-is-comma-clause

Interesting. Can be seen as another unneeded feature that later
programming languages did not include.

No, if you wanted to use this, you added the "Decimal point is comma" >statement in the configuration section. Note that this is obsolete, as
COBOL now supports some version of locales.

But that's a different feature: decimal-point-is-comma is for the
source code, while the locale is for the input and output of the
resulting program. If you compile a C program with LC_NUMERIC=de_AT,
the decimal separator in the C code is still ".", not what comes from
the locale. But if you then run a printf or scanf with the "'" in the conversion specifier (and you are on a Unix system), you get the
output according to the locale, and the input is scanned according to
the locale.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Anton Ertl on Sat May 25 20:30:29 2024

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

[...]

So the question is what locale the posters in this newsgroup use.

I >> typically use the C.utf8 locale, because then ls sorts
directories as >> Thompson intended, but which does not show
thousands separator. If I >> want to show thousands separators, I
usually do it with >>

LC_NUMERIC=prog <command>

or (on machines where I have not installed the prog locale):

LC_NUMERIC=en_US <command>

Let's see how that works for some programs:

[c8:~:105615] LC_NUMERIC=prog perf stat true

Performance counter stats for 'true':

0.17 msec task-clock # 0.376 CPUs
utilized 0 context-switches #

0.000 >> K/sec 0 cpu-migrations #
0.000 >> K/sec 42 page-faults #
0.242 >> M/sec 470_561 cycles #
2.716 GHz >> 5_214 stalled-cycles-frontend #
1.11% frontend >> cycles idle 28_375
stalled-cycles-backend # >> 6.03% backend cycles idle
515_987 instructions >> # 1.10 insn per cycle

# 0.05 stalled cycles per insn 103_096 branches
# 595.157 M/sec 4_973 branch-misses

# >> 4.82% of all branches

0.000460708 seconds time elapsed

0.000522000 seconds user
0.000000000 seconds sys

I would be happier with that (using an underscore for the thousands separator) than with no separation.

The underscore is due to my "prog" locale <https://www.complang.tuwien.ac.at/anton/locale-prog/>. If you use LC_NUMERIC=en_US, you get "," as thousands separator; if you use,
e.g., LC_NUMERIC=de_AT, you get ".". If you use LC_NUMERIC=C, you get nothing. All assuming you have these locales installed.

Interesting. As Isaid, I would be happier with any of those than no
separator at all.

I vaguely remember that in COBOL (which was defined before
locale were a thing), if you specified "Decimal is Comma" (I may
have the syntax wrong), then the decimal speparator became the
comma.

In the source code?

I had tolook this up, as it has been far to long, but yes.

https://www.ibm.com/docs/en/cobol-zos/6.3?topic=section-decimal-point-is-comma-clause

Interesting. Can be seen as another unneeded feature that later
programming languages did not include.

No, if you wanted to use this, you added the "Decimal point is
comma" statement in the configuration section. Note that this is
obsolete, as COBOL now supports some version of locales.

But that's a different feature: decimal-point-is-comma is for the
source code, while the locale is for the input and output of the
resulting program. If you compile a C program with LC_NUMERIC=de_AT,
the decimal separator in the C code is still ".", not what comes from
the locale. But if you then run a printf or scanf with the "'" in the conversion specifier (and you are on a Unix system), you get the
output according to the locale, and the input is scanned according to
the locale.

Yes and no. While the changes to numeric literals in the source code, obviously effects only that, changing the meaning in the Picture clause
causes the change to be reflected in the output of the program.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Anton Ertl on Sat May 25 20:31:13 2024

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Anton Ertl wrote:

No, if you wanted to use this, you added the "Decimal point is comma" >>statement in the configuration section. Note that this is obsolete, as >>COBOL now supports some version of locales.

Can a LOCALE specify that ' ' is the separator character ?? and that
':' is the decimal point character ??

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Mon May 27 07:22:49 2024

On Sun, 19 May 2024 19:29:06 -0000 (UTC), John Levine wrote:

... back in the day there were plenty of
people who were outraged at I=I+1 which is mathematically absurd for the physicicts and mathematicians who were Fortran's early users.

Count me among them.

Algol gave us various kinds of := which were supposed to be better.

I certainly preferred that. And then was dismayed when C popularized the Fortran (mis)usage, which has since become almost universal among currently-popular languages.

Sigh ...

Don't forget that while COBOL's control structures were quite weak,
its data structures still look pretty good. Everything in a C or C++ structure comes from COBOL by way of PL/I.

PL/I and COBOL had level numbers. The name “struct” (and bracketing
symbols for same) as used in C or C++ comes from Algol-68. Which in turn
built on the earlier work in Algol-W (aka “Wirth-Hoare Algol”), which was also the precursor of Pascal.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Mon May 27 07:56:36 2024

On Mon, 20 May 2024 00:12:36 +0000, MitchAlsup1 wrote:

The power of PL/1s DCL like was similarly a joy to use.

The only reason this was needed was PL/I had no typedefs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Mon May 27 07:57:12 2024

On Mon, 20 May 2024 17:50:15 +0000, MitchAlsup1 wrote:

DO 400 I = 10

Is an assignment statement assigning the variable DO400I the value of 10

This is no longer valid in the free-form option of Fortran-90 and later.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Mon May 27 08:21:47 2024

On Mon, 20 May 2024 19:13:44 +0300, Michael S wrote:

5th reason is that separators are culture-dependent.

As is the decimal point.

I remember learning in school that the international standard (at least in science/engineering) was to use a space as the thousands separator, and a centred dot as the decimal point.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to mitchalsup@aol.com on Wed May 29 08:27:22 2024

mitchalsup@aol.com (MitchAlsup1) writes:

Anton Ertl wrote:

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Anton Ertl wrote:

No, if you wanted to use this, you added the "Decimal point is comma" >>>statement in the configuration section. Note that this is obsolete, as >>>COBOL now supports some version of locales.

Can a LOCALE specify that ' ' is the separator character ?? and that
':' is the decimal point character ??

I expect so. I would not be surprised if locale implementations or
code that uses the locale misbehaves if you define thousands_sep as ' '.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Gretchiie
  Wed Sep 17 08:54:03 2025
  from Derry, Nh via Telnet
- Bob Worm
  Wed Sep 17 08:43:18 2025
  from Wales, Uk via Telnet
- Bob Worm
  Wed Sep 17 08:14:37 2025
  from Wales, Uk via Telnet
- Volatile_Memory
  Wed Sep 17 07:20:57 2025
  from Des Moines, Iowa via SSH
- Volatile_Memory
  Wed Sep 17 07:17:26 2025
  from Des Moines, Iowa via SSH
- Bob Worm
  Tue Sep 16 21:01:27 2025
  from Wales, Uk via Telnet
- Bob Worm
  Tue Sep 16 15:15:42 2025
  from Wales, Uk via Telnet
- Gretchiie
  Tue Sep 16 05:20:21 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	54:57:54
Calls:	10,397
Calls today:	5
Files:	14,067
Messages:	6,417,420
Posted today:	1

Re: text in programming languages, Unicode in strings

Who's Online

Recent Visitors

System Info