Forum: >>> Magnum BBS <<<

push for memory safe languages -- impact on Forth

From Krishna Myneni@21:1/5 to All on Fri Mar 1 09:54:36 2024

I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages. Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?

Opinions?

--
Krishna Myneni

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to mhx on Fri Mar 1 10:53:57 2024

On 3/1/24 10:37, mhx wrote:

What if the program writes a float to a byte location?

Do we have to go along and make Forth type-safe then?

We don't have to go along with anything. However, it might be useful to consider how we can satisfy some of the concerns. It is not possible to separate entirely memory safe from type safe, since an array of bytes
doesn't have the same memory bounds as an array of floats. Nevertheless
index checking would be the same in both cases.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to All on Fri Mar 1 16:37:13 2024

What if the program writes a float to a byte location?

Do we have to go along and make Forth type-safe then?

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Fri Mar 1 17:38:02 2024

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?

Some years ago I thought that we can make do by providing some kind of
secure dialect of standard Forth (with some additional words, and an
escape hatch to full Forth) [ertl-secure16]. But the secure dialect
was not intended to be watertight, only protect against mistakes.

In the meantime, I know more about the topic and think that it's
better to produce a watertight secure dialect (with an escape hatch).
Other people have been earlier in recognizing that and have created
Forth systems like Oforth or Eight. My own contribution to that
topic, Safe Forth [ertl22] is a paper design for now, but has the
selling point of requiring neither type tagging nor static type
checking.

I have not had any resonance wrt what I proposed in 2016. For my 2022
ideas, I have had one request on whether there already exists an implementation.

@InProceedings{ertl-secure16,
author = {M. Anton Ertl},
title = {Security},
crossref = {euroforth16},
pages = {82--83},
url = {http://www.euroforth.org/ef16/papers/ertl-secure.pdf},
video = {https://wiki.forth-ev.de/lib/exe/fetch.php/events:security.mp4},
OPTnote = {presentation slides}
}
@Proceedings{euroforth16,
title = {32nd EuroForth Conference},
booktitle = {32nd EuroForth Conference},
year = {2016},
key = {EuroForth'16},
url = {http://www.complang.tuwien.ac.at/anton/euroforth/ef16/papers/proceedings.pdf}
}

@InProceedings{ertl22,
author = {M. Anton Ertl},
title = {Memory Safety Without Tagging nor Static Type Checking},
crossref = {euroforth22},
pages = {5--15},
url = {http://www.euroforth.org/ef22/papers/ertl.pdf},
url-slides = {http://www.euroforth.org/ef22/papers/ertl-slides.pdf},
video = {https://www.youtube.com/watch?v=pReEJinuxEI},
OPTnote = {refereed},
abstract = {A significant proportion of vulnerabilities are due
to memory accesses (typically in C code) that
memory-safe languages like Java prevent. This paper
discusses a new approach to modifying Forth for
memory-safety: Eliminate addresses from the data
stack; instead, put object references on a separate
object stack and use \code{value}-flavoured words.
This approach avoids the complexity of static type
checking (used in, e.g., Java and Factor), and also
avoids the performance overhead of dynamic type
checking for non-memory operations. This paper
discusses the consequences of this approach on the
language, and on performance.}
}

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to mhx on Fri Mar 1 18:02:10 2024

mhx@iae.nl (mhx) writes:

What if the program writes a float to a byte location?

That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.
But once you are already implementing all the Safety features, it's
relatively easy to prevent that, too. But of course, if you find that
you need that, you can add a word that does that without subverting
memory safety.

Do we have to go along and make Forth type-safe then?

For memory safety, you certainly need a way to differentiate between
addresses and other data. Some programming languages use type
checkers for that, some use tagging. Safe Forth uses separate stacks.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Fri Mar 1 17:42:08 2024

Forth by design is as unsafe as any assembler.
The only way to tame it is to run it in a black box.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Fri Mar 1 12:42:31 2024

On 3/1/24 11:42, minforth wrote:

Forth by design is as unsafe as any assembler. The only way to tame it
is to run it in a black box.

We may have an alternative, when necessary. The malleability of the
language lends itself to interfaces which can enforce memory safety.
Even without changes to the language itself, memory safety might be
provided by a library e.g. typed arrays, as long as one sticks to the
designed interface.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Fri Mar 1 12:38:59 2024

On 3/1/24 11:38, Anton Ertl wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

the ones that Paul Rubin mentioned.

--
km

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Fri Mar 1 10:17:36 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages

https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Fri Mar 1 19:46:55 2024

IMO you would just be creating another stack language, even if it just
looks like another Forth dialect from the outside.

If I need a relatively safe programming language, I would use SPARK.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Paul Rubin on Fri Mar 1 15:47:42 2024

On 3/1/24 12:17, Paul Rubin wrote:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages

https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/

From the second link,

"While memory safe hardware and formal methods can be excellent
complementary approaches to mitigating undiscovered vulnerabilities, one
of the most impactful actions software and hardware manufacturers can
take is adopting memory safe programming languages. They offer a way to eliminate, not just mitigate, entire bug classes. This is a remarkable opportunity for the technical community to improve the cybersecurity of
the entire digital ecosystem."

It sounds like there are plans to use Rust for some of the Linux kernel
code.

--
KM

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Krishna Myneni on Fri Mar 1 23:16:53 2024

On 3/1/24 09:54, Krishna Myneni wrote:

I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages. Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?

Opinions?

I played with a simple buffer overflow attack code in C, based on an
example I found at

https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

=== begin code ===
/*
Demonstrate buffer overflow exploit.
Adapted from the example at:

https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

Build with:
gcc -m32 -o exploit_demo exploit_demo.c

Normal run:
printf "abcdefg" | ./exploit_demo

Find the address of MaliciousCode() within the disassembled executable
objdump -S ./exploit_demo

from the listing above, note the 4-byte address of MaliciousCode
and put the address in the input string, from low-byte to high-byte.

Exploit Example: pass a string to overflow the buffer and run
exploit code
printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo

replace the address 0x08049186 above with the one you obtained
from objdump command.

The exploit will cause MaliciousCode() to execute.
*/

#include <stdio.h>
#include <stdlib.h>

void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}

void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}

int main() {
GetInput();
return 0;
}
=== end code ===

It will be a useful exercise to work up a similar example in Forth, as a
step to thinking about automatic hardening techniques (as opposed to
input sanitization).

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Sat Mar 2 08:04:01 2024

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

#include <stdio.h>
#include <stdlib.h>

void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}

void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}

int main() {
GetInput();
return 0;
}
=== end code ===

It will be a useful exercise to work up a similar example in Forth, as a
step to thinking about automatic hardening techniques (as opposed to
input sanitization).

Forth does not have an inherently unbounded input word like C's
gets(). And even typical C environments warn you when you compile
this code; e.g., when I compile it on Debian 11, I get:

gcc xxx.c

|xxx.c: In function ‘GetInput’:
|xxx.c:12:10: warning: implicit declaration of function ‘gets’; did you mean ‘fgets’? [-Wimplicit-function-declaration]
| 12 | gets(buffer);
| | ^~~~
| | fgets
|/usr/bin/ld: /tmp/ccC9Qbu7.o: in function `GetInput':
|xxx.c:(.text+0x3b): warning: the `gets' function is dangerous and should not be used.

So, they removed gets() from stdio.h, and added a warning to the
linker. "man gets" tells me:

|_Never use this function_
|[...]
|ISO C11 removes the specification of gets() from the C language, and
|since version 2.16, glibc header files don't expose the function
|declaration if the _ISOC11_SOURCE feature test macro is defined.

And when I follow the recipe in the comments, the result is a
segmentation fault. Things like ASLR prevent such easy ways to
reliably perform arbitrary code execution. The attacker still might
try to repeat the attack using one of the possible target addresses,
and eventually the random-number generator will actually produce the
layout that the exploit is designed for. Moreover, attackers have
found other, less time-consuming ways to cope with ASLR. Bottom line:
ASLR makes attacks harder, but it does not prevent them.

Anyway, there are plenty of ways to corrupt a Forth system, e.g., by
using MOVE in an unsafe way, or by using (the non-standard) PLACE or
+PLACE with a target buffer that's smaller then 256 bytes (and for
+PLACE, I would not be surprised if there are implementations around
that even write beyond the 256-byte boundary).

If you want an example, here's one that targets the Gforth version I
am currently working with:

: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;

create buffer1 8 allot

:noname buffer1 96 stdin read-line . ; execute
bye

When I put this into a file xploit.fs and then perform

printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
setarch `uname -m` -R gforth xploit.fs

I get the following output:

This code is malicious!
It will not execute normally.

Here the "setarch `uname -m` -R" is used to disable ASLR. Attackers
typically have no way to run programs this way (or if they have, they
don't need such an exploit to execute arbitrary code), but they have
other ways to work around ASLR.

In the example above the mistake is easy to see, but these kinds of
mistakes still happen.

It would be safer if we had the convention that buffers are always
passed around with their lengths. Then we could have a defining word

safebuffer ( u "name" -- )
\ name execution: ( -- addr u )

and in the code above one would write

8 safebuffer buffer1

:noname buffer1 stdin read-line . ; execute
bye

and there could not be a buffer overflow exploit.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From albert@spenarnc.xs4all.nl@21:1/5 to krishna.myneni@ccreweb.org on Sat Mar 2 10:41:10 2024

In article <urstns$1ab0f$1@dont-email.me>,
Krishna Myneni <krishna.myneni@ccreweb.org> wrote:

I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages. Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?

Opinions?

There is no way Forth can be a safe language in the sense of algol/pascal/ada/go.
It is in the lane of assembler/Fortran/c.
The most that can be done implement a safe language on top of it,
that makes not a lot of sense.

Krishna Myneni

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From albert@spenarnc.xs4all.nl@21:1/5 to dxforth@gmail.com on Sat Mar 2 10:47:18 2024

In article <65e2c0f3$1@news.ausics.net>, dxf <dxforth@gmail.com> wrote:

On 2/03/2024 5:17 am, Paul Rubin wrote:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages

https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/

It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk' and how does 'memory-safe' >prevent that?

"ONCD has the belief that better metrics enable technology providers to
better plan, anticipate, and mitigate vulnerabilities before they become
a problem."

That may be their belief (fancy word for hope) but do they have anything
to back it up?

Most Forthers have a blind spot what safe means.
I grew up with algol60. The only errors you encountered were
array index errors, and memory exhausted. Index errors showed what array
the index, and a call tree. Memory exhausted indicates that you have
infinite recursion.
On the other hand FORTRAN programs showed an hex address and a dump of
the internal registers.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Anton Ertl on Sat Mar 2 09:57:01 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

If you want an example, here's one that targets the Gforth version I
am currently working with:

: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;

create buffer1 8 allot

:noname buffer1 96 stdin read-line . ; execute
bye

When I put this into a file xploit.fs and then perform

printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
setarch `uname -m` -R gforth xploit.fs

I get the following output:

This code is malicious!
It will not execute normally.

I forgot to give a recipe for the printf above:

insert

' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop

right before the execute, and the dumps contain the bytes you have to
put into the printf after the 80th byte, in that order. I.e.:

: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;

create buffer1 8 allot

:noname buffer1 96 stdin read-line . ;
' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop
execute
bye

and run it with

echo|setarch `uname -m` -R gforth xploit.fs gforth xploit.fs

For the particular Gforth at hand, this produces:

7FFFE9E43160: 33 5B 57 55 55 55 00 00 - 3[WUUU..

7FFFE9AF6FF0: 68 DC ED E9 FF 7F 00 00 - h.......

exactly the bytes in the printf above.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Sat Mar 2 06:18:41 2024

On 3/2/24 03:57, Anton Ertl wrote:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

If you want an example, here's one that targets the Gforth version I
am currently working with:

: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;

create buffer1 8 allot

:noname buffer1 96 stdin read-line . ; execute
bye

When I put this into a file xploit.fs and then perform

printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
setarch `uname -m` -R gforth xploit.fs

I get the following output:

This code is malicious!
It will not execute normally.

I forgot to give a recipe for the printf above:

insert

' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop

right before the execute, and the dumps contain the bytes you have to
put into the printf after the 80th byte, in that order. I.e.:

: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;

create buffer1 8 allot

:noname buffer1 96 stdin read-line . ;
' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop
execute
bye

and run it with

echo|setarch `uname -m` -R gforth xploit.fs gforth xploit.fs

For the particular Gforth at hand, this produces:

7FFFE9E43160: 33 5B 57 55 55 55 00 00 - 3[WUUU..

7FFFE9AF6FF0: 68 DC ED E9 FF 7F 00 00 - h.......

exactly the bytes in the printf above.

Nice example. I can't reproduce it with an older version of gforth (0.7.9_20220120), but the proof of concept attack is going to be Forth system-dependent.

Curious as to why you did not use standard ACCEPT for the illustration.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Krishna Myneni on Sat Mar 2 06:41:57 2024

On 3/1/24 23:16, Krishna Myneni wrote:

On 3/1/24 09:54, Krishna Myneni wrote:

I'm wondering what the CS Forth users and Forth systems developers
make of the renewed recent push for use of memory-safe languages.
Certainly Forth can add the type of contractual safety requirements
e.g., implementing bounds checking, of a "memory-safe language". Do we
need to work on libraries for these provisions?

Opinions?

I played with a simple buffer overflow attack code in C, based on an
example I found at

https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

=== begin code ===
/*
   Demonstrate buffer overflow exploit.
   Adapted from the example at:

https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

   Build with:
      gcc -m32 -o exploit_demo exploit_demo.c

   Normal run:
      printf "abcdefg" | ./exploit_demo

   Find the address of MaliciousCode() within the disassembled executable
      objdump -S ./exploit_demo

      from the listing above, note the 4-byte address of MaliciousCode
      and put the address in the input string, from low-byte to high-byte.

   Exploit Example: pass a string to overflow the buffer and run
exploit code
      printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo

      replace the address 0x08049186 above with the one you obtained
      from objdump command.

   The exploit will cause MaliciousCode() to execute.
*/

#include <stdio.h>
#include <stdlib.h>

void MaliciousCode() {
        printf("This code is malicious!\n");
        printf("It will not execute normally.\n");
        exit(0);
}

void GetInput() {
        char buffer[8];
        gets(buffer);
        // puts(buffer);
}

int main() {
        GetInput();
        return 0;
}
=== end code ===

It will be a useful exercise to work up a similar example in Forth, as a
step to thinking about automatic hardening techniques (as opposed to
input sanitization).

--
Krishna

Here's the output from two runs of the executable, the first with no
buffer overflow, and the second with buffer overflow.

=== begin test output ===
$ printf "abcdefg" | ./exploit_demo

$ printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo
This code is malicious!
It will not execute normally.
$
=== end test output ===

I am using Fedora release 39, kernel version 6.7.5-200.fc39.x86_64, and
gcc version gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6)

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to dxf on Sat Mar 2 08:35:25 2024

On 3/2/24 00:02, dxf wrote:

On 2/03/2024 5:17 am, Paul Rubin wrote:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages

https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/

It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk' and how does 'memory-safe' prevent that?

See my example in C where a buffer overflow is exploited to run code
which would not ever be called for normal execution.

Also, see Anton's example in Gforth.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Sat Mar 2 10:08:53 2024

On 3/2/24 09:39, minforth wrote:

Harden these without runtime checks:
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;

Let's see what python does:

def rt1():
return rt1()

rt1()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
[Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded

Clearly it is doing a runtime check. Similarly one could have RECURSE in
Forth perform a runtime check to enforce a recursion depth limit, and
indeed this type of error is caught by several Forth systems:

=== kForth example ===
: rt1 recurse ;
ok
rt1
Line 2: VM Error(-258): Return stack corrupt
rt1
=== end example ===

=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow

rt1<<<

=== end example ===

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Mar 2 15:39:11 2024

Harden these without runtime checks:
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Sat Mar 2 16:36:26 2024

minforth@gmx.net (minforth) writes:

Harden these without runtime checks:
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;

Depends on what you mean with "runtime checks". Gforth does not
compile extra code for stack depth checks, and yet:

: RT1 2 3e recurse ; ok
: RT2 drop fdrop recurse ; ok
rt1
*the terminal*:3:1: error: Floating-point stack overflow

rt1<<<

rt2
*the terminal*:4:1: error: Stack underflow

rt2<<<

Here's the code for the two words:

see-code rt1
$7FEEF9B56C60 lit 1->1
$7FEEF9B56C68 #2
7FEEF97FB523: mov $00[r13],r8
7FEEF97FB527: sub r13,$08
7FEEF97FB52B: mov r8,$08[rbx]
$7FEEF9B56C70 flit 1->1
$7FEEF9B56C78 #4613937818241073152
7FEEF97FB52F: add rbx,$20
7FEEF97FB533: movsd [r12],xmm15
7FEEF97FB539: movsd xmm15,-$08[rbx]
7FEEF97FB53F: sub r12,$08
$7FEEF9B56C80 call 1->1
$7FEEF9B56C88 RT1
7FEEF97FB543: mov rax,$08[rbx]
7FEEF97FB547: sub r14,$08
7FEEF97FB54B: add rbx,$10
7FEEF97FB54F: mov [r14],rbx
7FEEF97FB552: mov rbx,rax
7FEEF97FB555: mov rax,[rbx]
7FEEF97FB558: jmp eax
$7FEEF9B56C90 ;s 1->1
7FEEF97FB55A: mov rbx,[r14]
7FEEF97FB55D: add r14,$08
7FEEF97FB561: mov rax,[rbx]
7FEEF97FB564: jmp eax
ok
see-code rt2
$7FEEF9B56CC0 drop 1->1
7FEEF97FB566: mov r8,$08[r13]
7FEEF97FB56A: add r13,$08
$7FEEF9B56CC8 fdrop 1->1
7FEEF97FB56E: mov rax,r12
7FEEF97FB571: lea r12,$08[r12]
7FEEF97FB576: movsd xmm15,$08[rax]
$7FEEF9B56CD0 call 1->1
$7FEEF9B56CD8 RT2
7FEEF97FB57C: mov rax,$18[rbx]
7FEEF97FB580: sub r14,$08
7FEEF97FB584: add rbx,$20
7FEEF97FB588: mov [r14],rbx
7FEEF97FB58B: mov rbx,rax
7FEEF97FB58E: mov rax,[rbx]
7FEEF97FB591: jmp eax
$7FEEF9B56CE0 ;s 1->1
7FEEF97FB593: mov rbx,[r14]
7FEEF97FB596: add r14,$08
7FEEF97FB59A: mov rax,[rbx]
7FEEF97FB59D: jmp eax

Look, Ma, no software run-time checks. It's done with the MMU
hardware.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Sat Mar 2 16:43:10 2024

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On 3/2/24 10:08, Krishna Myneni wrote:

=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow

rt1<<<

=== end example ===

To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an >exploit), while Gforth still gives the same error.

In Gforth on a Unix system, Unix produces a SIGSEGV when a stack runs
into a guard page. The signal handler then looks at the offending
address, and guesses that an access close to the bottom of a stack is
an underflow of that stack, and correspondingly for accesses close to
the top of a stack. This can be seen as follows:

With the gforth engine with the FP stack being empty:

fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Krishna Myneni on Sat Mar 2 10:17:57 2024

On 3/2/24 10:08, Krishna Myneni wrote:

On 3/2/24 09:39, minforth wrote:

Harden these without runtime checks:
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;

Let's see what python does:

def rt1():
return rt1()

rt1()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
[Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded

Clearly it is doing a runtime check. Similarly one could have RECURSE in Forth perform a runtime check to enforce a recursion depth limit, and
indeed this type of error is caught by several Forth systems:

=== kForth example ===
: rt1 recurse ;
ok
rt1
Line 2: VM Error(-258): Return stack corrupt
rt1
=== end example ===

=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow

rt1<<<

=== end example ===

To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an exploit), while Gforth still gives the same error.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Sat Mar 2 11:18:14 2024

On 3/2/24 10:43, Anton Ertl wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On 3/2/24 10:08, Krishna Myneni wrote:

=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow
>>>rt1<<<
=== end example ===

To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an
exploit), while Gforth still gives the same error.

In Gforth on a Unix system, Unix produces a SIGSEGV when a stack runs
into a guard page. The signal handler then looks at the offending
address, and guesses that an access close to the bottom of a stack is
an underflow of that stack, and correspondingly for accesses close to
the top of a stack. This can be seen as follows:

With the gforth engine with the FP stack being empty:

fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<

Nice. The use of guard pages is something I need to look into to avoid
memory leaks or corruption for the stacks. Does this mean Gforth is
immune to arbitrary code execution attacks for the fp and data stack
overflow and underflow conditions?

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Sat Mar 2 10:39:23 2024

dxf <dxforth@gmail.com> writes:

It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk'

If the misbehaviour is related to the program input, and the input is
supplied by an attacker, they will look for an input that breaks security.

and how does 'memory-safe' prevent that?

"Prevent" is too strong a term, but it helps. A classic attack is when
you have a memory buffer on the stack, but accesses to it are not bounds checked. That means the attacker can overwrite stuff on the stack after
the memory buffer, such as the procedure's return address. That means
the attacker can make the program jump to the location of their choice,
i.e. a location containing a security attack. See:

https://en.wikipedia.org/wiki/Return-oriented_programming

That may be their belief (fancy word for hope) but do they have anything
to back it up?

It's unclear what they mean, but it's certainly the case that studying
the historical corpus of CVE's tells us things about common types of
attacks. That tells us what areas need attention.

Regarding runtime checks: in C++, if you access an array as a[i], there
is no runtime check and thus there is a potential out-of-range memory
access. If you instead say a.at(i), there is a runtime check, so you
get the right result if the index is in range, but raise an exception otherwise. What I've found in practice is that there is almost no
slowdown. I suspect that the memory access itself is slower than the
range check, even when it usually is within the cpu cache. So this says runtime checks are usually worth the small cost.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Sat Mar 2 18:03:32 2024

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

Does this mean Gforth is
immune to arbitrary code execution attacks for the fp and data stack
overflow and underflow conditions?

Technically, one might answer "yes", but there are stack depth
violations that don't result in a stack overflow or underflow, and
that can lead to arbitrary code execution in Gforth. A simple example
is:

: bla ." bla" ;

: foo >r ;

' bla >body foo \ prints "bla"

Essentially, there is far too few guardrails in Gforth for the guard
pages to provide significant safety. For Gforth they are just a
convenience feature.

However, the idea of Safe Forth is to eliminate all these other ways
towards arbitrary code execution, and in Safe Forth the guard pages
will close the hole that stack overflows and underflows would
otherwise leave open.

Note that guard pages require OS support; Gforth uses the mprotect()
system call (of modern (since ~1990) Unix systems) for that.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Mar 2 19:47:05 2024

You can compile in DEBUG/RELEASE mode, whereby runtime checks
are no longer included in RELEASE mode. But these are quasi
pre-mortem traps, just like guard pages - they do not make Forth
safer as a language, for that it would need a-priori error traps.

An example:

: TE1 -1 dup c! ;

TE1 contains two errors: -1 is not a char and -1 is not a permitted
memory address. It must be possible to catch these during compilation.

Even the so vulnerable language C has assert macros for compiling
in DEBUG mode. In Forth, you have to create asserts yourself.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Sat Mar 2 22:29:49 2024

minforth@gmx.net (minforth) writes:

In Forth, you have to create asserts yourself.

Or you can use Gforth, which has them since at least gforth-0.2
(released 1996). See <https://gforth.org/manual/Assertions.html#index-assert_0028>.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Sat Mar 2 22:21:19 2024

Paul Rubin <no.email@nospam.invalid> writes:

It's unclear what they mean, but it's certainly the case that studying
the historical corpus of CVE's tells us things about common types of
attacks. That tells us what areas need attention.

My impression from reading articles like
<https://lwn.net/Articles/961978/> and the discussions after them is
that in recent years CVEs have become a metric for evaluating security researchers, and, like any other metric, are therefore gamed. So
these days a statistic about CVEs tells us only what kind of bugs
which are assumed to be vulnerabilities are most often found by those researchers.

What I've found in practice is that there is almost no
slowdown. I suspect that the memory access itself is slower than the
range check, even when it usually is within the cpu cache.

On a modern OoO processor, if the program is dependence-bound rather
than resource-bound, the instructions for the range check cost very
little, because they do not add to the dependence chains in the usual
case (when the access is in range).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Sat Mar 2 17:07:02 2024

On 3/2/24 10:43, Anton Ertl wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On 3/2/24 10:08, Krishna Myneni wrote:

=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow
>>>rt1<<<
=== end example ===

To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an
exploit), while Gforth still gives the same error.

In Gforth on a Unix system, Unix produces a SIGSEGV when a stack runs
into a guard page. The signal handler then looks at the offending
address, and guesses that an access close to the bottom of a stack is
an underflow of that stack, and correspondingly for accesses close to
the top of a stack. This can be seen as follows:

With the gforth engine with the FP stack being empty:

fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<

In the version of Gforth which I have (0.7.9_20220120),

fp@ 32769 - c@
*the terminal*:5:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<

However,

fp@ 65536 - c@ ok 1

and, worse,

1 fp@ 65536 - c! ok

So the guard pages are not a solution to pointer arithmetic bugs with
the stack pointers.

To make stack access memory safe, there has to be bounds checks on
reading and writing from/to stacks. This suggests that stacks should be
arrays and stack operations always involve array read/write from arrays
with enforced bounds checking e.g. something like

: DUP STACK[ tos ]@ ; \ TOS returns an index to the top of the stack
: OVER STACK[ tos 1+ ]@ ;

etc. and ]@ and ]! performs bounds checks.

I haven't yet looked at your paper on SafeForth.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ron AARON@21:1/5 to Krishna Myneni on Sun Mar 3 07:54:49 2024

One of the criteria for 8th was security -- among other things, making
it very difficult to do unsafe memory operations. Within 8th itself you
can't; but of course, with the FFI anything is possible.

On 01/03/2024 17:54, Krishna Myneni wrote:

I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages. Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?

Opinions?

--
Krishna Myneni

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Sun Mar 3 07:25:20 2024

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On 3/2/24 10:43, Anton Ertl wrote:

With the gforth engine with the FP stack being empty:

fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<

In the version of Gforth which I have (0.7.9_20220120),

fp@ 32769 - c@
*the terminal*:5:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<

However,

fp@ 65536 - c@ ok 1

and, worse,

1 fp@ 65536 - c! ok

So the guard pages are not a solution to pointer arithmetic bugs with
the stack pointers.

Yes, that is not their intention and not the intention of these
examples. The intention of these examples is to show that any memory
access will be interpreted as a stack underflow or overflow if it is
to a certain range of addresses.

A more serious issue is that, as implemented in Gforth (in particular, gforth-fast), stack underflows can be undetected in some cases: On
Gforth on an AMD64 system, with the data stack being empty:

600 pick ok 1

On gforth-fast, with the data stack being empty:

: foo 600 0 ?do nip loop cr . ; foo
0
*the terminal*:1:33: error: Stack underflow
: foo 600 0 ?do nip loop cr . ; >>>foo<<<
Backtrace:
kernel/basics.fs:312:27: 0 $7F30E3BDFE10 throw

Note that FOO actually performs the "cr .", so the stack underflow is
not detected by an access to the the guard page. Instead, the text
interpreter checks the stack pointer and reports a stack underflow.
The non-detection of the stack underflow is because NIP is implemented
as:

$7F30E3C72C90 nip 1->1
7F30E3917557: add r13,$08 #update sp

With the gforth engine, a similar scenario (involving DROP) is avoided
because in this engine DROP loads the value being dropped exactly to
trigger stack underflow reports where they happen:

$7F55EBFA6C98 drop 0->0
7F55EBAC51C0: mov $50[r13],r15 #save ip (for accurate backtraces) 7F55EBAC51C4: add r15,$08 #update ip
7F55EBAC51C8: mov rax,[r14] #load dropped value
7F55EBAC51CB: add r14,$08 #update sp

Neither the deep PICK nor the loop that just NIPs or DROPs occur in
practice.

The motivation for the otherwise unnecessary load in DROP (in gforth)
is code sequences like

drop 1

in cases where the stack is empty. The load in DROP results in
detecting the stack underflow at the DROP rather than at the "1".
Reporting a stack underflow at an operation that just pushes can
produce a WTF moment in the programmer; the gforth engine exists to
make debugging easier, and that includes avoiding such moments.

To make stack access memory safe, there has to be bounds checks on
reading and writing from/to stacks. This suggests that stacks should be >arrays and stack operations always involve array read/write from arrays
with enforced bounds checking e.g. something like

: DUP STACK[ tos ]@ ; \ TOS returns an index to the top of the stack
: OVER STACK[ tos 1+ ]@ ;

etc. and ]@ and ]! performs bounds checks.

With guard pages, that's not necessary. The normal bounded-depth
stack accesses (of words like 2DROP or 2OVER) are sure to hit the
guard pages if the stack is out-of-bounds; you may want to perform an
otherwise unnecessary load on words like NIP, DROP, 2DROP etc. that do
not otherwise use (and thus load) the stack values that they consume,
but that's much cheaper than putting bounds checks on every stack
access. For unbounded stack-access words like PICK, a bounds check is appropriate.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sun Mar 3 08:21:30 2024

You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.
Analyses on the subject are dominated by the following: Memory errors,
type errors, range errors, race condition errors.

In order to develop Forth more in this direction, we would first need
a specification on "Hardened Forth" that is dedicated to these error
areas - and also marks UBs with defined exception codes. Ideally
accompanied by a test suite so that every Forth system developer can
check their own system.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Sun Mar 3 07:03:01 2024

On 3/2/24 13:47, minforth wrote:

You can compile in DEBUG/RELEASE mode, whereby runtime checks
are no longer included in RELEASE mode. But these are quasi
pre-mortem traps, just like guard pages - they do not make Forth
safer as a language, for that it would need a-priori error traps.

An example:

: TE1 -1 dup c! ;

TE1 contains two errors: -1 is not a char and -1 is not a permitted
memory address. It must be possible to catch these during compilation.

kForth, from its beginning, would never execute the C! in your example:

Ready!
: TE1 -1 dup c! ;
ok
TE1
Line 2: VM Error(-256): Not data type ADDR
TE1

It performs run-time type checking for address arguments, at about 15%
cost in speed for most benchmarks.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Sun Mar 3 07:07:07 2024

On 3/3/24 02:21, minforth wrote:

You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.
Analyses on the subject are dominated by the following: Memory errors,
type errors, range errors, race condition errors.

In order to develop Forth more in this direction, we would first need
a specification on "Hardened Forth" that is dedicated to these error
areas - and also marks UBs with defined exception codes. Ideally
accompanied by a test suite so that every Forth system developer can
check their own system.

I'm not smart enough for a top down approach to this problem. The Forth approach is one that I can take though. Start with small well-defined
problems, and try to find solutions for those. Build up a bigger picture
from those solutions.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Sun Mar 3 06:58:02 2024

On 3/3/24 01:25, Anton Ertl wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On 3/2/24 10:43, Anton Ertl wrote:

With the gforth engine with the FP stack being empty:

fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<

In the version of Gforth which I have (0.7.9_20220120),

fp@ 32769 - c@
*the terminal*:5:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<

However,

fp@ 65536 - c@ ok 1

and, worse,

1 fp@ 65536 - c! ok

So the guard pages are not a solution to pointer arithmetic bugs with
the stack pointers.

Yes, that is not their intention and not the intention of these
examples. The intention of these examples is to show that any memory
access will be interpreted as a stack underflow or overflow if it is
to a certain range of addresses.

A more serious issue is that, as implemented in Gforth (in particular, gforth-fast), stack underflows can be undetected in some cases: On
Gforth on an AMD64 system, with the data stack being empty:

600 pick ok 1

On gforth-fast, with the data stack being empty:

: foo 600 0 ?do nip loop cr . ; foo
0
*the terminal*:1:33: error: Stack underflow
: foo 600 0 ?do nip loop cr . ; >>>foo<<<
Backtrace:
kernel/basics.fs:312:27: 0 $7F30E3BDFE10 throw

Note that FOO actually performs the "cr .", so the stack underflow is
not detected by an access to the the guard page. Instead, the text interpreter checks the stack pointer and reports a stack underflow.
The non-detection of the stack underflow is because NIP is implemented
as:

$7F30E3C72C90 nip 1->1
7F30E3917557: add r13,$08 #update sp

With the gforth engine, a similar scenario (involving DROP) is avoided because in this engine DROP loads the value being dropped exactly to
trigger stack underflow reports where they happen:

$7F55EBFA6C98 drop 0->0
7F55EBAC51C0: mov $50[r13],r15 #save ip (for accurate backtraces) 7F55EBAC51C4: add r15,$08 #update ip
7F55EBAC51C8: mov rax,[r14] #load dropped value
7F55EBAC51CB: add r14,$08 #update sp

Neither the deep PICK nor the loop that just NIPs or DROPs occur in
practice.

The motivation for the otherwise unnecessary load in DROP (in gforth)
is code sequences like

drop 1

in cases where the stack is empty. The load in DROP results in
detecting the stack underflow at the DROP rather than at the "1".
Reporting a stack underflow at an operation that just pushes can
produce a WTF moment in the programmer; the gforth engine exists to
make debugging easier, and that includes avoiding such moments.

To make stack access memory safe, there has to be bounds checks on
reading and writing from/to stacks. This suggests that stacks should be
arrays and stack operations always involve array read/write from arrays
with enforced bounds checking e.g. something like

: DUP STACK[ tos ]@ ; \ TOS returns an index to the top of the stack
: OVER STACK[ tos 1+ ]@ ;

etc. and ]@ and ]! performs bounds checks.

With guard pages, that's not necessary. The normal bounded-depth
stack accesses (of words like 2DROP or 2OVER) are sure to hit the
guard pages if the stack is out-of-bounds; you may want to perform an otherwise unnecessary load on words like NIP, DROP, 2DROP etc. that do
not otherwise use (and thus load) the stack values that they consume,
but that's much cheaper than putting bounds checks on every stack
access. For unbounded stack-access words like PICK, a bounds check is appropriate.

That's a pretty good approach, to use guard pages for stack access words
which are guaranteed to trigger a signal, and use bounds checking for
the remaining ones.

The intent of the stack array access was to avoid stack pointer
arithmetic altogether. Stack array access words provide a safe alternate
to doing stack pointer arithmetic in Forth code. Pointer arithmetic
appears to be the source of a lot of memory safety problems.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sun Mar 3 16:08:26 2024

That's patchwork, but if it is sufficient for a program,
good for the program. As for language safety....

For instance, I wouldn't define how to react on
0 BASE !
that could lead to a plethora of system-dependent crashes.
Or on
-1. 3 UM/MOD
probably throw exception code -11 for result out of range
even when 'range' is undeclared or only implicit.

OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Sun Mar 3 15:51:07 2024

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

The intent of the stack array access was to avoid stack pointer
arithmetic altogether. Stack array access words provide a safe alternate
to doing stack pointer arithmetic in Forth code. Pointer arithmetic
appears to be the source of a lot of memory safety problems.

At the machine level and the standard Forth level, every array access
performs address arithmetics. Given that standard Forth does not
expose the implementation of the stacks, there is no need to use some
specific implementation for them. One may wonder, though, if using 4
stacks with guard pages around them (i.e., at least 9 pages per task,
set up with 6 system calls) is too expensive for multi-tasking; I
think Gforth currently only does it for the main task.

There are architectures (in particular, the 80286) that provide
hardware support for treating stretches of memory as segments with
bounds checking, and the idea probably was that every array becomes a
segment (not sure about structures; the 80286 supports only 8192
segments, which seems a little low if every struture needs a segment),
but anyway, using segments was too cumbersome, slow and limited, so
they have been let slide by the wayside in the descendents of the
architecture (IA-32, AMD64).

In any case, yes, in Safe Forth there are no addresses at the language
level. You have objects with value-flavoured fields, and arrays with indexed-fetch and indexed-store words. But in the implementation of
Safe Forth, there will certainly be address arithmetics.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Sun Mar 3 16:14:27 2024

minforth@gmx.net (minforth) writes:

You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.

It was certainly an interesting aspect of my work on Safe Forth that I
first had to understand better what memory safety is; I had the "I
know it when I see it" kind of understanding, but that was not enough.
But I succeeded in understanding it better, and you can read the paper
if you want to know more about it.

Analyses on the subject are dominated by the following: Memory errors,
type errors, range errors, race condition errors.

Safe Forth only tries to solve memory errors. That makes it necessary
to deal with some type errors and some range errors, but not all of
them, and there are no ambitions at the moment to harden Safe Forth
more against those. My idea on how to perform multitasking in Safe
Forth does not provide shared memory, so there are no race conditions.

In order to develop Forth more in this direction, we would first need
a specification on "Hardened Forth" that is dedicated to these error
areas - and also marks UBs with defined exception codes.

I know "UB" from the C language lawyers. They love the concept of
"undefined behaviour" so much that they have created a 2-letter
acronym for it. A safe language does not have undefined behaviour,
and if you define the behaviour on some kind of condition to perform
an exception, that behaviour is certainly not undefined.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to minforth on Sun Mar 3 11:32:46 2024

minforth@gmx.net (minforth) writes:

You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.

From https://en.wikipedia.org/wiki/Ada_(programming_language)#History :

HOLWG crafted the Steelman language requirements, a series of
documents stating the requirements they felt a programming language
should satisfy. Many existing languages were formally reviewed, but
the team concluded in 1977 that no existing language met the
specifications.

They put out for proposals for a new language to be designed. The
eventual winner was Ada, but that choice came with some controversy at
the time. There were competing proposals that some people felt were
less bloated and still fulfilled the intended goals.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sun Mar 3 19:56:44 2024

Don't look elsewhere for UBs, the Forth Standard is shock full of "ambiguous conditions"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Paul Rubin on Sun Mar 3 20:00:27 2024

Paul Rubin wrote:

They put out for proposals for a new language to be designed. The
eventual winner was Ada, but that choice came with some controversy at
the time. There were competing proposals that some people felt were
less bloated and still fulfilled the intended goals.

Misra-C is an example. There is no language specification, but quite a
number of rules against which a C program can be checked.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to minforth on Sun Mar 3 14:08:29 2024

minforth@gmx.net (minforth) writes:

Misra-C is an example. There is no language specification, but quite a
number of rules against which a C program can be checked.

Misra-C has some sensible rules, but it's still C, which comes nowhere
near meeting the requirements that the working group (that chose Ada)
was looking for. Maybe some subset of C++ could have done done it.
C doesn't have nearly enough type safety.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Sun Mar 3 17:02:44 2024

On 3/3/24 10:08, minforth wrote:

That's patchwork, but if it is sufficient for a program,
good for the program. As for language safety....

...

OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

Perhaps not, but I wrote my Forth system to provide some hand-holding, primarily for my own needs. My expectation is that the demand for Forth
systems which don't address safety concerns will rapidly drop to zero.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ron AARON@21:1/5 to dxf on Mon Mar 4 07:06:42 2024

On 04/03/2024 3:10, dxf wrote:

On 3/03/2024 4:54 pm, Ron AARON wrote:

One of the criteria for 8th was security -- among other things, making it very difficult to do unsafe memory operations.

Has it paid off - by which I mean completed apps that out of the blue access invalid memory? I'm curious as to what exactly is behind the high rate of 'memory errors' that govt et al is reporting because in my limited experience programming in Forth, I'm just not seeing any. I wonder if it has something to do with the practices employed in those other languages - such as the use of third-party libraries which programmers use essentially on faith.

That's a good question, for which I don't have an answer nor even any
metrics on which to base one.

While I, personally, rarely write code that has those sorts of issues
(at least, not in 30 years), I have worked in places where they were
fairly common. It depends a lot on the expertise and attention to detail
of the programmers, I think.

Since 8th is intended for "application programmers" who may have little experience, and since one of its primary goals is "security", I've made
it difficult to smash memory -- whether on purpose or accidentally. Of
course, that makes it stray considerably from standard Forths.

TL;DR: I don't really know.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Krishna Myneni on Mon Mar 4 07:52:28 2024

Krishna Myneni wrote:

On 3/3/24 10:08, minforth wrote:

OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

Perhaps not, but I wrote my Forth system to provide some hand-holding, primarily for my own needs. My expectation is that the demand for Forth systems which don't address safety concerns will rapidly drop to zero.

IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site). I've already wondered how much convincing
had to be done for NASA not to disqualify Forth.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Ron AARON on Mon Mar 4 07:39:36 2024

Ron AARON wrote:

While I, personally, rarely write code that has those sorts of issues
(at least, not in 30 years), I have worked in places where they were
fairly common. It depends a lot on the expertise and attention to detail
of the programmers, I think.

I think it's also a question of the scale of the software. Forth programs
are usually microscopically small and manageable. Typical modern software
can reach gigabytes and must be created by a team of developers who sometimes don't even work in the same place. The attack surface for errors is therefore orders of magnitude larger. Then there is a need for many more a-priori security functions already in the programming language and development tools, followed by software engineering test procedures.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ron AARON@21:1/5 to minforth on Mon Mar 4 10:13:06 2024

On 04/03/2024 9:39, minforth wrote:

Ron AARON wrote:

While I, personally, rarely write code that has those sorts of issues
(at least, not in 30 years), I have worked in places where they were
fairly common. It depends a lot on the expertise and attention to
detail of the programmers, I think.

I think it's also a question of the scale of the software. Forth programs
are usually microscopically small and manageable. Typical modern software
can reach gigabytes and must be created by a team of developers who
sometimes
don't even work in the same place. The attack surface for errors is
therefore
orders of magnitude larger. Then there is a need for many more a-priori security functions already in the programming language and development
tools,
followed by software engineering test procedures.

Yes, this too. Even when people are all in the same location, getting
everyone to work in the same direction and same style and follow the
rules can be challenging.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxf on Mon Mar 4 07:57:14 2024

dxf <dxforth@gmail.com> writes:

Has it paid off - by which I mean completed apps that out of the blue access >invalid memory?

Out of the blue? That's not how it happens.

I'm curious as to what exactly is behind the high rate of
'memory errors' that govt et al is reporting because in my limited experience >programming in Forth, I'm just not seeing any.

If you don't look, or if you look in the wrong place, you don't see.
The fact that a primitive technique like throwing random input at a
program caused many supposedly-debugged programs to misbehave shows
that programmers have blind spots, especially when it comes to their
own programs. And this has nothing to do with "gigabytes of
software", this was already found at times when machines were so small
that sizes of large programs were on the order of kilowords <https://en.wikipedia.org/wiki/Fuzzing#Early_random_testing>.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to dxf on Mon Mar 4 11:06:05 2024

dxf wrote:

What do I use while developing a recursive function: ?STACK.

Yes and no:

Gforth 0.7.9_20200709
Authors: Anton Ertl, Bernd Paysan, Jens Wilke et al., for more type `authors' Copyright © 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html> Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `help' for basic help
drop depth
*the terminal*:1:1: error: Stack underflow

drop<<< depth

: TEST drop depth ; ok
test <<<--- CRASH!!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Mon Mar 4 07:06:38 2024

On 3/4/24 01:52, minforth wrote:

Krishna Myneni wrote:

On 3/3/24 10:08, minforth wrote:

OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

Perhaps not, but I wrote my Forth system to provide some hand-holding,
primarily for my own needs. My expectation is that the demand for
Forth systems which don't address safety concerns will rapidly drop to
zero.

IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site). I've already wondered how much convincing
had to be done for NASA not to disqualify Forth.

The trend has been to go to "memory-safe" languages. There are many
instances in which simple run-time type checking for addresses have
resulted in saving me considerable debugging time -- usually just stack
order is incorrect, but the error can manifest in more complex ways as well.

I don't have any particular insight into the trends other than following
the news. I think there will be even greater pressure going forward to
use memory-safe languages for internet facing applications. The shift in academia towards those languages appears to have already happened. My daughter's first year CS class uses python.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Krishna Myneni on Mon Mar 4 14:20:09 2024

Krishna Myneni wrote:

On 3/4/24 01:52, minforth wrote:

Krishna Myneni wrote:

On 3/3/24 10:08, minforth wrote:

OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

Perhaps not, but I wrote my Forth system to provide some hand-holding,
primarily for my own needs. My expectation is that the demand for
Forth systems which don't address safety concerns will rapidly drop to
zero.

IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site). I've already wondered how much convincing
had to be done for NASA not to disqualify Forth.

The trend has been to go to "memory-safe" languages. There are many
instances in which simple run-time type checking for addresses have
resulted in saving me considerable debugging time -- usually just stack
order is incorrect, but the error can manifest in more complex ways as well.

I don't have any particular insight into the trends other than following
the news. I think there will be even greater pressure going forward to
use memory-safe languages for internet facing applications. The shift in academia towards those languages appears to have already happened.

This is why web assembly is on the rise. Many languages can already be
compiled to wasm. See
https://webassembly.org/docs/security/

However, I found only a few wasm-based Forths on the net.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Mon Mar 4 12:23:04 2024

dxf <dxforth@gmail.com> writes:

Has it paid off - by which I mean completed apps that out of the blue
access invalid memory?

It could be that catching more of those errors during development means
fewer make it out to deployment.

I'm curious as to what exactly is behind the high rate of 'memory
errors' that govt et al is reporting because in my limited experience programming in Forth, I'm just not seeing any.

The programs with the memory errors are likely more complicated than
typical Forth programs, deal with more maliciously crafted inputs, and
make heavier use of dynamic memory allocation than small embedded
programs are likely to. My own Forth experience is even more limited,
but in it, I haven't even used arrays very much, especially for
user-supplied data.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to minforth on Mon Mar 4 12:17:12 2024

minforth@gmx.net (minforth) writes:

IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site).

I wonder if any of those applications were written in the current
century.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tristan Wibberley@21:1/5 to Anton Ertl on Mon Mar 4 23:03:55 2024

On 01/03/2024 18:02, Anton Ertl wrote:

mhx@iae.nl (mhx) writes:

What if the program writes a float to a byte location?

That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.

I'm not very familiar with forth yet, does this refer to writing to a
machine addressed location? If so, plenty of computers have alignment requirements, a DoS can be introduced by the above action.

Also, if you write a byte to a float location, a variety of problems can
be introduced including running trap callbacks that were insufficiently
tested for the new program state, etc, killing the process and running
restart sequences where less volatile state can now be in an unusual
condition and new side-effects induced, and so on.

memory safety means maintaining invariant relations wrt. each memory
location.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Mon Mar 4 21:17:21 2024

dxf <dxforth@gmail.com> writes:

Yes but asking the system to find errors isn't looking - it's covering
one's butt.

If the implementer doesn't find them and the system doesn't find them,
that leaves them for the attackers to find. Wasn't that what you were
asking about? We are learning that the best way to prevent attackers
from finding such errors is to use tools (e.g. languages) that prevent
those errors from occurring in the first place.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tristan Wibberley@21:1/5 to Anton Ertl on Tue Mar 5 07:58:20 2024

On 05/03/2024 06:35, Anton Ertl wrote:

Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk> writes:

...

If so, plenty of computers have alignment
requirements,

In general-purpose computers, that used to be the case in the 1990s,
but nowadays it is no longer the case. We have to use really old
hardware to test against alignment errors. ...

Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Tristan Wibberley on Tue Mar 5 06:35:40 2024

Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk> writes: >On 01/03/2024 18:02, Anton Ertl wrote:

mhx@iae.nl (mhx) writes:

What if the program writes a float to a byte location?

That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.

I'm not very familiar with forth yet, does this refer to writing to a
machine addressed location?

Yes.

If so, plenty of computers have alignment
requirements,

In general-purpose computers, that used to be the case in the 1990s,
but nowadays it is no longer the case. We have to use really old
hardware to test against alignment errors, and even on hardware that
has alignment requirements (like our 21264B machine from 2000), the OS
(Linux) emulates the behaviour of computers without these requirements
when the program performs an unaligned access, and I had to write a
special program to get signals for unaligned accesses <https://www.complang.tuwien.ac.at/anton/uace.c>. And while the Linux
command setarch can turn on various compatibility features for old
programs, such as turning off ASLR, it does not include a feature for
making unaligned accesses trap on the appropriate hardware.

In any case, if we want to avoid unaligned FP accesses, one can design
a memory-safe Forth dialect such that it prevents unaligned FP
accesses, but not accessing the same memory as bytes and as FP values.

However, despite all that, my plan is to design Safe Forth in a way
where the commonly-used words do not support such kinds of accesses,
because the result will make most programming tasks easier. For
specialized uses there may be words that just treat the memory as
bytes, though.

a DoS can be introduced by the above action.

DoS and more serious vulnerabilities can be introduced in lots of ways
in memory-safe programming languages, whether some mechanism prevents
writing floats to a byte location or not.

However, I should refine my sentence above to: "That's not a
memory-safety problem ...".

Also, if you write a byte to a float location, a variety of problems can
be introduced including running trap callbacks that were insufficiently >tested for the new program state, etc, killing the process and running >restart sequences where less volatile state can now be in an unusual >condition and new side-effects induced, and so on.

Memory safety does not guarantee bug-freedom.

However, what you write appears to be a case of "Bedenkentraeger",
imagining all kinds of possible or impossible problems in order to
argue against something. In the present case, impossible problems:

In Gforth no floating-point operation traps, and I intend to keep that behaviour for Safe Forth.

There is also no way to write "trap callbacks". If there was, and a
programmer used it, and it was insufficiently tested, the problem
would be that the code was insufficiently tested, not in writing the
byte to an address where later an FP value is read from.

Because there is no trap and no trap callback, the process is not
killed, and no restart sequence is run. If it was, the condition of process-surviving state would be something that would have to be made
safe whether the system prevents accessing bytes and FP values at the
same addresses or not.

memory safety means maintaining invariant relations wrt. each memory >location.

So?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Tristan Wibberley on Tue Mar 5 14:03:46 2024

Tristan Wibberley wrote:

On 05/03/2024 06:35, Anton Ertl wrote:

Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk> writes:

....

If so, plenty of computers have alignment
requirements,

In general-purpose computers, that used to be the case in the 1990s,
but nowadays it is no longer the case. We have to use really old
hardware to test against alignment errors. ...

Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.

You are still in for some nasty surprises with "public market" ARM CPUs.
f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Tue Mar 5 10:03:15 2024

dxf <dxforth@gmail.com> writes:

AFAIK hacks are opportunistic i.e. could not reasonably be foreseen.
Such "errors" are forgivable. Not so, programmers who either don't
know where something might overflow, or knowing, fail to address it.

Humans make errors. The world's smartest mathematicians have published
proofs with mistakes. Today, there is a community that likes to
machine-check math proofs to make sure they are sound. It's the same
thing with memory-safe languages. We don't have practical ways to make
sure programs are free of all errors, but we can make sure they are free
of some common and significant types of them.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Hans Bezemer on Tue Mar 5 10:40:09 2024

Hans Bezemer <the.beez.speaks@gmail.com> writes:

If you think you will revive Forth by jumping on that Rust bandwagon,
I think you're wrong.

Probably true, Forth users want something different than what Rust aims
to supply.

First and foremost, because I think Rust is the wrong idea. It's been
tried before - Ada, Pascal, Java - in some sense: BASIC.

BASIC's heyday was before my time, but it was very popular in a certain
crowd. Java was extremely successful in industry and I think it was at
the top of TIOBE for a while (it is #4 now). #1 is currently Python
which can be seen as a successor to BASIC. Pascal was intentionally
limited (it was intended as an instructional language) and yet it had
its own era of popularity because of Turbo Pascal and the P-system.

Ada was overcomplicated, but I think it also didn't gain traction
because the early Ada compilers were slow and expensive. If GNAT had
been available from the beginning, Ada would have gotten more use, imho.

Good programmers exist because they are good programmers. Bad programs
exist because of bad programmers.

The best programmers I know have released code with memory errors, so at
a certain point you have to stop blaming the human for being less
accurate than a machine.

"Ada will not meet its major objective... for it is so complicated
that it defies the unambiguous definition that is essential for these purposes.

It's not particularly more complicated than C++ as far as I can tell,
and C++ is currently #3 on TIOBE.

"...for it is so complicated...". That is the very definition of Rust.

All the time you're spending getting your code to compile, you're not creating programs.

Would you say the same of time you spend fixing bugs that you find
during testing?

I'd say that's the reverse of productivity. The higher the
abstraction, the more difficult it is to understand - let alone to
teach.

Picking the right level of abstraction to handle a problem is an
important skill in programming just like it is in math. We spend a lot
of time studying abstractions in math because they are useful. That
turns out to be true in programming as well.

Lifetimes? Borrowing? Are you kidding me?

This is just the language handling and checking an abstraction that
people have been doing manually long before Rust. If you look at the
CPython implementation, it does memory management by reference counting,
and it constantly uses the ideas of borrowed references internally.

I would say today though, most application programmers don't need Rust.
They will be more productive with garbage collected languages, at the
expense of some machine resources. Rust is for when those resources
can't be spared.

So, safety, yes. I like that very much. I ventured into that very
early and I never regretted it. But apart from some basic checks it
should stop at the point where I have to convince a compiler that I
know what I'm doing.

I see it the other way. If the compiler can find every error in my
program of type X, then simply fixing the program until the compiler
accepts it means I get a program that is free of that type of error.
That increases my confidence in the program. The trade-off is that such features can make the language and the compiler harder to use. A big
part of research in languages is widening the classes of errors that the compiler can check without the language becoming too difficult.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Hans Bezemer on Tue Mar 5 18:32:03 2024

Hans Bezemer wrote:

I tend to trust my Forth programs a lot more than my C ones

Maybe you're a lousy, careless C programmer? (pun intended ;-))

But I agree with you that the world doesn't need a Safe Forth.

Still, everyone has their favourite baby (like you have your 4th)
and you can still learn a few things while exploring additional
security features and enjoy it for the intellectual exercise, as
Anton seems to be doing with gforth.

By the way, I don't want to go off on a tangent here. I use
security features myself (not in MinForth though), because the
cost of repairing faulty devices in remote locations are too high
to be careless.

The solution is a separate DSL (on top of a Forth nucleus) that
does not allow any direct memory access. Very simple sandboxing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Tue Mar 5 16:54:24 2024

dxf <dxforth@gmail.com> writes:

At no time during its writing did I consider hackers or inept users. Responsible programming was all.

Very nice. Back in the 1980s all of us did that. Then something called
the internet came along, as did computerized banking and other systems
which attracted highly competent malicious and/or financially motivated attackers. At that point, writing bulletproof code became not only much harder, but also vitally important. You now must ensure not only that
your program can do what you intended, but that it can't do what you
didn't intend. Bruce Schneier[1] wrote about security engineering:

In many ways this is similar to safety engineering. ... But safety
engineering involves making sure things do not fail in the presence
of random faults: it’s about programming Murphy’s computer, if you
will. Security engineering involves making sure things do not fail
in the presence of an intelligent and malicious adversary who forces
faults at precisely the worst time and in precisely the worst
way. Security engineering involves programming Satan’s computer.
And Satan’s computer is hard to test.

[1] https://www.schneier.com/essays/archives/1999/11/why_computers_are_in.html

So sure, if you're claiming that 1980s programming didn't benefit from
memory safe languages, maybe you're right. Those of us who have to
program in the 21st century, though, need all the help we can get.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to dxf on Wed Mar 6 08:23:53 2024

dxf wrote:

On 6/03/2024 11:54 am, Paul Rubin wrote:

...
Those of us who have to
program in the 21st century, though, need all the help we can get.

"There is no hardware protection. Memory protection can be provided by
the access computer. But I prefer software that is correct by design." - C.M.

Conficious said:
Use program that treats integer wraparound as good feature and find yourself in big heap of dung

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to dxf on Wed Mar 6 09:32:29 2024

dxf wrote:

On 6/03/2024 7:23 pm, minforth wrote:

dxf wrote:

On 6/03/2024 11:54 am, Paul Rubin wrote:

... Those of us who have to
program in the 21st century, though, need all the help we can get.

"There is no hardware protection. Memory protection can be provided by
the access computer. But I prefer software that is correct by design." - C.M.

Conficious said:
Use program that treats integer wraparound as good feature and find yourself in big heap of dung

A 'memory-safe' system won't detect that. What now?

Wrong separation. They are related: https://www.securecoding.com/blog/integer-overflow-attack-and-prevention/

I've been bitten in the past. Among other things, I now often use
range values, unknown to standard Forth, e.g. as array indices.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Thu Mar 7 18:25:57 2024

dxf <dxforth@gmail.com> writes:

For example it's my experience one can input an out-of-range integer
into C and Forth compilers and neither will notice.... Programmers
too and I'm no exception.

These days I'd call C and Forth both niche languages, the niche being
low level systems code and small embedded programs. #1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That
slows arithmetic down but it mostly eliminates the overflow problem.

IMHO that is what all high level languages should do by default. Of
course native machine types and low level languages (C, Forth, Rust,
Ada, etc.) should stay available for cases where you want to or have to
program closer to the hardware.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ron AARON@21:1/5 to Paul Rubin on Fri Mar 8 07:10:55 2024

On 08/03/2024 4:25, Paul Rubin wrote:

dxf <dxforth@gmail.com> writes:

For example it's my experience one can input an out-of-range integer
into C and Forth compilers and neither will notice.... Programmers
too and I'm no exception.

These days I'd call C and Forth both niche languages, the niche being
low level systems code and small embedded programs. #1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That slows arithmetic down but it mostly eliminates the overflow problem.

IMHO that is what all high level languages should do by default. Of
course native machine types and low level languages (C, Forth, Rust,
Ada, etc.) should stay available for cases where you want to or have to program closer to the hardware.

Just as an aside, 8th also does that. Numbers automatically grow as
needed. Yes, it's slower than native integers/floats... but it's very convenient, and most of the time nobody notices the difference in speed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Sat Mar 9 11:30:56 2024

Paul Rubin <no.email@nospam.invalid> writes:

#1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That >slows arithmetic down but it mostly eliminates the overflow problem.

If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very predictable.

Python (particularly CPython), however, does not seem to have gone for efficient implementation; I don't know what they do for arbitrarily
large integers, but the inner interpreter was pretty monstrous last I
looked.

I have looked at the implementation of arbitrarily large integers in
OpenJDK (could be better) and in the BC engine of Racket (could also
be better, but the BC engine is on the back burner, and they have a
JIT compiler as the main engine, but I did not find out how it
implements arbitrarily large integers.

But integer overflow is orthogonal to memory safety.

There are many people who claim that wrapping behaviour for integer
overflow is a problem. Java defines the basic types int and long to
perform wraparound on overflow, and while Java has its own share of vulnerabilities (most prominently Log4Shell), I am not aware of one
where the wraparound behaviour was involved (but then, I have not
looked).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Mar 9 12:17:08 2024

Years ago we had a crash with using old archived data files
in a more recent system. The old file format relied on having
max 64k (16bit) index size, while the evaluating system assumed
24bit, and so the index overflowed the allocated memory space.
In hindsight a trivial case, but it took a while to track it down.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Sat Mar 9 17:27:45 2024

minforth@gmx.net (minforth) writes:

Years ago we had a crash with using old archived data files
in a more recent system. The old file format relied on having
max 64k (16bit) index size, while the evaluating system assumed
24bit, and so the index overflowed the allocated memory space.

Sounds like it would be caught by a memory-safe language, no integer
overflow detection necessary; and it's actually not a case of integer
overflow.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Spiros Bousbouras on Sat Mar 9 17:01:30 2024

Spiros Bousbouras <spibou@gmail.com> writes:

On Sat, 09 Mar 2024 11:30:56 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.

Don't you also need to first check that both arguments are small
integers ?

Yes, at some point. If the same value is used several times in a
piece of code, there is only one check needed before the first use; if
a subsequent use is not dominated by the first use, you only need
another check on those paths that bypass the first check, as in
partial redundancy elimination, resulting in one check on any path
that reaches a use of the value.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Sat Mar 9 20:18:34 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

If implemented well, the slowdown is small in the common case (small integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very predictable.

It might be worse for RISC V. Either way though, you need either boxed integers or tag bits.

Python (particularly CPython), however, does not seem to have gone for efficient implementation;

CPython's implementation is not very good, but there is or was a gmpy
module that let you use GMP for fast bignum arithmetic. I remember in
the Python 2.2 era it was 3x or 4x faster than CPython bignums. But, I
think it has since fallen into non-maintenance and bit rot.

I don't know what they do for arbitrarily large integers, but the
inner interpreter was pretty monstrous last I looked.

CPython has a fairly straightforward bytecode interpreter.

But integer overflow is orthogonal to memory safety.
There are many people who claim that wrapping behaviour for integer
overflow is a problem.

It has a problem because it's wrong! Of course it's deterministic
instead of being UB, and that makes some people feel better, but making
2+2=5 is also deterministic yet wrong. At least with UB, the
implementation can have a setting to do the right thing and trap the
overflow, instead of being mandated to quietly give wrong results.

Imagine x is a 50 element array and for whatever reason you try to
update x[60]. So the implementation might clobber 10 elements past the
end of the array (bad), or it can signal an error (the only thing that
makes sense), or in a feat of Java-like brilliance it might alias x[60]
to x[10] since 60 is 10 mod 50. That seems completely silly to me as a
default behaviour. Integer overflow wraparound is more of the same.

Yes there are situations like circular buffers where you might want that wraparound, just like there are situations like hash functions where you
want machine word wraparound, but those are special enough to call for
explicit declarations.

Java defines the basic types int and long to perform wraparound on
overflow,

Yes, a mistake IMHO. The one language that I know of that gets this
right is Ada. The default behavior is signal on overflow, but you can
specify wraparound (with any modulus you wish) if that is what your
application wants. If your modulus happens to be 2**32 or whatever, the compiler recognizes this and generates the efficient machine code you
would expect.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sun Mar 10 08:15:48 2024

Excellent summary.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Sun Mar 10 08:29:13 2024

Paul Rubin <no.email@nospam.invalid> writes:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.

It might be worse for RISC V.

It is. That's a failure of RISC-V.

I don't know what they do for arbitrarily large integers, but the
inner interpreter was pretty monstrous last I looked.

CPython has a fairly straightforward bytecode interpreter.

When I last looked, the inner interpreter dispatch was huge, covering
the screen (maybe 50-100 lines), with lots of special cases for
various things.

But integer overflow is orthogonal to memory safety.
There are many people who claim that wrapping behaviour for integer
overflow is a problem.

It has a problem because it's wrong! Of course it's deterministic
instead of being UB, and that makes some people feel better, but making
2+2=5 is also deterministic yet wrong.

In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

Imagine x is a 50 element array and for whatever reason you try to
update x[60].

That is a memory-safety issue, and what Java gives you in the case is
something like throwing an ArrayIndexOutOfBoundsException.

So the implementation might clobber 10 elements past the
end of the array (bad), or it can signal an error (the only thing that
makes sense), or in a feat of Java-like brilliance it might alias x[60]
to x[10] since 60 is 10 mod 50.

Java does not do that. What do you hope to gain by putting up straw
men?

Java defines the basic types int and long to perform wraparound on
overflow,

Yes, a mistake IMHO.

You just have no arguments but "It's wrong!" and straw men to back up
your opinion.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Sun Mar 10 01:56:08 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

2+2=5 is also deterministic yet wrong.

In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

2+2=5 is obviously wrong and Java doesn't go quite that far. Java
instead insists that you can add two positive integers and get a
negative one. That's wrong the same way that 2+2=5 is. It just doesn't
mess up actual programs as often, because the numbers involved are
bigger.

You just have no arguments but "It's wrong!" and straw men to back up
your opinion.

In what world can it be right for n to be a positive integer and n+1 to
be a negative integer? That's not how integers work.

Tony Hoare in 2009 said about null pointers:

I call it my billion-dollar mistake. It was the invention of the
null reference in 1965. At that time, I was designing the first
comprehensive type system for references in an object oriented
language (ALGOL W). My goal was to ensure that all use of references
should be absolutely safe, with checking performed automatically by
the compiler. But I couldn't resist the temptation to put in a null
reference, simply because it was so easy to implement. This has led
to innumerable errors, vulnerabilities, and system crashes, which
have probably caused a billion dollars of pain and damage in the
last forty years.

That is, C and other such languages have null pointers because they corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine arithmetic.

Java also has null pointers, another possible mistake. Ada doesn't have
them, nor does Python etc. C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Hans Bezemer on Sun Mar 10 13:03:04 2024

Hans Bezemer <the.beez.speaks@gmail.com> writes:

Any number representation has its problems - since there is no way to properly represent infinite precision.

That's exactly the idea here. If the computer runs out of memory in a
bignum system, that is unquestionably an error condition. In a low
level system where the representation limit is fitting in a machine word
rather than having the whole computer memory available, the same error condition occurs if the machine word doesn't have enough bits.

Exclamations like "BUT IT'S WRONG" may be correct, but without a true alternative it's not gonna change much.

The true alternative is to treat overflow as an error condition, as
Ada does, and as languages with bignums do, and as even C does (C at
least permits the implementation to do the right thing, although it
doesn't require it to).

It depends a lot on how error checking is handled. You could return it
like "errno" or perror(). You could throw an exception. You could
return some special value - like a NULL pointer.

You could also use something like std::optional so that static analysis
can notify you if you don't handle the error case. Haskell in principle
does even better, letting type inference determine the error handling
strategy:

https://blogs.perl.org/users/ovid/2010/08/what-to-know-before-debating-type-systems.html

See the section "Fallacy: Static types imply longer code".

std::ofstream file("example.txt");
if (!file.is_open()) {

I have the impression that this is legacy design leaking through, but
I'm not a C++ expert by any means. See also the term "boolean blindness".

I mean, NULL is already a macro, it shouldn't be difficult to
gravitate to a better value.

The trouble is that the pointer datatype doesn't distinguish NULL from
valid addresses. A static analyzer could have an internal database of functions whose return values should be checked against NULL, but it's
better to make it explicit in the datatype.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From albert@spenarnc.xs4all.nl@21:1/5 to Anton Ertl on Mon Mar 11 11:15:56 2024

In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Paul Rubin <no.email@nospam.invalid> writes: >>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.

It might be worse for RISC V.

It is. That's a failure of RISC-V.

As far as I can tell it was a design choice for DEC Alpha and RISC-V. Apparently flags are detrimental to parallelism.

You can't call that a failure because you don't like it.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to albert@spenarnc.xs4all.nl on Mon Mar 11 17:40:20 2024

albert@spenarnc.xs4all.nl writes:

In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Paul Rubin <no.email@nospam.invalid> writes: >>>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.

It might be worse for RISC V.

It is. That's a failure of RISC-V.

As far as I can tell it was a design choice for DEC Alpha and RISC-V.

And MIPS.

Apparently flags are detrimental to parallelism.

Reality check: No MIPS, Alpha, or RISC-V ever has had as much
instruction-level parallelism as contemporaneous CPUs for
architectures with flags, so flags are obviously not detrimental to instruction-level parallelism.

Look at
<http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps>: The
dashed orange line near the bottom is U74, a RISC-V implementation.
The other lines are all for CPU cores with flags.

If you want to do several parallel multi-precision additions, say, if
you want a multi-precision addition a+b+c+d, having one (ARM A64) or
two (AMD64 with ADX) carry flags does indeed limit the parallelism,
but the MIPS/Alpha/RISC-V answer is to replace one ADCX/ADOX
instruction (one cycle latency) with five instructions with typically
three cycles of latency.

On AMD64 with ADX, a 6400-bit addition of a+b+c+d can be split into
two chains: t=a+b+c and t+d; this has a total latency of about 200
cycles (actually OoO execution can reduce this somewhat by overlapping
the two chains to a certain extent), while the MIPS/Alpha/RISC-V
approach takes 300 cycles of latency with no chance of additional
overlap within that computation.

You will need >6 parallel multi-precision additions before the two
carry flags of AMD64 with ADX are theoretically more limiting than the MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
RISC-V implementation needs to be extremely wide (>36 instructions per
cycle) and the precision must be extremely high (to eliminate overlap
between chains as an issue).

You can't call that a failure because you don't like it.

The correct english term is that it's the *fault* of RISC-V. They
took a deliberate decision to need more instructions for implementing
overflow checks than other architectures, so it's their
responsibility, and for those who want to use big integers (or who
want to trap on signed overflow), their fault.

For an alternative to the RISC-V approach that is not as limiting as
the ARM A64 and AMD64 approaches, read:

http://www.complang.tuwien.ac.at/anton/tmp/carry.pdf

(not published yet)

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to All on Mon Mar 11 18:50:36 2024

No / not yet?
"The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Anton Ertl on Mon Mar 11 20:53:54 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

You will need >6 parallel multi-precision additions before the two
carry flags of AMD64 with ADX are theoretically more limiting than the >MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
RISC-V implementation needs to be extremely wide (>36 instructions per
cycle) and the precision must be extremely high (to eliminate overlap
between chains as an issue).

Correction: For performing >6 parallel multi-precision additions at a
rate of >6 steps every 3 cycles, >36 instructions are needed only
every 3 cycles with the MIPS/Alpha/RISC-V approach, i.e. >12 instructions/cycle.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to mhx on Mon Mar 11 20:51:46 2024

mhx@iae.nl (mhx) writes:

No / not yet?
"The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."

Works for me:

wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
--2024-03-11 21:49:20-- http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
Resolving www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)... 128.130.173.64
Connecting to www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)|128.130.173.64|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2255987 (2.2M) [application/postscript]
Saving to: ‘opt-ipc-uarch.eps’

opt-ipc-uarch.eps 100%[===================>] 2.15M 8.38MB/s in 0.3s

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Mon Mar 11 21:08:43 2024

Paul Rubin <no.email@nospam.invalid> writes:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

2+2=5 is also deterministic yet wrong.

In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

2+2=5 is obviously wrong and Java doesn't go quite that far. Java
instead insists that you can add two positive integers and get a
negative one. That's wrong the same way that 2+2=5 is.

Not at all. Modular arithmetic is not arithmetic in Z, but it's a
commutative ring and has the nice properties of this algebraic
structure.

It just doesn't
mess up actual programs as often, because the numbers involved are
bigger.

If you use members of that ring as if they were members of Z, you will sometimes get an unintended result; but even that works surprisingly
well, so well that the RISC-V designers have not seen a need to
include an efficient way to detect those cases where the result
deviates from that in Z. Still, the nice algebraic properties of
modular arithmetic can be of benefit even in such cases:

9223372036854775807 1 + dup cr . 2 - cr .

prints

-9223372036854775808
9223372036854775806 ok

in Gforth on a 64-bit machine.

In what world can it be right for n to be a positive integer and n+1 to
be a negative integer? That's not how integers work.

It's how Java's int and long types work. And if you want something
closer to Z, Java also has BigInteger.

Tony Hoare in 2009 said about null pointers:

And the relevance is?

Java-style wraparound
arithmetic is more of the same. A bug magnet,

Unsupported claim. Interestingly, I remember only one case where I
saw an unintended result due to modular arithmetic in a programming
language. It happened when I computed with performance counter
results in bash. bash still works that way:

[~:147654] A=9223372036854775807
[~:147655] echo $[A+1]
-9223372036854775808

I think I saw the unintended result on a 32-bit machine, because
performance counter results typically do not exceed 2^48, definitely
not 2^63-1.

Java also has null pointers, another possible mistake. Ada doesn't have >them,

Ada certainly has null.

C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.

I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.

Null pointers are at least a little bit more on-topic in this thread
than integer overflow. In Java one can write, say, a linked list or a
tree in an object-oriented manner, with, e.g., a tree node being an
abstract class that has two concrete subclasses: inner node, and empty
node. No null pointers in sight, right? Wrong: When an inner node is
created, the constructor of the node first sees a data structure where
all bytes have been initialized to 0, in order to guarantee memory
safety; for the references to the child nodes, this means that at that
point they are null pointers. Only then can the Java code in the
constructor overwrite them with whatever proper value they get. Is it
a problem? Not if they only exist there.

The fact that Java idiomatics is to implement trees and linked lists
not in the object-oriented way I outlined above, but in an imperative
way with null pointers instead of empty nodes could be more
problematic, but is it a major problem? Not in my (limited)
experience.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Mon Mar 11 19:20:08 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps

It worked for me too (in a browser). The Golden Cove figures are
impressive. I believe there are some RISC-V implementations with OOO by
now though.

The article about carry bits is interesting though besides bignums, one
should also consider the cost of (desirable) routine overflow trapping
of integer arithmetic which is currently not done much. Maybe
benchmarking C programs compiled with and without -ftrapv would be a
useful addition.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Mon Mar 11 20:07:01 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

Not at all. Modular arithmetic is not arithmetic in Z, but it's a commutative ring and has the nice properties of this algebraic
structure.

Right, those modular values aren't integers, they are equivalence
classes of integers. The ring Z/NZ might have some nice properties
but they aren't the properties of integers.

but even that works surprisingly well, so well that the RISC-V
designers have not seen a need to include an efficient way to detect
those cases where the result deviates from that in Z.

Sure, C worked pretty well in the 1980s but we've seen how well that
worked out. RISC-V perpetuates the bugs of the 1980s instead of taking
the opportunity to fix them.

Still, the nice algebraic properties of modular arithmetic can be of
benefit even in such cases.... 64 bit machine

Another thing, if I run the same integer calculation on two machines, at
least programmed in a HLL, I should expect the same result on both. But
if the word sizes are different then the results will be different. (If
one or both crash due to implementation restrictions such as machine
overflow, that's annoying, but it's better than getting wrong answers).

In what world can it be right for n to be a positive integer and n+1 to
be a negative integer? That's not how integers work.

It's how Java's int and long types work.

Yes, that's a mistake. I just don't see how it can be anything else.
2+2=5 would be obviously wrong, but it's hypothetical, or as you say, a
straw man. 20+20=50 or 2000+2000=5000 or 200000+200000=500000 would
also be straw men, since they don't happen either. What about 2000000000+2000000000=-294967296? Java actually does that, it can't be
called a straw man, so instead I'm supposed to believe that it's a valid result. I just can't.

And if you want something closer to Z, Java also has BigInteger.

Those are boxed and expensive for the usual case where the results are
expected to fit into the machine word. Of course that expectation may
be wrong (say due to a program bug), but in that case I want the program
to crash, like it would for an out-of-range subscript.

Maybe it is a mistake for Java to have an int type like that at all,
i.e. BigInteger should be the default, like in Python. It was a design
choice to make machine arithmetic more accessible to gain acceptance by
some potential users. Guy Steele famously said "We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp."
Java today seems awfully old-fashioned of course.

.

Tony Hoare in 2009 said about null pointers:

And the relevance is?

Both are instances where adding a "feature" for implementation
convenience turned out to attract bugs and vulnerabilities.

Java-style wraparound arithmetic is more of the same. A bug magnet,

Unsupported claim.

It's supported by that page linked a few days ago, about overflow bugs
in real programs.

I think I saw the unintended result on a 32-bit machine

I agree that it's less likely to be a problem if the ints are 64 bits.
And of course it was a frequent occurence in the 16 bit era.

Note that at least in gcc on x64, ints and longs by default are still 32
bits. These days when I write C code I tend to use stdint.h and specify
int sizes explicitly, e.g. int64_t or int32_t rather than int or long or whatever.

I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.

I don't know a way to make an uninitialized reference in C++ but maybe
it's possible. If you just say "int &y;" you get a compile time error.

The fact that Java idiomatics is to implement trees and linked lists
not in the object-oriented way I outlined above

The OO description is similar to using a sum type, and it's reasonable
for the implementation under the covers to use a zero pointer to
represent an empty list. Some Lisp implementations go even further and
used "cdr coding", which means using a single bit to indicate that the
next list node is at the next word in memory, so the "next" pointer
(cdr) can be eliminated. You might allocate the list nodes
non-consecutively when the list is created, but a compacting GC can
later make the elements consecutive in memory and get rid of the pointer overhead.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From albert@spenarnc.xs4all.nl@21:1/5 to Anton Ertl on Tue Mar 12 09:48:18 2024

In article <2024Mar11.220843@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Paul Rubin <no.email@nospam.invalid> writes:

<SNIP?

Java also has null pointers, another possible mistake. Ada doesn't have >>them,

Ada certainly has null.

C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.

I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.

I can't see the problem with null pointers. Algol68 had an explicit
`nil' that serves the same purpose. Any reference is initialized with
`nil'. If you try to dereference it, meaning trying to fetch or otherwise
use the referred object this meets with a run time error.
That is probably the clean and expensive way.
So nil + reference takes the same place as NULL + pointer in c.

I try to emulate this in ciforth. Looking up a word in the dictionary
results in an entry (struct with fields for properties) or a null pointer,
i.e. zero. You are supposed to test for this case, but if you fail
you get a "Segmentation fault".
As far as Forth goes, that is pretty satisfactory security.

<SNIP>

- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From albert@spenarnc.xs4all.nl@21:1/5 to Anton Ertl on Tue Mar 12 10:13:19 2024

In article <2024Mar2.090401@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

#include <stdio.h>
#include <stdlib.h>

void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}

void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}

int main() {
GetInput();
return 0;
}
=== end code ===

It will be a useful exercise to work up a similar example in Forth, as a >>step to thinking about automatic hardening techniques (as opposed to
input sanitization).

Forth does not have an inherently unbounded input word like C's
gets(). And even typical C environments warn you when you compile
this code; e.g., when I compile it on Debian 11, I get:

gcc xxx.c

|xxx.c: In function ‘GetInput’:
|xxx.c:12:10: warning: implicit declaration of function ‘gets’; did
you mean ‘fgets’? [-Wimplicit-function-declaration]
| 12 | gets(buffer);
| | ^~~~
| | fgets
|/usr/bin/ld: /tmp/ccC9Qbu7.o: in function `GetInput':
|xxx.c:(.text+0x3b): warning: the `gets' function is dangerous and
should not be used.

So, they removed gets() from stdio.h, and added a warning to the
linker. "man gets" tells me:

|_Never use this function_
|[...]
|ISO C11 removes the specification of gets() from the C language, and
|since version 2.16, glibc header files don't expose the function >|declaration if the _ISOC11_SOURCE feature test macro is defined.

Ironically, in ciforth I implemented (ACCEPT). That has the
functionality of gets(). However it returns (addr length) and
identifies a part of the input buffer. So you can never
overwrite anything, because it doesn't write anything.

<SNIP>

- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Hans Bezemer on Wed Mar 13 07:20:34 2024

Hans Bezemer <the.beez.speaks@gmail.com> writes:

In a perfect world I'd have a word:
- That puts *three* parameters on the stack: limit, start and step;
- That evaluates these three parameters and leaves a flag
- That takes this flag and skips the loop if zero.

Let's call the word that initializes these actions "+DO". +DO equals (
limit index step -- R: limit index step)

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

which is paired with LOOP. Both produce the same addresses (if ubytes
is a multiple of +nstride), but MEM-DO in reverse order.

One could add a BOUNDS+DO that works like your +DO, but I would first
have to see if it is needed.

Concerning the name +DO, this is taken in Gforth since at least
Gforth-0.2 (1996) for entering a loop only if index<limit (signed
comparison), without providing a stride.

Compare: https://rosettacode.org/wiki/Loops/Wrong_ranges#uBasic/4tH
To the rather weak: https://rosettacode.org/wiki/Loops/Wrong_ranges#Forth

Note that 4tH behaves different here. It catches most of the exceptional >situations:

start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2
start: -2 stop: 2 inc: -1 | -2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2
start: 2 stop: 2 inc: 1 | 2
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2
start: 0 stop: 0 inc: 0 | 0

Versus:

Some of these loop infinitely, and some under/overflow, so for the sake
of brevity long outputs will be truncated by ....

start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
start: 2 stop: 2 inc: 1 | 2 3 4 5 ... -2 -1 0 1
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2 2 2 2 2 ...
start: 0 stop: 0 inc: 0 | 0 0 0 0 0 ...

I still don't think 4tH's performance is perfect, but it's a tradeoff
between compatibility and intuitive behavior.

You showed the DO version in Forth, which is indeed rather weak for
the practically occuring index=limit case. For that we have ?DO,
which shows:

start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
start: 2 stop: 2 inc: 1 |
start: 2 stop: 2 inc: -1 |
start: 2 stop: 2 inc: 0 |
start: 0 stop: 0 inc: 0 |

The 0 +LOOP case (second line) does not occur in practice. I
recommend to not use

?DO ... -1 +LOOP

because the behaviour of ?DO is not consistent with that of -1 +LOOP
when index=limit. The rosettacode tests don't show this inconsistency
clearly, though. Gforth has

-DO ... 1 -LOOP

for decrementing in each step by 1, but it seems to me that the
rosettacode task is intended to use the same counted-loop construct
for both cases. If you, say, write

2 -2 +DO ... -1 +LOOP

You will get the same result as in the third line, but you asked for
it.

For the fifth line, if you use

-2 2 +DO ... 1 +LOOP

the result is that the loop is not entered.

Overall, for

: test-seq ( start stop inc -- )
cr rot dup ." start: " 2 .r
rot dup ." stop: " 2 .r
rot dup ." inc: " 2 .r ." | "
-rot swap +do i . dup +loop drop ;
-2 2 1 test-seq
-2 2 0 test-seq
-2 2 -1 test-seq
-2 2 10 test-seq
2 -2 1 test-seq
2 2 1 test-seq
2 2 -1 test-seq
2 2 0 test-seq
0 0 0 test-seq

the output is:

start: -2 stop: 2 inc: 1 | -2 -1 0 1 ok
start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2 ok
start: -2 stop: 2 inc: 10 | -2 ok
start: 2 stop: -2 inc: 1 | ok
start: 2 stop: 2 inc: 1 | ok
start: 2 stop: 2 inc: -1 | ok
start: 2 stop: 2 inc: 0 | ok
start: 0 stop: 0 inc: 0 | ok

The same as the ?DO variant except for the "start: 2 stop: -2 inc: 1"
case.

I don't consider performing one iteration if index=limit good
behaviour.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to Anton Ertl on Wed Mar 13 10:00:05 2024

Anton Ertl wrote:
[..]

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

[..]

Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area
or at the address of the first item to process?

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Anton Ertl on Wed Mar 13 09:24:36 2024

Anton Ertl wrote:

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

which is paired with LOOP. Both produce the same addresses (if ubytes
is a multiple of +nstride), but MEM-DO in reverse order.

A very handy addition when working with arrays. I use similar words

.. NEXT and <FOR .. NEXT \ index N for 1-dim vectors

.. NEXT and <<FOR .. NEXT \ indices X Y for 2-dim arrays.

Recently I also added a "runtime control flow stack" to my system to hold
loop indices. I just hated UNLOOP et al too much. ;-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to dxf on Wed Mar 13 14:15:49 2024

dxf wrote:

On 13/03/2024 9:00 pm, mhx wrote:

Anton Ertl wrote:
[..]

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

[..]

Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

Make one using BEGIN WHILE REPEAT. That's what Forth is for.

Scratch with the chickens, don't fly with the eagles! ;-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to mhx on Wed Mar 13 16:41:37 2024

mhx@iae.nl (mhx) writes:

Anton Ertl wrote:
[..]

A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect

MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )

[..]

Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP

I used the locals stack for the stride in the general case (when the
stride is not a constant). If MEM+DO works correctly, that value is
cleaned up automatically. Let's see if it works correctly:

: foo pad swap dup mem+do unloop exit loop ;
: bar 123 {: a :} cell foo a . ;
bar

This prints 123, so it works as intended. Let's see if LEAVE also
works as it should:

: foo 123 {: a :} pad swap dup mem+do leave loop a . ;
cell foo

This also prints 123 as it should.

and does one point at the start of the area
or at the address of the first item to process?

For MEM+DO addr is the first item to process, for MEM-DO the last.
I.e., you use exactly the same parameters whether you process the
array forwards with MEM+DO or backwards with MEM-DO, as long as ubytes
is a multiple of +nstride.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to albert@spenarnc.xs4all.nl on Wed Mar 13 18:03:56 2024

albert@spenarnc.xs4all.nl writes:

So [Algol68] nil + reference takes the same place as NULL + pointer in c.

I'm unfamiliar with Algol68 but if every reference in it can be set to
nil, that sounds like the same error that Algol-W had. The alternative,
using an option value, means: 1) if the reference is not wrapped by an
option type, then it is guaranteed to not be null; 2) if it is wrapped
by an option type, then the compiler can stop you (or at least warn you)
if you try to dereference without first checking that it is non-null.

You are supposed to test for this case, but if you fail you get a "Segmentation fault". As far as Forth goes, that is pretty
satisfactory security.

For sure, it is usually better to crash than to keep running and give
nonsense answers. Of course that usually requires a hardware fault on dereferencing a null pointer, rather than giving whatever is at location
0 in memory like on unprotected machines.

Beyond not giving wrong answers, it's usually nice if your program
doesn't crash too often, especially from program bugs. Getting help
from the compiler for that is often useful.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From albert@spenarnc.xs4all.nl@21:1/5 to no.email@nospam.invalid on Thu Mar 14 09:16:55 2024

In article <87bk7h1v5v.fsf@nightsong.com>,
Paul Rubin <no.email@nospam.invalid> wrote:

albert@spenarnc.xs4all.nl writes:

So [Algol68] nil + reference takes the same place as NULL + pointer in c.

I'm unfamiliar with Algol68 but if every reference in it can be set to
nil, that sounds like the same error that Algol-W had. The alternative, >using an option value, means: 1) if the reference is not wrapped by an
option type, then it is guaranteed to not be null; 2) if it is wrapped
by an option type, then the compiler can stop you (or at least warn you)
if you try to dereference without first checking that it is non-null.

You are supposed to test for this case, but if you fail you get a
"Segmentation fault". As far as Forth goes, that is pretty
satisfactory security.

For sure, it is usually better to crash than to keep running and give >nonsense answers. Of course that usually requires a hardware fault on >dereferencing a null pointer, rather than giving whatever is at location
0 in memory like on unprotected machines.

Algol68 doesn't crash. It gives a run time error of the type
dereferencing a <nil> (<ref> <ref> <my_struct> aap) on line .. of ...
called from line .. of ..
..
called from line .. of main

Beyond not giving wrong answers, it's usually nice if your program
doesn't crash too often, especially from program bugs. Getting help
from the compiler for that is often useful.

You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to albert@spenarnc.xs4all.nl on Thu Mar 14 14:52:43 2024

albert@spenarnc.xs4all.nl writes:

Algol68 doesn't crash. It gives a run time error of the type

Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.

You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.

No idea about Algol68 but in (at least some) other languages, the idea
of having references instead of pointers is that it is impossible to
create an uninitialised reference.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Paul Rubin on Fri Mar 15 09:26:11 2024

Paul Rubin wrote:

albert@spenarnc.xs4all.nl writes:

Algol68 doesn't crash. It gives a run time error of the type

Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.

You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.

No idea about Algol68 but in (at least some) other languages, the idea
of having references instead of pointers is that it is impossible to
create an uninitialised reference.

In Forth parlance: unless you're doing system programming where you need it, don't use direct memory operations like @ ! MOVE, etc. This also prohibits
the use of VARIABLE. VARIABLES are uninitialized and are accessed by @ !.

So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
in cleaner code and improves memory safety.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to minforth on Fri Mar 15 11:37:55 2024

minforth@gmx.net (minforth) writes:

In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.

That helps but I'm sure there are other hazards. What do you do about
arrays? What about ALLOT or ALLOCATE?

At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.

So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
in cleaner code and improves memory safety.

Yes I should start doing that too. I only mess with Forth for fun
though. I feel like it helps me stay sharp compared with safer
languages, even including C. I'm not old enough to have written
significant amounts of machine code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Paul Rubin on Fri Mar 15 19:55:07 2024

Paul Rubin wrote:

minforth@gmx.net (minforth) writes:

In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.

That helps but I'm sure there are other hazards. What do you do about arrays?

Arrays are heap-allocated dynamic objects with access methods. Direct memory access is virtually impossible (but with "carnal knowledge"). There
is an array stack for more complex operations and a chain of array values
for persistent storage. Stack and array values contain only pointers.

F.ex.
XZ14[ designates array (matrix or vector) value XZ14
<index or indices> ] reads a vector/matrix element
<index or indices> ]! writes to a vector/matrix element
M"[ 2 1 ] from 3rd array on array stack read 1st element in 2nd row
(M[ M'[ M"[ designate top, second and third matrix on array stack)
XZ14[ ]' pushes transposed matrix XZ14 onto array stack

XZ14 (or TO XZ14) writes top matrix to array value XZ14

et cetera

IOW there is a special word set for array operations. Operators check
that there is no memory violation like index out of bounds, and do some housekeeping like (re)allocating memory.

What about ALLOT or ALLOCATE?

Above word set would be overkill for normal Forth applications.
Nevertheless you could SEAL your search order and exclude or make
safer versions of ALLOT et al for your application wordlist.
I never understood why SEAL did not make it into ANS Forth's
Search-Order word set, as it is just a simple SET-ORDER thing.

At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do in general.

Yes and no. It is easy to forget correct initialization when 0 is wrong.
VALUEs explicitly require conscious initialization.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to dxf on Sat Mar 16 09:35:13 2024

dxf wrote:

At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.

That's something I'd do for VALUEs should I move to omit the numeric
prefix at creation. By automatically initializing VALUEs with 0, I can pretend - if only to myself - that VALUEs are different from VARIABLEs.

Indeed, if you only work with integers in cell size, VARIABLEs and some
code discipline are sufficient.

VALUEs are like variants in VBA. You can only change them with TO <NAME>,
and TO (alias =>) is the same for all data types. The standard also writes locals and FVALUEs with TO. Non-standard $VALUEs (for dynamic strings) or DVALUEs/ZVALUEs can be very practical too. I also use range-limited VALUEs. None of this works with VARIABLEs.

When you implement your type-specific TO variants with built-in
appropriate checking, you are on the safer side.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Sat Mar 16 10:20:16 2024

minforth@gmx.net (minforth) writes:

Non-standard $VALUEs (for dynamic strings) or
DVALUEs/ZVALUEs can be very practical too.

2VALUE is standard.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Anton Ertl on Sat Mar 16 11:13:07 2024

Anton Ertl wrote:

minforth@gmx.net (minforth) writes:

Non-standard $VALUEs (for dynamic strings) or
DVALUEs/ZVALUEs can be very practical too.

2VALUE is standard.

2VALUEs are for cell pairs. DVALUEs do not exist, because
the standard assumes equivalency of double numbers and cell
pairs (although mathematically they are not).
ZVALUEs are for complex numbers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to All on Sun Mar 17 07:30:07 2024

Interesting. I didn't know that the TO concept was coined by Moore (before Bartholdi).

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tristan Wibberley@21:1/5 to minforth on Tue Mar 19 19:57:35 2024

On 05/03/2024 14:03, minforth wrote:

Tristan Wibberley wrote:

Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.

You are still in for some nasty surprises with "public market" ARM CPUs. f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment

And then we're not even trying to talk about what's in use and for sale
today but rather what will be in use over the next 6 decades. Most of
the historical peculiarities that are eliminated with more complex
hardware instead of longer software can be expected to be present at
some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those
peculiarities wouldn't have been present if there weren't some
efficiency earned.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Tristan Wibberley on Tue Mar 19 21:39:41 2024

Tristan Wibberley wrote:

On 05/03/2024 14:03, minforth wrote:

Tristan Wibberley wrote:

Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.

You are still in for some nasty surprises with "public market" ARM CPUs.
f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment

And then we're not even trying to talk about what's in use and for sale
today but rather what will be in use over the next 6 decades. Most of
the historical peculiarities that are eliminated with more complex
hardware instead of longer software can be expected to be present at
some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those
peculiarities wouldn't have been present if there weren't some
efficiency earned.

Although repeatedly proclaimed dead, we can still observe Moore's Law.
With the increasing 3-dimensional design of CPUs and the hunger for massive computing power through AI applications, the trend is likely to continue. Another driver is the need for lower energy consumption.

This means that as the complexity of systems grows almost exponentially,
the consequences of software errors will become increasingly dangerous in
the same magnitude. Just as a professional electrician only works with insulated tools, a professional programmer should also choose his tools,
e.g. programming languages, which do not allow even simple errors to occur
in the first place. They should also use operating systems and software containers equipped with protective functions.

These means of protection that already exist today are not available in
archaic programming languages such as C or Forth. Stoic language
conservativism (a tenor in standard Forth) won't help.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	507
Nodes:	16 (2 / 14)
Uptime:	158:47:31
Calls:	9,952
Files:	13,822
Messages:	6,353,644

push for memory safe languages -- impact on Forth

Who's Online

System Info