Forum: >>> Magnum BBS <<<

Proposal: Special memory access words

From Anton Ertl@21:1/5 to All on Fri Jun 14 15:35:23 2024

At the 2023 Forth200x meeting we discussed various proposals for words
like w@, I presented what Gforth has, and the committee tasked me to
write this up as a proposal. I have now done so, and you can find it
at
<https://forth-standard.org/proposals/special-memory-access-words>

Ideally you will comment on the proposal there.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Fri Jun 14 17:14:27 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

<https://forth-standard.org/proposals/special-memory-access-words>
Ideally you will comment on the proposal there.

I'm not set up for that, but I don't currently have comments besides
"looks ok, and worthwhile". The C library uses htonl, ntohl etc. which
are alternatives you might have considered.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Fri Jun 14 21:14:16 2024

On 6/14/24 10:35, Anton Ertl wrote:

At the 2023 Forth200x meeting we discussed various proposals for words
like w@, I presented what Gforth has, and the committee tasked me to
write this up as a proposal. I have now done so, and you can find it
at
<https://forth-standard.org/proposals/special-memory-access-words>

Ideally you will comment on the proposal there.

- anton

kForth provides special memory access words, largely based on what has
been used in other Forth systems, including Gforth. They are in use in a substantial base of code.

In particular, I don't like the necessity of two separate steps to fetch
a sign-extended word, preferring instead two separate words, one for
unsigned fetch and one for sign extended fetch.

Below are the special memory access words implemented in kForth (32 and
64 bit versions):

SW@ ( a -- n ) fetch 16-bit word, sign-extended to cell size
UW@ ( a -- u ) fetch 16-bit word as unsigned number
SL@ ( a -- n ) fetch 32-bit word, sign-extended to cell size
UL@ ( a -- u ) fetch 32-bit word as unsigned number
W! ( n|u a -- ) store 16-bit word
L! ( n|u a -- ) store 32-bit word

These words are indispensable for writing portable code between 32-bit
and 64-bit systems.

I do not have X@ or X! which are simply @ and ! on a 64-bit system.

W@ used to exist prior to SW@ in kForth, with the same function, but has
been deprecated due to inconsistent meaning in different Forth systems.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Fri Jun 14 21:19:42 2024

On 6/14/24 10:35, Anton Ertl wrote:

At the 2023 Forth200x meeting we discussed various proposals for words
like w@, I presented what Gforth has, and the committee tasked me to
write this up as a proposal. I have now done so, and you can find it
at
<https://forth-standard.org/proposals/special-memory-access-words>

Ideally you will comment on the proposal there.

- anton

[My apologies if this message appears twice in this thread. I did not
see it post on my first attempt.]

kForth provides special memory access words, largely based on what has
been used in other Forth systems, including Gforth. They are in use in a substantial base of code.

In particular, I don't like the necessity of two separate steps to fetch
a sign-extended word, preferring instead two separate words, one for
unsigned fetch and one for sign extended fetch.

Below are the special memory access words implemented in kForth (32 and
64 bit versions):

SW@ ( a -- n ) fetch 16-bit word, sign-extended to cell size
UW@ ( a -- u ) fetch 16-bit word as unsigned number
SL@ ( a -- n ) fetch 32-bit word, sign-extended to cell size
UL@ ( a -- u ) fetch 32-bit word as unsigned number
W! ( n|u a -- ) store 16-bit word
L! ( n|u a -- ) store 32-bit word

These words are indispensable for writing portable code between 32-bit
and 64-bit systems.

I do not have X@ or X! which are simply @ and ! on a 64-bit system.

W@ used to exist prior to SW@ in kForth, with the same function, but has
been deprecated due to inconsistent meaning in different Forth systems.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Sat Jun 15 07:09:47 2024

Paul Rubin <no.email@nospam.invalid> writes:

The C library uses htonl, ntohl etc. which
are alternatives you might have considered.

Both htonl() and ntohl() correspond to LBE.
Both htons() and ntohs() correspond to WBE.

The C functions are specified in POSIX.1-2001.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Sat Jun 15 07:19:52 2024

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

In particular, I don't like the necessity of two separate steps to fetch
a sign-extended word, preferring instead two separate words, one for
unsigned fetch and one for sign extended fetch.

That used to be my position, too, but if we add the need to deal with
different byte orders, this results in

sw@ uw@ be-sw@ be-uw@ le-sw@ le-uw@

and when you have the precomposed words for fetching, you also want
them for storing:

w! be-w! le-w!

And another 9 words for l, and another 9 words for x. And if you also
add stuff like w, etc., precomposing leads to even more words.

That is the memory access proposal from Federico de Ceballos, but the
committe (in particular, Leon Wagner) has experimented with it and
found that the number of words is too high.

One idea have is to provide a library that defines the precomposed
words in terms of the decomposed ones.

These words are indispensable for writing portable code between 32-bit
and 64-bit systems.

I have good experiences with Forth's cell, char, float model for
portability and bad experiences with the portability of C code, thanks
to its large number of integer types: you can produce portable C code,
but unless you test it on both 32-bit and 64-bit systems, I would not
bet on its portability, while debugged Forth code often is also
portable.

I discovered one exception recently: brainless produced different
results on 32-bit systems and 64-bit systems. I found that the reason
was that it used double-cells on 32-bit systems and single-cells on
64-bit systems, and it sometimes accesses only one cell. If it had
always accessed double-cells with 2@ and 2!, it would have worked fine
(and that's my fix).

I do not have X@ or X! which are simply @ and ! on a 64-bit system.

@ and ! require aligned addresses, X@ and X! don't. Also X@ and X!
will continue to access 64-bit values even on systems where cells are
larger than 64 bits (if it ever comes to that).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Jun 15 08:28:51 2024

Rarely used, but anyhow: __int128 also require aligned addresses,
afaik at least when used with gcc. But perhaps it has more to
do with the __int128 or pointer implementation within gcc.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Jun 15 12:02:06 2024

;-D one never knows what "progress" will show up around the
next corner. But quadmath is already there with _int128 as its
"natural" integer companion.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to minforth on Sat Jun 15 11:44:59 2024

minforth wrote:

Rarely used, but anyhow: __int128 also require aligned addresses,
afaik at least when used with gcc. But perhaps it has more to
do with the __int128 or pointer implementation within gcc.

I solve that by making ALLOCATE return 16-byte aligned addresses.
Maybe I should change that to 32-byte aligned for the next release.

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Sat Jun 15 15:36:52 2024

minforth@gmx.net (minforth) writes:

Rarely used, but anyhow: __int128 also require aligned addresses,
afaik at least when used with gcc. But perhaps it has more to
do with the __int128 or pointer implementation within gcc.

Not sure how any of this C discussion is relevant to the proposal or
to clf, but anyway:

C memory accesses usually have alignment requirements, and if you want unaligned accesses, the solution that gcc advocates advocate is to use
memcpy() from memory to a variable of the type, or from a variable of
the type to memory, and the gcc and clang optimizer supposedly creates
good code out of that; counterevidence: <2016Jan17.174821@mips.complang.tuwien.ac.at> <2020May5.102110@mips.complang.tuwien.ac.at> <2022Dec14.132439@mips.complang.tuwien.ac.at>. I am sure that the
fact that some less sophisticated C compilers suffer extremely from
this recommendation is just a coincidence.

__int128 sounds like a gcc extension. I expect that in many cases on
AMD64 it is loaded into the GPRs with mov instructions that do not
require alignment, but I expect that in some cases __int128 values are
loaded into xmm registers usign instructions like movdqa that trap on
unaligned accesses.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Sun Jun 16 07:27:24 2024

On 6/15/24 02:19, Anton Ertl wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

In particular, I don't like the necessity of two separate steps to fetch
a sign-extended word, preferring instead two separate words, one for
unsigned fetch and one for sign extended fetch.

That used to be my position, too, but if we add the need to deal with different byte orders, this results in

sw@ uw@ be-sw@ be-uw@ le-sw@ le-uw@

and when you have the precomposed words for fetching, you also want
them for storing:

w! be-w! le-w!

And another 9 words for l, and another 9 words for x. And if you also
add stuff like w, etc., precomposing leads to even more words.

That is the memory access proposal from Federico de Ceballos, but the committe (in particular, Leon Wagner) has experimented with it and
found that the number of words is too high.

IMO, when it comes to memory access, and arithmetic, demanding that one
say explicitly what one wants to do is preferable to reducing the word
count. The two step process for sign-extended fetches opens up room for inadvertent programming mistakes, e.g.

sw@ versus w@ w>s

Also, when trying to debug code, one has to question whether or not the programmer accidentally omitted the sign-extension part.

If you want to stick with the two step-process, I suggest renaming the
fetch words to explicitly state that the fetches are unsigned:

w@ --> uw@
l@ --> ul@
x@ --> ux@

and the sign-extension conversion words to

s

s

s

This will force the programmer to explicitly state what he is doing. It
also maintains consistency with existing practice in Forths such as
Gforth and kForth, while allowing word count reduction.

The store words do not need a signed/unsigned prefix, so that's not
relevant.

One idea have is to provide a library that defines the precomposed
words in terms of the decomposed ones.

These words are indispensable for writing portable code between 32-bit
and 64-bit systems.

I have good experiences with Forth's cell, char, float model for
portability and bad experiences with the portability of C code, thanks
to its large number of integer types: you can produce portable C code,
but unless you test it on both 32-bit and 64-bit systems, I would not
bet on its portability, while debugged Forth code often is also
portable.

One of the reasons I'm a proponent of the explicit prefix fetch words is
that I have used them for working with structures provided by C
libraries, and they keep me from making mistakes. In 64-bit libraries, structures often pack 32-bit fields contiguously to keep 64-bit
alignment with the 64-bit fields.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Pelc@21:1/5 to All on Mon Jun 17 13:37:46 2024

On 16 Jun 2024 at 14:27:24 CEST, "Krishna Myneni" <krishna.myneni@ccreweb.org> wrote:

One of the reasons I'm a proponent of the explicit prefix fetch words is
that I have used them for working with structures provided by C
libraries, and they keep me from making mistakes. In 64-bit libraries, structures often pack 32-bit fields contiguously to keep 64-bit
alignment with the 64-bit fields.

The committee has been round this loop several times. Although I
support your perspective - readable code is good, the countervailing
position has been to minimise the word set, and this position seems
to be more popuar at the moment.

I doubt that the supporters of either camp can change their minds, so
it is likely that no generalised memory access word set will achieve
nearly unanimous acceptance. I have been involved in at least two
formal proposals, and I'm not going to do it again.

Stephen
--
Stephen Pelc, stephen@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com
MPE website
http://www.vfxforth.com/downloads/VfxCommunity/
downloads

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Mon Jun 17 14:31:03 2024

ISTM that the propsed wordset supports bi-endianness because of
some special CPUs, like eg RISC-V. While perhaps useful there,
I am wondering whether such special requirements merit to be part
of a global standard.

So IMO the proposed wordset could be reduced even more, because
a.m. requirements are rather user/application specific.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Mon Jun 17 15:10:11 2024

minforth@gmx.net (minforth) writes:

ISTM that the propsed wordset supports bi-endianness because of
some special CPUs, like eg RISC-V.

It's unclear what you mean here.

Bi-endianness is a property of several architectures (MIPS, Power, ARM
among them) that you can use both byte orders on the same hardware.
That is typically decided at OS booting; E.g., SGI's IRIX used
big-endian MIPS, whereas DEC's Ultrix used little-endian MIPS. There
has also been the interesting case that Power, while always bi-endian
in theory has been used with big-endian OSs for a quarter-century, but
they switched Linux (not AIX) to little-endian with the Power ISA 3.0
and Power 9.

RISC-V, however, is not bi-endian, but little-endian.

Maybe you are thinking that words like WBE, WLE etc. are proposed due
to CPUs with different byte orders, but that's not the case. If all
we wanted was native byte order accesses (whether the native order is big-endian or little-endian), these words would be unnecessary: w@ w!
etc. already perform natively ordered accesses.

These words are there so that you can deal with data coming in from
the outside with a defined byte order; e.g., in many Internet
protocols the network byte order is big-endian. If you want to
process these data, words like WBE LBE XBE are useful. There are also protocols like X where the server and the client agree on a particular
byte order (and I have seen cases where that was implemented wrongly).
If you process file system data coming from disks, the data may be in little-endian byte order (especially if the file system was created on
a little-endian platform), and WLE LLE XLE are useful in accessing
them.

While perhaps useful there,
I am wondering whether such special requirements merit to be part
of a global standard.

If you don't want to exchange binary data with other computers, you
don't need these words. JSON and CSV (or, if you are an old-timer,
XML) have their merits, but I have yet to see a file system that
stores its metadata as JSON. And I expect that protocols like X that
value performance so highly that they negotiate the byte order in
order to avoid conversion at both ends in some cases won't pay for the
JSON overhead.

Of course, these days we see lots of claims that X is dead and the
Wayland is the future, but last I looked Wayland missed important
features, including working across the 'net. But anyway, consider
protocols like RDP or RFB with a similar purpose, but RDP and RFB
moves much more data (X moves commands, while RDP and RFB move frame
contents), so efficiency is even more important there than for X.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From albert@spenarnc.xs4all.nl@21:1/5 to minforth on Mon Jun 17 17:27:26 2024

In article <1a0376ba40f433e70aa7e60484300340@www.novabbs.com>,
minforth <minforth@gmx.net> wrote:

ISTM that the propsed wordset supports bi-endianness because of
some special CPUs, like eg RISC-V. While perhaps useful there,
I am wondering whether such special requirements merit to be part
of a global standard.

So IMO the proposed wordset could be reduced even more, because
a.m. requirements are rather user/application specific.

I'm a proponent of the separation of concerns.
Loading and storing of 8 16 32 64 bit entities.

Sign extension if needed explicitly.
Reverse byte order if needed explicitly.
Define extra words for automatic sign extension, maybe in assembler,
if this bothers you, in *your application*.

In a context of an assembler the meaning can be switched between
big endian and little endian. This doesn't need to be reflected
in the words itself. Maybe in the namespace (wordlist on top of
the search order).

In ciasdis (my universal assembler) i86 i386 Pentium AMD 6809 8080
I get by with
B@ W@ L@ Q@ ( byte fetch, word fetch, long fetch, Quadword fetch ).
B! W! L! Q! ( byte store, word store, long store, Quadword store ).

I hate 16@ , because it looks like a number. Has Intel not defined
the suffix @ to mean a number in base 23 ;-)

In short I agree with Pelc.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Anton Ertl on Mon Jun 17 16:55:49 2024

Anton Ertl wrote:

RISC-V, however, is not bi-endian, but little-endian.

Not wanting to be picky, it is claimed here (under Hardware): https://en.wikipedia.org/wiki/Endianness#Current_architectures

But you seem to be right, a quick search on the net came up with little-endianness mostly.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Mon Jun 17 22:34:25 2024

minforth@gmx.net (minforth) writes:

Anton Ertl wrote:

RISC-V, however, is not bi-endian, but little-endian.

Not wanting to be picky, it is claimed here (under Hardware): >https://en.wikipedia.org/wiki/Endianness#Current_architectures

What is claimed there?

From the horses mouth <https://riscv.org/wp-content/uploads/2019/06/riscv-spec.pdf>:

Page v:
|The base ISA has been defined to have a little-endian memory system,
|with big-endian or bi-endian as non-standard variants.

Page 9:
|RISC-V base ISAs have little-endian memory systems.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Pelc@21:1/5 to All on Tue Jun 18 11:59:21 2024

On 17 Jun 2024 at 16:31:03 CEST, "minforth" <minforth> wrote:

ISTM that the propsed wordset supports bi-endianness because of
some special CPUs, like eg RISC-V. While perhaps useful there,
I am wondering whether such special requirements merit to be part
of a global standard.

So IMO the proposed wordset could be reduced even more, because
a.m. requirements are rather user/application specific.

MPE has both a TCP/IP stack in high level Forth and a USB stack (MSC
and CDC) in high level Forth. One needs some big-endian memory access
and the other needs little-endian. It is common for embedded systems
to use both packages.

Stephen
--
Stephen Pelc, stephen@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com
MPE website
http://www.vfxforth.com/downloads/VfxCommunity/
downloads

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to All on Sat Jun 22 06:44:55 2024

In the discussion of the special memory access words proposal the
question has come up whether the proposal should make extra effort to
support Forth systems with address units >8 bits.

This question can be decomposed in the following subquestions:

* Are there Forth systems with address units >8 bits?

That's typically Forth systems for word-addressed hardware. Of
course Chuck Moore's hardware (in particular, the cores of the
Greenarrays machines) and the Forth systems that run on it come to
mind. Any others?

* Do these Forth systems implement the standard or at least take the
standard as a guideline?

If not, there is little point in catering to these systems in the
standard. AFAIK the systems for Chuck Moore's hardware ignore the
standard.

* Would programs on these systems actually use the special memory
access words?

I.e. do they exchange data with the wider computing world, and are
they prepared for the inefficiency that results from using these
words for this purpose (only 8 bits are used per address unit, and,
e.g., l@ accesses 4 address units)?

Can you name systems that satisfy all three criteria?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to All on Sat Jun 22 08:51:43 2024

Maybe it is possible to "reverse-the-charges"?
If an non-mainstream or new Forth implementation wants to claim
compatibility (or use standard code), it has to provide the
Standard words.
For that to work, the standard should proclaim that it assumes
bytes, big-endian, and adresses that are cell-sized. Probably
a few things more: division by 0 traps, no exception on overflow,
addresses are unsigned and grow from 0, ...

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to mhx on Sat Jun 22 10:00:53 2024

mhx@iae.nl (mhx) writes:

Maybe it is possible to "reverse-the-charges"?
If an non-mainstream or new Forth implementation wants to claim
compatibility (or use standard code), it has to provide the
Standard words.

That is already the case. A standard system has to provide the core
words and implement them such that they behave as specified in the
standard.

For that to work, the standard should proclaim that it assumes
bytes

The standard already specifies the properties it requires.

Byte addressing has not been among them. That's why I ask whether
requiring byte addressing for the special memory access words would be
a real problem (of course, it's a theoretical problem, but we don't
complicate the standard to cater for theoretical systems).

The main cost would actually be that the standard would get additional
wording to cover the additional cases that make the text harder to
understand. The cost in terms of additional words would be minimal
(especially for byte-addressed systems).

big-endian, and adresses that are cell-sized. Probably
a few things more: division by 0 traps, no exception on overflow,
addresses are unsigned and grow from 0, ...

Concerning the other aspects:

* Little-endian pretty much has won in general-purpose computing, the
remaining big-endian systems are either at their end-of-life (SPARC)
or are niche systems like AIX on Power and S390x. For embedded
computing, I don't know the market that well, but it seems that ARM
and RISC-V (both little-endian) are gaining market share, and
big-endian platforms like Coldfire are on their way out.

In any case, the benefit of requiring little-endian byte order would
be small: We would not need the words WLE, LLE, XLE, but these words
are noops on little-endian systems anyway. So I don't think that
such a requirement is worth the cost, at least for now (but given
the longevity of S/390x (based on the S/360, which is 60 years old
this year), I would not hold my breath for that changing.

* Addresses are cell-sized (see 3.1).

* Division by 0 results in an ambiguous condition. There would be
little gain from requiring that it traps. OTOH, the cost would also
be small.

* Integer overflow (except for division) is ignored (see 3.2.2.2).
The next standard specifies the result of such an overflow to follow
modulo arithmetic rules
<http://www.forth200x.org/twos-complement.html>

* Addresses are unsigned (see 3.1.1). It's not clear what you mean
with "grow from 0".

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Ruvim on Sat Jun 22 16:58:43 2024

Ruvim <ruvim.pinka@gmail.com> writes:

On 2024-06-22 10:44, Anton Ertl wrote:

In the discussion of the special memory access words proposal the
question has come up whether the proposal should make extra effort to
support Forth systems with address units >8 bits.

This question can be decomposed in the following subquestions:

* Are there Forth systems with address units >8 bits?

...

A standard-compliant one:

- WAForth — WebAssembly-based
<https://github.com/remko/waforth>
<https://mko.re/waforth/thurtle/>

This has 8-bit address units (and 1 CHARS = 1)

VARIABLE X
-1 X C!
X C@ .

outputs 255 on WAForth.

- WASM Forth — WebAssembly-based, Python-based
<https://github.com/stefano/wasm-forth>

I have not tried this. It only implements the CORE words, so
extension words that have additional requirements (whether it's an FP
stack or byte addressing) is of no concern to this system.

* Do these Forth systems implement the standard or at least take the
standard as a guideline?

Yes. Here are some examples.

- Yoda — Bash-based cell-addressed Forth-system
<https://github.com/Bushmills/yoda>

Given that the base system is already 100,000 times slower than a more efficient implementation, paying a factor 10 or so for l@ should not
be a big issue.

- jsforth — JavaScript-based, cell-addressed Forth-system
<https://github.com/brendanator/jsforth>
<https://brendanator.github.io/jsForth/>

Core, Core plus (what's that?) and Core extension words fully
implemented. It has 32-bit address units and cells. This system
seems to communicate to the outside world through JavaScript, and it
does not implement, e.g., open-file. So I doubt that the proposed
words will be implemented by this system even if the standard is
complicated to cater for address units >8 bits.

Even if the special memory access words will be targeted to
byte-addressed systems only, it should not mean that any standard system >shall be byte-addressed.

No. It would only mean that a system that implements these words must
be byte-addressed, just like a system that implements the FP word set
has to implement an FP stack in Forth-2012 (and systems that don't,
don't).

And actually that's not even necessary. My current plan is to
describe the words as working in a byte-addressed system, and put the explanation of how to implement it on a system with address unit >8
bits in the Rationale. As a result, only those who actually have this complication need to read this stuff, and I think that even for those
the result will be easier to understand than if everything was
described through abstractions.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Sun Jun 23 00:03:30 2024

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

* Division by 0 results in an ambiguous condition. There would be
little gain from requiring that it traps. OTOH, the cost would also
be small.

The trap would require some extra code on risc-v.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Sun Jun 23 15:53:19 2024

Paul Rubin <no.email@nospam.invalid> writes:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

* Division by 0 results in an ambiguous condition. There would be
little gain from requiring that it traps. OTOH, the cost would also
be small.

The trap would require some extra code on risc-v.

Sure, and on ARM T32 and ARM A64, and on Power, and several EOLed architectures. The cost is still small. The gforth engine on RISC-V
has:

see /s
Code /s
15D42: sd s9,$50(s10)
15D46: addi s9,s9,8
15D48: ld s8,0(s11)
15D4C: li a5,-1
15D4E: ld s6,8(s11)
15D52: addi s7,s11,8
15D56: bne s8,a5,$15D68
15D5A: slli a5,a5,$3F
15D5C: bne s6,a5,$15D72
15D60: li a0,$-B
15D62: jal ra,$24EA6
15D66: j $15D72
15D68: bne s8,zero,$15D72
15D6C: li a0,$-A
15D6E: jal ra,$24EA6
15D72: div s6,s6,s8
15D76: mv s11,s7
15D78: sd s6,0(s7)
15D7C: ld a4,0(s9)
15D80: jr a4
end-code

This checks for both division by zero and overflow on division
(minint/-1). The division by zero check is:

15D68: bne s8,zero,$15D72
15D6C: li a0,$-A
15D6E: jal ra,$24EA6

The overflow check is:

15D4C: li a5,-1
15D56: bne s8,a5,$15D68
15D5A: slli a5,a5,$3F
15D5C: bne s6,a5,$15D72
15D60: li a0,$-B
15D62: jal ra,$24EA6

The jal performs a throw, with the argument in a0. Throw code $-A is
"division by zero", $-B is "result out of range".

This probably would be more efficient on an in-order core like the U74
if the div instruction was moved to before the checks (with
appropriate register renaming of the result), but if we do the
reordering in the C code, there is the danger that some gcc version
will "optimize" one or both checks away (the joys of undefined
behaviour). In theory gcc could to the reordering, but it obviously
does not do so.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Ruvim on Sun Jun 23 17:10:33 2024

Ruvim <ruvim.pinka@gmail.com> writes:

On 2024-06-22 20:58, Anton Ertl wrote:

Ruvim <ruvim.pinka@gmail.com> writes:

- jsforth — JavaScript-based, cell-addressed Forth-system
<https://github.com/brendanator/jsforth>
<https://brendanator.github.io/jsForth/>

Core, Core plus (what's that?) and Core extension words fully
implemented. It has 32-bit address units and cells. This system
seems to communicate to the outside world through JavaScript, and it
does not implement, e.g., open-file. So I doubt that the proposed
words will be implemented by this system even if the standard is
complicated to cater for address units >8 bits.

After all, JavaScript has WebSockets, and it allows to implement
*binary* network protocols that specify values in bits and endianness.

Just an example of such a binary protocol:
PostgreSQL Frontend/Backend Protocol, Message Data Types ><https://www.postgresql.org/docs/current/protocol-message-types.html>

I expect such a system to do in JavaScript what a more mainstream
Forth system would do with the special memory access words. No need
to complicate the proposal for such a system.

One more idea.

It seems, in almost any system we can have a separate byte-based address >space. For an address in this space, 1+ produces the address of the next >consecutive byte.

For example, let's consider a cell-addressed, little-endian
Forth-system, where one cell is 32 bits, and several most significant
bits of addresses are always 0.

: byte-address ( addr -- b-addr ) #2 lshift ;

The BCPL approach in reverse. Just say No!

Having two incompatible address types was bad in BCPL (and AmigaDOS
programmers can show you their scars from this mistake), and it would
be bad in Forth. Fortunately, Forth has found a better way to deal
with byte-addressed machines, and AFAICT, Forth is more popular than
BCPL despite BCPL having an Algol syntax and Forth not; I think the
BCPL approach to dealing with bytes has had much to do with that. I
won't go there.

If systems like jsforth want to go there, they should implement it and establish common practice about such things. It will be interesting
to see how this approach works out with, e.g., 20-bit cells.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Ruvim on Mon Jun 24 06:41:02 2024

Ruvim <ruvim.pinka@gmail.com> writes:

On 2024-06-23 21:10, Anton Ertl wrote:

Ruvim <ruvim.pinka@gmail.com> writes:

It seems, in almost any system we can have a separate byte-based address >>> space. For an address in this space, 1+ produces the address of the next >>> consecutive byte.

For example, let's consider a cell-addressed, little-endian
Forth-system, where one cell is 32 bits, and several most significant
bits of addresses are always 0.

: byte-address ( addr -- b-addr ) #2 lshift ;

The BCPL approach in reverse. Just say No!

Having two incompatible address types was bad in BCPL (and AmigaDOS
programmers can show you their scars from this mistake), and it would
be bad in Forth.

Well, it's not obvious to me why this is bad.

It leads to bugs where the wrong kind of address is provided or
expected. It leads to complications in designing the words where you
now have to deal with two kinds of addresses and design your words to
expect or provide the right one. And in cases where the usage of the
word includes both kinds of addresses, perform the conversion before
and/or after the call, or have two functionally parallel words, one
for each kind of address; or maybe more, if you want to support
various combinations for the different parameters and return values.

Actually, it's worse than the BCPL approach: On the Amiga in BCPL the
two kinds of addresses were clearly distinct, i.e., the conversion
were not nops, and any mistakes in usage would be found quickly in
testing.

In the suggested approach, on the widely-used byte-addressed Forth
systems the conversion words would be nops, and having one too many,
too few, or in the wrong direction or in the wrong place would not
become apparent in testing. You would have to test on a system where
the address unit >8 bits to find the mistake, and it's likely that the
program won't test there for other reasons (e.g., because it uses
OPEN-FILE). We have seen with CHARS how well that has worked. Even
those who wanted to write Forth-94 standard programs could not test
that their programs actually complied. We finally accepted reality
and standardized 1 chars = 1.

My preferred alternative of reading and writing the data in a byte-per-address-unit format (or maybe converting between a packed and
a byte-per-address-unit format) has the same problem, of course, but
at a smaller scale: if we standardize words for doing this reading and
writing, or this conversion, on a byte-addressed machine you cannot
determine by testing that you did the OPEN-FILE without a BYTEWISE
fam, where it would be appropriate. However, the places where
BYTEWISE would have to be inserted are far fewer, making it much easer
to get right without testing, or to insert missing instances of
BYTEWISE.

For the variant where there is a conversion between packed and
bytewise representations in memory the number of places to consider is
between the fam approach and the two-kinds-of-address approach, so I
would rather recommend the fam approach.

If we do not standardize words for systems with address units >8 bits
(and I currently don't plan to propose such words, because I would
like to see some existing practice before proposing such words), the
situation is actually not that much difference from if we standardize
them: many programs will not use these words either way, and it's best
to go for the variant that requires the least changes.

[two kinds of addresses]

If systems like jsforth want to go there, they should implement it and
establish common practice about such things. It will be interesting
to see how this approach works out with, e.g., 20-bit cells.

It will not work if addresses use all bits in a cell.

That's also the case for jsforth. jsforth can address 16GB (4G
cells), but 32-bit byte addresses can only address 4GB.

The only way that I can see is to use double-cell size addresses to
refer individual bytes (or even bits).

On one hand, requiring double-cell addresses for W@ etc. will
certainly ensure that most or all mistakes in converting between the
address types will be found in testing.

On the other hand, double-cell addresses for W@ etc. conflicts with
existing practice and is very likely to lead to a proposal that
proposes it being rejected. I also doubt that the that the users of
systems with address units >8 bits would prefer it over the fam
approach, and that the implementors of such systems would implement
such words.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Mon Jun 24 12:23:43 2024

You asked.... I prefer the word IN
to read an 18 bit word from an ADC

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From sjack@21:1/5 to Ruvim on Mon Jun 24 15:26:06 2024

Ruvim <ruvim.pinka@gmail.com> wrote:

A question to all.

In your application code, if you want to read exactly one byte at an
address, do you prefer to use the word "c@" or "b@"?

You didn't go back in time far enough. BYTE started as polymorphic but
quickly became vulgar for 8-bits. OCTATE would be specific for 8-bits,
so O@ .
You can ignore the above; just for grins. I prefer B@ but only
because I can't think of a use for a backtracking fetch.

--
me

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	507
Nodes:	16 (2 / 14)
Uptime:	189:07:00
Calls:	9,958
Files:	13,826
Messages:	6,356,107

Proposal: Special memory access words

Who's Online

System Info