<https://forth-standard.org/proposals/special-memory-access-words>
Ideally you will comment on the proposal there.
At the 2023 Forth200x meeting we discussed various proposals for words
like w@, I presented what Gforth has, and the committee tasked me to
write this up as a proposal. I have now done so, and you can find it
at
<https://forth-standard.org/proposals/special-memory-access-words>
Ideally you will comment on the proposal there.
- anton
At the 2023 Forth200x meeting we discussed various proposals for words[My apologies if this message appears twice in this thread. I did not
like w@, I presented what Gforth has, and the committee tasked me to
write this up as a proposal. I have now done so, and you can find it
at
<https://forth-standard.org/proposals/special-memory-access-words>
Ideally you will comment on the proposal there.
- anton
The C library uses htonl, ntohl etc. which
are alternatives you might have considered.
In particular, I don't like the necessity of two separate steps to fetch
a sign-extended word, preferring instead two separate words, one for
unsigned fetch and one for sign extended fetch.
These words are indispensable for writing portable code between 32-bit
and 64-bit systems.
I do not have X@ or X! which are simply @ and ! on a 64-bit system.
Rarely used, but anyhow: __int128 also require aligned addresses,
afaik at least when used with gcc. But perhaps it has more to
do with the __int128 or pointer implementation within gcc.
Rarely used, but anyhow: __int128 also require aligned addresses,
afaik at least when used with gcc. But perhaps it has more to
do with the __int128 or pointer implementation within gcc.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
In particular, I don't like the necessity of two separate steps to fetch
a sign-extended word, preferring instead two separate words, one for
unsigned fetch and one for sign extended fetch.
That used to be my position, too, but if we add the need to deal with different byte orders, this results in
sw@ uw@ be-sw@ be-uw@ le-sw@ le-uw@
and when you have the precomposed words for fetching, you also want
them for storing:
w! be-w! le-w!
And another 9 words for l, and another 9 words for x. And if you also
add stuff like w, etc., precomposing leads to even more words.
That is the memory access proposal from Federico de Ceballos, but the committe (in particular, Leon Wagner) has experimented with it and
found that the number of words is too high.
s
s
s
One idea have is to provide a library that defines the precomposed
words in terms of the decomposed ones.
These words are indispensable for writing portable code between 32-bit
and 64-bit systems.
I have good experiences with Forth's cell, char, float model for
portability and bad experiences with the portability of C code, thanks
to its large number of integer types: you can produce portable C code,
but unless you test it on both 32-bit and 64-bit systems, I would not
bet on its portability, while debugged Forth code often is also
portable.
One of the reasons I'm a proponent of the explicit prefix fetch words is
that I have used them for working with structures provided by C
libraries, and they keep me from making mistakes. In 64-bit libraries, structures often pack 32-bit fields contiguously to keep 64-bit
alignment with the 64-bit fields.
ISTM that the propsed wordset supports bi-endianness because of
some special CPUs, like eg RISC-V.
While perhaps useful there,
I am wondering whether such special requirements merit to be part
of a global standard.
ISTM that the propsed wordset supports bi-endianness because of
some special CPUs, like eg RISC-V. While perhaps useful there,
I am wondering whether such special requirements merit to be part
of a global standard.
So IMO the proposed wordset could be reduced even more, because
a.m. requirements are rather user/application specific.
RISC-V, however, is not bi-endian, but little-endian.
Anton Ertl wrote:
RISC-V, however, is not bi-endian, but little-endian.
Not wanting to be picky, it is claimed here (under Hardware): >https://en.wikipedia.org/wiki/Endianness#Current_architectures
ISTM that the propsed wordset supports bi-endianness because of
some special CPUs, like eg RISC-V. While perhaps useful there,
I am wondering whether such special requirements merit to be part
of a global standard.
So IMO the proposed wordset could be reduced even more, because
a.m. requirements are rather user/application specific.
Maybe it is possible to "reverse-the-charges"?
If an non-mainstream or new Forth implementation wants to claim
compatibility (or use standard code), it has to provide the
Standard words.
For that to work, the standard should proclaim that it assumes
bytes
big-endian, and adresses that are cell-sized. Probably
a few things more: division by 0 traps, no exception on overflow,
addresses are unsigned and grow from 0, ...
On 2024-06-22 10:44, Anton Ertl wrote:...
In the discussion of the special memory access words proposal the
question has come up whether the proposal should make extra effort to
support Forth systems with address units >8 bits.
This question can be decomposed in the following subquestions:
* Are there Forth systems with address units >8 bits?
A standard-compliant one:
- WAForth — WebAssembly-based
<https://github.com/remko/waforth>
<https://mko.re/waforth/thurtle/>
- WASM Forth — WebAssembly-based, Python-based
<https://github.com/stefano/wasm-forth>
* Do these Forth systems implement the standard or at least take the
standard as a guideline?
Yes. Here are some examples.
- Yoda — Bash-based cell-addressed Forth-system
<https://github.com/Bushmills/yoda>
- jsforth — JavaScript-based, cell-addressed Forth-system
<https://github.com/brendanator/jsforth>
<https://brendanator.github.io/jsForth/>
Even if the special memory access words will be targeted to
byte-addressed systems only, it should not mean that any standard system >shall be byte-addressed.
* Division by 0 results in an ambiguous condition. There would be
little gain from requiring that it traps. OTOH, the cost would also
be small.
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
* Division by 0 results in an ambiguous condition. There would be
little gain from requiring that it traps. OTOH, the cost would also
be small.
The trap would require some extra code on risc-v.
On 2024-06-22 20:58, Anton Ertl wrote:
Ruvim <ruvim.pinka@gmail.com> writes:
- jsforth — JavaScript-based, cell-addressed Forth-system
<https://github.com/brendanator/jsforth>
<https://brendanator.github.io/jsForth/>
Core, Core plus (what's that?) and Core extension words fully
implemented. It has 32-bit address units and cells. This system
seems to communicate to the outside world through JavaScript, and it
does not implement, e.g., open-file. So I doubt that the proposed
words will be implemented by this system even if the standard is
complicated to cater for address units >8 bits.
After all, JavaScript has WebSockets, and it allows to implement
*binary* network protocols that specify values in bits and endianness.
Just an example of such a binary protocol:
PostgreSQL Frontend/Backend Protocol, Message Data Types ><https://www.postgresql.org/docs/current/protocol-message-types.html>
One more idea.
It seems, in almost any system we can have a separate byte-based address >space. For an address in this space, 1+ produces the address of the next >consecutive byte.
For example, let's consider a cell-addressed, little-endian
Forth-system, where one cell is 32 bits, and several most significant
bits of addresses are always 0.
: byte-address ( addr -- b-addr ) #2 lshift ;
On 2024-06-23 21:10, Anton Ertl wrote:
Ruvim <ruvim.pinka@gmail.com> writes:
It seems, in almost any system we can have a separate byte-based address >>> space. For an address in this space, 1+ produces the address of the next >>> consecutive byte.
For example, let's consider a cell-addressed, little-endian
Forth-system, where one cell is 32 bits, and several most significant
bits of addresses are always 0.
: byte-address ( addr -- b-addr ) #2 lshift ;
The BCPL approach in reverse. Just say No!
Having two incompatible address types was bad in BCPL (and AmigaDOS
programmers can show you their scars from this mistake), and it would
be bad in Forth.
Well, it's not obvious to me why this is bad.
If systems like jsforth want to go there, they should implement it and
establish common practice about such things. It will be interesting
to see how this approach works out with, e.g., 20-bit cells.
It will not work if addresses use all bits in a cell.
The only way that I can see is to use double-cell size addresses to
refer individual bytes (or even bits).
A question to all.
In your application code, if you want to read exactly one byte at an
address, do you prefer to use the word "c@" or "b@"?
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 475 |
Nodes: | 16 (2 / 14) |
Uptime: | 19:20:28 |
Calls: | 9,487 |
Calls today: | 6 |
Files: | 13,617 |
Messages: | 6,121,093 |