FHYPOT sqrt(a^2+b^2)
FHYPOT sqrt(a^2+b^2)
This is a nice one (that iForth does
not have) because FHYPOT is not only
more efficient but also documents a
tricky numerical problem.
Thanks.
In my apps I added for convenience
FINV alias 1/F
F2* F2/
FHYPOT sqrt(a^2+b^2)
FMA horner a*b+c
On 11/29/23 08:29, minforth wrote:
Thanks.
In my apps I added for convenience
FINV alias 1/F
F2* F2/
FHYPOT sqrt(a^2+b^2)
FMA horner a*b+c
FINV is also a commonly needed word, instead of writing
"1.0E0 FSWAP F/".
The other most useful word for vector/matrix code is F+!, which also
improves the efficiency, readability, and compactness of code. Use of
F+! can be found in the FSL modules.
F+! has common usage and is easily comprehensible so it may be time to
enter it formally into the Forth floating point lexicon.
minforth wrote:
FHYPOT sqrt(a^2+b^2)
This is a nice one (that iForth does
not have) because FHYPOT is not only
more efficient but also documents a
tricky numerical problem.
-marcelGroetjes Albert
Krishna Myneni wrote:
The other most useful word for vector/matrix code is F+!, which also
improves the efficiency, readability, and compactness of code. Use of
F+! can be found in the FSL modules.
F+! has common usage and is easily comprehensible so it may be time to
enter it formally into the Forth floating point lexicon.
May I add F*! for scalar operations on vector/matrix elements
It should make the code for loops which scale arrays more compact, but typically, it is more rare to loop over a sequence of scalars which
multiply a single array element (value at a fixed address) than it is to
loop over a sequence of scalars which accumulate into a single array
element e.g. matrix multiplication.
Krishna Myneni wrote:
It should make the code for loops which scale arrays more compact, but
typically, it is more rare to loop over a sequence of scalars which
multiply a single array element (value at a fixed address) than it is to
loop over a sequence of scalars which accumulate into a single array
element e.g. matrix multiplication.
Matrix multiplication (if not available as a primitive or from an external >library) is an example.
Matrix multiplication (if not available as a primitive or from an external >>library) is an example.
Not in my experience. Matrix multiplication always multiplies one
element of one matrix with one element of the other matrix. Since you
still need both matrices, you do not want to use F*! for that. Matrix multiplication adds a number of the products of these multiplications;
e.g., for a 1000x1000 matrix multiply, it sums up 1000 products
resulting in one element of the target matrix. F+! can be used for
that.
But for these kinds of things, it's better to use specialized code,
such as OpenBLAS. E.g., if you look at slides 80 and 87 of https://www.complang.tuwien.ac.at/anton/lvas/efficient.pdf, you see
that OpenBLAS is >13 times as fast for 1000x1000 matrix multiplication
(on a Tiger Lake CPU) than a straightforward scalar implementation of
matrix multiplication that uses the best loop nesting. Compared to
the naive variant that uses a dot product, the speedup exceeds a
factor of 25 (slide 78). Even when the auto-vectorization of gcc
kicks in (with -O3), the result is still >5 times slower than
OpenBLAS.
"THP" on these slides means that transparent huge pages are enabled
and kick in (there is no guarantee that they kick in if they are
enabled).
minforth@gmx.net (minforth) writes:
Krishna Myneni wrote:
It should make the code for loops which scale arrays more compact, but
typically, it is more rare to loop over a sequence of scalars which
multiply a single array element (value at a fixed address) than it is to >>> loop over a sequence of scalars which accumulate into a single array
element e.g. matrix multiplication.
Matrix multiplication (if not available as a primitive or from an external >>library) is an example.
Not in my experience. Matrix multiplication always multiplies one
element of one matrix with one element of the other matrix. Since you
still need both matrices, you do not want to use F*! for that. Matrix >multiplication adds a number of the products of these multiplications;
e.g., for a 1000x1000 matrix multiply, it sums up 1000 products
resulting in one element of the target matrix. F+! can be used for
that.
But for these kinds of things, it's better to use specialized code,
such as OpenBLAS. E.g., if you look at slides 80 and 87 of >https://www.complang.tuwien.ac.at/anton/lvas/efficient.pdf, you see
that OpenBLAS is >13 times as fast for 1000x1000 matrix multiplication
(on a Tiger Lake CPU) than a straightforward scalar implementation of
matrix multiplication that uses the best loop nesting. Compared to
the naive variant that uses a dot product, the speedup exceeds a
factor of 25 (slide 78). Even when the auto-vectorization of gcc
kicks in (with -O3), the result is still >5 times slower than
OpenBLAS.
"THP" on these slides means that transparent huge pages are enabled
and kick in (there is no guarantee that they kick in if they are
enabled).
- anton
--
In article <2023Dec2.080651@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
But for these kinds of things, it's better to use specialized code,
such as OpenBLAS. E.g., if you look at slides 80 and 87 of >>https://www.complang.tuwien.ac.at/anton/lvas/efficient.pdf, you see
that OpenBLAS is >13 times as fast for 1000x1000 matrix multiplication
(on a Tiger Lake CPU) than a straightforward scalar implementation of >>matrix multiplication that uses the best loop nesting. Compared to
the naive variant that uses a dot product, the speedup exceeds a
factor of 25 (slide 78). Even when the auto-vectorization of gcc
kicks in (with -O3), the result is still >5 times slower than
OpenBLAS.
"THP" on these slides means that transparent huge pages are enabled
and kick in (there is no guarantee that they kick in if they are
enabled).
This is an excellent opportunity to introduce a single assembler
routine that does a huge speed up.
Approximately a vector times vector multiplication with
specified start addresses, specified strides, and a length.
'v*' ( f-addr1 nstride1 f-addr2 nstride2 ucount -- r ) gforth-0.5 "v-star"
dot-product: r=v1*v2. The first element of v1 is at f_addr1, the
next at f_addr1+nstride1 and so on (similar for v2). Both vectors have ucount elements.
However, note that the dot-product variant is slower than OpenBLAS by
a factor of 25. The best scalar implementation from slide 80 is quite
a bit faster (Factor 13 slower than OpenBLAS) and can be implemented
with
albert@cherry.(none) (albert) writes:
This is an excellent opportunity to introduce a single assembler
routine that does a huge speed up.
Approximately a vector times vector multiplication with
specified start addresses, specified strides, and a length.
You mean something like:
'v*' ( f-addr1 nstride1 f-addr2 nstride2 ucount -- r ) gforth-0.5 "v-star"
dot-product: r=v1*v2. The first element of v1 is at f_addr1, the
next at f_addr1+nstride1 and so on (similar for v2). Both vectors have >ucount elements.
However, note that the dot-product variant is slower than OpenBLAS by
a factor of 25. The best scalar implementation from slide 80 is quite
- anton
In article <2023Dec2.174433@mips.complang.tuwien.ac.at>,[..]
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
albert@cherry.(none) (albert) writes:
Loosing that much imagining using all 8 registers of the 8087 stack
is astonishing, if V* really is implemented in assembler.
Anton Ertl wrote:
'v*' ( f-addr1 nstride1 f-addr2 nstride2 ucount -- r ) gforth-0.5 "v-star" >> dot-product: r=v1*v2. The first element of v1 is at f_addr1, the
next at f_addr1+nstride1 and so on (similar for v2). Both vectors have
ucount elements.
However, note that the dot-product variant is slower than OpenBLAS by
a factor of 25. The best scalar implementation from slide 80 is quite
a bit faster (Factor 13 slower than OpenBLAS) and can be implemented
with
It is not only about speed, but also about minimising calculation errors.
For example, naive dot product summation in a single loop, which is >unfortunately what gforth does, is prone to accumulating rounding errors.
Nothing to blame here, but library functions are often "very smart".
The BLAS implementations seem to be only about speed. None that I am
aware of uses, e.g., Kahan summation to reduce rounding errors.
In article <2023Dec2.174433@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
albert@cherry.(none) (albert) writes:
This is an excellent opportunity to introduce a single assembler
routine that does a huge speed up.
Approximately a vector times vector multiplication with
specified start addresses, specified strides, and a length.
You mean something like:
'v*' ( f-addr1 nstride1 f-addr2 nstride2 ucount -- r ) gforth-0.5 "v-star"
dot-product: r=v1*v2. The first element of v1 is at f_addr1, the
next at f_addr1+nstride1 and so on (similar for v2). Both vectors have >>ucount elements.
However, note that the dot-product variant is slower than OpenBLAS by
a factor of 25. The best scalar implementation from slide 80 is quite
Loosing that much imagining using all 8 registers of the 8087 stack
is astonishing, if V* really is implemented in assembler.
If you do a more sophisticated version with at least 8 fp registers >available, you can prefetch easily 2 fp numbers in advance for
each stride.
Anton Ertl wrote:
The BLAS implementations seem to be only about speed. None that I am
aware of uses, e.g., Kahan summation to reduce rounding errors.
Kahan summation gives good results but can be very slow. As a good >compromise, I prefer recursive summation of vector halves for dot products, >until their size is small enough to fit into vector chunks ready for >CPU-supported vector operations or intrinsics.
Wikipedia has a small article on this called Pairwise Summation.
In article <e93ff88202425b32916bae8123adf0b2@news.novabbs.com>,
minforth <minforth@gmx.net> wrote:
Anton Ertl wrote:
The BLAS implementations seem to be only about speed. None that I am
aware of uses, e.g., Kahan summation to reduce rounding errors.
Kahan summation gives good results but can be very slow. As a good >>compromise, I prefer recursive summation of vector halves for dot products, >>until their size is small enough to fit into vector chunks ready for >>CPU-supported vector operations or intrinsics.
Wikipedia has a small article on this called Pairwise Summation.
Summing numbers that mean something, result in a sum whose error
is dominated with the maximum error of the summands.
Imagine a fly landing on the top of a church and a flee on top of
that. If you measure the height of the church precise to one mm,
the total height cannot be made more precise on reordering the
summands.
So I think it is mostly academic.
That is because OpenBLAS uses AVX2 with all cores working
in parallel.
Anton Ertl wrote:
The BLAS implementations seem to be only about speed. None that I am
aware of uses, e.g., Kahan summation to reduce rounding errors.
Kahan summation gives good results but can be very slow. As a good >compromise, I prefer recursive summation of vector halves for dot products, >until their size is small enough to fit into vector chunks ready for >CPU-supported vector operations or intrinsics.
Summing numbers that mean something, result in a sum whose error
is dominated with the maximum error of the summands.
1e30 1e f+ -1e30 f+ 1e 0e f~ .
produces 0 (false), even though with exact summation it would produce
true (-1). Of course, you may say that these numbers mean nothing to
you, but you are not the only one in the world.
mhx@iae.nl (mhx) writes:
That is because OpenBLAS uses AVX2 with all cores working
in parallel.
I expect that it uses AVX-512 on the Tiger Lake which I measured. My >measurements used only one core. Using more cores increases the CPU
cycles needed (due to parallelization overhead), although it reduces
the elapsed time.
- anton--
albert@cherry.(none) (albert) writes:
Summing numbers that mean something, result in a sum whose error
is dominated with the maximum error of the summands.
1e30 1e f+ -1e30 f+ 1e 0e f~ .
produces 0 (false), even though with exact summation it would produce
true (-1). Of course, you may say that these numbers mean nothing to
you, but you are not the only one in the world.
- anton--
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html >comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023
Krishna Myneni wrote:
It should make the code for loops which scale arrays more compact, but
typically, it is more rare to loop over a sequence of scalars which
multiply a single array element (value at a fixed address) than it is
to loop over a sequence of scalars which accumulate into a single
array element e.g. matrix multiplication.
Matrix multiplication (if not available as a primitive or from an external library) is an example. In other numerical matrix algorithms, pivoting is
is rather common, which involves scalar column or row multiplication.
Most occurrences in my code involve shifting and scaling of vectors.
Anton Ertl wrote:
1e30 1e f+ -1e30 f+ 1e 0e f~ .
produces 0 (false), even though with exact summation it would produce
true (-1). Of course, you may say that these numbers mean nothing to
you, but you are not the only one in the world.
Take the number of years the big bang happened (14.5 billion years ago), >square it and multiply by the height of Church St. Spirit in meters for
good measure. A photon will travel 1e30 meters in that amount of years.
Now add 1 meter ...
I would be interested to have a comparable time with the examples
done by OpenBlas with one core.
The example of matrix multiplication was not a good fit for F+!. We
usually accumulate the sum on the stack and then store it at the
destination in the matrix.
mhx@iae.nl (mhx) writes:[..]
Anton Ertl wrote:
1e30 1e f+ -1e30 f+ 1e 0e f~ .
produces 0 (false), even though with exact summation it would produce
true (-1). Of course, you may say that these numbers mean nothing to
you, but you are not the only one in the world.
Take the number of years the big bang happened (14.5 billion years ago), >>square it and multiply by the height of Church St. Spirit in meters for >>good measure. A photon will travel 1e30 meters in that amount of years.
Now add 1 meter ...
So? Yes, it seems that the typical answer to issues of numerical
errors has been to
1) Replace fixed point with floating point, so you don't have to do
analysis for scaling.
2) Use wider FP types, so you may be able to do without numerical
analysis (or if you still would need it, you have the hope of missing
the cases where you need it). I think that iForth uses 80-bit FP
numbers. Why?
3) Use examples like the above to convince themselves that numerical
analysis is not needed.
So? Yes, it seems that the typical answer to issues of numerical
errors has been to
1) Replace fixed point with floating point, so you don't have to do
analysis for scaling.
2) Use wider FP types, so you may be able to do without numerical
analysis (or if you still would need it, you have the hope of missing
the cases where you need it). I think that iForth uses 80-bit FP
numbers. Why?
3) Use examples like the above to convince themselves that numerical
analysis is not needed.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
The example of matrix multiplication was not a good fit for F+!. We
usually accumulate the sum on the stack and then store it at the
destination in the matrix.
Who is "we"?
Looking at
<http://theforth.net/package/matmul/current-view/matmul.4th>, the
fastest version on all systems that does not use a primitive FAXPY
is version 2, and that spends most of its time in:
: faxpy-nostride ( ra f_x f_y ucount -- )
\ vy=ra*vx+vy
dup >r 3 and 0 ?do
fdup over f@ f* dup f+! float+ swap float+ swap
loop
r> 2 rshift 0 ?do
fdup over f@ f* dup f+! float+ swap float+ swap
fdup over f@ f* dup f+! float+ swap float+ swap
fdup over f@ f* dup f+! float+ swap float+ swap
fdup over f@ f* dup f+! float+ swap float+ swap
\ better performance on gforth-fast:
\ fdup swap dup f@ f* float+ swap dup f@ f+ dup f! float+
loop
2drop fdrop ;
As you can see, it uses F+!.
- anton
The need for numerical analysis could be reduced in a processor that
allows a data item to be of variable length, or to span multiple cells:
In his book "The End of Error"[1] John Gustafson presents a core model[2]
of his (Type 1) Unums. This data type allows both fields of a float to be
of variable length, so that '*/' is redundant, being numerically the same
as '* /'.
He also claims 50% processing power reduction for inherently compressed
data, and less supervision of data due to all bits in data being valid,
and none being lost by fixed-format constraints.
Might it be significantly simpler to implement variable-length data in hardware on a zero-operand processor than a register based one?
jan Coombs wrote:
The need for numerical analysis could be reduced in a processor that
allows a data item to be of variable length, or to span multiple cells:
In his book "The End of Error"[1] John Gustafson presents a core model[2]
of his (Type 1) Unums. This data type allows both fields of a float to be
of variable length, so that '*/' is redundant, being numerically the same
as '* /'.
He also claims 50% processing power reduction for inherently compressed
data, and less supervision of data due to all bits in data being valid,
and none being lost by fixed-format constraints.
Might it be significantly simpler to implement variable-length data in
hardware on a zero-operand processor than a register based one?
Thanks for mentioning this. There is indeed a need for reduced, adaptable
fp formats, especially in AI systems. See also the 'Motivation' section in >https://github.com/stillwater-sc/universal
There are already some experimental libraries using unum posits for various >programming languages. Is there any Forth code that uses unums?
But development will be slow as long as GPU hardware is cheap and readily >available for faster time-to-market: >https://www.windowscentral.com/microsoft/microsoft-to-spend-dollar32-billion-on-uks-ai-infrastructure-that-should-bring-more-than-20000-of-the-most-advanced-gpus-to-the-uk-by-2026
But development will be slow as long as GPU hardware is cheap and readily >>available for faster time-to-market: >>https://www.windowscentral.com/microsoft/microsoft-to-spend-dollar32-billion-on-uks-ai-infrastructure-that-should-bring-more-than-20000-of-the-most-advanced-gpus-to-the-uk-by-2026
I doubt the necessity of fp formats in ai. 256 levels of uncertainty
must be plenty.
...Take the number of years the big bang happened (14.5 billion years ago), >>>square it and multiply by the height of Church St. Spirit in meters for >>>good measure. A photon will travel 1e30 meters in that amount of years. >>>Now add 1 meter ...
You misinterpret my posting. I find it illuminating when technical
problems are visualized ( "2nm line-width means four Si atoms across" ).
The need for numerical analysis could be reduced in a processor that
allows a data item to be of variable length, or to span multiple cells:
In his book "The End of Error"[1] John Gustafson presents a core model[2]
of his (Type 1) Unums. This data type allows both fields of a float to be
of variable length, so that '*/' is redundant, being numerically the same
as '* /'.
Might it be significantly simpler to implement variable-length data in >hardware on a zero-operand processor than a register based one?
jan Coombs <jan4comp.lang.forth@murray-microft.co.uk> writes:[...]
Might it be significantly simpler to implement variable-length data in >hardware on a zero-operand processor than a register based one?
No. Variable-length data is always a pain. E.g., see strings in Forth.
Use of the sequence "FDUP F*" is ubiquitous in Forth scientific code for
lack of a common word which squares an fp number. This not only is less readable but does not convey as much meaning to anyone who is reading
the code.
I've updated the FSL modules in kForth (32, Win32, and 64) to remove use
all instances of "FDUP F*" with the (built-in) word FSQUARE. Some FSL
modules provided definitions of FSQR for the same function (by MHX) and
I replaced these instances with FSQUARE which I find more readable and
less error-prone due to the proximity of FSQR to FSQRT.
R R>F (already discussed)
FSL has memory-mapped flocals. Can't be worse than reliance on FPICK and FROLL.
Krishna Myneni wrote:
Use of the sequence "FDUP F*" is ubiquitous in Forth scientific code
for lack of a common word which squares an fp number. This not only is
less readable but does not convey as much meaning to anyone who is
reading the code.
I've updated the FSL modules in kForth (32, Win32, and 64) to remove
use all instances of "FDUP F*" with the (built-in) word FSQUARE. Some
FSL modules provided definitions of FSQR for the same function (by
MHX) and I replaced these instances with FSQUARE which I find more
readable and less error-prone due to the proximity of FSQR to FSQRT.
Regarding code readability when no fp locals are available:
Standard Forth only defines a reduced number of fp stack operations. I
added
FPICK (like PICK) FROLL (like ROLL)
-FROLL (like ROLL reversed)
R R>F (already discussed)Of course this only works if the FP stack is fully accessible e.g.
memory mapped.
On 12/11/23 16:42, minforth wrote:
Standard Forth only defines a reduced number of fp stack operations. I
added
FPICK (like PICK) FROLL (like ROLL)
-FROLL (like ROLL reversed)
R R>F (already discussed)Of course this only works if the FP stack is fully accessible e.g.
memory mapped.
Yes, I have added FPICK as an intrinsic word in kForth-64, and have
source definitions of F>R and FR> (your R>F, which is actually a better name). But I think FRISE may reduce/eliminate the need for F>R etc. When
the FP stack resides in memory and can be accessed using a pointer, it's
easy to implement FRISE in source to assess its usefulness.
The flocals implementation in the FSL is substantially worse. Unlike
using fp stack operations, one can't write re-entrant words with the FSL implementation of flocals.
Krishna Myneni wrote:
Use of the sequence "FDUP F*" is ubiquitous in Forth scientific code for
lack of a common word which squares an fp number. This not only is less
readable but does not convey as much meaning to anyone who is reading
the code.
I've updated the FSL modules in kForth (32, Win32, and 64) to remove use
all instances of "FDUP F*" with the (built-in) word FSQUARE. Some FSL
modules provided definitions of FSQR for the same function (by MHX) and
I replaced these instances with FSQUARE which I find more readable and
less error-prone due to the proximity of FSQR to FSQRT.
Regarding code readability when no fp locals are available:
Standard Forth only defines a reduced number of fp stack operations. I added
FPICK (like PICK)
FROLL (like ROLL)
-FROLL (like ROLL reversed)
R R>F (already discussed)
Of course this only works if the FP stack is fully accessible e.g. memory mapped.
Krishna Myneni wrote:
The flocals implementation in the FSL is substantially worse. Unlike
using fp stack operations, one can't write re-entrant words with the FSL
implementation of flocals.
I don't understand. This should be awkward, but ok?
8 CONSTANT /flocals
: (frame) ( n -- ) FLOATS ALLOT ;
: FRAME|
0 >R
BEGIN BL WORD COUNT 1 =
SWAP C@ [CHAR] | =
AND 0=
WHILE POSTPONE F, R> 1+ >R
REPEAT
/FLOCALS R> - DUP 0< ABORT" too many flocals"
POSTPONE LITERAL POSTPONE (frame) ; IMMEDIATE
: |FRAME ( -- ) [ /FLOCALS NEGATE ] LITERAL (FRAME) ;
: &h HERE [ 1 FLOATS ] LITERAL - ;
: &g HERE [ 2 FLOATS ] LITERAL - ;
: &f HERE [ 3 FLOATS ] LITERAL - ;
: &e HERE [ 4 FLOATS ] LITERAL - ;
: &d HERE [ 5 FLOATS ] LITERAL - ;
: &c HERE [ 6 FLOATS ] LITERAL - ;
: &b HERE [ 7 FLOATS ] LITERAL - ;
: &a HERE [ 8 FLOATS ] LITERAL - ;
: a &a F@ ;
: b &b F@ ;
: c &c F@ ;
: d &d F@ ;
: e &e F@ ;
: f &f F@ ;
: g &g F@ ;
: h &h F@ ;
-marcel--
Krishna Myneni wrote:
The flocals implementation in the FSL is substantially worse. Unlike
using fp stack operations, one can't write re-entrant words with the
FSL implementation of flocals.
I don't understand. This should be awkward, but ok?
8 CONSTANT /flocals
: (frame) ( n -- ) FLOATS ALLOT ;
: FRAME|
0 >R
BEGIN BL WORD COUNT 1 =
SWAP C@ [CHAR] | =
AND 0=
WHILE POSTPONE F, R> 1+ >R
REPEAT
/FLOCALS R> - DUP 0< ABORT" too many flocals"
POSTPONE LITERAL POSTPONE (frame) ; IMMEDIATE
: |FRAME ( -- ) [ /FLOCALS NEGATE ] LITERAL (FRAME) ;
: &h HERE [ 1 FLOATS ] LITERAL - ;
: &g HERE [ 2 FLOATS ] LITERAL - ;
: &f HERE [ 3 FLOATS ] LITERAL - ;
: &e HERE [ 4 FLOATS ] LITERAL - ;
: &d HERE [ 5 FLOATS ] LITERAL - ;
: &c HERE [ 6 FLOATS ] LITERAL - ;
: &b HERE [ 7 FLOATS ] LITERAL - ;
: &a HERE [ 8 FLOATS ] LITERAL - ;
: a &a F@ ;
: b &b F@ ;
: c &c F@ ;
: d &d F@ ;
: e &e F@ ;
: f &f F@ ;
: g &g F@ ;
: h &h F@ ;
Krishna Myneni wrote:
On 12/11/23 16:42, minforth wrote:
Standard Forth only defines a reduced number of fp stack operations.
I added
FPICK (like PICK) FROLL (like ROLL)
-FROLL (like ROLL reversed)
R R>F (already discussed)Of course this only works if the FP stack is fully accessible e.g.
memory mapped.
Yes, I have added FPICK as an intrinsic word in kForth-64, and have
source definitions of F>R and FR> (your R>F, which is actually a
better name). But I think FRISE may reduce/eliminate the need for F>R
etc. When the FP stack resides in memory and can be accessed using a
pointer, it's easy to implement FRISE in source to assess its usefulness.
You have defined RISE as in 2 RISE i*x a b c d -- i*x b a c d et cetera
I don't really have an application where a position swap in the depths
of the stack would fit, because Forth operations always only use the
top stack element(s).
Then rather something like in
2 FLIP i*x a b c d -- d b c a
The depth 2 RISE/FRISE would provide the function I was originally
asking for, but the general version is similar to FPICK. Admittedly,
whether the general FRISE has application for other depths remains to be seen.
Perhaps an on-fpstack sorting routine?
Thanks for mentioning this. There is indeed a need for reduced, adaptable
fp formats, especially in AI systems. See also the 'Motivation' section in https://github.com/stillwater-sc/universal
There are already some experimental libraries using unum posits for various programming languages. Is there any Forth code that uses unums?
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 475 |
Nodes: | 16 (2 / 14) |
Uptime: | 18:31:01 |
Calls: | 9,487 |
Calls today: | 6 |
Files: | 13,617 |
Messages: | 6,121,092 |