The code below, which requires a separate FP stack in memory, has been...
tested on kForth-64 and Gforth. ...
=== start of fstarslash.4th ===
\ fstarslash.4th
\
\ Implementation of F*/ on 64-bit Forth
\
\ Krishna Myneni, 2024-05-19
\
\ F*/ ( F: r1 r2 r3 -- r )
\ Multiply two IEEE-754 double precision floating point
\ numbers r1 and r2 and divide by r3 to obtain the result,
\ r. The intermediate product significand, r1*r2, is
\ represented with at least IEEE quad precision for both
\ its significand and exponent.
\
\ F*/ uses an intermediate significand represented by
\ 128 bits
\
\ Tests
1 [IF]
[UNDEFINED] T{ [IF] include ttester.4th [THEN]
1e-17 rel-near f!
1e-17 abs-near f!
TESTING F*/
\ These results are different using F* and F/
t{ 1.0e300 -3.0e259 1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e-300 3.0e-259 1.0e-280 f*/ -> 3.0e-279 }t
t{ 1.0e300 -3.0e259 -1.0e280 f*/ -> 3.0e279 }t
t{ -1.0e300 -3.0e259 -1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e302 -3.0e302 DP_+INF f*/ -> -0.0e0 }t
t{ -1.0e-302 -1.0e-302 0.0e0 f*/ -> DP_+INF }t
\ The following should give same results as using F* and F/
t{ 0.0e0 3.0e302 1.0e-100 f*/ -> 0.0e0 }t
t{ 3.0e302 0.0e0 1.0e-100 f*/ -> 0.0e0 }t
t{ 1.0e302 -3.0e259 0.0e0 f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0 f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0 f*/ -> DP_-INF }t
[THEN]
Nice work! This failed here:
$FFFFFFFFFFFFF S>F FCONSTANT MANTMAX \ full 52 bit mantissa
t{ mantmax 65535e fdup f*/ -> mantmax }T
But perhaps it is just me not thinking straight anymore...
\ The following should give same results as using F* and F/
t{ 0.0e0 3.0e302 1.0e-100 f*/ -> 0.0e0 }t
t{ 3.0e302 0.0e0 1.0e-100 f*/ -> 0.0e0 }t
t{ 1.0e302 -3.0e259 0.0e0 f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0 f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0 f*/ -> DP_-INF }t
Krishna Myneni wrote:
\ The following should give same results as using F* and F/
t{ 0.0e0 3.0e302 1.0e-100 f*/ -> 0.0e0 }t
t{ 3.0e302 0.0e0 1.0e-100 f*/ -> 0.0e0 }t
t{ 1.0e302 -3.0e259 0.0e0 f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0 f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0 f*/ -> DP_-INF }t
What is the stack diagram of FP@ ?
With IEEE extended floats in iForth:
FORTH> 1.0e300 -3.0e259 F* 1.0e280 F/ e. -3.000000e279 ok
FORTH> 1.0e-300 3.0e-259 F* 1.0e-280 F/ e. 3.000000e-279 ok
FORTH> 1.0e302 -3.0e302 F* +INF F/ e. -0.000000e0 ok
FORTH> -1.0e-302 -1.0e-302 F* 0.0e0 F/ e. +INF ok
etc.
Of course, this won't always work. What is the exponent range
that should be supported?
Of course, you considered F/* instead of F*/. What is your
problem with that?
-marcelGroetjes Albert
Okay, thanks, I should have looked it up. I was too tired yesterday...
Now even when the mantissa still can digest another bit, this still
fails:
$1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX
t{ mantmax $ffff s>f fdup f*/ -> mantmax }t
What did go wrong?
The test is about expanding the mantissa to more than 53 bits by multiplication
and shrinking it back by division. The intermediate 128-bit format
should allow it.
On 5/20/24 00:20, mhx wrote:...
Of course, you considered F/* instead of F*/. What is your
problem with that?
The issue is you have three fp numbers on the fp stack, from some
source. If you can write a word which automatically selects between
doing the sequence, F/ and F*, or the sequence, F* and F/, so that the intermediate quotient or product does not underflow or overflow, and
produce a result which fits into the d.p. range, you will have
effectively written F*/ (except for maybe a corner case or two).
Krishna Myneni wrote:
\ The following should give same results as using F* and F/
t{ 0.0e0 3.0e302 1.0e-100 f*/ -> 0.0e0 }t
t{ 3.0e302 0.0e0 1.0e-100 f*/ -> 0.0e0 }t
t{ 1.0e302 -3.0e259 0.0e0 f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0 f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0 f*/ -> DP_-INF }t
What is the stack diagram of FP@ ?
With IEEE extended floats in iForth:
FORTH> 1.0e300 -3.0e259 F* 1.0e280 F/ e. -3.000000e279 ok
FORTH> 1.0e-300 3.0e-259 F* 1.0e-280 F/ e. 3.000000e-279 ok
FORTH> 1.0e302 -3.0e302 F* +INF F/ e. -0.000000e0 ok
FORTH> -1.0e-302 -1.0e-302 F* 0.0e0 F/ e. +INF ok
etc.
Of course, this won't always work. What is the exponent range that
should be supported?
Of course, you considered F/* instead of F*/. What is your
problem with that?
AFAIK IEEE 754 specifies that arithmetic with integers without
decimal places within the width of the mantissa must be EXACT,
i.e. it behaves like integer arithmetic, i.e. without extra
rounding.
The IEEE extended float uses a 15 bit exponent, while it is only 11 bits
for IEEE double precision. This is why you are not seeing a problem
above. However, it will show up once your arithmetic
overflows/underflows the extended float exponent range.
Krishna Myneni wrote:
What is the stack diagram of FP@ ?
With IEEE extended floats in iForth:
FORTH> 1.0e300 -3.0e259 F* 1.0e280 F/ e. -3.000000e279 ok
FORTH> 1.0e-300 3.0e-259 F* 1.0e-280 F/ e. 3.000000e-279 ok
FORTH> 1.0e302 -3.0e302 F* +INF F/ e. -0.000000e0 ok
FORTH> -1.0e-302 -1.0e-302 F* 0.0e0 F/ e. +INF ok
etc.
Krishna Myneni wrote:
On 5/20/24 00:11, minforth wrote:
Okay, thanks, I should have looked it up. I was too tired
yesterday... Now even when the mantissa still can digest another bit,
this still
fails:
$1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX
t{ mantmax $ffff s>f fdup f*/ -> mantmax }t
What did go wrong?
The test is about expanding the mantissa to more than 53 bits by
multiplication
and shrinking it back by division. The intermediate 128-bit format
should allow it.
Good question. I can reproduce it here.
hex
ok
1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX
ok
decimal
ok
MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok
hex
ok
MANTMAX FFFF S>F FDUP F*/
ok
decimal dp-components.
significand = 9007199254740990 exponent = 52 sign = 0 ok
The lowest bit is different in the significand. Need to look into why
the lowest fraction bit is different.
It looks like you are truncating the result and not rounding to nearest
which is expected by F* and F/.
You save the remainder but do not use it for anything!
That's why is was curious if you had a specific range / exponents
in mind. Is it possible to formulate which combinations of x,y,z
do NOT give a result that fits a double-precision float, and can NOT
be fixed with x,z,y?
E.g., when x is very large, y/z must be near 1, and therefore there
would be no overflow problem when first doing y/z and then x*(y/z).
Or maybe it is possible to use 'rational numbers' and simply not
evaluate fractions until the very end?
Basically we have a multiplication of three factors: x, y, and 1/z;
and the assumption is that the product of all three is in range, while
there is one product of one of the factors that is out of range.
It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.
- anton
It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.
Or maybe it is possible to use 'rational numbers' and simply not
evaluate fractions until the very end?
Rational numbers in programming (e.g., in Prolog II) are just
represented as fractions of unlimited-size integers. I don't think
that you have that in mind.
What you may have in mind and solves the problem: From the numbers x,
y, and z extract the exponents xe, ye, ze, and replace them with 0 in
the numbers, resulting in xm, ym, and zm. Now compute rm=xm*ym/zm,
and re=xe+ye-ze, and finally add re to the exponent of rm, giving the
result r.
Anton Ertl wrote:
[..]
It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.
Elegant! This F3SORT, enhanced with Albert's comment further down, looks
like a useful and cheap factor in case I ever encounter a problem with intermediate FP overflow.
[..]
Or maybe it is possible to use 'rational numbers' and simply not
evaluate fractions until the very end?
Rational numbers in programming (e.g., in Prolog II) are just
represented as fractions of unlimited-size integers. I don't think
that you have that in mind.
For some reason, MATLAB / Octave uses 'characters' for this extension (https://nl.mathworks.com/help/matlab/ref/rat.html). Infinite precision creates problems where a rational approximation
does not exist. [ BTW: here octave acts a bit surprising (but strictly correct): octave:1> R=rat(pi,1e-20)
R = 3 + 1/(7 + 1/(16 + 1/(-294 + 1/(3 + 1/(-4 + 1/(5 + 1/(-15 +
1/(-3))))))))
octave:2> R=rat(pi,1e-32)
R = 3 + 1/(7 + 1/(16 + 1/(-294 + 1/(3 + 1/(-4 + 1/(5 + 1/(-15 +
1/(-3))))))))
]
What you may have in mind and solves the problem: From the numbers x,
y, and z extract the exponents xe, ye, ze, and replace them with 0 in
the numbers, resulting in xm, ym, and zm. Now compute rm=xm*ym/zm,
and re=xe+ye-ze, and finally add re to the exponent of rm, giving the
result r.
That was indeed my first idea when I saw the OP's post. DNW wrote a
nice
toolkit to dissect FP numbers. I should check if it uncovers the problem encountered in this thread.
MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok
hex
ok
MANTMAX FFFF S>F FDUP F*/
ok
decimal dp-components.
significand = 9007199254740990 exponent = 52 sign = 0 ok
The lowest bit is different in the significand. Need to look into why
the lowest fraction bit is different.
It looks like you are truncating the result and not rounding to nearest
which is expected by F* and F/.
You save the remainder but do not use it for anything!
I think you're correct, but the rounding used by F*/ should be to
whatever rounding mode the FPU is set. In kForth, the FPU is always initialized to round to nearest. Since we are doing the calculation for
F*/ in integer arithmetic, it is not observing the round-to-nearest mode
of the FPU.
On 5/21/24 04:03, mhx wrote:...
Anton Ertl wrote:
[..]
It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.
Remember that you will also have to deal with IEEE 754 special values
like Inf and NaN.
It will be interesting to compare the efficiency of
both my approach and your sorting approach. I'm skeptical that the
additional sorting will make the equivalent calculation faster.
Krishna Myneni wrote:
MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok >>>> hex
ok
MANTMAX FFFF S>F FDUP F*/
ok
decimal dp-components.
significand = 9007199254740990 exponent = 52 sign = 0 ok
The lowest bit is different in the significand. Need to look into
why the lowest fraction bit is different.
It looks like you are truncating the result and not rounding to nearest
which is expected by F* and F/.
You save the remainder but do not use it for anything!
I think you're correct, but the rounding used by F*/ should be to
whatever rounding mode the FPU is set. In kForth, the FPU is always
initialized to round to nearest. Since we are doing the calculation for
F*/ in integer arithmetic, it is not observing the round-to-nearest mode
of the FPU.
Manual cross-check with quad precision floats:
MinForth 3.6 (64 bit) (fp 128 bit)
# $1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX ok
# mantmax ok
f: 9.0072e+15 | # $FFFF s>f ok
f: 9.0072e+15 65535 | # f* ok
f: 5.90287e+20 | # fdup f>d ok
f: 5.90287e+20 | -9007199254806527 31 # hex ok
f: 5.90287e+20 | FFDFFFFFFFFF0001 1F $ decimal 65535e f/ ok
f: 9.0072e+15 | -9007199254806527 31 # 2drop f>s ok
9007199254740991 # hex ok
1FFFFFFFFFFFFF $
So for the 64-bit F*/ challenge, the interim significand after the multiplication step should be $1F_FFDF_FFFF_FFFF_0001.
It is now easy to check whether the erroneous 1-digit offset was caused
by the multiplication step or the division step.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
On 5/21/24 04:03, mhx wrote:...
Anton Ertl wrote:
[..]
It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.
Remember that you will also have to deal with IEEE 754 special values
like Inf and NaN.
Not a problem. If any operand is a NaN, the result will be NaN no
matter how the operations are associated. For infinities (and 0 as
divisor), I would analyse it by looking at all cases, but I don't see
that it makes any difference:
Variable names here represent finite non-zero values:
(inf*y)/z=inf/z=inf
inf*(y/z)=inf*finite=inf
y*(inf/z)=y*inf=inf
Likewise if x is finite and y is infinite
(x*y)/inf=finite/inf=0
x*(y/inf)=x*0=0
y*(x/inf)=y*0=0
(x*y)/0=finite/0=inf
x*(y/0)=x*inf=inf
y*(x/0)=y*inf=inf
Signs in all these cases follow the same rules whether infinities are involved or not.
It will be interesting to compare the efficiency of
both my approach and your sorting approach. I'm skeptical that the
additional sorting will make the equivalent calculation faster.
Actually sorting is overkill:
: fsort2 ( r1 r2 -- r3 r4 )
\ |r3|>=|r4|
fover fabs fover fabs f< if
fswap
then ;
: f*/ ( r1 r2 r3 -- r )
fdup fabs 1e f> fswap frot fsort2 if
fswap then
frot f/ f* ;
I have tested this with your tests from
<v2dqte$3ir32$1@dont-email.me>, but needed to change rel-near (I
changed it to 1e-16) for gforth to pass your tests. I leave
performance testing to you. Here's what vfx64 produces for this F*/:
see f*/
F*/
( 0050A310 D9C0 ) FLD ST
( 0050A312 D9E1 ) FABS
( 0050A314 D9E8 ) FLD1
( 0050A316 E8F5BEFFFF ) CALL 00506210 F>
( 0050A31B D9C9 ) FXCH ST(1)
( 0050A31D D9C9 ) FXCH ST(1)
( 0050A31F D9CA ) FXCH ST(2)
( 0050A321 E88AFFFFFF ) CALL 0050A2B0 FSORT2
( 0050A326 4885DB ) TEST RBX, RBX
( 0050A329 488B5D00 ) MOV RBX, [RBP]
( 0050A32D 488D6D08 ) LEA RBP, [RBP+08]
( 0050A331 0F8402000000 ) JZ/E 0050A339
( 0050A337 D9C9 ) FXCH ST(1)
( 0050A339 D9C9 ) FXCH ST(1)
( 0050A33B D9CA ) FXCH ST(2)
( 0050A33D DEF9 ) FDIVP ST(1), ST
( 0050A33F DEC9 ) FMULP ST(1), ST
( 0050A341 C3 ) RET/NEXT
( 50 bytes, 18 instructions )
ok
see fsort2
FSORT2
( 0050A2B0 D9C1 ) FLD ST(1)
( 0050A2B2 D9E1 ) FABS
( 0050A2B4 D9C1 ) FLD ST(1)
( 0050A2B6 D9E1 ) FABS
( 0050A2B8 E863BEFFFF ) CALL 00506120 F<
( 0050A2BD 4885DB ) TEST RBX, RBX
( 0050A2C0 488B5D00 ) MOV RBX, [RBP]
( 0050A2C4 488D6D08 ) LEA RBP, [RBP+08]
( 0050A2C8 0F8402000000 ) JZ/E 0050A2D0
( 0050A2CE D9C9 ) FXCH ST(1)
( 0050A2D0 C3 ) RET/NEXT
( 33 bytes, 11 instructions )
ok
see f<
F<
( 00506120 E86BFEFFFF ) CALL 00505F90 FCMP2
( 00506125 4881FB00010000 ) CMP RBX, # 00000100
( 0050612C 0F94C3 ) SETZ/E BL
( 0050612F F6DB ) NEG BL
( 00506131 480FBEDB ) MOVSX RBX, BL
( 00506135 C3 ) RET/NEXT
( 22 bytes, 6 instructions )
ok
see fcmp2
FCMP2
( 00505F90 4883ED08 ) SUB RBP, # 08
( 00505F94 48895D00 ) MOV [RBP], RBX
( 00505F98 D9C9 ) FXCH ST(1)
( 00505F9A DED9 ) FCOMPP
( 00505F9C 9B ) FWAIT
( 00505F9D DFE0 ) FSTSW AX
( 00505F9F 66250041 ) AND AX, # 4100
( 00505FA3 480FB7D8 ) MOVZX RBX, AX
( 00505FA7 C3 ) RET/NEXT
( 24 bytes, 9 instructions )
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
On 5/21/24 04:03, mhx wrote:...
Anton Ertl wrote:
[..]
It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.
Remember that you will also have to deal with IEEE 754 special values
like Inf and NaN.
Not a problem. If any operand is a NaN, the result will be NaN no
matter how the operations are associated. For infinities (and 0 as
divisor), I would analyse it by looking at all cases, but I don't see
that it makes any difference:
Variable names here represent finite non-zero values:
(inf*y)/z=inf/z=inf
inf*(y/z)=inf*finite=inf
y*(inf/z)=y*inf=inf
Likewise if x is finite and y is infinite
(x*y)/inf=finite/inf=0
x*(y/inf)=x*0=0
y*(x/inf)=y*0=0
(x*y)/0=finite/0=inf
x*(y/0)=x*inf=inf
y*(x/0)=y*inf=inf
Signs in all these cases follow the same rules whether infinities are involved or not.
It will be interesting to compare the efficiency of
both my approach and your sorting approach. I'm skeptical that the
additional sorting will make the equivalent calculation faster.
Actually sorting is overkill:
: fsort2 ( r1 r2 -- r3 r4 )
\ |r3|>=|r4|
fover fabs fover fabs f< if
fswap
then ;
: f*/ ( r1 r2 r3 -- r )
fdup fabs 1e f> fswap frot fsort2 if
fswap then
frot f/ f* ;
I have tested this with your tests from
<v2dqte$3ir32$1@dont-email.me>, but needed to change rel-near (I
changed it to 1e-16) for gforth to pass your tests. I leave
performance testing to you. ...
On 5/22/24 11:31, Anton Ertl wrote:
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
On 5/21/24 04:03, mhx wrote:...
Anton Ertl wrote:
[..]
It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.
Remember that you will also have to deal with IEEE 754 special values
like Inf and NaN.
Not a problem. If any operand is a NaN, the result will be NaN no
matter how the operations are associated. For infinities (and 0 as
divisor), I would analyse it by looking at all cases, but I don't see
that it makes any difference:
Variable names here represent finite non-zero values:
(inf*y)/z=inf/z=inf
inf*(y/z)=inf*finite=inf
y*(inf/z)=y*inf=inf
Likewise if x is finite and y is infinite
(x*y)/inf=finite/inf=0
x*(y/inf)=x*0=0
y*(x/inf)=y*0=0
(x*y)/0=finite/0=inf
x*(y/0)=x*inf=inf
y*(x/0)=y*inf=inf
Signs in all these cases follow the same rules whether infinities are
involved or not.
It will be interesting to compare the efficiency of
both my approach and your sorting approach. I'm skeptical that the
additional sorting will make the equivalent calculation faster.
Actually sorting is overkill:
: fsort2 ( r1 r2 -- r3 r4 )
\ |r3|>=|r4|
fover fabs fover fabs f< if
fswap
then ;
: f*/ ( r1 r2 r3 -- r )
fdup fabs 1e f> fswap frot fsort2 if
fswap then
frot f/ f* ;
I have tested this with your tests from
<v2dqte$3ir32$1@dont-email.me>, but needed to change rel-near (I
changed it to 1e-16) for gforth to pass your tests. I leave
performance testing to you. ...
I put together a benchmark test. My implementation of F*/ hangs during
the test, so I have some debugging to do on that. Your implementation
above does execute.
--
Krishna
=== begin bench-fstarslash.4th ===
\ bench-fstarslash.4th
\
\ Benchmark implementation of double-precision F*/ using
\ a large sample of triplet dp floating-point numbers,
\ which would overflow/underflow with F* F/ sequence.
\
Updated benchmark code also allows for validating the arithmetic of F*/
for all of the 1 million test cases. At the moment, all arithmetic cases
use only "normal" fp numbers i.e. they do not include zero, infinity, or subnormal numbers -- special tests are needed for such cases.
...
On 5/28/24 06:11, Krishna Myneni wrote:...
...
Now the benchmark code runs. It shows two things (see below):
1. The execution time is poor compared to Anton's implementation, about
8x slower.
2. Half of the arithmetic results are wrong, with the max relative error being as high as 2/3.
So, I clearly have some work to do to fix the arithmetic. It was
fortuitous that it passed my test cases, but clearly the test cases in
my original implementation of F*/ were insufficient.
--
Krishna
bench-f*/d
603 ms
ok
check-results
1000000 samples
# tolerance errors: 501374
# sign errors: 0
Max relative error: 0.666667
ok
The analog of */ for single length integers, with intermediate double
length result, can be implemented for floating point arithmetic as well.
The following implementation of F*/ uses the non-standard word, UD/MOD, although it might be possible to code it with UM/MOD as well. It also requires FP@ (requiring a separate fp stack in memory).
inf3.12e306 * 1.1e145 / 2.8e150
Python 3.12.3:[..]
inf3.12e306 * 1.1e145 / 2.8e150
inf3.12e306 * 1.1e2 / 2.8e150
Krishna Myneni wrote:
Python 3.12.3:[..]
inf3.12e306 * 1.1e145 / 2.8e150
Worse.
inf3.12e306 * 1.1e2 / 2.8e150
On 5/19/24 16:28, Krishna Myneni wrote:
The analog of */ for single length integers, with intermediate double....
length result, can be implemented for floating point arithmetic as
well.
The following implementation of F*/ uses the non-standard word, UD/MOD,
although it might be possible to code it with UM/MOD as well. It also
requires FP@ (requiring a separate fp stack in memory).
Based on how it handles large integer arithmetic, I half-expected that double-precision floating point arithmetic overflow/underflow would by handled automatically by python. That appears not to be the case:
Python 3.12.3:
inf3.12e306 * 1.1e145 / 2.8e150
Using F*/ in Forth:
3.12e306 1.1e145 2.8e150 f*/ fs.
1.22571428571429e+301 ok
1.225714285714286e+3013.12e306 * ( 1.1e145 / 2.8e150 )
Using F*/ in Forth:
3.12e306 1.1e145 2.8e150 f*/ fs.
1.22571428571429e+301 ok
Krishna Myneni wrote:
On 5/19/24 16:28, Krishna Myneni wrote:
The analog of */ for single length integers, with intermediate double....
length result, can be implemented for floating point arithmetic as
well.
The following implementation of F*/ uses the non-standard word, UD/MOD,
although it might be possible to code it with UM/MOD as well. It also
requires FP@ (requiring a separate fp stack in memory).
Based on how it handles large integer arithmetic, I half-expected that
double-precision floating point arithmetic overflow/underflow would by
handled automatically by python. That appears not to be the case:
Python 3.12.3:
inf3.12e306 * 1.1e145 / 2.8e150
Using F*/ in Forth:
3.12e306 1.1e145 2.8e150 f*/ fs.
1.22571428571429e+301 ok
Of course, in Python you could use parentheses around the division, or
load bigfloat or float128 packages, but...
MicroPython v1.19.1 on 2022-11-05; win32 [GCC 6.3.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
1.225714285714286e+3013.12e306 * ( 1.1e145 / 2.8e150 )
inf3.12e-306*(1.1e150/2.8e-159)
Krishna Myneni wrote:
MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok >>>> hex
ok
MANTMAX FFFF S>F FDUP F*/
ok
decimal dp-components.
significand = 9007199254740990 exponent = 52 sign = 0 ok
The lowest bit is different in the significand. Need to look into
why the lowest fraction bit is different.
It looks like you are truncating the result and not rounding to nearest
which is expected by F* and F/.
You save the remainder but do not use it for anything!
I think you're correct, but the rounding used by F*/ should be to
whatever rounding mode the FPU is set. In kForth, the FPU is always
initialized to round to nearest. Since we are doing the calculation for
F*/ in integer arithmetic, it is not observing the round-to-nearest mode
of the FPU.
Manual cross-check with quad precision floats:
MinForth 3.6 (64 bit) (fp 128 bit)
# $1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX ok
# mantmax ok
f: 9.0072e+15 | # $FFFF s>f ok
f: 9.0072e+15 65535 | # f* ok
f: 5.90287e+20 | # fdup f>d ok
f: 5.90287e+20 | -9007199254806527 31 # hex ok
f: 5.90287e+20 | FFDFFFFFFFFF0001 1F $ decimal 65535e f/ ok
f: 9.0072e+15 | -9007199254806527 31 # 2drop f>s ok
9007199254740991 # hex ok
1FFFFFFFFFFFFF $
So for the 64-bit F*/ challenge, the interim significand after the multiplication step should be $1F_FFDF_FFFF_FFFF_0001.
It is now easy to check whether the erroneous 1-digit offset was caused
by the multiplication step or the division step.
Super!
In a dreamy moment it occurred to me whether these methods
would also be suitable for optimising the standard Forth
operator M*/ (with triple-wide intermediate result after
multiplication).
AFAIR there are very different implementation approaches
between different Forth systems. But presumably M*/ is not
used by anyone...
Krishna Myneni wrote:
On 6/2/24 07:11, Krishna Myneni wrote:
....
Using F*/ in Forth:
3.12e306 1.1e145 2.8e150 f*/ fs.
1.22571428571429e+301 ok
The alternative way to calculate this without overflow in Forth is to
add/subtract the logarithms, then take the antilog. However, let's
check
the accuracy of this approach compared to the above:
3.12e306 flog 1.1e145 flog f+ 2.8e150 flog f- falog fs.
1.225714285714187e+301 ok
The base 10 significand of the result is
3.12e 1.1e f* 2.8e f/ fs.
1.225714285714286e+00 ok
The result from using F*/ to the same number of decimal digits is
3.12e306 1.1e145 2.8e150 f*/ fs. \ Anton's version of F*/
1.225714285714286e+301 ok
The relative error of the logarithm approach to using F*/ is
1.225714285714286e+301 1.225714285714187e+301 f- 1.225714285714286e+301
f/ f.
8.07495e-14 ok
We appear losing between about 3 decimal digits of accuracy with the
logarithm approach.
When I implemented floating point for LXF64 I decided to use a C math library,
fdlibm. As they were available I implemented frexp and scalbn.
For implementing F*/ they finally became useful!
\ F*/ using frexp and fscalbn
\ frexp( f - mantissa exp ) split float in mantissa and integer
exponent
\ fscalbn ( F: f exp -- f1) scale the float with integer exponent f1 = f*2^exp
: F*/ ( f1 f2 f3 -- f1*f2/f3 )
frexp fswap frexp fswap f/
fswap frexp f* + swap -
fscalbn ;
This is somewhat similar to your approach but keeping the mantissa as a
floating-point number. The benefit is that subnormals, INFs, NANs are
handled automatically.
Just for fun: I took Peter Fälth's F*/ to good use in
one of my locals test programs. By pure coincidence ;-)
the test program is a modified version of Krishna Myneni's
calculation program sph_bes_neu.fth for spherical Bessel and Neumann functions.
NB: The test program deliberately makes heavy use of
locals and of an embedded function that accesses an upvalue.
It should not be taken as an indication of some crazy
normal coding style. ;-)
\ ====== SPH_BES_NEU.MF =============================================
\
\ Spherical Bessel and Neumann functions for large index range
\
\ Reference
\ E. Gillman and H. R. Fiebig, "Accurate recursive generation \ of spherical Bessel and Neumann functions for a large range \ of indices," Computers in Physics vol. 2, p. 62 (1988).
\ Original FORTRAN to Forth conversion by Krishna Myneni
\ Adapted to MinForth 3.6 syntax (as locals test):
1000 CONSTANT MAXL MAXL VECTOR RBES MAXL VECTOR RNEU
: SPHFUNCS { f: x -- } \ fill vectors RBES and RNEU
\ --- Initialize
x fsqr x fcos x fsin { f: xx cx sx }
1e RBES{ maxl }! 1e RBES{ maxl 1- }!
cx RNEU{ 1 }! x sx f* cx f+ RNEU{ 2 }!
\ --- Recursion loop
<: LFUN ( l -- r3 ) xx sqr 4 * 1- s>f f/ ;>
maxl 1- 1 { lu lv }
maxl 1+ 3 DO
RBES{ lu } lu lfun RBES{ lu 1+ } f* f- RBES{ lu 1- }!
RNEU{ lv 1+ } lv lfun RNEU{ lv } f* f- RNEU{ lv 2+ }!
-1 += lu 1 += lv
LOOP
\ --- Scale
RBES{ 1 } RNEU{ 2 } f* xx cx f* RBES{ 2 } 3e f*/ f- 1/f *to RBES
\ --- Normalize
x 1/f -1e { f: cu cv | w }
maxl 1+ 1 DO
i 1- 2* s>f := w
cu x w 1e f+ f*/ := cu cu RBES{ i }*
cv 1e w f- x f*/ := cv cv RNEU{ i }*
LOOP ;
\ Test: 1e SPHFUNCS RBES m. RNEU m.
.. I became curious and reordered factors in the main loop
to use f*/ also there:
<: LFUN ( l -- r3 ) sqr 4 * 1- s>f ;>
maxl 1- 1 { n: lu lv }
maxl 1+ 3 DO
RBES( lu ) RBES( lu 1+ ) xx lu lfun f*/ f- RBES( lu 1- )!
RNEU( lv 1+ ) RNEU( lv ) xx lv lfun f*/ f- RNEU( lv 2+ )!
-1 += lu 1 += lv
LOOP
Precision improved indeed by about one decimal digit, iow close to
maximal achievable double fp-number resolution. Higher precision
would require calculation with extended fp-numbers.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 475 |
Nodes: | 16 (2 / 14) |
Uptime: | 19:48:52 |
Calls: | 9,487 |
Calls today: | 6 |
Files: | 13,617 |
Messages: | 6,121,093 |