Forum: >>> Magnum BBS <<<

Re: F*/ (f-star-slash)

From Krishna Myneni@21:1/5 to Krishna Myneni on Sun May 19 16:44:30 2024

On 5/19/24 16:28, Krishna Myneni wrote:
...

The code below, which requires a separate FP stack in memory, has been
tested on kForth-64 and Gforth. ...

=== start of fstarslash.4th ===
\ fstarslash.4th
\
\ Implementation of F*/ on 64-bit Forth
\
\ Krishna Myneni, 2024-05-19
\
\ F*/ ( F: r1 r2 r3 -- r )
\ Multiply two IEEE-754 double precision floating point
\ numbers r1 and r2 and divide by r3 to obtain the result,
\ r. The intermediate product significand, r1*r2, is
\ represented with at least IEEE quad precision for both
\ its significand and exponent.
\
\ F*/ uses an intermediate significand represented by
\   128 bits
\

...

\ Tests
1 [IF]
[UNDEFINED] T{ [IF] include ttester.4th [THEN]
1e-17 rel-near f!
1e-17 abs-near f!
TESTING F*/
\ These results are different using F* and F/
t{ 1.0e300 -3.0e259   1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e-300 3.0e-259 1.0e-280 f*/ -> 3.0e-279 }t
t{ 1.0e300 -3.0e259 -1.0e280 f*/ -> 3.0e279 }t
t{ -1.0e300 -3.0e259 -1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e302 -3.0e302   DP_+INF f*/ -> -0.0e0   }t
t{ -1.0e-302 -1.0e-302 0.0e0    f*/ -> DP_+INF }t

\ The following should give same results as using F* and F/
t{ 0.0e0    3.0e302 1.0e-100 f*/ -> 0.0e0   }t
t{ 3.0e302 0.0e0    1.0e-100 f*/ -> 0.0e0   }t
t{ 1.0e302 -3.0e259 0.0e0    f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0    f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0   f*/ -> DP_-INF }t
[THEN]

Just a note of caution. This code is preliminary, and I have not used it
for anything yet. There are only a few test cases, and it may well have significant bugs in it.

--
KM

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to All on Sun May 19 16:28:12 2024

The analog of */ for single length integers, with intermediate double
length result, can be implemented for floating point arithmetic as well.
The following implementation of F*/ uses the non-standard word, UD/MOD, although it might be possible to code it with UM/MOD as well. It also
requires FP@ (requiring a separate fp stack in memory).

How is F*/ different from using F* and F/ sequentially?

Consider the following cases:

1) 1.0e300 -3.0e259 F* 1.0e280 F/

2) 1.0e-300 3.0e-259 F* 1.0e-280 F/

3) 1.0e302 -3.0e302 F* DP_+INF F/

4) -1.0e-302 -1.0e-302 F* 0.0e0 F/

(DP_+INF is the double-precision IEEE-754 representation of plus infinity).

The first two cases, on a double-precision FP system, result in -inf and
zero, respectively.

Cases 3 and 4 result in NaN (not a number).

A proper implementation of F*/ can evaluate the result to a number for
cases 1 and 2, recognizing that the final result is representable as a double-precision number. For cases 3 and 4 F*/ recognizes that the final results are signed zero and +INF. F*/ represents the intermediate result
of the product with a wider number representation.

The code below, which requires a separate FP stack in memory, has been
tested on kForth-64 and Gforth. Gforth appears to have a problem
representing signed zero, although it does represent plus and minus
infinity.

1.0e302 -3.0e302 DP_+INF f*/ fs. 0.00000000000000E0 ok

( should be minus zero in the above case).

The factor of _F*/ was done on purpose, to be used as a basis for higher precision calculations.

The tests do not include subnormal number testing or for NANs at the
present time.

--
Krishna Myneni

=== start of fstarslash.4th ===
\ fstarslash.4th
\
\ Implementation of F*/ on 64-bit Forth
\
\ Krishna Myneni, 2024-05-19
\
\ F*/ ( F: r1 r2 r3 -- r )
\ Multiply two IEEE-754 double precision floating point
\ numbers r1 and r2 and divide by r3 to obtain the result,
\ r. The intermediate product significand, r1*r2, is
\ represented with at least IEEE quad precision for both
\ its significand and exponent.
\
\ F*/ uses an intermediate significand represented by
\ 128 bits
\

include ans-words

1 cells 8 < [IF]
cr .( Requires 64-bit Forth system ) cr
ABORT
[THEN]

fvariable temp
: >fpstack ( u -- ) ( F: -- r )
temp ! temp f@ ;

base @ hex
8000000000000000 constant DP_SIGNBIT_MASK
000FFFFFFFFFFFFF constant DP_FRACTION_MASK
7FF0000000000000 constant DP_EXPONENT_MASK
\ DP special value constants
7FF0000000000001 >fpstack fconstant DP_NAN
FFF0000000000000 >fpstack fconstant DP_-INF
7FF0000000000000 >fpstack fconstant DP_+INF
base !

52 constant DP_FRACTION_BITS
1023 constant DP_EXPONENT_BIAS
2047 constant DP_EXPONENT_MAX

1 DP_FRACTION_BITS LSHIFT constant DP_IMPLIED_BIT
63 constant DP_SIGN_BIT

: dp-fraction ( F: r -- r ) ( -- u )
FP@ @ DP_FRACTION_MASK and ;
: dp-exponent ( F: r -- r ) ( -- e )
FP@ @ DP_EXPONENT_MASK and DP_FRACTION_BITS rshift ;
: dp-sign ( F: r -- r ) ( -- sign )
FP@ @ DP_SIGN_BIT rshift ;

: _isZero? ( ufrac uexp -- b ) 0= swap 0= and ;
: _isInf? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0= and ;
: _isNaN? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0<> and ;
: _isSubNormal? ( ufrac uexp -- b ) 0= swap 0<> and ;

: frac>significand ( u -- u' ) DP_IMPLIED_BIT or ;

\ Show the significand, unbiased base-2 exponent, and sign bit
\ for a double-precision floating point number. To recover the
\ dp floating point number, r, from these fields,
\
\ r = (-1)^sign * significand * 2^(exponent - 52)
\
: dp-components. ( F: r -- )
dp-sign
dp-exponent
dp-fraction
2dup D0= invert IF frac>significand THEN
swap DP_EXPONENT_BIAS - swap
fdrop
." significand = " 18 u.r 2 spaces
." exponent = " 4 .r 2 spaces
." sign = " .
;

: dp-normal-fraction ( u -- expinc u')
dup DP_FRACTION_BITS 1+ rshift
dup IF
tuck rshift
ELSE
drop -1 swap \ -1 u
BEGIN
1 lshift
dup DP_IMPLIED_BIT and 0=
WHILE
swap 1- swap
REPEAT
THEN
;

0 value r1_sign
0 value r2_sign
0 value r3_sign
0 value num_sign
variable r1_exp
variable r2_exp
variable r3_exp
variable r_exp
variable r1_frac
variable r2_frac
variable r3_frac

: get-arg-components ( F: r1 r2 r3 -- )
dp-fraction r3_frac !
dp-exponent r3_exp !
dp-sign to r3_sign
fdrop
dp-fraction r2_frac !
dp-exponent r2_exp !
dp-sign to r2_sign
fdrop
dp-fraction r1_frac !
dp-exponent r1_exp !
dp-sign to r1_sign
fdrop ;

0 value b_num_zero
0 value b_den_zero
0 value b_num_Inf
0 value b_den_Inf

variable remainder

: _F*/ ( F: r1 r2 r3 -- ) ( -- usign uexponent udfraction true )
\ or ( F: -- DP_NAN | +/- DP_INF ) ( -- false )
get-arg-components

\ Check for NaN in args
r1_frac @ r1_exp @ _IsNaN?
r2_frac @ r2_exp @ _IsNaN? or
r3_frac @ r3_exp @ _IsNaN? or IF DP_NAN FALSE EXIT THEN

r1_frac @ r1_exp @ _IsZero?
r2_frac @ r2_exp @ _IsZero? or to b_num_zero
r3_frac @ r3_exp @ _IsZero? to b_den_zero

b_num_zero b_den_zero and IF DP_NAN FALSE EXIT THEN

r1_sign r2_sign xor to num_sign

r1_frac @ r1_exp @ _IsInf?
r2_frac @ r2_exp @ _IsInf? or to b_num_Inf
r3_frac @ r3_exp @ _IsInf? to b_den_Inf

b_num_Inf b_den_Inf and IF DP_NAN FALSE EXIT THEN

b_num_zero b_den_Inf or IF \ return signed zero
num_sign r3_sign xor
0
0 s>d TRUE EXIT
THEN

b_den_zero b_num_Inf or IF \ return signed Inf
DP_+INF
num_sign r3_sign xor IF FNEGATE THEN
FALSE EXIT
THEN

num_sign r3_sign xor
r1_exp @ r2_exp @ + r3_exp @ -
r1_frac @ frac>significand
r2_frac @ frac>significand um*
r3_frac @ frac>significand ud/mod
rot remainder !
TRUE
;

variable uhigh

: F*/ ( F: r1 r2 r3 -- r )
_F*/ \ ( -- usign uexp udfrac true ) or
\ ( -- false ) ( F: NAN | +/-INF )
IF
2 pick 0 DP_EXPONENT_MAX WITHIN 0= IF
-43 throw
THEN
dup IF \ high 64 bits of the quotient
uhigh !
-43 throw
ELSE
drop
2dup D0= IF \ ufrac = 0 and uexp 0
drop \ usign uexp
ELSE
\ usign
\ quotient: normal double precision fraction
dp-normal-fraction \ usign uexp expinc uquot
>r + DP_FRACTION_BITS lshift r> or
THEN
swap DP_SIGN_BIT lshift or
>fpstack
THEN
THEN
;

\ Tests
1 [IF]
[UNDEFINED] T{ [IF] include ttester.4th [THEN]
1e-17 rel-near f!
1e-17 abs-near f!
TESTING F*/
\ These results are different using F* and F/
t{ 1.0e300 -3.0e259 1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e-300 3.0e-259 1.0e-280 f*/ -> 3.0e-279 }t
t{ 1.0e300 -3.0e259 -1.0e280 f*/ -> 3.0e279 }t
t{ -1.0e300 -3.0e259 -1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e302 -3.0e302 DP_+INF f*/ -> -0.0e0 }t
t{ -1.0e-302 -1.0e-302 0.0e0 f*/ -> DP_+INF }t

\ The following should give same results as using F* and F/
t{ 0.0e0 3.0e302 1.0e-100 f*/ -> 0.0e0 }t
t{ 3.0e302 0.0e0 1.0e-100 f*/ -> 0.0e0 }t
t{ 1.0e302 -3.0e259 0.0e0 f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0 f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0 f*/ -> DP_-INF }t
[THEN]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sun May 19 23:28:30 2024

Nice work! This failed here:

$FFFFFFFFFFFFF S>F FCONSTANT MANTMAX \ full 52 bit mantissa
t{ mantmax 65535e fdup f*/ -> mantmax }T

But perhaps it is just me not thinking straight anymore...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Sun May 19 22:35:51 2024

On 5/19/24 18:28, minforth wrote:

Nice work! This failed here:

$FFFFFFFFFFFFF S>F FCONSTANT MANTMAX \ full 52 bit mantissa
t{ mantmax 65535e fdup f*/ -> mantmax }T

But perhaps it is just me not thinking straight anymore...

Your example with S>F is creating a number where the fraction does not represent the maximum mantissa

HEX
ok
MANTMAX DP-FRACTION 40 BINARY U.R
1111111111111111111111111111111111111111111111111110 ok

Note the lowest bit in the field.

MANTMAX dp-components.
significand = 1FFFFFFFFFFFFE exponent = 33 sign = 0 ok

I think you need to do the following:

1FFFFFFFFFFFFF s>f fconstant MANTMAX ( note the leading 53rd bit )
MANTMAX DP-FRACTION 40 BINARY U.R
1111111111111111111111111111111111111111111111111111 ok
DECIMAL
ok
MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok
MANTMAX fs.
9.00719925474099e+15 ok
MANTMAX 65535e fdup f*/ fs.
9.00719925474099e+15 ok

The implied 53rd bit exists in the d.p. representation. It is only
absent when the exponent is zero, and we have a subnormal number.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Mon May 20 05:11:22 2024

Okay, thanks, I should have looked it up. I was too tired yesterday...
Now even when the mantissa still can digest another bit, this still
fails:

$1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX
t{ mantmax $ffff s>f fdup f*/ -> mantmax }t

What did go wrong?

The test is about expanding the mantissa to more than 53 bits by
multiplication
and shrinking it back by division. The intermediate 128-bit format
should allow it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to Krishna Myneni on Mon May 20 05:20:32 2024

Krishna Myneni wrote:

\ The following should give same results as using F* and F/
t{ 0.0e0    3.0e302 1.0e-100 f*/ -> 0.0e0   }t
t{ 3.0e302 0.0e0    1.0e-100 f*/ -> 0.0e0   }t
t{ 1.0e302 -3.0e259 0.0e0    f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0    f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0   f*/ -> DP_-INF }t

What is the stack diagram of FP@ ?

With IEEE extended floats in iForth:

FORTH> 1.0e300 -3.0e259 F* 1.0e280 F/ e. -3.000000e279 ok
FORTH> 1.0e-300 3.0e-259 F* 1.0e-280 F/ e. 3.000000e-279 ok
FORTH> 1.0e302 -3.0e302 F* +INF F/ e. -0.000000e0 ok
FORTH> -1.0e-302 -1.0e-302 F* 0.0e0 F/ e. +INF ok
etc.

Of course, this won't always work. What is the exponent range
that should be supported?

Of course, you considered F/* instead of F*/. What is your
problem with that?

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From albert@spenarnc.xs4all.nl@21:1/5 to mhx on Mon May 20 11:37:40 2024

In article <09d2b252e92d9c88e67055fb6ee601d4@www.novabbs.com>,
mhx <mhx@iae.nl> wrote:

Krishna Myneni wrote:

\ The following should give same results as using F* and F/
t{ 0.0e0    3.0e302 1.0e-100 f*/ -> 0.0e0   }t
t{ 3.0e302 0.0e0    1.0e-100 f*/ -> 0.0e0   }t
t{ 1.0e302 -3.0e259 0.0e0    f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0    f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0   f*/ -> DP_-INF }t

What is the stack diagram of FP@ ?

With IEEE extended floats in iForth:

FORTH> 1.0e300 -3.0e259 F* 1.0e280 F/ e. -3.000000e279 ok
FORTH> 1.0e-300 3.0e-259 F* 1.0e-280 F/ e. 3.000000e-279 ok
FORTH> 1.0e302 -3.0e302 F* +INF F/ e. -0.000000e0 ok
FORTH> -1.0e-302 -1.0e-302 F* 0.0e0 F/ e. +INF ok
etc.

Of course, this won't always work. What is the exponent range
that should be supported?

Of course, you considered F/* instead of F*/. What is your
problem with that?

It is absolutely symmetric, that runs into the same problem.

In Philips mat-lab I fixed a defect for 5 the order interpolation.
They plugged the electron charge in some 10E-19 Coulomb and got overflow.
The solution is not a fix to the floating point package, but rather
use electron charges as unit or femto Coulomb.

(10E-80 was too much for the IBM machine. They wrote some strange code
that confused the FORTRAN optimiser. Boy, were they surprised when
I pointed out the defect in the generated assembler code. Maybe first
time they saw assembler code.)

-marcel

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Mon May 20 07:22:32 2024

On 5/20/24 00:11, minforth wrote:

Okay, thanks, I should have looked it up. I was too tired yesterday...
Now even when the mantissa still can digest another bit, this still
fails:

$1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX
t{ mantmax $ffff s>f fdup f*/ -> mantmax }t

What did go wrong?

The test is about expanding the mantissa to more than 53 bits by multiplication
and shrinking it back by division. The intermediate 128-bit format
should allow it.

Good question. I can reproduce it here.

hex
ok
1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX
ok
decimal
ok
MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok
hex
ok
MANTMAX FFFF S>F FDUP F*/
ok
decimal dp-components.
significand = 9007199254740990 exponent = 52 sign = 0 ok

The lowest bit is different in the significand. Need to look into why
the lowest fraction bit is different.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Krishna Myneni on Mon May 20 07:39:25 2024

On 5/20/24 07:17, Krishna Myneni wrote:

On 5/20/24 00:20, mhx wrote:

...

Of course, you considered F/* instead of F*/. What is your
problem with that?

The issue is you have three fp numbers on the fp stack, from some
source. If you can write a word which automatically selects between
doing the sequence, F/ and F*, or the sequence, F* and F/, so that the intermediate quotient or product does not underflow or overflow, and
produce a result which fits into the d.p. range, you will have
effectively written F*/ (except for maybe a corner case or two).

Just for clarity, if a word which automatically selects between the two sequences F/ and F* vs F* and F/ it will also have to re-order the
arguments so as to produce the intended F*/ on the original argument
sequence.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to mhx on Mon May 20 07:17:44 2024

On 5/20/24 00:20, mhx wrote:

Krishna Myneni wrote:

\ The following should give same results as using F* and F/
t{ 0.0e0    3.0e302 1.0e-100 f*/ -> 0.0e0   }t
t{ 3.0e302 0.0e0    1.0e-100 f*/ -> 0.0e0   }t
t{ 1.0e302 -3.0e259 0.0e0    f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0    f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0   f*/ -> DP_-INF }t

What is the stack diagram of FP@ ?

FP@ ( -- a ) \ return address of top of floating point stack

FP@ is the analog of SP@ but for the FP stack.

With IEEE extended floats in iForth:

FORTH> 1.0e300 -3.0e259 F* 1.0e280 F/ e. -3.000000e279 ok
FORTH> 1.0e-300 3.0e-259 F* 1.0e-280 F/ e. 3.000000e-279 ok
FORTH> 1.0e302 -3.0e302 F* +INF     F/ e. -0.000000e0 ok
FORTH> -1.0e-302 -1.0e-302 F* 0.0e0    F/ e. +INF ok
etc.

Of course, this won't always work. What is the exponent range that
should be supported?

For this implementation, the arguments should be within the range of
IEEE-754 double-precision floats, which, for normal numbers is a biased
base-2 exponent range of [0,2047] -- the exponent bias is 1023.

Of course, you considered F/* instead of F*/. What is your
problem with that?

The issue is you have three fp numbers on the fp stack, from some
source. If you can write a word which automatically selects between
doing the sequence, F/ and F*, or the sequence, F* and F/, so that the intermediate quotient or product does not underflow or overflow, and
produce a result which fits into the d.p. range, you will have
effectively written F*/ (except for maybe a corner case or two).

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Mon May 20 13:40:31 2024

AFAIK IEEE 754 specifies that arithmetic with integers without
decimal places within the width of the mantissa must be EXACT,
i.e. it behaves like integer arithmetic, i.e. without extra
rounding.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Mon May 20 18:12:24 2024

On 5/20/24 08:40, minforth wrote:

AFAIK IEEE 754 specifies that arithmetic with integers without
decimal places within the width of the mantissa must be EXACT,
i.e. it behaves like integer arithmetic, i.e. without extra
rounding.

I think you are looking for the following in integer arithmetic.

hex FFFFFFFFFFFFF decimal 68 binary u.r
1111111111111111111111111111111111111111111111111111 ok
hex FFFFFFFFFFFFF FFFF um* decimal 68 binary ud.r 11111111111111101111111111111111111111111111111111110000000000000001 ok
hex FFFFFFFFFFFFF FFFF um* FFFF ud/mod decimal 68 binary ud.r drop decimal
1111111111111111111111111111111111111111111111111111 ok

The original 52 bit pattern is restored.

I'm not sure S>F will give encoding as a subnormal to avoid decimal
places within the width of the mantissa. A subnormal has a biased
exponent of zero and a nonzero fraction.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to Krishna Myneni on Mon May 20 23:04:53 2024

Krishna Myneni wrote:
[...]

The IEEE extended float uses a 15 bit exponent, while it is only 11 bits

for IEEE double precision. This is why you are not seeing a problem
above. However, it will show up once your arithmetic
overflows/underflows the extended float exponent range.

That's why is was curious if you had a specific range / exponents
in mind. Is it possible to formulate which combinations of x,y,z
do NOT give a result that fits a double-precision float, and can NOT
be fixed with x,z,y?

E.g., when x is very large, y/z must be near 1, and therefore there
would be no overflow problem when first doing y/z and then x*(y/z).

Or maybe it is possible to use 'rational numbers' and simply not
evaluate fractions until the very end?

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to mhx on Mon May 20 17:34:26 2024

On 5/20/24 00:20, mhx wrote:

Krishna Myneni wrote:

What is the stack diagram of FP@ ?

Sorry, the code given in my original post does not use FP@ -- I used it earlier, but >FPSTACK uses a more portable method of moving a 64-bit bit pattern on the data stack to the fp stack.

With IEEE extended floats in iForth:

FORTH> 1.0e300 -3.0e259 F* 1.0e280 F/ e. -3.000000e279 ok
FORTH> 1.0e-300 3.0e-259 F* 1.0e-280 F/ e. 3.000000e-279 ok
FORTH> 1.0e302 -3.0e302 F* +INF F/ e. -0.000000e0 ok
FORTH> -1.0e-302 -1.0e-302 F* 0.0e0 F/ e. +INF ok
etc.

The IEEE extended float uses a 15 bit exponent, while it is only 11 bits
for IEEE double precision. This is why you are not seeing a problem
above. However, it will show up once your arithmetic
overflows/underflows the extended float exponent range.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to PMF on Mon May 20 18:27:51 2024

On 5/20/24 08:06, PMF wrote:

Krishna Myneni wrote:

On 5/20/24 00:11, minforth wrote:

Okay, thanks, I should have looked it up. I was too tired
yesterday... Now even when the mantissa still can digest another bit,
this still
fails:

$1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX
t{ mantmax $ffff s>f fdup f*/ -> mantmax }t

What did go wrong?

The test is about expanding the mantissa to more than 53 bits by
multiplication
and shrinking it back by division. The intermediate 128-bit format
should allow it.

Good question. I can reproduce it here.

hex
ok
1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX
ok
decimal
ok
MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok
hex
ok
MANTMAX FFFF S>F FDUP F*/
ok
decimal dp-components.
significand = 9007199254740990 exponent = 52 sign = 0 ok

The lowest bit is different in the significand. Need to look into why
the lowest fraction bit is different.

It looks like you are truncating the result and not rounding to nearest
which is expected by F* and F/.
You save the remainder but do not use it for anything!

I think you're correct, but the rounding used by F*/ should be to
whatever rounding mode the FPU is set. In kForth, the FPU is always
initialized to round to nearest. Since we are doing the calculation for
F*/ in integer arithmetic, it is not observing the round-to-nearest mode
of the FPU.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to mhx on Tue May 21 06:52:24 2024

mhx@iae.nl (mhx) writes:

That's why is was curious if you had a specific range / exponents
in mind. Is it possible to formulate which combinations of x,y,z
do NOT give a result that fits a double-precision float, and can NOT
be fixed with x,z,y?

E.g., when x is very large, y/z must be near 1, and therefore there
would be no overflow problem when first doing y/z and then x*(y/z).

Basically we have a multiplication of three factors: x, y, and 1/z;
and the assumption is that the product of all three is in range, while
there is one product of one of the factors that is out of range.

It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.

This approach can be taken to more than three factors, but you don't
need to do full sorting: Divide the factors into a set that is <1 and
a set that is >1. Start with an intermediate result 1 from one of the
sets. Now if p<1, take a value v from the set >1, otherwise take v
from the set <1. Compute p = p*v. Repeat until one of the sets is
empty, then multiply the numbers from the other set.

Or maybe it is possible to use 'rational numbers' and simply not
evaluate fractions until the very end?

Rational numbers in programming (e.g., in Prolog II) are just
represented as fractions of unlimited-size integers. I don't think
that you have that in mind.

What you may have in mind and solves the problem: From the numbers x,
y, and z extract the exponents xe, ye, ze, and replace them with 0 in
the numbers, resulting in xm, ym, and zm. Now compute rm=xm*ym/zm,
and re=xe+ye-ze, and finally add re to the exponent of rm, giving the
result r.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From albert@spenarnc.xs4all.nl@21:1/5 to Anton Ertl on Tue May 21 09:33:11 2024

In article <2024May21.085224@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
<SNIP>

Basically we have a multiplication of three factors: x, y, and 1/z;
and the assumption is that the product of all three is in range, while
there is one product of one of the factors that is out of range.

It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.

Make that |a|>|b|>|c|

- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to Anton Ertl on Tue May 21 09:03:49 2024

Anton Ertl wrote:

[..]

It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.

Elegant! This F3SORT, enhanced with Albert's comment further down,
looks like a useful and cheap factor in case I ever encounter
a problem with intermediate FP overflow.
[..]

Or maybe it is possible to use 'rational numbers' and simply not
evaluate fractions until the very end?

Rational numbers in programming (e.g., in Prolog II) are just
represented as fractions of unlimited-size integers. I don't think
that you have that in mind.

For some reason, MATLAB / Octave uses 'characters' for this extension (https://nl.mathworks.com/help/matlab/ref/rat.html). Infinite
precision creates problems where a rational approximation
does not exist.
[ BTW: here octave acts a bit surprising (but strictly correct):
octave:1> R=rat(pi,1e-20)
R = 3 + 1/(7 + 1/(16 + 1/(-294 + 1/(3 + 1/(-4 + 1/(5 + 1/(-15 +
1/(-3))))))))
octave:2> R=rat(pi,1e-32)
R = 3 + 1/(7 + 1/(16 + 1/(-294 + 1/(3 + 1/(-4 + 1/(5 + 1/(-15 +
1/(-3))))))))
]

What you may have in mind and solves the problem: From the numbers x,
y, and z extract the exponents xe, ye, ze, and replace them with 0 in
the numbers, resulting in xm, ym, and zm. Now compute rm=xm*ym/zm,
and re=xe+ye-ze, and finally add re to the exponent of rm, giving the
result r.

That was indeed my first idea when I saw the OP's post. DNW wrote a
nice
toolkit to dissect FP numbers. I should check if it uncovers the
problem encountered in this thread.

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to mhx on Tue May 21 07:27:57 2024

On 5/21/24 04:03, mhx wrote:

Anton Ertl wrote:

[..]

It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.

Elegant! This F3SORT, enhanced with Albert's comment further down, looks
like a useful and cheap factor in case I ever encounter a problem with intermediate FP overflow.
[..]

Or maybe it is possible to use 'rational numbers' and simply not
evaluate fractions until the very end?

Rational numbers in programming (e.g., in Prolog II) are just
represented as fractions of unlimited-size integers. I don't think
that you have that in mind.

For some reason, MATLAB / Octave uses 'characters' for this extension (https://nl.mathworks.com/help/matlab/ref/rat.html). Infinite precision creates problems where a rational approximation
does not exist. [ BTW: here octave acts a bit surprising (but strictly correct): octave:1> R=rat(pi,1e-20)
R = 3 + 1/(7 + 1/(16 + 1/(-294 + 1/(3 + 1/(-4 + 1/(5 + 1/(-15 +
1/(-3))))))))
octave:2> R=rat(pi,1e-32)
R = 3 + 1/(7 + 1/(16 + 1/(-294 + 1/(3 + 1/(-4 + 1/(5 + 1/(-15 +
1/(-3))))))))
]

What you may have in mind and solves the problem: From the numbers x,
y, and z extract the exponents xe, ye, ze, and replace them with 0 in
the numbers, resulting in xm, ym, and zm. Now compute rm=xm*ym/zm,
and re=xe+ye-ze, and finally add re to the exponent of rm, giving the
result r.

That was indeed my first idea when I saw the OP's post. DNW wrote a
nice
toolkit to dissect FP numbers. I should check if it uncovers the problem encountered in this thread.

Remember that you will also have to deal with IEEE 754 special values
like Inf and NaN. It will be interesting to compare the efficiency of
both my approach and your sorting approach. I'm skeptical that the
additional sorting will make the equivalent calculation faster.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Krishna Myneni on Tue May 21 19:17:20 2024

Krishna Myneni wrote:

MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok
hex
ok
MANTMAX FFFF S>F FDUP F*/
ok
decimal dp-components.
significand = 9007199254740990 exponent = 52 sign = 0 ok

The lowest bit is different in the significand. Need to look into why
the lowest fraction bit is different.

It looks like you are truncating the result and not rounding to nearest
which is expected by F* and F/.
You save the remainder but do not use it for anything!

I think you're correct, but the rounding used by F*/ should be to
whatever rounding mode the FPU is set. In kForth, the FPU is always initialized to round to nearest. Since we are doing the calculation for

F*/ in integer arithmetic, it is not observing the round-to-nearest mode

of the FPU.

Manual cross-check with quad precision floats:

MinForth 3.6 (64 bit) (fp 128 bit)
# $1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX ok
# mantmax ok
f: 9.0072e+15 | # $FFFF s>f ok
f: 9.0072e+15 65535 | # f* ok
f: 5.90287e+20 | # fdup f>d ok
f: 5.90287e+20 | -9007199254806527 31 # hex ok
f: 5.90287e+20 | FFDFFFFFFFFF0001 1F $ decimal 65535e f/ ok
f: 9.0072e+15 | -9007199254806527 31 # 2drop f>s ok
9007199254740991 # hex ok
1FFFFFFFFFFFFF $

So for the 64-bit F*/ challenge, the interim significand after the multiplication step should be $1F_FFDF_FFFF_FFFF_0001.

It is now easy to check whether the erroneous 1-digit offset was caused
by the multiplication step or the division step.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Wed May 22 18:09:50 2024

NaNs can be surprising. I read in Wikipedia:

Encodings of qNaN and sNaN are not completely specified in
IEEE 754 and depend on the processor. Most processors, such
as the x86 family and the ARM family processors, use the most
significant bit of the significand field to indicate a quiet
NaN; this is what is recommended by IEEE 754. The PA-RISC
processors use the bit to indicate a signaling NaN.

IOW tests with filled significands should better leave alone
the 53rd bit.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Wed May 22 16:31:45 2024

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On 5/21/24 04:03, mhx wrote:

Anton Ertl wrote:

[..]

It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.

...

Remember that you will also have to deal with IEEE 754 special values
like Inf and NaN.

Not a problem. If any operand is a NaN, the result will be NaN no
matter how the operations are associated. For infinities (and 0 as
divisor), I would analyse it by looking at all cases, but I don't see
that it makes any difference:

Variable names here represent finite non-zero values:

(inf*y)/z=inf/z=inf
inf*(y/z)=inf*finite=inf
y*(inf/z)=y*inf=inf

Likewise if x is finite and y is infinite

(x*y)/inf=finite/inf=0
x*(y/inf)=x*0=0
y*(x/inf)=y*0=0

(x*y)/0=finite/0=inf
x*(y/0)=x*inf=inf
y*(x/0)=y*inf=inf

Signs in all these cases follow the same rules whether infinities are
involved or not.

It will be interesting to compare the efficiency of
both my approach and your sorting approach. I'm skeptical that the
additional sorting will make the equivalent calculation faster.

Actually sorting is overkill:

: fsort2 ( r1 r2 -- r3 r4 )
\ |r3|>=|r4|
fover fabs fover fabs f< if
fswap
then ;

: f*/ ( r1 r2 r3 -- r )
fdup fabs 1e f> fswap frot fsort2 if
fswap then
frot f/ f* ;

I have tested this with your tests from
<v2dqte$3ir32$1@dont-email.me>, but needed to change rel-near (I
changed it to 1e-16) for gforth to pass your tests. I leave
performance testing to you. Here's what vfx64 produces for this F*/:

see f*/
F*/
( 0050A310 D9C0 ) FLD ST
( 0050A312 D9E1 ) FABS
( 0050A314 D9E8 ) FLD1
( 0050A316 E8F5BEFFFF ) CALL 00506210 F>
( 0050A31B D9C9 ) FXCH ST(1)
( 0050A31D D9C9 ) FXCH ST(1)
( 0050A31F D9CA ) FXCH ST(2)
( 0050A321 E88AFFFFFF ) CALL 0050A2B0 FSORT2
( 0050A326 4885DB ) TEST RBX, RBX
( 0050A329 488B5D00 ) MOV RBX, [RBP]
( 0050A32D 488D6D08 ) LEA RBP, [RBP+08]
( 0050A331 0F8402000000 ) JZ/E 0050A339
( 0050A337 D9C9 ) FXCH ST(1)
( 0050A339 D9C9 ) FXCH ST(1)
( 0050A33B D9CA ) FXCH ST(2)
( 0050A33D DEF9 ) FDIVP ST(1), ST
( 0050A33F DEC9 ) FMULP ST(1), ST
( 0050A341 C3 ) RET/NEXT
( 50 bytes, 18 instructions )
ok
see fsort2
FSORT2
( 0050A2B0 D9C1 ) FLD ST(1)
( 0050A2B2 D9E1 ) FABS
( 0050A2B4 D9C1 ) FLD ST(1)
( 0050A2B6 D9E1 ) FABS
( 0050A2B8 E863BEFFFF ) CALL 00506120 F<
( 0050A2BD 4885DB ) TEST RBX, RBX
( 0050A2C0 488B5D00 ) MOV RBX, [RBP]
( 0050A2C4 488D6D08 ) LEA RBP, [RBP+08]
( 0050A2C8 0F8402000000 ) JZ/E 0050A2D0
( 0050A2CE D9C9 ) FXCH ST(1)
( 0050A2D0 C3 ) RET/NEXT
( 33 bytes, 11 instructions )
ok
see f<
F<
( 00506120 E86BFEFFFF ) CALL 00505F90 FCMP2
( 00506125 4881FB00010000 ) CMP RBX, # 00000100
( 0050612C 0F94C3 ) SETZ/E BL
( 0050612F F6DB ) NEG BL
( 00506131 480FBEDB ) MOVSX RBX, BL
( 00506135 C3 ) RET/NEXT
( 22 bytes, 6 instructions )
ok
see fcmp2
FCMP2
( 00505F90 4883ED08 ) SUB RBP, # 08
( 00505F94 48895D00 ) MOV [RBP], RBX
( 00505F98 D9C9 ) FXCH ST(1)
( 00505F9A DED9 ) FCOMPP
( 00505F9C 9B ) FWAIT
( 00505F9D DFE0 ) FSTSW AX
( 00505F9F 66250041 ) AND AX, # 4100
( 00505FA3 480FB7D8 ) MOVZX RBX, AX
( 00505FA7 C3 ) RET/NEXT
( 24 bytes, 9 instructions )

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Mon May 27 13:48:05 2024

On 5/21/24 14:17, minforth wrote:

Krishna Myneni wrote:

MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok >>>> hex
ok
MANTMAX FFFF S>F FDUP F*/
ok
decimal dp-components.
significand = 9007199254740990 exponent = 52 sign = 0 ok

The lowest bit is different in the significand. Need to look into
why the lowest fraction bit is different.

It looks like you are truncating the result and not rounding to nearest
which is expected by F* and F/.
You save the remainder but do not use it for anything!

I think you're correct, but the rounding used by F*/ should be to
whatever rounding mode the FPU is set. In kForth, the FPU is always
initialized to round to nearest. Since we are doing the calculation for

F*/ in integer arithmetic, it is not observing the round-to-nearest mode

of the FPU.

Manual cross-check with quad precision floats:

MinForth 3.6 (64 bit) (fp 128 bit)
# $1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX ok
# mantmax ok
f: 9.0072e+15 | # $FFFF s>f ok
f: 9.0072e+15 65535 | # f* ok
f: 5.90287e+20 | # fdup f>d ok
f: 5.90287e+20 | -9007199254806527 31 # hex ok
f: 5.90287e+20 | FFDFFFFFFFFF0001 1F $ decimal 65535e f/ ok
f: 9.0072e+15 | -9007199254806527 31 # 2drop f>s ok
9007199254740991 # hex ok
1FFFFFFFFFFFFF $

So for the 64-bit F*/ challenge, the interim significand after the multiplication step should be $1F_FFDF_FFFF_FFFF_0001.

It is now easy to check whether the erroneous 1-digit offset was caused
by the multiplication step or the division step.

Ok. Will go back and compare.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Mon May 27 13:45:38 2024

On 5/22/24 11:31, Anton Ertl wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On 5/21/24 04:03, mhx wrote:

Anton Ertl wrote:

[..]

It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.

...

Remember that you will also have to deal with IEEE 754 special values
like Inf and NaN.

Not a problem. If any operand is a NaN, the result will be NaN no
matter how the operations are associated. For infinities (and 0 as
divisor), I would analyse it by looking at all cases, but I don't see
that it makes any difference:

Variable names here represent finite non-zero values:

(inf*y)/z=inf/z=inf
inf*(y/z)=inf*finite=inf
y*(inf/z)=y*inf=inf

Likewise if x is finite and y is infinite

(x*y)/inf=finite/inf=0
x*(y/inf)=x*0=0
y*(x/inf)=y*0=0

(x*y)/0=finite/0=inf
x*(y/0)=x*inf=inf
y*(x/0)=y*inf=inf

Signs in all these cases follow the same rules whether infinities are involved or not.

It will be interesting to compare the efficiency of
both my approach and your sorting approach. I'm skeptical that the
additional sorting will make the equivalent calculation faster.

Actually sorting is overkill:

: fsort2 ( r1 r2 -- r3 r4 )
\ |r3|>=|r4|
fover fabs fover fabs f< if
fswap
then ;

: f*/ ( r1 r2 r3 -- r )
fdup fabs 1e f> fswap frot fsort2 if
fswap then
frot f/ f* ;

I have tested this with your tests from
<v2dqte$3ir32$1@dont-email.me>, but needed to change rel-near (I
changed it to 1e-16) for gforth to pass your tests. I leave
performance testing to you. Here's what vfx64 produces for this F*/:

see f*/
F*/
( 0050A310 D9C0 ) FLD ST
( 0050A312 D9E1 ) FABS
( 0050A314 D9E8 ) FLD1
( 0050A316 E8F5BEFFFF ) CALL 00506210 F>
( 0050A31B D9C9 ) FXCH ST(1)
( 0050A31D D9C9 ) FXCH ST(1)
( 0050A31F D9CA ) FXCH ST(2)
( 0050A321 E88AFFFFFF ) CALL 0050A2B0 FSORT2
( 0050A326 4885DB ) TEST RBX, RBX
( 0050A329 488B5D00 ) MOV RBX, [RBP]
( 0050A32D 488D6D08 ) LEA RBP, [RBP+08]
( 0050A331 0F8402000000 ) JZ/E 0050A339
( 0050A337 D9C9 ) FXCH ST(1)
( 0050A339 D9C9 ) FXCH ST(1)
( 0050A33B D9CA ) FXCH ST(2)
( 0050A33D DEF9 ) FDIVP ST(1), ST
( 0050A33F DEC9 ) FMULP ST(1), ST
( 0050A341 C3 ) RET/NEXT
( 50 bytes, 18 instructions )
ok
see fsort2
FSORT2
( 0050A2B0 D9C1 ) FLD ST(1)
( 0050A2B2 D9E1 ) FABS
( 0050A2B4 D9C1 ) FLD ST(1)
( 0050A2B6 D9E1 ) FABS
( 0050A2B8 E863BEFFFF ) CALL 00506120 F<
( 0050A2BD 4885DB ) TEST RBX, RBX
( 0050A2C0 488B5D00 ) MOV RBX, [RBP]
( 0050A2C4 488D6D08 ) LEA RBP, [RBP+08]
( 0050A2C8 0F8402000000 ) JZ/E 0050A2D0
( 0050A2CE D9C9 ) FXCH ST(1)
( 0050A2D0 C3 ) RET/NEXT
( 33 bytes, 11 instructions )
ok
see f<
F<
( 00506120 E86BFEFFFF ) CALL 00505F90 FCMP2
( 00506125 4881FB00010000 ) CMP RBX, # 00000100
( 0050612C 0F94C3 ) SETZ/E BL
( 0050612F F6DB ) NEG BL
( 00506131 480FBEDB ) MOVSX RBX, BL
( 00506135 C3 ) RET/NEXT
( 22 bytes, 6 instructions )
ok
see fcmp2
FCMP2
( 00505F90 4883ED08 ) SUB RBP, # 08
( 00505F94 48895D00 ) MOV [RBP], RBX
( 00505F98 D9C9 ) FXCH ST(1)
( 00505F9A DED9 ) FCOMPP
( 00505F9C 9B ) FWAIT
( 00505F9D DFE0 ) FSTSW AX
( 00505F9F 66250041 ) AND AX, # 4100
( 00505FA3 480FB7D8 ) MOVZX RBX, AX
( 00505FA7 C3 ) RET/NEXT
( 24 bytes, 9 instructions )

Nice work. I'll try some comparisons using your Forth implementation of
F*/ .

Interesting that there is an FWAIT instruction assembled into FCMP2.
IIRC, FWAIT was needed to fix a bug in early FPUs.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Anton Ertl on Mon May 27 18:27:20 2024

On 5/22/24 11:31, Anton Ertl wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On 5/21/24 04:03, mhx wrote:

Anton Ertl wrote:

[..]

It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.

...

Remember that you will also have to deal with IEEE 754 special values
like Inf and NaN.

Not a problem. If any operand is a NaN, the result will be NaN no
matter how the operations are associated. For infinities (and 0 as
divisor), I would analyse it by looking at all cases, but I don't see
that it makes any difference:

Variable names here represent finite non-zero values:

(inf*y)/z=inf/z=inf
inf*(y/z)=inf*finite=inf
y*(inf/z)=y*inf=inf

Likewise if x is finite and y is infinite

(x*y)/inf=finite/inf=0
x*(y/inf)=x*0=0
y*(x/inf)=y*0=0

(x*y)/0=finite/0=inf
x*(y/0)=x*inf=inf
y*(x/0)=y*inf=inf

Signs in all these cases follow the same rules whether infinities are involved or not.

It will be interesting to compare the efficiency of
both my approach and your sorting approach. I'm skeptical that the
additional sorting will make the equivalent calculation faster.

Actually sorting is overkill:

: fsort2 ( r1 r2 -- r3 r4 )
\ |r3|>=|r4|
fover fabs fover fabs f< if
fswap
then ;

: f*/ ( r1 r2 r3 -- r )
fdup fabs 1e f> fswap frot fsort2 if
fswap then
frot f/ f* ;

I have tested this with your tests from
<v2dqte$3ir32$1@dont-email.me>, but needed to change rel-near (I
changed it to 1e-16) for gforth to pass your tests. I leave
performance testing to you. ...

I put together a benchmark test. My implementation of F*/ hangs during
the test, so I have some debugging to do on that. Your implementation
above does execute.

--
Krishna

=== begin bench-fstarslash.4th ===
\ bench-fstarslash.4th
\
\ Benchmark implementation of double-precision F*/ using
\ a large sample of triplet dp floating-point numbers,
\ which would overflow/underflow with F* F/ sequence.
\

include ans-words \ only for kForth
include random
include fsl/fsl-util
include fsl/dynmem

1 [IF]

\ Anton Ertl's implementation of F*/ per discussion on
\ comp.lang.forth: 5/22/2024, "Re: F*/ (f-star-slash)"

: fsort2 ( F: r1 r2 -- r3 r4 )
\ |r3|>=|r4|
fover fabs fover fabs f< if
fswap
then ;

: f*/ ( F: r1 r2 r3 -- r )
fdup fabs 1e f> fswap frot fsort2 if
fswap then
frot f/ f* ;

[ELSE]
include fstarslash \ other implementation
[THEN]

\ Generate triplet:
\
\ 1. Use a integer random number generator to generate
\ binary signed exponents, e1, e2, e3, in the interval
\
\ [-DP_EXPONENT_BIAS, MAX_DP_EXPONENT-DP_EXPONENT_BIAS)
\
\ 2. Obtain the triplet, 2^{e1--e3}, such that it is in
\ the normal range for double-precision fp.
\
\ 3. Store in matrix and repeat from 1 for N_SAMPLES.
\
\ 4. Benchmark results of F*/ on triplets; store in
\ results array.
\

1014678321 seed !

[UNDEFINED] DP_EXPONENT_MAX [IF]
2047 constant DP_EXPONENT_MAX
[THEN]

DP_EXPONENT_MAX constant RAND_EXP_MASK

[UNDEFINED] DP_EXPONENT_BIAS [IF]
1023 constant DP_EXPONENT_BIAS
[THEN]

1000000 constant N_SAMPLES

FLOAT DMATRIX triplets{{
FLOAT DARRAY results{

: alloc-mem ( u -- bFail )
& triplets{{ over 3 }}malloc
& results{ swap }malloc
malloc-fail? ;

: free-mem ( -- )
& results{ }free
& triplets{{ }}free
;

\ Return signed binary exponent
: rand-exp ( -- e )
random2 RAND_EXP_MASK and \ ensure e stays within range
DP_EXPONENT_BIAS - ;

: normal-exponent? ( e -- bNormal )
DP_EXPONENT_BIAS negate DP_EXPONENT_BIAS within ;

: random-exponents ( -- e1 e2 e3 )
BEGIN
rand-exp rand-exp
2dup + normal-exponent?
WHILE
2drop
REPEAT
\ -- e1 e2
\ (2^e1)*(2^e2) will underflow or overflow double fp under F*
BEGIN
2dup +
rand-exp 2dup -
normal-exponent? 0=
WHILE
2drop
REPEAT
nip
;

: randomize-sign ( F: r -- +/-r )
random2 1 and IF fnegate THEN ;

: gen-triplet ( F: -- r1 r2 r3 )
random-exponents
2.0e0 s>f f** randomize-sign
2.0e0 s>f f** randomize-sign
2.0e0 s>f f** randomize-sign ;

: gen-samples ( -- )
N_SAMPLES alloc-mem ABORT" Unable to allocate memory!"
N_SAMPLES 0 DO
gen-triplet
triplets{{ I 2 }} f! triplets{{ I 1 }} f! triplets{{ I 0 }} f!
key? IF key drop I . cr unloop EXIT THEN
LOOP ;

\ For Gforth, include kforth-compat.fs, or replace MS@ with UTIME
\ and double arithmetic to obtain elapsed time.

: bench-f*/ ( -- )
ms@
results{ 0 }
N_SAMPLES 0 DO
triplets{{ I 0 }}
dup f@
float+ dup f@
float+ f@
f*/
dup f! float+
LOOP
drop
ms@ swap - . ." ms" cr ;

cr
cr .( Type 'gen-samples' to obtain arithmetic cases for F*/' )
cr .( Type bench-f*/ to execute the benchmark. )
cr
cr
=== end bench-fstarslash.4th ===

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Krishna Myneni on Tue May 28 06:11:05 2024

On 5/27/24 18:27, Krishna Myneni wrote:

On 5/22/24 11:31, Anton Ertl wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On 5/21/24 04:03, mhx wrote:

Anton Ertl wrote:

[..]

It seems to me that this can be solved by sorting the three factors
into a>b>c. Then you can avoid the intermediate overflow by
performing the computation as (a*c)*b.

...

Remember that you will also have to deal with IEEE 754 special values
like Inf and NaN.

Not a problem. If any operand is a NaN, the result will be NaN no
matter how the operations are associated. For infinities (and 0 as
divisor), I would analyse it by looking at all cases, but I don't see
that it makes any difference:

Variable names here represent finite non-zero values:

(inf*y)/z=inf/z=inf
inf*(y/z)=inf*finite=inf
y*(inf/z)=y*inf=inf

Likewise if x is finite and y is infinite

(x*y)/inf=finite/inf=0
x*(y/inf)=x*0=0
y*(x/inf)=y*0=0

(x*y)/0=finite/0=inf
x*(y/0)=x*inf=inf
y*(x/0)=y*inf=inf

Signs in all these cases follow the same rules whether infinities are
involved or not.

It will be interesting to compare the efficiency of
both my approach and your sorting approach. I'm skeptical that the
additional sorting will make the equivalent calculation faster.

Actually sorting is overkill:

: fsort2 ( r1 r2 -- r3 r4 )
     \ |r3|>=|r4|
     fover fabs fover fabs f< if
         fswap
     then ;

: f*/ ( r1 r2 r3 -- r )
     fdup fabs 1e f> fswap frot fsort2 if
         fswap then
     frot f/ f* ;

I have tested this with your tests from
<v2dqte$3ir32$1@dont-email.me>, but needed to change rel-near (I
changed it to 1e-16) for gforth to pass your tests. I leave
performance testing to you. ...

I put together a benchmark test. My implementation of F*/ hangs during
the test, so I have some debugging to do on that. Your implementation
above does execute.

--
Krishna

=== begin bench-fstarslash.4th ===
\ bench-fstarslash.4th
\
\ Benchmark implementation of double-precision F*/ using
\ a large sample of triplet dp floating-point numbers,
\ which would overflow/underflow with F* F/ sequence.
\

Updated benchmark code also allows for validating the arithmetic of F*/
for all of the 1 million test cases. At the moment, all arithmetic cases
use only "normal" fp numbers i.e. they do not include zero, infinity, or subnormal numbers -- special tests are needed for such cases.

--
Krishna

=== begin bench-fstarslash.4th ===
\ bench-fstarslash.4th
\
\ Benchmark implementation of double-precision F*/ using
\ a large sample of triplet dp floating-point numbers,
\ which would overflow/underflow with F* F/ sequence.
\

include ans-words \ only for kForth
include random
include fsl/fsl-util
include fsl/dynmem

1 [IF]

\ Anton Ertl's implementation of F*/ per discussion on
\ comp.lang.forth: 5/22/2024, "Re: F*/ (f-star-slash)"

: fsort2 ( F: r1 r2 -- r3 r4 )
\ |r3|>=|r4|
fover fabs fover fabs f< if
fswap
then ;

: f*/ ( F: r1 r2 r3 -- r )
fdup fabs 1e f> fswap frot fsort2 if
fswap then
frot f/ f* ;

[ELSE]
include fstarslash \ other implementation
[THEN]

\ Generate triplet:
\
\ 1. Use a integer random number generator to generate
\ binary signed exponents, e1, e2, e3, in the interval
\
\ [-DP_EXPONENT_BIAS, MAX_DP_EXPONENT-DP_EXPONENT_BIAS)
\
\ 2. Obtain the triplet, 2^{e1--e3}, such that it is in
\ the normal range for double-precision fp.
\
\ 3. Store in matrix and repeat from 1 for N_SAMPLES.
\
\ 4. Benchmark results of F*/ on triplets; store in
\ results array.
\

1014678321 seed !

[UNDEFINED] DP_EXPONENT_MAX [IF]
2047 constant DP_EXPONENT_MAX
[THEN]

DP_EXPONENT_MAX constant RAND_EXP_MASK

[UNDEFINED] DP_EXPONENT_BIAS [IF]
1023 constant DP_EXPONENT_BIAS
[THEN]

1000000 constant N_SAMPLES

FLOAT DMATRIX triplets{{
FLOAT DARRAY results{
INTEGER DMATRIX refexp{{

: 3dup ( n1 n2 n3 -- n1 n2 n3 n1 n2 n3 )
2 pick 2 pick 2 pick ;

: alloc-mem ( u -- bFail )
& triplets{{ over 3 }}malloc
& results{ over }malloc
& refexp{{ swap 3 }}malloc
malloc-fail? ;

: free-mem ( -- )
& refexp{{ }}free
& results{ }free
& triplets{{ }}free
;

: normal-exponent? ( e -- bNormal )
DP_EXPONENT_BIAS negate 1+ DP_EXPONENT_BIAS 1+ within ;

\ Return signed binary exponent in normal range
: rand-exp ( -- e )
random2 RAND_EXP_MASK and \ ensure e stays within range
DP_EXPONENT_BIAS -
DP_EXPONENT_BIAS min ;

: random-exponents ( -- e1 e2 e3 )
BEGIN
rand-exp rand-exp
2dup + normal-exponent?
WHILE
2drop
REPEAT
\ -- e1 e2
\ (2^e1)*(2^e2) will underflow or overflow double fp under F*
BEGIN
2dup +
rand-exp 2dup -
normal-exponent? 0=
WHILE
2drop
REPEAT
nip
;

: combine-exponents ( e1 e2 e3 -- e1 e2 e3 etot )
2dup - 3 pick + ;

: randomize-sign ( F: r -- +/-r )
random2 1 and IF fnegate THEN ;

: etriple>floats ( e1 e2 e3 -- ) ( F: -- r1 r2 r3 )
rot 2.0e0 s>f f**
swap 2.0e0 s>f f**
2.0e0 s>f f** ;

\ GEN-SAMPLES does not generate cases with zeros or
\ infinities.
: gen-samples ( -- )
N_SAMPLES alloc-mem ABORT" Unable to allocate memory!"
N_SAMPLES 0 DO
random-exponents
3dup refexp{{ I 2 }} ! refexp{{ I 1 }} ! refexp{{ I 0 }} !
etriple>floats
triplets{{ I 2 }} randomize-sign f!
triplets{{ I 1 }} randomize-sign f!
triplets{{ I 0 }} randomize-sign f!
LOOP ;

\ For Gforth, include kforth-compat.fs, or replace MS@ with UTIME
\ and double arithmetic to obtain elapsed time.

: bench-f*/ ( -- )
ms@
results{ 0 }
N_SAMPLES 0 DO
triplets{{ I 0 }}
dup f@
float+ dup f@
float+ f@
f*/
dup f! float+
LOOP
drop
ms@ swap - . ." ms" cr ;

\ Check the calculation performed by F*/
: relative-error ( F: r1 r2 -- relerror )
fswap fover f- fswap f/ fabs ;

1e-17 fconstant tol
variable tol-errors
variable sign-errors
fvariable max-rel-error

: check-results ( -- )
cr N_SAMPLES 7 .r ." samples" cr
0 tol-errors !
0 sign-errors !
0.0e0 max-rel-error f!
N_SAMPLES 0 DO
results{ I } f@ fabs
2.0e0 refexp{{ I 0 }} @ refexp{{ I 1 }} @ +
refexp{{ I 2 }} @ - s>f f**
relative-error fdup tol f> IF
max-rel-error f@ fmax max-rel-error f!
1 tol-errors +!
ELSE
fdrop
THEN
\ check for sign errors
results{ I } f@ f0< \ bSign
triplets{{ I 0 }} f@ f0< triplets{{ I 1 }} f@ f0< xor
triplets{{ I 2 }} f@ f0< xor
<> IF 1 sign-errors +! THEN
LOOP
cr ." # tolerance errors: " tol-errors @ 6 .r
cr ." # sign errors: " sign-errors @ 6 .r
cr ." Max relative error: " max-rel-error f@ f.
cr
;

cr
cr .( Type: )
cr .( 'GEN-SAMPLES' to obtain arithmetic cases for F*/ )
cr .( 'BENCH-F*/' to execute the benchmark )
cr .( 'CHECK-RESULTS' to validate arithmetic by F*/ )
cr .( 'FREE-MEM' to free memory )
cr .( Note: Test cases do not involve zeros or infinities. )
cr

=== end bench-fstarslash.4th ===

=== Sample Run ===
Type:
'GEN-SAMPLES' to obtain arithmetic cases for F*/
'BENCH-F*/' to execute the benchmark
'CHECK-RESULTS' to validate arithmetic by F*/
'FREE-MEM' to free memory
Note: Test cases do not involve zeros or infinities.
ok
gen-samples
ok
bench-f*/
87 ms
ok
check-results

1000000 samples

# tolerance errors: 0
# sign errors: 0
Max relative error: 0
ok
=== end Sample Run ===

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Krishna Myneni on Wed May 29 07:27:33 2024

On 5/28/24 06:11, Krishna Myneni wrote:
...

Updated benchmark code also allows for validating the arithmetic of F*/
for all of the 1 million test cases. At the moment, all arithmetic cases
use only "normal" fp numbers i.e. they do not include zero, infinity, or subnormal numbers -- special tests are needed for such cases.
...

I found the reason for hanging in my implementation of F*/ -- the
problem was in the word DP_NORMAL_FRACTION (the only word which has an indefinite loop in the code). The version which doesn't go into an
infinite loop is given below:

: dp-normal-fraction ( u -- expinc u')
dup DP_FRACTION_BITS 1+ rshift
dup IF
tuck rshift
ELSE
drop -1 swap \ -1 u
BEGIN
dup DP_IMPLIED_BIT and 0=
WHILE
1 lshift
swap 1- swap
REPEAT
THEN
;

Now the benchmark code runs. It shows two things (see below):

1. The execution time is poor compared to Anton's implementation, about
8x slower.

2. Half of the arithmetic results are wrong, with the max relative error
being as high as 2/3.

So, I clearly have some work to do to fix the arithmetic. It was
fortuitous that it passed my test cases, but clearly the test cases in
my original implementation of F*/ were insufficient.

--
Krishna

bench-f*/
603 ms
ok
check-results

1000000 samples

# tolerance errors: 501374
# sign errors: 0
Max relative error: 0.666667
ok

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Krishna Myneni on Fri May 31 22:57:18 2024

On 5/29/24 07:27, Krishna Myneni wrote:

On 5/28/24 06:11, Krishna Myneni wrote:
...

...

Now the benchmark code runs. It shows two things (see below):

1. The execution time is poor compared to Anton's implementation, about
8x slower.

2. Half of the arithmetic results are wrong, with the max relative error being as high as 2/3.

So, I clearly have some work to do to fix the arithmetic. It was
fortuitous that it passed my test cases, but clearly the test cases in
my original implementation of F*/ were insufficient.

--
Krishna

bench-f*/d
603 ms
ok
check-results

1000000 samples

# tolerance errors: 501374
# sign errors: 0
Max relative error: 0.666667
ok

Ok. I found and fixed the errors in my Forth implementation of F*/ . It
runs the benchmark test and now there are no arithmetic errors to within
the specified tolerance of 1E-17.

My Forth implementation is much slower than Anton's, which uses the FPU floating point operations. However, my Forth implementation of F*/ makes
no use of floating point arithmetic, and will likely be considerably
faster in assembly code than in Forth due to the low level bit operations.

There is also a restriction currently on passing subnormal numbers to my version of F*/ -- as it stands, it will give incorrect results for
subnormal numbers and won't detect this condition.

I also found an error in the benchmark code (bench-fstarslash.4th), in
the word,

RAND-EXP ( -- e ) \ Return signed binary exponent in normal range

It was allowing return of biased binary exponent of -1023 which is not
in the normal range. Anton's implementation works with subnormal
numbers, so it passed the results verification check.

The updated definition of the random exponent generator for bench-fstarslash.4th is

\ Return signed binary exponent in normal range
: rand-exp ( -- e )
random2 RAND_EXP_MASK and
1 max DP_EXPONENT_MAX 1- min \ ensure e within normal range
DP_EXPONENT_BIAS - ;

My current implementation, which passes the results verification test in bench-fstarslash.4th (with the above revised def of RAND-EXP), is listed
below, along with a sample run.

Cheers,
Krishna

--

=== begin fstarslash.4th ===
\ fstarslash.4th
\
\ Implementation of F*/ on 64-bit Forth
\
\ Krishna Myneni, 2024-05-19
\ Revised: 2024-05-31
\
\ F*/ ( F: r1 r2 r3 -- r )
\ Multiply two IEEE-754 double precision floating point
\ numbers r1 and r2 and divide by r3 to obtain the result,
\ r. The intermediate product significand, r1*r2, is
\ represented with at least IEEE quad precision for both
\ its significand and exponent.
\
\ F*/ uses an intermediate significand represented by
\ 128 bits
\
\ Notes:
\ 1. Does not work with subnormal IEEE 754 double-
\ precision numbers. Needs tests for subnormal
\ inputs.

\ include ans-words

1 cells 8 < [IF]
cr .( Requires 64-bit Forth system ) cr
ABORT
[THEN]

fvariable temp
: >fpstack ( u -- ) ( F: -- r )
temp ! temp f@ ;

base @ hex
8000000000000000 constant DP_SIGNBIT_MASK
000FFFFFFFFFFFFF constant DP_FRACTION_MASK
7FF0000000000000 constant DP_EXPONENT_MASK
\ DP special value constants
7FF0000000000001 >fpstack fconstant DP_NAN
FFF0000000000000 >fpstack fconstant DP_-INF
7FF0000000000000 >fpstack fconstant DP_+INF
base !

52 constant DP_FRACTION_BITS
1023 constant DP_EXPONENT_BIAS
2047 constant DP_EXPONENT_MAX

1 DP_FRACTION_BITS LSHIFT constant DP_IMPLIED_BIT
63 constant DP_SIGN_BIT

: dp-fraction ( F: r -- r ) ( -- u )
FP@ @ DP_FRACTION_MASK and ;
: dp-exponent ( F: r -- r ) ( -- e )
FP@ @ DP_EXPONENT_MASK and DP_FRACTION_BITS rshift ;
: dp-sign ( F: r -- r ) ( -- sign )
FP@ @ DP_SIGN_BIT rshift ;

: _isZero? ( ufrac uexp -- b ) 0= swap 0= and ;
: _isInf? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0= and ;
: _isNaN? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0<> and ;
: _isSubNormal? ( ufrac uexp -- b ) 0= swap 0<> and ;

: frac>significand ( u -- u' ) DP_IMPLIED_BIT or ;

\ Show the significand, unbiased base-2 exponent, and sign bit
\ for a double-precision floating point number. To recover the
\ dp floating point number, r, from these fields,
\
\ r = (-1)^sign * significand * 2^(exponent - 52)
\
: dp-components. ( F: r -- )
dp-sign
dp-exponent
dp-fraction
2dup D0= invert IF frac>significand THEN
swap DP_EXPONENT_BIAS - swap
fdrop
." significand = " 18 u.r 2 spaces
." exponent = " 4 .r 2 spaces
." sign = " .
;

: dp-normal-significand ( u -- expinc u' )
dup DP_FRACTION_BITS 1+ rshift
dup IF
tuck rshift
ELSE
drop 0 swap \ 0 u
BEGIN
dup DP_IMPLIED_BIT and 0=
WHILE
1 lshift
swap 1- swap
REPEAT
THEN
;

0 value r1_sign
0 value r2_sign
0 value r3_sign
0 value num_sign
variable r1_exp
variable r2_exp
variable r3_exp
variable r_exp
variable r1_frac
variable r2_frac
variable r3_frac

: get-arg-components ( F: r1 r2 r3 -- )
dp-fraction r3_frac !
dp-exponent r3_exp !
dp-sign to r3_sign
fdrop
dp-fraction r2_frac !
dp-exponent r2_exp !
dp-sign to r2_sign
fdrop
dp-fraction r1_frac !
dp-exponent r1_exp !
dp-sign to r1_sign
fdrop ;

0 value b_num_zero
0 value b_den_zero
0 value b_num_Inf
0 value b_den_Inf

variable remainder

: _F*/ ( F: r1 r2 r3 -- ) ( -- usign uexponent udquot true )
\ or ( F: -- DP_NAN | +/- DP_INF ) ( -- false )
get-arg-components

\ Check for NaN in args
r1_frac @ r1_exp @ _IsNaN?
r2_frac @ r2_exp @ _IsNaN? or
r3_frac @ r3_exp @ _IsNaN? or IF DP_NAN FALSE EXIT THEN

r1_frac @ r1_exp @ _IsZero?
r2_frac @ r2_exp @ _IsZero? or to b_num_zero
r3_frac @ r3_exp @ _IsZero? to b_den_zero

b_num_zero b_den_zero and IF DP_NAN FALSE EXIT THEN

r1_sign r2_sign xor to num_sign

r1_frac @ r1_exp @ _IsInf?
r2_frac @ r2_exp @ _IsInf? or to b_num_Inf
r3_frac @ r3_exp @ _IsInf? to b_den_Inf

b_num_Inf b_den_Inf and IF DP_NAN FALSE EXIT THEN

b_num_zero b_den_Inf or IF \ return signed zero
num_sign r3_sign xor
0
0 s>d TRUE EXIT
THEN

b_den_zero b_num_Inf or IF \ return signed Inf
DP_+INF
num_sign r3_sign xor IF FNEGATE THEN
FALSE EXIT
THEN

num_sign r3_sign xor
r1_exp @ r2_exp @ + r3_exp @ -
r1_frac @ frac>significand
r2_frac @ frac>significand um*
r3_frac @ frac>significand ud/mod
rot remainder !
TRUE
;

variable uhigh

: F*/ ( F: r1 r2 r3 -- r )
_F*/ \ ( -- usign uexp udquot true ) or
\ ( -- false ) ( F: NAN | +/-INF )
IF
2 pick 0 DP_EXPONENT_MAX WITHIN 0= IF
-43 throw
THEN
dup IF \ high 64 bits of the quotient
uhigh !
-43 throw
ELSE
drop \ -- usign uexp uquot
2dup D0= IF \ ufrac = 0 and uexp 0
drop \ usign uexp
ELSE
dp-normal-significand \ usign uexp expinc uquot
>r + DP_FRACTION_BITS lshift
r> DP_FRACTION_MASK and or
THEN
swap DP_SIGN_BIT lshift or
>fpstack
THEN
THEN
;

\ Tests
1 [IF]
[UNDEFINED] T{ [IF] include ttester.4th [THEN]
1e-17 rel-near f!
1e-17 abs-near f!
TESTING F*/
\ These results are different using F* and F/
t{ 1.0e300 -3.0e259 1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e-300 3.0e-259 1.0e-280 f*/ -> 3.0e-279 }t
t{ 1.0e300 -3.0e259 -1.0e280 f*/ -> 3.0e279 }t
t{ -1.0e300 -3.0e259 -1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e302 -3.0e302 DP_+INF f*/ -> -0.0e0 }t
t{ -1.0e-302 -1.0e-302 0.0e0 f*/ -> DP_+INF }t

\ The following should give same results as using F* and F/
t{ 0.0e0 3.0e302 1.0e-100 f*/ -> 0.0e0 }t
t{ 3.0e302 0.0e0 1.0e-100 f*/ -> 0.0e0 }t
t{ 1.0e302 -3.0e259 0.0e0 f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0 f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0 f*/ -> DP_-INF }t
[THEN]

=== end fstarslash.4th ===

=== sample run of bench-fstarslash.4th ===
\ Testing implementation of F*/ in fstarslash.4th
include bench-fstarslash

FSL-UTIL V1.3d 21 Apr 2024 EFC, KM
DYNMEM V1.9d 19 February 2012 EFC TESTING F*/

Type:
'GEN-SAMPLES' to obtain arithmetic cases for F*/
'BENCH-F*/' to execute the benchmark
'CHECK-RESULTS' to validate arithmetic by F*/
'FREE-MEM' to free memory
Note: Test cases do not involve zeros or infinities.
ok
gen-samples
ok
bench-f*/
602 ms
ok
check-results

1000000 samples

# tolerance errors: 0
# sign errors: 0
Max relative error: 0
ok
=== end sample run ===

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Krishna Myneni on Sun Jun 2 07:11:12 2024

On 5/19/24 16:28, Krishna Myneni wrote:

The analog of */ for single length integers, with intermediate double
length result, can be implemented for floating point arithmetic as well.
The following implementation of F*/ uses the non-standard word, UD/MOD, although it might be possible to code it with UM/MOD as well. It also requires FP@ (requiring a separate fp stack in memory).

...

Based on how it handles large integer arithmetic, I half-expected that double-precision floating point arithmetic overflow/underflow would by
handled automatically by python. That appears not to be the case:

Python 3.12.3:

3.12e306 * 1.1e145 / 2.8e150

inf

Using F*/ in Forth:

3.12e306 1.1e145 2.8e150 f*/ fs.
1.22571428571429e+301 ok

--
KM

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mhx@21:1/5 to Krishna Myneni on Sun Jun 2 15:03:47 2024

Krishna Myneni wrote:

Python 3.12.3:

3.12e306 * 1.1e145 / 2.8e150

inf

[..]

Worse.

3.12e306 * 1.1e2 / 2.8e150

inf

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to mhx on Sun Jun 2 13:01:19 2024

On 6/2/24 10:03, mhx wrote:

Krishna Myneni wrote:

Python 3.12.3:

3.12e306 * 1.1e145 / 2.8e150

inf

[..]

Worse.

3.12e306 * 1.1e2 / 2.8e150

inf

The largest decimal number expressible at double precision is

1.7976931348623157e+308

so 3.12e306 * 1.1e2 overflows double-precision IEEE fp.

3.12e306 1.1e2 f* fs.
inf ok

But, the following is ok:

3.12e306 5.76183697071255e1 f* 2.8e150 f/ fs.
6.420332624508270e+157 ok

I have a least-significant bit problem with the significand in my implementation of F*/ , which I need to fix:

3.12e306 5.7618369707125503e1 2.8e150 f*/ fs.
6.420332624508269e+157 ok

--
KM

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Krishna Myneni on Sun Jun 2 18:03:17 2024

Krishna Myneni wrote:

On 5/19/24 16:28, Krishna Myneni wrote:

The analog of */ for single length integers, with intermediate double
length result, can be implemented for floating point arithmetic as
well.

The following implementation of F*/ uses the non-standard word, UD/MOD,

although it might be possible to code it with UM/MOD as well. It also
requires FP@ (requiring a separate fp stack in memory).

....

Based on how it handles large integer arithmetic, I half-expected that double-precision floating point arithmetic overflow/underflow would by handled automatically by python. That appears not to be the case:

Python 3.12.3:

3.12e306 * 1.1e145 / 2.8e150

inf

Using F*/ in Forth:

3.12e306 1.1e145 2.8e150 f*/ fs.
1.22571428571429e+301 ok

Of course, in Python you could use parentheses around the division, or
load bigfloat or float128 packages, but...
MicroPython v1.19.1 on 2022-11-05; win32 [GCC 6.3.0] version

Use Ctrl-D to exit, Ctrl-E for paste mode

3.12e306 * ( 1.1e145 / 2.8e150 )

1.225714285714286e+301

I know of no language with automatic type casting/inference to higher
precision floating-point libraries, triggered by falling into a +/-Inf
trap.

OTOH, regarding the ease of working with type information in Julia,
it shouldn't be too hard to implement in Julia.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to Krishna Myneni on Sun Jun 2 13:18:13 2024

On 6/2/24 07:11, Krishna Myneni wrote:
...

Using F*/ in Forth:

3.12e306 1.1e145 2.8e150 f*/ fs.
1.22571428571429e+301 ok

The alternative way to calculate this without overflow in Forth is to add/subtract the logarithms, then take the antilog. However, let's check
the accuracy of this approach compared to the above:

3.12e306 flog 1.1e145 flog f+ 2.8e150 flog f- falog fs.
1.225714285714187e+301 ok

The base 10 significand of the result is

3.12e 1.1e f* 2.8e f/ fs.
1.225714285714286e+00 ok

The result from using F*/ to the same number of decimal digits is

3.12e306 1.1e145 2.8e150 f*/ fs. \ Anton's version of F*/ 1.225714285714286e+301 ok

The relative error of the logarithm approach to using F*/ is

1.225714285714286e+301 1.225714285714187e+301 f- 1.225714285714286e+301
f/ f.
8.07495e-14 ok

We appear losing between about 3 decimal digits of accuracy with the
logarithm approach.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Sun Jun 2 13:51:28 2024

On 6/2/24 13:03, minforth wrote:

Krishna Myneni wrote:

On 5/19/24 16:28, Krishna Myneni wrote:

The analog of */ for single length integers, with intermediate double
length result, can be implemented for floating point arithmetic as
well.

The following implementation of F*/ uses the non-standard word, UD/MOD,

although it might be possible to code it with UM/MOD as well. It also
requires FP@ (requiring a separate fp stack in memory).

....

Based on how it handles large integer arithmetic, I half-expected that
double-precision floating point arithmetic overflow/underflow would by
handled automatically by python. That appears not to be the case:

Python 3.12.3:

3.12e306 * 1.1e145 / 2.8e150

inf

Using F*/ in Forth:

3.12e306 1.1e145 2.8e150 f*/ fs.
1.22571428571429e+301 ok

Of course, in Python you could use parentheses around the division, or
load bigfloat or float128 packages, but...
MicroPython v1.19.1 on 2022-11-05; win32 [GCC 6.3.0] version

Use Ctrl-D to exit, Ctrl-E for paste mode

3.12e306 * ( 1.1e145 / 2.8e150 )

1.225714285714286e+301

Yes, you could do that for a specific case; however, you can get
overflow to INF if you encode the calculation as a*(b/c) for general
cases. Consider,

a = 3.12e-306
b = 1.1e150
c = 2.8e-159

For this case, a*b/c will not overflow, but a*(b/c) won't work.

3.12e-306*(1.1e150/2.8e-159)

inf

But F*/ will work for this case as well as for the prior case (a =
3.12e306, b = 1.1e145, c = 2.8e150).

3.12e-306 1.1e150 2.8e-159 f*/ fs.
1.22571428571429e+03 ok

You don't have to do a case by case selection of (a*b)/c or a*(b/c),
just a b c F*/

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Sun Jun 2 21:58:12 2024

On 5/21/24 14:17, minforth wrote:

Krishna Myneni wrote:

MANTMAX dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok >>>> hex
ok
MANTMAX FFFF S>F FDUP F*/
ok
decimal dp-components.
significand = 9007199254740990 exponent = 52 sign = 0 ok

The lowest bit is different in the significand. Need to look into
why the lowest fraction bit is different.

It looks like you are truncating the result and not rounding to nearest
which is expected by F* and F/.
You save the remainder but do not use it for anything!

I think you're correct, but the rounding used by F*/ should be to
whatever rounding mode the FPU is set. In kForth, the FPU is always
initialized to round to nearest. Since we are doing the calculation for

F*/ in integer arithmetic, it is not observing the round-to-nearest mode

of the FPU.

Manual cross-check with quad precision floats:

MinForth 3.6 (64 bit) (fp 128 bit)
# $1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX ok
# mantmax ok
f: 9.0072e+15 | # $FFFF s>f ok
f: 9.0072e+15 65535 | # f* ok
f: 5.90287e+20 | # fdup f>d ok
f: 5.90287e+20 | -9007199254806527 31 # hex ok
f: 5.90287e+20 | FFDFFFFFFFFF0001 1F $ decimal 65535e f/ ok
f: 9.0072e+15 | -9007199254806527 31 # 2drop f>s ok
9007199254740991 # hex ok
1FFFFFFFFFFFFF $

So for the 64-bit F*/ challenge, the interim significand after the multiplication step should be $1F_FFDF_FFFF_FFFF_0001.

It is now easy to check whether the erroneous 1-digit offset was caused
by the multiplication step or the division step.

I had errors in my previous F*/ implementation, provided in
fstarslash.4th, which were corrected in my subsequent post of this code
on 5/31/24 (see listing for fstarslash.4th in that post). Using the
2024-05-31 revision, I now reproduce your results:

hex
ok
1FFFFFFFFFFFFF S>F FCONSTANT MANTMAX
ok
mantmax dp-components.
significand = 1FFFFFFFFFFFFF exponent = 34 sign = 0 ok
decimal
ok
mantmax dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok
mantmax 65535 s>f fdup f*/
ok
decimal dp-components.
significand = 9007199254740991 exponent = 52 sign = 0 ok

ok
9007199254740991 hex .
1FFFFFFFFFFFFF ok

We are in agreement now.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Tue Jun 4 10:55:26 2024

Super!

In a dreamy moment it occurred to me whether these methods
would also be suitable for optimising the standard Forth
operator M*/ (with triple-wide intermediate result after
multiplication).

AFAIR there are very different implementation approaches
between different Forth systems. But presumably M*/ is not
used by anyone...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Tue Jun 4 17:46:47 2024

On 6/4/24 05:55, minforth wrote:

Super!

In a dreamy moment it occurred to me whether these methods
would also be suitable for optimising the standard Forth
operator M*/ (with triple-wide intermediate result after
multiplication).

AFAIR there are very different implementation approaches
between different Forth systems. But presumably M*/ is not
used by anyone...

kForth implements M*/ in this way. It provides the word UTM/ which has
the stack diagram,

UTM/ ( ut u -- ud )

Divide unsigned triple by unsigned single to give unsigned double quotient.

It also provides DS* and UDM* :

DS* ( d n -- t )

Multiply double and single to give signed triple length product.

UDM* ( ud u -- ut )

Multiply unsigned double and unsigned single to give unsigned triple
length product.

I remember David Williams contributed these words to kForth, during his
port of the kForth VM to the PowerPC processor (this port has not been maintained for many years due to lack of access to the hardware).

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to PMF on Tue Jun 4 17:50:04 2024

On 6/4/24 08:13, PMF wrote:

Krishna Myneni wrote:

On 6/2/24 07:11, Krishna Myneni wrote:
....

Using F*/ in Forth:

3.12e306 1.1e145 2.8e150 f*/ fs.
1.22571428571429e+301 ok

The alternative way to calculate this without overflow in Forth is to
add/subtract the logarithms, then take the antilog. However, let's
check

the accuracy of this approach compared to the above:

3.12e306 flog 1.1e145 flog f+ 2.8e150 flog f- falog fs.
1.225714285714187e+301 ok

The base 10 significand of the result is

3.12e 1.1e f* 2.8e f/ fs.
1.225714285714286e+00 ok

The result from using F*/ to the same number of decimal digits is

3.12e306 1.1e145 2.8e150 f*/ fs. \ Anton's version of F*/
1.225714285714286e+301 ok

The relative error of the logarithm approach to using F*/ is

1.225714285714286e+301 1.225714285714187e+301 f- 1.225714285714286e+301

f/ f.
8.07495e-14 ok

We appear losing between about 3 decimal digits of accuracy with the
logarithm approach.

When I implemented floating point for LXF64 I decided to use a C math library,
fdlibm. As they were available I implemented frexp and scalbn.
For implementing F*/ they finally became useful!

Cool. FREXP and SCALBN look like useful factors. I can define them in
terms of my primitive factors.

\ F*/ using frexp and fscalbn
\ frexp( f - mantissa exp ) split float in mantissa and integer
exponent
\ fscalbn ( F: f exp -- f1) scale the float with integer exponent f1 = f*2^exp

: F*/ ( f1 f2 f3 -- f1*f2/f3 )
   frexp fswap frexp fswap f/
   fswap frexp f*    + swap -
   fscalbn ;

Will be interesting to benchmark this version against the other two implementations of F*/ (mine and Anton's).

This is somewhat similar to your approach but keeping the mantissa as a

floating-point number. The benefit is that subnormals, INFs, NANs are
handled automatically.

-- Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Wed Jun 12 19:01:21 2024

Just for fun: I took Peter Fälth's F*/ to good use in
one of my locals test programs. By pure coincidence ;-)
the test program is a modified version of Krishna Myneni's
calculation program sph_bes_neu.fth for spherical Bessel
and Neumann functions.

NB: The test program deliberately makes heavy use of
locals and of an embedded function that accesses an upvalue.
It should not be taken as an indication of some crazy
normal coding style. ;-)

\ ====== SPH_BES_NEU.MF =============================================
\
\ Spherical Bessel and Neumann functions for large index range
\
\ Reference
\ E. Gillman and H. R. Fiebig, "Accurate recursive generation
\ of spherical Bessel and Neumann functions for a large range
\ of indices," Computers in Physics vol. 2, p. 62 (1988).

\ Original FORTRAN to Forth conversion by Krishna Myneni

\ Adapted to MinForth 3.6 syntax (as locals test):

1000 CONSTANT MAXL MAXL VECTOR RBES MAXL VECTOR RNEU

: SPHFUNCS { f: x -- } \ fill vectors RBES and RNEU
\ --- Initialize
x fsqr x fcos x fsin { f: xx cx sx }
1e RBES{ maxl }! 1e RBES{ maxl 1- }!
cx RNEU{ 1 }! x sx f* cx f+ RNEU{ 2 }!
\ --- Recursion loop
<: LFUN ( l -- r3 ) xx sqr 4 * 1- s>f f/ ;>
maxl 1- 1 { lu lv }
maxl 1+ 3 DO
RBES{ lu } lu lfun RBES{ lu 1+ } f* f- RBES{ lu 1- }!
RNEU{ lv 1+ } lv lfun RNEU{ lv } f* f- RNEU{ lv 2+ }!
-1 += lu 1 += lv
LOOP
\ --- Scale
RBES{ 1 } RNEU{ 2 } f* xx cx f* RBES{ 2 } 3e f*/ f- 1/f *to RBES
\ --- Normalize
x 1/f -1e { f: cu cv | w }
maxl 1+ 1 DO
i 1- 2* s>f := w
cu x w 1e f+ f*/ := cu cu RBES{ i }*
cv 1e w f- x f*/ := cv cv RNEU{ i }*
LOOP ;

\ Test: 1e SPHFUNCS RBES m. RNEU m.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Fri Jun 14 21:17:12 2024

On 6/12/24 14:01, minforth wrote:

Just for fun: I took Peter Fälth's F*/ to good use in
one of my locals test programs. By pure coincidence ;-)
the test program is a modified version of Krishna Myneni's
calculation program sph_bes_neu.fth for spherical Bessel and Neumann functions.

NB: The test program deliberately makes heavy use of
locals and of an embedded function that accesses an upvalue.
It should not be taken as an indication of some crazy
normal coding style. ;-)

\ ====== SPH_BES_NEU.MF =============================================
\
\ Spherical Bessel and Neumann functions for large index range
\
\ Reference
\      E. Gillman and H. R. Fiebig, "Accurate recursive generation \      of spherical Bessel and Neumann functions for a large range \      of indices," Computers in Physics vol. 2, p. 62 (1988).

\ Original FORTRAN to Forth conversion by Krishna Myneni

\ Adapted to MinForth 3.6 syntax (as locals test):

1000 CONSTANT MAXL MAXL VECTOR RBES MAXL VECTOR RNEU

: SPHFUNCS { f: x -- } \ fill vectors RBES and RNEU
\ --- Initialize
x fsqr x fcos x fsin { f: xx cx sx }
1e RBES{ maxl }! 1e RBES{ maxl 1- }!
cx RNEU{ 1 }! x sx f* cx f+ RNEU{ 2 }!
\ --- Recursion loop
<: LFUN ( l -- r3 ) xx sqr 4 * 1- s>f f/ ;>
maxl 1- 1 { lu lv }
maxl 1+ 3 DO
   RBES{ lu } lu lfun RBES{ lu 1+ } f* f- RBES{ lu 1- }!
   RNEU{ lv 1+ } lv lfun RNEU{ lv } f* f- RNEU{ lv 2+ }!
   -1 += lu 1 += lv
LOOP
\ --- Scale
RBES{ 1 } RNEU{ 2 } f* xx cx f* RBES{ 2 } 3e f*/ f- 1/f *to RBES
\ --- Normalize
x 1/f -1e { f: cu cv | w }
maxl 1+ 1 DO
   i 1- 2* s>f := w
   cu x w 1e f+ f*/ := cu cu RBES{ i }*
   cv 1e w f- x f*/ := cv cv RNEU{ i }*
LOOP ;

\ Test: 1e SPHFUNCS RBES m. RNEU m.

Does the use of F*/ change any of the results in the test code
accompanying the original Forth code?

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Jun 15 08:21:01 2024

.. more precisely: precision deviation at j_10 (100)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Jun 15 08:12:24 2024

A quick check showed no difference between the F*/ versions.
Comparisons with -1e13 F~ worked well, while -1e-14 F~ stalled
at RBES{ 10 }.

Either I gave F*/ wrong factors or it is simply a limitation
of the algorithm. TBH I was more interested in the locals
scheme with embedded function.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Jun 15 19:38:43 2024

.. I became curious and reordered factors in the main loop
to use f*/ also there:

<: LFUN ( l -- r3 ) sqr 4 * 1- s>f ;>
maxl 1- 1 { n: lu lv }
maxl 1+ 3 DO
RBES( lu ) RBES( lu 1+ ) xx lu lfun f*/ f- RBES( lu 1- )!
RNEU( lv 1+ ) RNEU( lv ) xx lv lfun f*/ f- RNEU( lv 2+ )!
-1 += lu 1 += lv
LOOP

Precision improved indeed by about one decimal digit, iow close to
maximal achievable double fp-number resolution. Higher precision
would require calculation with extended fp-numbers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Sat Jun 15 15:41:20 2024

On 6/15/24 14:38, minforth wrote:

.. I became curious and reordered factors in the main loop
to use f*/ also there:

<: LFUN ( l -- r3 ) sqr 4 * 1- s>f ;>
maxl 1- 1 { n: lu lv }
maxl 1+ 3 DO
   RBES( lu ) RBES( lu 1+ ) xx lu lfun f*/ f- RBES( lu 1- )!
   RNEU( lv 1+ ) RNEU( lv ) xx lv lfun f*/ f- RNEU( lv 2+ )!
   -1 += lu 1 += lv
LOOP

Precision improved indeed by about one decimal digit, iow close to
maximal achievable double fp-number resolution. Higher precision
would require calculation with extended fp-numbers.

The numerical implementation of the algorithm using double-precision
numbers overflows the exponent for large l, and for large x. The results
of the calculation, however, are typically reasonable numbers over a
large range of l,x. I looked at the code to see if F*/ could be used to
avoid the exponent overflow in the intermediate calculations, and
couldn't see anything obvious.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Kampej
  Tue Jul 15 21:53:12 2025
  from Scotland via SSH
- Michal Wronka
  Tue Jul 15 18:49:59 2025
  from Wroclaw, Poland via SSH
- Plume
  Tue Jul 15 15:05:55 2025
  from Uk via SSH
- Plume
  Tue Jul 15 15:02:54 2025
  from Uk via SSH
- Michal Wronka
  Tue Jul 15 11:15:38 2025
  from Wroclaw, Poland via SSH
- Bagwaa
  Tue Jul 15 08:28:34 2025
  from Nottingham via Telnet
- Bob Worm
  Mon Jul 14 22:02:50 2025
  from Wales, Uk via Telnet
- Guest
  Mon Jul 14 21:51:19 2025
  from A via Raw

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	509
Nodes:	16 (2 / 14)
Uptime:	53:27:53
Calls:	10,005
Calls today:	6
Files:	13,843
Messages:	6,363,048

Re: F*/ (f-star-slash)

Who's Online

Recent Visitors

System Info

**Re: F*/ (f-star-slash)**