On Thu, 22 Feb 2024 20:16:25 -0000 (UTC), Steven G. Kargl wrote:
Apparently, you missed the part about argument reduction. For
sinpi(x), it is fairly easy to reduce x = n + r with n an integer
and r in [0,1). For the extended interval, x in [0,2^23], there are
roughly 2^23 values with r = 0.5; and thus, sinpi(x) = 1 exactly.
There are no such values for sin(x), and argument reduction for
sin(x) is much more involved.
You are working with approximations anyway. That those approximations
happen to exactly equal some random value seems irrelevant.
As to real-world use, how about any physical phenomena where one is interest in resonance frequencies of the system. For a simple
example see https://www.feynmanlectures.caltech.edu/I_49.html where
one might write f(x) = sin(kx) = sin(pi * (2*x/L)) with L a length
of say a clamped string.
I don’t see a formula anything like that anywhere on that page. On
the other hand, I see lots of use of “ω” symbols, representing frequencies in radians per second.
There are also uses with computing other functions, e.g., the true
gamma function via the Euler reflection formula.
gamma(x) * gamma(1 - x) = pi / sin(pi * x) = pi / sinpi(x)
π radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the dozens
or hundreds of examples of the usefulness of radians as an angle unit?
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Thu, 22 Feb 2024 20:16:25 -0000 (UTC), Steven G. Kargl wrote:
Apparently, you missed the part about argument reduction. For
sinpi(x), it is fairly easy to reduce x = n + r with n an integer
and r in [0,1).
For the extended interval, x in [0,2^23], there are
roughly 2^23 values with r = 0.5; and thus, sinpi(x) = 1 exactly.
There are no such values for sin(x), and argument reduction for
sin(x) is much more involved.
You are working with approximations anyway. That those approximations
happen to exactly equal some random value seems irrelevant.
As to real-world use, how about any physical phenomena where one is
interest in resonance frequencies of the system. For a simple
example see https://www.feynmanlectures.caltech.edu/I_49.html where
one might write f(x) = sin(kx) = sin(pi * (2*x/L)) with L a length
of say a clamped string.
I don’t see a formula anything like that anywhere on that page. On
the other hand, I see lots of use of “ω” symbols, representing
frequencies in radians per second.
There are also uses with computing other functions, e.g., the true
gamma function via the Euler reflection formula.
gamma(x) * gamma(1 - x) = pi / sin(pi * x) = pi / sinpi(x)
π radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the dozens
or hundreds of examples of the usefulness of radians as an angle unit?
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious
choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
Michael S wrote:
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Thu, 22 Feb 2024 20:16:25 -0000 (UTC), Steven G. Kargl wrote:
Apparently, you missed the part about argument reduction. For
sinpi(x), it is fairly easy to reduce x = n + r with n an integer
and r in [0,1).
You are better off with r in [-½,+½]
For the extended interval, x in [0,2^23], there are
roughly 2^23 values with r = 0.5; and thus, sinpi(x) = 1 exactly.
There are no such values for sin(x), and argument reduction for
sin(x) is much more involved.
I have a patent on doing argument reduction for sin(x) in 4 cycles...that
is as good as Payne-Haneok argument reduction over -infinity..+infinity
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious
choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
I should remind that my patents include methods for calculating sin()
and cos() in 18 cycles, sinpi() and cospi() in 14 cycles while achieving
an accuracy n the 0.5002-bits of error. All of this is in HW, and all
carried out in precision wider than IEEE 754 with a bunch of HW tricks
not available to SW.
At this performance level, let the algorithms wanting pi based trigonometry have it and those wanting unit based trigonometry have it too. Let program- mers use what is easiest for them.
On 2/22/2024 4:49 PM, Chris M. Thomasson wrote:
[...]> Use some form of arbitrary precision if you want to get to:
A 10^4141 zoom mset zoom!? It must took many hours to render... ;^)
https://youtu.be/Xjy_HSUujaw
A 10^275 zoom:
https://youtu.be/0jGaio87u3A
On Thu, 22 Feb 2024 22:57:20 +0000, MitchAlsup1 wrote:
Michael S wrote:
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Thu, 22 Feb 2024 20:16:25 -0000 (UTC), Steven G. Kargl wrote:
Apparently, you missed the part about argument reduction. For
sinpi(x), it is fairly easy to reduce x = n + r with n an integer
and r in [0,1).
You are better off with r in [-½,+½]
Partial implementation details for single precision C code (trying
to keep this on topic for c.l.c) are at https://cgit.freebsd.org/src/tree/lib/msun/src/s_sinpif.c
One uses sinpi(-|x|) = -sinpi(|x|), which natually gets one to
r = x - floor(x) in [0,1). One then uses kernels for sin() and
cos() to do the actual computation.
r in [0,0.25) is ksin(r).
r in [0.25,0.5) is kcos(0.5-r).
r in [0.5,0.75) is kcos(r - 0.5).
r in [0.75,1) is ksin(1 - r).
For the extended interval, x in [0,2^23], there are
roughly 2^23 values with r = 0.5; and thus, sinpi(x) = 1 exactly.
There are no such values for sin(x), and argument reduction for
sin(x) is much more involved.
I have a patent on doing argument reduction for sin(x) in 4 cycles...that
is as good as Payne-Haneok argument reduction over -infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the
top of stack of papers on my desk. Still need to internalize
the details.
(much trimmed for brevity)
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious
choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
I should remind that my patents include methods for calculating sin()
and cos() in 18 cycles, sinpi() and cospi() in 14 cycles while achieving
an accuracy n the 0.5002-bits of error. All of this is in HW, and all
carried out in precision wider than IEEE 754 with a bunch of HW tricks
not available to SW.
At this performance level, let the algorithms wanting pi based trigonometry >> have it and those wanting unit based trigonometry have it too. Let program- >> mers use what is easiest for them.
Agreed a programmer should use what is required by the problem
that they are solving. I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
Steven G. Kargl wrote:
Agreed a programmer should use what is required by the problem
that they are solving. I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
To get near IEEE desired precision, one HAS TO use more than 754 precision.
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
π radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the dozens
or hundreds of examples of the usefulness of radians as an angle unit?
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
Michael S wrote:
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
π radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the
dozens or hundreds of examples of the usefulness of radians as an
angle unit?
In digital signal processing circle-based units are pretty much
always more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most
obvious choice.
From [binary floating point] numerical properties perspective,
1/8th of the circle (==pi/4 radians = 45 degrees) is probably the
best option for a library routine, because for sin() its derivative
at 0 is closest to 1 among all powers of two which means that loss
of precision near 0 is very similar for input and for output. But
this advantage does not sound like particularly big deal.
ieee754 defines sinpi() and siblings, but imho it really doesn't
matter if you use circles, half-circles (i.e. sinpi) or some other
binary fraction of a circle: Argument reduction for huge inputs are
just as easy, you might just have to multiply by the corresponding
power of two (i.e. adjust the exponent) before extracting the
fractional term.
For sinpi(x) I could do it like this:
if (abs(x) >= two_to_52nd_power) error("Zero significant bits.");
ix = int(x);
x_reduced = x - (double) (ix & ~1);
if (x_reduced < 0.0) x_reduced += 2.0;
but it is probably better to return a value in the [-1.0 .. 1.0>
range?
Terje
On Thu, 22 Feb 2024 22:57:20 +0000, MitchAlsup1 wrote:
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over -infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the
top of stack of papers on my desk. Still need to internalize
the details.
On Fri, 23 Feb 2024 00:13:02 -0000 (UTC)
"Steven G. Kargl" <sgk@REMOVEtroutmask.apl.washington.edu> wrote:
On Thu, 22 Feb 2024 22:57:20 +0000, MitchAlsup1 wrote:
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over
-infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the
top of stack of papers on my desk. Still need to internalize
the details.
Payne-Hanek is a right answer to a wrong question.
In science/engineering floating-point number x represents a range [x-ULP:x+ULP] with hopefully much higher probability of being in range [x-ULP/2:x+ULP/2]. However in this reduced range probability is flat,
an "exact" x is no worse and no better than any other point within this range.
Which means that when we look for y=sin(x) then for
scientific/engineering purposes any answer in range* [sin(x-ULP/2):sin(x+ULP/2)] is as "right" as any other answer. The
right answer to the question "What is a sin() of IEEE-754 binary64
number 1e17?" is "Anything in range [-1:1]". The wise and useful answer
to the same question is "You're holding it wrong".
* in paragraph Sin(x) designates result calculated with infinite
precision.
Michael S wrote:
On Fri, 23 Feb 2024 00:13:02 -0000 (UTC)
"Steven G. Kargl" <sgk@REMOVEtroutmask.apl.washington.edu> wrote:
On Thu, 22 Feb 2024 22:57:20 +0000, MitchAlsup1 wrote:Payne-Hanek is a right answer to a wrong question.
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over
-infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the top of
stack of papers on my desk. Still need to internalize the details.
If someone looking for fast sin(x) you are correct, for a person looking
for the best possible (correctly rounded) result, you are not. In any
event I have driven that monstrosity down from 50-odd cycles to 4--
making it at least palatable.
In science/engineering floating-point number x represents a range
[x-ULP:x+ULP] with hopefully much higher probability of being in range
[x-ULP/2:x+ULP/2]. However in this reduced range probability is flat,
an "exact" x is no worse and no better than any other point within this
range.
Which means that when we look for y=sin(x) then for
scientific/engineering purposes any answer in range*
[sin(x-ULP/2):sin(x+ULP/2)] is as "right" as any other answer. The
right answer to the question "What is a sin() of IEEE-754 binary64
number 1e17?" is "Anything in range [-1:1]". The wise and useful answer
to the same question is "You're holding it wrong".
There are verification tests that know the exact value of both x and
sin(x).
For these uses you argument is not valid--however I am willing to grant
that outside of verification, and in actual use, your argument holds as
well ad the noise in the data provides.
I find it interesting that sinpi() has worse numeric error than sin().
On Fri, 23 Feb 2024 20:02:00 +0000, MitchAlsup1 wrote:
Michael S wrote:
On Fri, 23 Feb 2024 00:13:02 -0000 (UTC)
"Steven G. Kargl" <sgk@REMOVEtroutmask.apl.washington.edu> wrote:
On Thu, 22 Feb 2024 22:57:20 +0000, MitchAlsup1 wrote:Payne-Hanek is a right answer to a wrong question.
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over
-infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the top of
stack of papers on my desk. Still need to internalize the details.
If someone looking for fast sin(x) you are correct, for a person looking
for the best possible (correctly rounded) result, you are not. In any
event I have driven that monstrosity down from 50-odd cycles to 4--
making it at least palatable.
In science/engineering floating-point number x represents a range
[x-ULP:x+ULP] with hopefully much higher probability of being in range
[x-ULP/2:x+ULP/2]. However in this reduced range probability is flat,
an "exact" x is no worse and no better than any other point within this
range.
Which means that when we look for y=sin(x) then for
scientific/engineering purposes any answer in range*
[sin(x-ULP/2):sin(x+ULP/2)] is as "right" as any other answer. The
right answer to the question "What is a sin() of IEEE-754 binary64
number 1e17?" is "Anything in range [-1:1]". The wise and useful answer
to the same question is "You're holding it wrong".
There are verification tests that know the exact value of both x and
sin(x).
For these uses you argument is not valid--however I am willing to grant
that outside of verification, and in actual use, your argument holds as
well ad the noise in the data provides.
Fortunately, for 32-bit single precision floating point, one can
exhaustively test single-argument functions. For the SW implementation
on FreeBSD, exhaustive testing shows
% tlibm sinpi -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinpif: [0,8.38861e+06]
1000000 calls, 0.015453 secs, 0.01545 usecs/call
ulp <= 0.5: 99.842% 1253631604 | 99.842% 1253631604
0.5 < ulp < 0.6: 0.158% 1989420 | 100.000% 1255621024
Max ulp: 0.583666 at 6.07950642e-05
% tlibm sin -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinf: [0,8.38861e+06]
1000000 calls, 0.019520 secs, 0.01952 usecs/call
ulp <= 0.5: 99.995% 1249842491 | 99.995% 1249842491
0.5 < ulp < 0.6: 0.005% 60102 | 100.000% 1249902593
Max ulp: 0.500900 at 3.68277281e+05
The speed test is for 1M values evenly distributed over
the interval. The difference in speed for sinpi vs sin
is due to the argument reduction.
Note 1: libm built with clang/llvm with only -O2 on a
AMD Phenom II X2 560 Processor (3300.14-MHz K8-class CPU).
Note 2: subnormal values for x are test not included in
the count as subnormal are tested with a different approach.
Steven G. Kargl wrote:
On Fri, 23 Feb 2024 20:02:00 +0000, MitchAlsup1 wrote:
Michael S wrote:
On Fri, 23 Feb 2024 00:13:02 -0000 (UTC)
"Steven G. Kargl" <sgk@REMOVEtroutmask.apl.washington.edu> wrote:
On Thu, 22 Feb 2024 22:57:20 +0000, MitchAlsup1 wrote:Payne-Hanek is a right answer to a wrong question.
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over
-infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the top of
stack of papers on my desk. Still need to internalize the details.
If someone looking for fast sin(x) you are correct, for a person
looking for the best possible (correctly rounded) result, you are not.
In any event I have driven that monstrosity down from 50-odd cycles to
4-- making it at least palatable.
In science/engineering floating-point number x represents a range
[x-ULP:x+ULP] with hopefully much higher probability of being in
range [x-ULP/2:x+ULP/2]. However in this reduced range probability is
flat, an "exact" x is no worse and no better than any other point
within this range.
Which means that when we look for y=sin(x) then for
scientific/engineering purposes any answer in range*
[sin(x-ULP/2):sin(x+ULP/2)] is as "right" as any other answer. The
right answer to the question "What is a sin() of IEEE-754 binary64
number 1e17?" is "Anything in range [-1:1]". The wise and useful
answer to the same question is "You're holding it wrong".
There are verification tests that know the exact value of both x and
sin(x).
For these uses you argument is not valid--however I am willing to
grant that outside of verification, and in actual use, your argument
holds as well ad the noise in the data provides.
Fortunately, for 32-bit single precision floating point, one can
All of the numerics I have spoken about are DP (64-bit) values.
exhaustively test single-argument functions. For the SW implementation
on FreeBSD, exhaustive testing shows
% tlibm sinpi -fPED -x 0 -X 0x1p23 -s 0 Interval tested for sinpif:
[0,8.38861e+06]
1000000 calls, 0.015453 secs, 0.01545 usecs/call
ulp <= 0.5: 99.842% 1253631604 | 99.842% 1253631604
0.5 < ulp < 0.6: 0.158% 1989420 | 100.000% 1255621024 Max ulp:
0.583666 at 6.07950642e-05
% tlibm sin -fPED -x 0 -X 0x1p23 -s 0 Interval tested for sinf:
[0,8.38861e+06]
1000000 calls, 0.019520 secs, 0.01952 usecs/call
ulp <= 0.5: 99.995% 1249842491 | 99.995% 1249842491
0.5 < ulp < 0.6: 0.005% 60102 | 100.000% 1249902593 Max ulp:
0.500900 at 3.68277281e+05
I find it interesting that sinpi() has worse numeric error than sin().
I also find it interesting that the highest error is in the easy part of polynomial evaluation without argument reduction whereas sin() has its
worst result in a region where significant argument reduction has
transpired.
{{Seems like another case of faster poor answers top slower correct answers.}}
The speed test is for 1M values evenly distributed over the interval.
The difference in speed for sinpi vs sin is due to the argument
reduction.
But result precision suffers nonetheless.
On Fri, 23 Feb 2024 22:29:17 +0000, MitchAlsup1 wrote:
I find it interesting that sinpi() has worse numeric error than sin().
Another argument for sticking with the simpler solution? One set of radian-specific functions plus conversion factors, instead of a separate
set of unit-specific functions for every unit?
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
On Sat, 24 Feb 2024 04:03:08 -0000 (UTC), Steven G. Kargl wrote:
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
That’s 9 figures, but single-precision IEEE754 floats only allow about
7.
% ./tlibm sinpi -f -a 5.65300049e+03
x = 5.65300049e+03f, /* 0x45b0a801 */
libm = -2.51050433e-03f, /* 0xbb248746 */
mpfr = -1.53398013e-03f, /* 0xbac90fd5 */
ULP = 8388278.50000
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
On Sat, 24 Feb 2024 04:47:29 +0000, Lawrence D'Oliveiro wrote:
On Sat, 24 Feb 2024 04:03:08 -0000 (UTC), Steven G. Kargl wrote:
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
That’s 9 figures, but single-precision IEEE754 floats only allow about
7.
ROTFL. I suggest (re-)reading the entire section
for binary32, Pmin (binary32) = 9
On Sat, 24 Feb 2024 04:03:08 -0000 (UTC), Steven G. Kargl wrote:
% ./tlibm sinpi -f -a 5.65300049e+03
x = 5.65300049e+03f, /* 0x45b0a801 */
libm = -2.51050433e-03f, /* 0xbb248746 */
mpfr = -1.53398013e-03f, /* 0xbac90fd5 */
ULP = 8388278.50000
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
They’re all wrong. Python says (in double-precision), that math.sin(5653.00049) is -0.9566595256716378.
Even GCC’s sinf function says the value is -0.9565167, which is only a slight discrepancy in the fourth decimal place, so even in single-
precision it is doing a darn sight better than you have managed above.
So where is there a version of the code you’re using, using different
angle units, that gets closer to the right answer?
On Fri, 23 Feb 2024 11:01:02 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
Michael S wrote:
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
Ï€ radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the
dozens or hundreds of examples of the usefulness of radians as an
angle unit?
In digital signal processing circle-based units are pretty much
always more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most
obvious choice.
From [binary floating point] numerical properties perspective,
1/8th of the circle (==pi/4 radians = 45 degrees) is probably the
best option for a library routine, because for sin() its derivative
at 0 is closest to 1 among all powers of two which means that loss
of precision near 0 is very similar for input and for output. But
this advantage does not sound like particularly big deal.
ieee754 defines sinpi() and siblings, but imho it really doesn't
matter if you use circles, half-circles (i.e. sinpi) or some other
binary fraction of a circle: Argument reduction for huge inputs are
just as easy, you might just have to multiply by the corresponding
power of two (i.e. adjust the exponent) before extracting the
fractional term.
For sinpi(x) I could do it like this:
if (abs(x) >= two_to_52nd_power) error("Zero significant bits.");
ix = int(x);
x_reduced = x - (double) (ix & ~1);
if (x_reduced < 0.0) x_reduced += 2.0;
but it is probably better to return a value in the [-1.0 .. 1.0>
range?
Terje
Both you and Mitch look at it from wrong perspective.
When we define a library API, an ease of implementation of the library function should be pretty low on our priority scale. As long as
reasonably precise calculation is theoretically possible, we should
give credit to intelligence of implementor, that's all.
The real concern of API designer should be with avoidance of loss of precision in preparation of inputs and use of outputs.
In specific case of y=sin2pi(x), it is x that is more problematic,
because near 0 it starts to lose precision 3 octaves before y. In
subnormal range we lose ~2.5 bits of precision in preparation of the argument. An implementation, no matter how good, can't recover what's
already lost.
sinpi() is slightly better, but only slightly, not enough better to
justify less natural semantics.
My yesterday suggestion is a step in right direction, but today I think
that it is not sufficiently radical step. In specific case of sin/cos
there is no good reason to match loss of precision on input with loss of precision.
<Thinking out load>
There are advantages in matched loss of precision for input and for
output when both input and output occupy full range of real numbers
(ignoring sign). Output of sin/cos does not occupy o full range. But
for tan() it does. So, may be, there is a one good reason for matching
input with output for sin/cos - consistency with tan.
</Thinking out load>
So, ignoring tan(), what is really an optimal input scaling for sin/cos inputs? Today I think that it is a scaling in which full circle
corresponds to 2**64. With such scaling you never lose any input
precision to subnoramals before precision of the result is lost
completely.
Now, one could ask "Why 2**64, why not 2**56 that has the same
property?". My answer is "Because 2**64 is a scaling that is most
convenient for preparation of trig arguments in fix-point [on 64-bit computer]." I.e. apart from being a good scaling for avoidance of loss
of precision in tiny range, it happens to be the best scaling for interoperability with fixed point.
That is my answer today. Will it hold tomorrow? Tomorrow will tell.
Fortunately, for 32-bit single precision floating point, one can
exhaustively test single-argument functions. For the SW implementation
on FreeBSD, exhaustive testing shows
% tlibm sinpi -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinpif: [0,8.38861e+06]
1000000 calls, 0.015453 secs, 0.01545 usecs/call
ulp <= 0.5: 99.842% 1253631604 | 99.842% 1253631604
0.5 < ulp < 0.6: 0.158% 1989420 | 100.000% 1255621024
Max ulp: 0.583666 at 6.07950642e-05
% tlibm sin -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinf: [0,8.38861e+06]
1000000 calls, 0.019520 secs, 0.01952 usecs/call
ulp <= 0.5: 99.995% 1249842491 | 99.995% 1249842491
0.5 < ulp < 0.6: 0.005% 60102 | 100.000% 1249902593
Max ulp: 0.500900 at 3.68277281e+05
The speed test is for 1M values evenly distributed over
the interval. The difference in speed for sinpi vs sin
is due to the argument reduction.
Note 1: libm built with clang/llvm with only -O2 on a
AMD Phenom II X2 560 Processor (3300.14-MHz K8-class CPU).
Note 2: subnormal values for x are test not included in
the count as subnormal are tested with a different approach.
sin(x) for Subnormal x seems like it should be trivial: If you just
return x the maximum error of a Taylor series with two terms
(x - x^3/3!)
would be that second term which is by definition zero since x is already subnormal, right?
If you want to perform well on a platform where subnormal ops traps to a
much slower hardware or software path, then it might make sense to start
by checking for |x| < 2^-340 or so?
Terje
MitchAlsup1 wrote:
Steven G. Kargl wrote:
Agreed a programmer should use what is required by the problem
that they are solving. I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
To get near IEEE desired precision, one HAS TO use more than 754
precision.
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
There is a suggestion on the table to make that a (probably optional
imho) feature for an upcoming ieee754 revision.
Terje
sin(x) is not sinpi(x). The conversion factor that you're missing
is M_PI as in sinpi(x) = sin(M_PI*x).
On Fri, 23 Feb 2024 11:10:00 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
MitchAlsup1 wrote:
Steven G. Kargl wrote:
Agreed a programmer should use what is required by the problem
that they are solving. I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
To get near IEEE desired precision, one HAS TO use more than 754
precision.
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
At which cost in tables sizes?
There is a suggestion on the table to make that a (probably optional
imho) feature for an upcoming ieee754 revision.
Terje
The critical point here is definition of what considered exact. If
'exact' is measured only on y side of y=foo(x), disregarding
possible imprecision on the x side then you are very likely to end up
with results that are slower to calculate, but not at all more useful
from point of view of engineer or physicist. Exactly like Payne-Hanek
or Mitch's equivalent of Payne-Hanek.
The definition of 'exact' should be:
For any finite-precision function foo(x) lets designate the same
mathematical function calculated with infinite precision as Foo(x).
Let's designate an operation of rounding of infinite-precision number to desired finite precision as Rnd(). Rounding is done in to-nearest mode. Unlike in the case of basic operations, ties are allowed to be broken in
any direction.
The result of y=foo(x) for finite-precision number x considered
exact if *at least one* two conditions is true:
(1=Y-clause) Rnd(Foo(x)) == y
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
As follows from the (2), it is possible and not uncommon that more
than one finite-precision number y is accepted exact result of foo(x).
If Committee omits the 2nd clause then the whole proposition will be not
just not useful, but harmful.
Michael S wrote:
On Fri, 23 Feb 2024 11:01:02 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
Michael S wrote:
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
Ï€ radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the
dozens or hundreds of examples of the usefulness of radians as an
angle unit?
In digital signal processing circle-based units are pretty much
always more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most
obvious choice.
From [binary floating point] numerical properties perspective,
1/8th of the circle (==pi/4 radians = 45 degrees) is probably the
best option for a library routine, because for sin() its derivative
at 0 is closest to 1 among all powers of two which means that loss
of precision near 0 is very similar for input and for output. But
this advantage does not sound like particularly big deal.
ieee754 defines sinpi() and siblings, but imho it really doesn't
matter if you use circles, half-circles (i.e. sinpi) or some other
binary fraction of a circle: Argument reduction for huge inputs are
just as easy, you might just have to multiply by the corresponding
power of two (i.e. adjust the exponent) before extracting the
fractional term.
For sinpi(x) I could do it like this:
if (abs(x) >= two_to_52nd_power) error("Zero significant bits.");
ix = int(x);
x_reduced = x - (double) (ix & ~1);
if (x_reduced < 0.0) x_reduced += 2.0;
but it is probably better to return a value in the [-1.0 .. 1.0>
range?
Terje
Both you and Mitch look at it from wrong perspective.
When we define a library API, an ease of implementation of the library
function should be pretty low on our priority scale. As long as
reasonably precise calculation is theoretically possible, we should
give credit to intelligence of implementor, that's all.
The real concern of API designer should be with avoidance of loss of
precision in preparation of inputs and use of outputs.
In specific case of y=sin2pi(x), it is x that is more problematic,
because near 0 it starts to lose precision 3 octaves before y. In
subnormal range we lose ~2.5 bits of precision in preparation of the
argument. An implementation, no matter how good, can't recover what's
already lost.
sinpi() is slightly better, but only slightly, not enough better to
justify less natural semantics.
My yesterday suggestion is a step in right direction, but today I think
that it is not sufficiently radical step. In specific case of sin/cos
there is no good reason to match loss of precision on input with loss of
precision.
<Thinking out load>
There are advantages in matched loss of precision for input and for
output when both input and output occupy full range of real numbers
(ignoring sign). Output of sin/cos does not occupy o full range. But
for tan() it does. So, may be, there is a one good reason for matching
input with output for sin/cos - consistency with tan.
</Thinking out load>
So, ignoring tan(), what is really an optimal input scaling for sin/cos
inputs? Today I think that it is a scaling in which full circle
corresponds to 2**64. With such scaling you never lose any input
precision to subnoramals before precision of the result is lost
completely.
Now, one could ask "Why 2**64, why not 2**56 that has the same
property?". My answer is "Because 2**64 is a scaling that is most
convenient for preparation of trig arguments in fix-point [on 64-bit
computer]." I.e. apart from being a good scaling for avoidance of loss
of precision in tiny range, it happens to be the best scaling for
interoperability with fixed point.
That is my answer today. Will it hold tomorrow? Tomorrow will tell.
This is, except for today being a 64-bit world as opposed to 30 years earlier, exactly the same reasoning Garmin's programmers used to decide
that all their lat/long calculations would use 2^32 as the full circle.
With a signed 32-bit int you get a resolution of 40e6m / 2^32 = 0.0093m
or 9.3 mm, which they considered was more than good enough back in the
days of SA and its ~100 RMS noise, and even after Clinton got rid of
that (May 2 2000 or 2001?), sub-cm GPS is very rarely available.
Doing the same with 64-bit means that you get a resolution of 2.17e-12 m which is 2.17 picometers or 0.0217 Å, so significantly smaller than a
single H atom which is about 1 Å in size.
Terje
Michael S wrote:
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
In the second clause:: are we guaranteed that RND(Foo(X))= Y ??
On Thu, 14 Mar 2024 17:34:57 +0000, MitchAlsup1 wrote:
Michael S wrote:
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
In the second clause:: are we guaranteed that RND(Foo(X))= Y ??
No idea why that’s relevant. Michael S was talking about “Rnd”, not “RND”.
When you say “RND”, I think of a bad random-number generator found in various implementations of BASIC.
On 3/14/2024 10:28 AM, MitchAlsup1 wrote:
The drift rate of the oscillators in the satellites is such it will neve
be.
The clock drift is reset every orbit.
Also note: hackers are now using ground based GPS transmitters to alter
where your GPS calculates where you think you are. This is most annoying
around airports when planes use GPS to auto guide the planes to runways.
GPS Spoofing?
On Fri, 23 Feb 2024 11:01:02 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
Both you and Mitch look at it from wrong perspective.
When we define a library API, an ease of implementation of the library function should be pretty low on our priority scale. As long as
reasonably precise calculation is theoretically possible, we should
give credit to intelligence of implementor, that's all.
The real concern of API designer should be with avoidance of loss of precision in preparation of inputs and use of outputs.
In specific case of y=sin2pi(x), it is x that is more problematic,
because near 0 it starts to lose precision 3 octaves before y. In
subnormal range we lose ~2.5 bits of precision in preparation of the argument.
An implementation, no matter how good, can't recover what's already lost.
sinpi() is slightly better, but only slightly, not enough better to
justify less natural semantics.
My yesterday suggestion is a step in right direction, but today I think
that it is not sufficiently radical step. In specific case of sin/cos
there is no good reason to match loss of precision on input with loss of precision.
<Thinking out load>
There are advantages in matched loss of precision for input and for
output when both input and output occupy full range of real numbers
(ignoring sign). Output of sin/cos does not occupy o full range. But
for tan() it does. So, may be, there is a one good reason for matching
input with output for sin/cos - consistency with tan.
</Thinking out load>
So, ignoring tan(), what is really an optimal input scaling for sin/cos inputs? Today I think that it is a scaling in which full circle
corresponds to 2**64. With such scaling you never lose any input
precision to subnoramals before precision of the result is lost
completely.
On Fri, 23 Feb 2024 11:10:00 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
MitchAlsup1 wrote:
Steven G. Kargl wrote:
Agreed a programmer should use what is required by the problem
that they are solving. I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
To get near IEEE desired precision, one HAS TO use more than 754
precision.
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
At which cost in tables sizes?
There is a suggestion on the table to make that a (probably optional
imho) feature for an upcoming ieee754 revision.
Terje
The critical point here is definition of what considered exact. If
'exact' is measured only on y side of y=foo(x), disregarding
possible imprecision on the x side then you are very likely to end up
with results that are slower to calculate, but not at all more useful
from point of view of engineer or physicist. Exactly like Payne-Hanek
or Mitch's equivalent of Payne-Hanek.
The definition of 'exact' should be:
For any finite-precision function foo(x) lets designate the same
mathematical function calculated with infinite precision as Foo(x).
Let's designate an operation of rounding of infinite-precision number to desired finite precision as Rnd(). Rounding is done in to-nearest mode. Unlike in the case of basic operations, ties are allowed to be broken in
any direction.
The result of y=foo(x) for finite-precision number x considered
exact if *at least one* two conditions is true:
(1=Y-clause) Rnd(Foo(x)) == y
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
As follows from the (2), it is possible and not uncommon that more
than one finite-precision number y is accepted exact result of foo(x).
If Committee omits the 2nd clause then the whole proposition will be not
just not useful, but harmful.
Terje Mathisen wrote:
This is, except for today being a 64-bit world as opposed to 30 years
earlier, exactly the same reasoning Garmin's programmers used to
decide that all their lat/long calculations would use 2^32 as the full
circle.
With a signed 32-bit int you get a resolution of 40e6m / 2^32 =
0.0093m or 9.3 mm, which they considered was more than good enough
back in the days of SA and its ~100 RMS noise, and even after Clinton
got rid of that (May 2 2000 or 2001?), sub-cm GPS is very rarely
available.
The drift rate of the oscillators in the satellites is such it will neve
be.
The clock drift is reset every orbit.
Also note: hackers are now using ground based GPS transmitters to alter
where your GPS calculates where you think you are. This is most annoying around airports when planes use GPS to auto guide the planes to runways.
Doing the same with 64-bit means that you get a resolution of 2.17e-12
m which is 2.17 picometers or 0.0217 Ã…, so significantly smaller than >> a single H atom which is about 1 Ã… in size.
And yet, driving by Edwards AFB sometimes my car's GPS shows my 50m off the interstate quality road, and sometimes not.
Michael S wrote:
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
In the second clause:: are we guaranteed that RND(Foo(X))= Y ??
As follows from the (2), it is possible and not uncommon that more
than one finite-precision number y is accepted exact result of
foo(x).
If Committee omits the 2nd clause then the whole proposition will
be not just not useful, but harmful.
An interesting side effect of greater intermediate precision is the
lack of need to round prior to the final result. Thus, my sin(x) over
its entire calculation suffers exactly 1 rounding. Payne & Hanek does
not have this prperty.
And thirdly::
Let us postulate that the reduced argument r is indeed calculated with
0.5ULP
error.
What makes you think you can calculate the polynomial without introducing
any more error ??
Michael, I for the main part agree with you here, i.e. calculating
sin(x) with x larger than 2^53 or so, is almost certainly stupid.
Actually using and depending upon the result is more stupid.
OTOH, it is and have always been, a core principle of ieee754 that
basic operations (FADD/FSUB/FMUL/FDIV/FSQRT) shall assume that the
inputs are exact (no fractional ulp uncertainty), and that we from
that starting point must deliver a correctly rounded version of the infinitely precise exact result of the operation.
Given the latter, it is in fact very tempting to see if that basic
result rule could be applied to more of the non-core operations, but
I cannot foresee any situation where I would use it myself: If I find
myself in a situation where the final fractional ulp is important,
then I would far rather switch to doing the operation in fp128.
Terje
MitchAlsup1 wrote:
Terje Mathisen wrote:
This is, except for today being a 64-bit world as opposed to 30 years
earlier, exactly the same reasoning Garmin's programmers used to
decide that all their lat/long calculations would use 2^32 as the full
circle.
With a signed 32-bit int you get a resolution of 40e6m / 2^32 =
0.0093m or 9.3 mm, which they considered was more than good enough
back in the days of SA and its ~100 RMS noise, and even after Clinton
got rid of that (May 2 2000 or 2001?), sub-cm GPS is very rarely
available.
The drift rate of the oscillators in the satellites is such it will neve
be.
The clock drift is reset every orbit.
Sub-meter (typically 2 cm at 10-20 Hz) GPS requires a nearby (static)
base station which can measure the individual pseudo-range errors from
each sat and send that to the measuring device (the rover) over a
separate channel.
If you record all the pseudorange values, then you can do the same with >post-processing, this can be useful for DIY surveying.
Also note: hackers are now using ground based GPS transmitters to alter
where your GPS calculates where you think you are. This is most annoying
around airports when planes use GPS to auto guide the planes to runways.
The potential for this attack is one of the reasons the military signal
is encrypted, so that it cannot be spoofed.
Doing the same with 64-bit means that you get a resolution of 2.17e-12
m which is 2.17 picometers or 0.0217 Ã…, so significantly smaller than >>> a single H atom which is about 1 Ã… in size.
And yet, driving by Edwards AFB sometimes my car's GPS shows my 50m off the >> interstate quality road, and sometimes not.
50 m is a bit high, but still fairly typical for what commercial
receivers can do in the vicinity of refelcting surfaces like cliffs or >buildings.
Modern multi-system receivers are actually getting much better at
detecting and mitigating such problems. Starting with
GPS+Glonass+Galileo having worldwide coverage, then adding in the
Chinese and Japanese sats means that as long as you don't worry to much
about pwer usage, you can do quite well. My personal target is sub 3m
when under a wet forest canopy, that is good enough for orienteering map >field survey work.
Terje
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately?
To make it less tempting, you could try to push for inclusion of
rsqrt() into basic set. Long overdue, IMHO.
Right now, I can't think of any other transcendental that I really want
to elevate to higher status. It seems to me that elevation of log2(x)
and of 2**x will do no harm, but I am not sure about usefulness.
Michael S wrote:
To make it less tempting, you could try to push for inclusion of
rsqrt() into basic set. Long overdue, IMHO.
Many Newton-Raphson iterations converge more rapidly with RSQRT than
with SQRT, for reasons I never looked that deeply into.
Right now, I can't think of any other transcendental that I really want
to elevate to higher status. It seems to me that elevation of log2(x)
and of 2**x will do no harm, but I am not sure about usefulness.
Ln2 and EXP2 are the basis of My 66000 log and EXP families, performed in
HW function unit.
Ln2 has the property that a binade [1..2) maps directly to another binade [2..4); exp2 has a similar property. Ln2 is easier to get good accuracy
from than Ln or Log as a starting point. But in any event:: you are
always within a high precision multiply of the correct result = round(53×106);
range, or (simplifying the exp logic) over [0.5 - 2.0> so that thelast exp bit maps to the first vs second half of the range. Back before
Keith Thompson wrote:
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Since early 1990s, when
Payne and Hanek argument reduction became widely available, there is
no reason NOT to perform argument reduction with great accuracy.
NOTE: you don't HAVE to use Payne and Hanek when simpler strategies
word adequately; but when they do not, you do not have to "Throw up
your hands" and run away.
On Sat, 16 Mar 2024 01:16:25 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
Keith Thompson wrote:
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
sin(2**53) is not what programmer wanted, because 2**53 is not the x he really wanted. It has nothing to do with calculation of sin().
Since early 1990s, when
Payne and Hanek argument reduction became widely available, there is
no reason NOT to perform argument reduction with great accuracy.
NOTE: you don't HAVE to use Payne and Hanek when simpler strategies
word adequately; but when they do not, you do not have to "Throw up
your hands" and run away.
Michael S wrote:
On Sat, 16 Mar 2024 01:16:25 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
Keith Thompson wrote:
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
Michael S wrote:
On Sat, 16 Mar 2024 01:16:25 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
Keith Thompson wrote:
I can see how computing sin(x) with high precision for
"reasonable" values of x would be useful, but does any of that
benefit from being able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
With high precision argument reduction you have to do (wait for it)
NoThInG !!.
Without high precision argument reduction yo have to do a lot of programming--programming that I insist should have been done by
the programmer responsible for sin(x).
mitchalsup@aol.com (MitchAlsup1) writes:
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than just
doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to make
the result practically meaningless.
...
Output:
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
bart <bc@freeuk.com> writes:
On 16/03/2024 23:19, Keith Thompson wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
Say you are a programmer and you receive a value like 2^53 from anI can't think of a scenario where that would be useful (other than
Input read and you wan the most accurate possible SIN( of that ).
just
doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual
value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the
application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter
functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to make
the result practically meaningless.
...
Output:
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
Is this output supposed to be different between gcc -O0 and gcc -O3?
No, why do you ask?
On my system, the output is identical with gcc and clang, and tcc at all optimization levels, with both glibc and musl. I do get slightly
different results for long double on Cygwin vs. Ubuntu (Cygwin uses
newlib, not glibc).
mitchalsup@aol.com (MitchAlsup1) writes:
Michael S wrote:
On Sat, 16 Mar 2024 01:16:25 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
Keith Thompson wrote:
I can see how computing sin(x) with high precision forBecause accurate argument reduction reduces the burden on the
"reasonable" values of x would be useful, but does any of that
benefit from being able to compute sin(2^53) accurately?
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than
just doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to
make the result practically meaningless.
#include <math.h>
#include <stdio.h>
#include <limits.h>
#include <float.h>
int main(void) {
{
printf("float (%zu bits, %d mantissa bits)\n", CHAR_BIT *
sizeof (float), FLT_MANT_DIG); const float x = (float)(1LL<<53);
const float y = nextafterf(x, x*2);
printf("%.8f %.8f\n", x, sinf(x));
printf("%.8f %.8f\n", y, sinf(y));
}
putchar('\n');
{
printf("double (%zu bits, %d mantissa bits)\n", CHAR_BIT *
sizeof (double), DBL_MANT_DIG); const double x = (double)(1LL<<53);
const double y = nextafter(x, x*2);
printf("%.8f %.8f\n", x, sin(x));
printf("%.8f %.8f\n", y, sin(y));
}
putchar('\n');
{
printf("long double (%zu bits, %d mantissa bits)\n", CHAR_BIT
* sizeof (long double), LDBL_MANT_DIG); const long double x = (long double)(1LL<<53); const long double y = nextafterl(x, x*2);
printf("%.8Lf %.8Lf\n", x, sinl(x));
printf("%.8Lf %.8Lf\n", y, sinl(y));
}
}
Output:
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
On Sat, 16 Mar 2024 16:19:11 -0700
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
Michael S wrote:
On Sat, 16 Mar 2024 01:16:25 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
Keith Thompson wrote:
I can see how computing sin(x) with high precision forBecause accurate argument reduction reduces the burden on the
"reasonable" values of x would be useful, but does any of that
benefit from being able to compute sin(2^53) accurately?
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than
just doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual
value to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on
the application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The
nextafter functions are used to compute the nearest representable
number. For long double, the value of sin() changes by about 1
part in 1600, which seems decent, but it's not nearly as precise as
for values around 1.0. For float and double, the imprecision of the argument is enough to make the result practically meaningless.
As written, your example does not emphasize that the problem has
nothing to do with implementation of sinX() library routine.
It's best illustrated by followup conversation with bart, IMHO 100%
O.T.
To make the point more clear I'd rather change it to following form:
bart <bc@freeuk.com> writes:
On 17/03/2024 01:38, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 16/03/2024 23:19, Keith Thompson wrote:No, why do you ask?
mitchalsup@aol.com (MitchAlsup1) writes:
Say you are a programmer and you receive a value like 2^53 from an >>>>>> Input read and you wan the most accurate possible SIN( of that ).I can't think of a scenario where that would be useful (other than
just
doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual
value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the >>>>> application.
Here's a C program that shows how precise sin(2^53) can be for types >>>>> float, double, and long double (I used gcc and glibc). The nextafter >>>>> functions are used to compute the nearest representable number. For >>>>> long double, the value of sin() changes by about 1 part in 1600, which >>>>> seems decent, but it's not nearly as precise as for values around 1.0. >>>>> For float and double, the imprecision of the argument is enough to make >>>>> the result practically meaningless.
...
Output:
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
Is this output supposed to be different between gcc -O0 and gcc -O3?
On my system, the output is identical with gcc and clang, and tcc at
all
optimization levels, with both glibc and musl. I do get slightly
different results for long double on Cygwin vs. Ubuntu (Cygwin uses
newlib, not glibc).
Because I get the results shown below. Even if the library is
different from yours, I would have assumed matching results.
------------------------------
C:\c>gcc c.c -O3
C:\c>a
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
C:\c>gcc c.c -O0
C:\c>a
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84893209
9007200328482816.00000000 -0.34160271
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84893209
9007199254740994.00000000 -0.12728505
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84893209
9007199254740992.00097656 -0.84944780
C:\c>gcc --version
gcc (tdm64-1) 10.3.0
I see similar results (possibly identical, I haven't checked) with i686-w64-mingw32-gcc on Cygwin. I think it uses the same, or a similar, runtime library as the tdm64 implementation you're using.
Comparing the generated assembly, at -O0 it generates a call to "sin",
while at -O1 and above it replaces the expression with a compile-time constant. Apparently that compile-time constant matches the run-time computed value for glibc, but not for whatever tdm64 uses.
I don't *think* it's a conformance issue as long as the precision is
within the standard's (fairly vague) requirements. When I modify the
program to start with sin(1.0) rather than sin(2**53), the output is identical at all optimization levels.
Apparently glibc and the library used by tgm64 behave differently when computing the sin of very large values. The result for sin(2**53) in
long double differs by about one part in 2**17 between -O0 and -O1, or between gcc/glibc and tgm64.
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
At which cost in tables sizes?
The critical point here is definition of what considered exact. If
'exact' is measured only on y side of y=foo(x), disregarding
possible imprecision on the x side then you are very likely to end up
with results that are slower to calculate, but not at all more useful
from point of view of engineer or physicist. Exactly like Payne-Hanek
or Mitch's equivalent of Payne-Hanek.
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
That much? I had the impression it was significantly cheaper.
At which cost in tables sizes?
My impression was that it wasn't costly in that respect, but since my recollection seems to be off on the performance, maybe it's off here
as well.
The critical point here is definition of what considered exact. If
'exact' is measured only on y side of y=foo(x), disregarding
possible imprecision on the x side then you are very likely to end up
with results that are slower to calculate, but not at all more useful
from point of view of engineer or physicist. Exactly like Payne-Hanek
or Mitch's equivalent of Payne-Hanek.
I don't know what are/were the motivations for the people working on
exact transcendentals, but they have applications unrelated to the fact
that they're "better": the main benefit (from this here PL guy) is that
it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Stefan
The J. M. Muller book indicates about 2 to 2.5That much? I had the impression it was significantly cheaper.There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
I don't know what are/were the motivations for the people working on
exact transcendentals, but they have applications unrelated to the fact
that they're "better": the main benefit (from this here PL guy) is that
it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Consider moving an application which uses libm from machine to machine.
When libm is correctly rounded, there is no issue at all; not so other-
wise.
The J. M. Muller book indicates about 2× to 2.5×That much? I had the impression it was significantly cheaper.There are groups who have shown that exactly rounded
trancendental functions are in fact achievable with maybe 3X
reduced performance.
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
I don't know what are/were the motivations for the people working
on exact transcendentals, but they have applications unrelated to
the fact that they're "better": the main benefit (from this here
PL guy) is that it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Consider moving an application which uses libm from machine to
machine. When libm is correctly rounded, there is no issue at all;
not so other- wise.
Exactly!
[ Or should I say "Correctly rounded!"? ]
Stefan
On Wed, 20 Mar 2024 09:54:36 -0400
Stefan Monnier <monnier@iro.umontreal.ca> wrote:
The J. M. Muller book indicates about 2× to 2.5×That much? I had the impression it was significantly cheaper.There are groups who have shown that exactly rounded
trancendental functions are in fact achievable with maybe 3X
reduced performance.
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
I had only read the 1st page.
It sounds like they are not particularly interested in IEEE binary64
which appears to be the major point of interest of math libs of
conventional languages.
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) projectI had only read the 1st page.
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
It sounds like they are not particularly interested in IEEE binary64
which appears to be the major point of interest of math libs of
conventional languages.
It all depend of what you compare against.
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32 results
in approximately the same time as results calculated with max. err of,
say, 0.75 ULP. Especially so if target machine has fast binary64
arithmetic.
I don't want to pay for correct rounding of transcendental functions.
Neither in speed and especially nor in tables footprint. Not even a
little. Because for me there are no advantages.
Now, there are things that I am ready to pay for. E.g. preservation of mathematical properties of original exact function. I.e. if original is monotonic on certain interval then I do want at least weak monotonicity
of approximation. If original is even (or odd) I want the same for approximation. If original never exceed 1, I want the same
for approximation. Etc... But correct rounding is not on the list.
The J. M. Muller book indicates about 2× to 2.5×That much? I had the impression it was significantly cheaper.There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project claims
to get much better performance (basically, in the same ballpark as not-correctly-rounded implementations).
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
I don't know what are/were the motivations for the people working on
exact transcendentals, but they have applications unrelated to the fact
that they're "better": the main benefit (from this here PL guy) is that
it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Consider moving an application which uses libm from machine to machine.
When libm is correctly rounded, there is no issue at all; not so other-
wise.
Exactly!
[ Or should I say "Correctly rounded!"? ]
Stefan
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
This sounds like the "Minefield Method"
On Wed, 20 Mar 2024 09:54:36 -0400
Stefan Monnier <monnier@iro.umontreal.ca> wrote:
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
It all depend of what you compare against.
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32 results
in approximately the same time as results calculated with max. err of,
say, 0.75 ULP. Especially so if target machine has fast binary64
arithmetic.
But in practice when we use lower (than binary64) precision we often
care about vector performance rather than scalar.
I.e. we care little about speed of sinf(), but want ippsTone_32f() as
fast as possible. In case you wonder, this function is part Intel
Performance Primitives and it is FAST. Writing correctly rounded
function that approaches the speed of this *almost* correctly
rounded routine (I think, for sane input ranges it's better than
0.55 ULP) would not be easy at all!
I don't know what are/were the motivations for the people working
on exact transcendentals, but they have applications unrelated to
the fact that they're "better": the main benefit (from this here
PL guy) is that it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Consider moving an application which uses libm from machine to
machine. When libm is correctly rounded, there is no issue at all;
not so other- wise.
Exactly!
[ Or should I say "Correctly rounded!"? ]
Stefan
You like this proposal because you are implementer of the language/lib.
It makes your regression tests easier. And it's good challenge.
I don't like it because I am primarily user of the language/lib. My floating-point tests have zero chance of repeatability of this sort for
a thousand of other reasons.
I don't want to pay for correct rounding of transcendental functions.
Neither in speed and especially nor in tables footprint. Not even a
little. Because for me there are no advantages.
Now, there are things that I am ready to pay for. E.g. preservation of mathematical properties of original exact function.
I.e. if original is monotonic on certain interval then I do want at least weak monotonicity
of approximation.
If original is even (or odd) I want the same for approximation. If original never exceed 1, I want the same
for approximation. Etc... But correct rounding is not on the list.
On Wed, 20 Mar 2024 18:21:47 +0200, Michael S wrote:
On Wed, 20 Mar 2024 09:54:36 -0400
Stefan Monnier <monnier@iro.umontreal.ca> wrote:
The J. M. Muller book indicates about 2× to 2.5×That much? I had the impression it was significantly cheaper.There are groups who have shown that exactly rounded
trancendental functions are in fact achievable with maybe 3X
reduced performance.
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
I had only read the 1st page.
It sounds like they are not particularly interested in IEEE binary64
which appears to be the major point of interest of math libs of
conventional languages.
I skimmed their logf(x) implementation. Their technique will
fall a part for binary64 (and other higher precisions). With
logf(x), they combine an argument step with table look-up and
a 5th-order polynomial approximation. The polynomial is computed
in double precision, and provides the correct rounding. For
binary64, the polynomial would require many more terms and
double-double or binary128 evaluation.
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) projectI had only read the 1st page.
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
It sounds like they are not particularly interested in IEEE binary64
which appears to be the major point of interest of math libs of
conventional languages.
Indeed, I hoped by now they had attacked the world of 64bit floats, but
it looks like it's not the case yet. In theory the same idea is
applicable there and should give similar results, but of course it's
harder for two reasons:
- The search space to consider is much larger, making it much more
costly to do exhaustive testing (and to collect the info they need to
choose/compute the polynomials).
- You can't cheaply use higher-precision floats for internal computations.
It all depend of what you compare against.
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32 results
in approximately the same time as results calculated with max. err of,
say, 0.75 ULP.
Especially so if target machine has fast binary64
arithmetic.
IIUC that was not the case before their work: it was "easy" to get the correct result in 99% of the cases, but covering all 100% of the cases
used to be costly because those few cases needed a lot more
internal precision.
I don't want to pay for correct rounding of transcendental functions.
Neither in speed and especially nor in tables footprint. Not even a
little. Because for me there are no advantages.
For that reason, correctly rounded results were nowhere near the menu, indeed. But the proposition of Rlibm is that maybe we can have our cake
and eat it too, because (if all goes well) you don't have to pay for it
in speed nor in table size.
I'm hoping that pans out :-)
Now, there are things that I am ready to pay for. E.g. preservation of
mathematical properties of original exact function. I.e. if original is
monotonic on certain interval then I do want at least weak monotonicity
of approximation. If original is even (or odd) I want the same for
approximation. If original never exceed 1, I want the same
for approximation. Etc... But correct rounding is not on the list.
But correct rounding would give you those properties.
Stefan
Michael S wrote:
On Wed, 20 Mar 2024 09:54:36 -0400
Stefan Monnier <monnier@iro.umontreal.ca> wrote:
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact
result and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
My impression was that their performance was good enough that the
case for not-correctly-rounded implementations becomes very weak.
It all depend of what you compare against.
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32
results in approximately the same time as results calculated with
max. err of, say, 0.75 ULP. Especially so if target machine has
fast binary64 arithmetic.
But in practice when we use lower (than binary64) precision we often
care about vector performance rather than scalar.
I.e. we care little about speed of sinf(), but want ippsTone_32f()
as fast as possible. In case you wonder, this function is part Intel Performance Primitives and it is FAST. Writing correctly rounded
function that approaches the speed of this *almost* correctly
rounded routine (I think, for sane input ranges it's better than
0.55 ULP) would not be easy at all!
I challenge ANY software version of SIN() correctly rounded or not
to compete with my <patented> HW implementations for speed (or even
power).
The J. M. Muller book indicates about 2× to 2.5×That much? I had the impression it was significantly cheaper.There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project claims
to get much better performance (basically, in the same ballpark as not-correctly-rounded implementations).
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
Stefan Monnier wrote:
IIUC that was not the case before their work: it was "easy" to get the
correct result in 99% of the cases, but covering all 100% of the cases
used to be costly because those few cases needed a lot more
internal precision.
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get correct roundings 100% of the time. FP128 only has 2×n+3 and is insufficient by itself.
MitchAlsup1 wrote:
Stefan Monnier wrote:
IIUC that was not the case before their work: it was "easy" to get
the correct result in 99% of the cases, but covering all 100% of
the cases used to be costly because those few cases needed a lot
more internal precision.
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get
correct roundings 100% of the time. FP128 only has 2×n+3 and is insufficient by itself.
I agree with everything else you've written about this subject, but
afair, fp128 is using 1:15:112 while double is of course 1:10:53.
On the one hand, 53*2+6 -> 112, on the other hand (if we include the
hidden bits) we get 54*2+5 -> 113.
So significantly more than 2n+3 but not enough on its own to
guarantee correct rounding.
As Michael S have mentioned, we want these algorithms to work for vector/SIMD calculations, and at that point both lookup tables and
widening the size of the temporaries are very costly.
Terje
On Thu, 21 Mar 2024 08:52:18 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
MitchAlsup1 wrote:
Stefan Monnier wrote:
IIUC that was not the case before their work: it was "easy" to get
the correct result in 99% of the cases, but covering all 100% of
the cases used to be costly because those few cases needed a lot
more internal precision.
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get
correct roundings 100% of the time. FP128 only has 2×n+3 and is
insufficient by itself.
I agree with everything else you've written about this subject, but
afair, fp128 is using 1:15:112 while double is of course 1:10:53.
IEEE-754 binary64 is 1:11:52 :-)
But anyway I am skeptical about Miller's rules of thumb.
I'd expect that different transcendental functions would exercise non-trivially different behaviors, mostly because they have different relationships between input and output ranges. Some of them compress
wider inputs into narrower output and some do the opposite.
Yet another factor is luck.
Besides, I see nothing special about binary128 as a helper format.
It is not supported on wast majority of HW, And even when it is
supported, like on IBM POWER, for majority of operations it is slower
than emulated 128-bit fixed-point. Fix-point is more work for coder, but sounds like more sure path to success.
On the one hand, 53*2+6 -> 112, on the other hand (if we include the
hidden bits) we get 54*2+5 -> 113.
So significantly more than 2n+3 but not enough on its own to
guarantee correct rounding.
As Michael S have mentioned, we want these algorithms to work for
vector/SIMD calculations, and at that point both lookup tables and
widening the size of the temporaries are very costly.
Terje
On Thu, 21 Mar 2024 08:52:18 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
MitchAlsup1 wrote:
Stefan Monnier wrote:
IIUC that was not the case before their work: it was "easy" to get
the correct result in 99% of the cases, but covering all 100% of
the cases used to be costly because those few cases needed a lot
more internal precision.
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get
correct roundings 100% of the time. FP128 only has 2×n+3 and is
insufficient by itself.
I agree with everything else you've written about this subject, but
afair, fp128 is using 1:15:112 while double is of course 1:10:53.
IEEE-754 binary64 is 1:11:52 :-)
But anyway I am skeptical about Miller's rules of thumb.
I'd expect that different transcendental functions would exercise non-trivially different behaviors, mostly because they have different relationships between input and output ranges. Some of them compress
wider inputs into narrower output and some do the opposite.
Yet another factor is luck.
Besides, I see nothing special about binary128 as a helper format.
It is not supported on wast majority of HW, And even when it is
supported, like on IBM POWER, for majority of operations it is slower
than emulated 128-bit fixed-point. Fix-point is more work for coder, but sounds like more sure path to success.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (3 / 13) |
Uptime: | 30:28:08 |
Calls: | 10,391 |
Calls today: | 2 |
Files: | 14,064 |
Messages: | 6,417,097 |