I was just wondering: seeing how most/all recent hardware problems in
Intel CPUs seem to come together with discussions about "microcode
update to avoid the problematic case", I was wondering if the
introduction of such microcode and ability to update it "in the field"
was made as a result of lessons learned from the famous FDIV bug (for
which Intel had to actually replace the CPUs, which seems a lot more
costly).
Or is it just a coincidence?
Stefan
Writeable microcode saved VAX big time when they finally got around to pasting 2 VAXen into one box and found the Enqueue and Dequeue
instructions had not been made ATOMIC.
... I was wondering if the introduction of such microcode and
ability to update it "in the field" was made as a result of lessons
learned from the famous FDIV bug (for which Intel had to actually
replace the CPUs, which seems a lot more costly).
I was just wondering: seeing how most/all recent hardware problems in
Intel CPUs seem to come together with discussions about "microcode
update to avoid the problematic case", I was wondering if the
introduction of such microcode and ability to update it "in the field"
was made as a result of lessons learned from the famous FDIV bug (for
which Intel had to actually replace the CPUs, which seems a lot more
costly).
I was just wondering: seeing how most/all recent hardware problems in
Intel CPUs seem to come together with discussions about "microcode
update to avoid the problematic case", I was wondering if the
introduction of such microcode and ability to update it "in the field"
was made as a result of lessons learned from the famous FDIV bug (for
which Intel had to actually replace the CPUs, which seems a lot more
costly).
Or is it just a coincidence?
Stefan Monnier wrote:
I was just wondering: seeing how most/all recent hardware problems in
Intel CPUs seem to come together with discussions about "microcode
update to avoid the problematic case", I was wondering if the
introduction of such microcode and ability to update it "in the field"
was made as a result of lessons learned from the famous FDIV bug (for
which Intel had to actually replace the CPUs, which seems a lot more
costly).
Or is it just a coincidence?
Possibly not:
We did of course discuss this a lot during the FDIV software replacement work, at that time it was impossible to trap on any unpriviledged
opcode, so FDIV could never have been fixed that way.
If the microcode patch facility includes a mask to trap any arbitrary
opcode, then it would also be able to fix FDIV type bugs, but unless
they have significant space in that writeable microcode store, then it possibly would not have been possible to make it fit:
A minimal patch working the same way as our SW would need to start by inspecting the top 10 bits of the divisor mantissa, and then fall
through for all except 5 different hits.
So at least 1024 bits for a lookup table, or relatively complicated
logic if implemented directly with gates.
On Fri, 11 Jul 2025 18:05:45 -0400, Stefan Monnier wrote:
... I was wondering if the introduction of such microcode and
ability to update it "in the field" was made as a result of lessons
learned from the famous FDIV bug (for which Intel had to actually
replace the CPUs, which seems a lot more costly).
It seems to me this would depend on a lot on how flexible the microarchitecture really was. How far can you go in redefining the
operation of a fixed set of hardware function units?
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Stefan Monnier wrote:
I was just wondering: seeing how most/all recent hardware problems in
Intel CPUs seem to come together with discussions about "microcode
update to avoid the problematic case", I was wondering if the
introduction of such microcode and ability to update it "in the field"
was made as a result of lessons learned from the famous FDIV bug (for
which Intel had to actually replace the CPUs, which seems a lot more
costly).
Or is it just a coincidence?
Possibly not:
We did of course discuss this a lot during the FDIV software replacement
work, at that time it was impossible to trap on any unpriviledged
opcode, so FDIV could never have been fixed that way.
If the microcode patch facility includes a mask to trap any arbitrary
opcode, then it would also be able to fix FDIV type bugs, but unless
they have significant space in that writeable microcode store, then it
possibly would not have been possible to make it fit:
A minimal patch working the same way as our SW would need to start by
inspecting the top 10 bits of the divisor mantissa, and then fall
through for all except 5 different hits.
So at least 1024 bits for a lookup table, or relatively complicated
logic if implemented directly with gates.
Gates are not possible in Microcode...
https://www.righto.com/2024/12/this-die-photo-of-pentium-shows.html
has a nice explanation of the bug and how they fixed it in silicon
(and simplified the PLA they used as a lookup table, as well -
apparently, it had not been optimized with don't cares, or
the bug might never have bitten).
You can call this microcode space optimizer, or you can call it gates.
It is a lot closer to gates in the modern sense of the word.
On Fri, 11 Jul 2025 23:17:23 +0000, Lawrence D'Oliveiro wrote:
On Fri, 11 Jul 2025 18:05:45 -0400, Stefan Monnier wrote:
... I was wondering if the introduction of such microcode and
ability to update it "in the field" was made as a result of lessons
learned from the famous FDIV bug (for which Intel had to actually
replace the CPUs, which seems a lot more costly).
It seems to me this would depend on a lot on how flexible the
microarchitecture really was. How far can you go in redefining the
operation of a fixed set of hardware function units?
A lot of microcode escape sequences are like ::
Check for special operand
if found trap to software
else run in hardware
Which can be "done" in one microcode "word"
MitchAlsup1 wrote:
On Fri, 11 Jul 2025 23:17:23 +0000, Lawrence D'Oliveiro wrote:
On Fri, 11 Jul 2025 18:05:45 -0400, Stefan Monnier wrote:
... I was wondering if the introduction of such microcode and
ability to update it "in the field" was made as a result of lessons
learned from the famous FDIV bug (for which Intel had to actually
replace the CPUs, which seems a lot more costly).
It seems to me this would depend on a lot on how flexible the
microarchitecture really was. How far can you go in redefining the
operation of a fixed set of hardware function units?
A lot of microcode escape sequences are like ::
Check for special operand
if found trap to software
else run in hardware
Which can be "done" in one microcode "word"
That would actually work perfectly for the original FDIV bug:
Check top 10 mantissa bits, trap if one of the five patterns.
Without the call, store to memory & reload overhead, the actual cost of
such a workaround would have been just a cycle or two for a 40-cycle operation that is performed very rarely in optimized sw.
Terje
On Mon, 14 Jul 2025 14:59:04 +0000, Terje Mathisen wrote:
MitchAlsup1 wrote:
On Fri, 11 Jul 2025 23:17:23 +0000, Lawrence D'Oliveiro wrote:
On Fri, 11 Jul 2025 18:05:45 -0400, Stefan Monnier wrote:
... I was wondering if the introduction of such microcode and
ability to update it "in the field" was made as a result of lessons
learned from the famous FDIV bug (for which Intel had to actually
replace the CPUs, which seems a lot more costly).
It seems to me this would depend on a lot on how flexible the
microarchitecture really was. How far can you go in redefining the
operation of a fixed set of hardware function units?
A lot of microcode escape sequences are like ::
   Check for special operand
   if found trap to software
   else run in hardware
Which can be "done" in one microcode "word"
That would actually work perfectly for the original FDIV bug:
Check top 10 mantissa bits, trap if one of the five patterns.
Without the call, store to memory & reload overhead, the actual cost of
such a workaround would have been just a cycle or two for a 40-cycle
operation that is performed very rarely in optimized sw.
At some_small_percent it is rare, but not "very rare" due to the
latency multiplier of ~40 cycles. 0.1% corresponds to 4% of CPU
time being FDIV. At 40 cycles of latency there simply HAS TO BE
some other instruction waiting for its result.
{All I am complaining about is the word "very"}
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 39:05:20 |
Calls: | 10,392 |
Files: | 14,064 |
Messages: | 6,417,185 |