Forum: >>> Magnum BBS <<<

Re: Spill and Fill Buffers

From MitchAlsup1@21:1/5 to Robert Finch on Sun Feb 11 21:36:41 2024

Robert Finch wrote:

Not being satisfied with current Q+ and the number of rename registers required I decide to start yet another project, this time a CPU with
only 16 GPRs. I know that fewer registers will spill to memory more
often, so, I thought using explicit spill and fill instructions backed
up by appropriate buffers would help.
I found this article, which is related, suggesting ILP may be increased.

http://cva.stanford.edu/classes/ee482a/projects/project_spill.pdf

Interesting

With only 16 regs, some instructions can be reduced to 24-bits.

I have compiled benchmarks where My 66000 with only 32 registers takes
no spill/fill instructions where RISC-V takes spill/fill instructions
even though it has 32 integer and 32 FP registers in its file. In my
case this is down to efficient use of <FP> constants, not wasting inst- ructions to LD then, and not wasting a register to temporarily hold
them.

In the past I have noted that a 16 register machine with IBM-360-like
ISA performs as if it had about 22 registers; LD-OPs performing most
of the heavy lifting; saving registers from holding temporary and use
once values.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Robert Finch on Mon Feb 12 13:03:37 2024

Robert Finch wrote:

Not being satisfied with current Q+ and the number of rename registers required I decide to start yet another project, this time a CPU with
only 16 GPRs. I know that fewer registers will spill to memory more
often, so, I thought using explicit spill and fill instructions backed
up by appropriate buffers would help.
I found this article, which is related, suggesting ILP may be increased.

http://cva.stanford.edu/classes/ee482a/projects/project_spill.pdf

With only 16 regs, some instructions can be reduced to 24-bits.

That's going to have the same problems as Sparc register windowing.
The problems happen when there is a memory reference to a register that software thinks was spilled but is being held in the register window
that is acting as a hidden non-coherent cache.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to BGB on Tue Feb 20 17:56:28 2024

BGB wrote:

On 2/11/2024 3:36 PM, MitchAlsup1 wrote:

Robert Finch wrote:

With only 16 regs, some instructions can be reduced to 24-bits.

I have compiled benchmarks where My 66000 with only 32 registers takes
no spill/fill instructions where RISC-V takes spill/fill instructions
even though it has 32 integer and 32 FP registers in its file. In my
case this is down to efficient use of <FP> constants, not wasting inst-
ructions to LD then, and not wasting a register to temporarily hold them.

I have still not entirely eliminated spill/fill, even with 64 GPRs.
Though, this is typically more due to compiler limitations than actually running out of free registers...

Nobody can completely eliminate spill/fill with a finite number of registers.

Then noted in my fiddling that, with superscalar enabled, Dhrystone was faster in RV64G ("GCC -O3") than in BJX2.

Though, more fiddling, I have noted that re-enabling the Compare+Branch
ops (with 2 input registers), and disabling stack-canary checking
(enabled by default in BGBCC), was enough to put BJX2 back in the lead (though, not by a particularly large margin, namely 91k vs 88k).

In the past I have noted that a 16 register machine with IBM-360-like
ISA performs as if it had about 22 registers; LD-OPs performing most
of the heavy lifting; saving registers from holding temporary and use
once values.

It is possible I may need to revisit this, since:
I already have the underlying mechanism as it is needed for the RV 'A' extension;
The competition against RV is tighter than I would like;
Ultimately, my project may be kinda moot if it is only slightly faster
than RISC-V.

This is one of the reasons that one needs a "better" ISA than RISC-V.
My 66000 only requires 70%-72% of the instruction count of RISC-V.

Though, I suspect that performance and code-density are interrelated in
this case (in particular, my compiler is still emitting some amount of unnecessary instructions).

A balance is required:: you need the ISA to be higher enough to have a
better instruction count, but not so high that the number of cycles goes
way up (VAX).

Though, I guess I still have my GLQuake port on my side.

And on the RISC-V side, the 'P' extension ironically manages to be both
less useful and also needlessly over-complicated.

Me:
PADD.W, PSUB.W
'P':
ADD, SUB, ADDSUB, SUBADD x Wrap/SSat/USat/SHalve/UHalve x Byte/Word
So, where I have 2 instructions, P has 40...
And, it just keeps going on and on like this...

I get all my SIMD-ness and Vectorization with exactly 2 instructions.

And, it never gets to FPU-SIMD...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to EricP on Tue Feb 20 18:02:10 2024

EricP wrote:

Robert Finch wrote:

Not being satisfied with current Q+ and the number of rename registers
required I decide to start yet another project, this time a CPU with
only 16 GPRs. I know that fewer registers will spill to memory more
often, so, I thought using explicit spill and fill instructions backed
up by appropriate buffers would help.
I found this article, which is related, suggesting ILP may be increased.

http://cva.stanford.edu/classes/ee482a/projects/project_spill.pdf

With only 16 regs, some instructions can be reduced to 24-bits.

That's going to have the same problems as Sparc register windowing.
The problems happen when there is a memory reference to a register that software thinks was spilled but is being held in the register window
that is acting as a hidden non-coherent cache.

It is similar to SPARC register windows in that it provides a place to
perform spill/fill, and if that place does not "overflow" then the
STs to memory are not performed and fewer cycles are required. It is
different in how the compiler expresses spill/fill: SPARC is implicit,
that paper is explicit.

The problem with SPARC register windows is that it slows down the register
file access because there are at least 4× as many registers in the file
as typical RISCs. Thus, while MIPS, M88K, HP, .. all got register access
time under ½ cycle, SPARCs got 1 full cycle, slowing the pipeline or the frequency.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to mitchalsup@aol.com on Tue Feb 20 22:03:42 2024

mitchalsup@aol.com (MitchAlsup1) writes:

The problem with SPARC register windows is that it slows down the register >file access because there are at least 4× as many registers in the file
as typical RISCs. Thus, while MIPS, M88K, HP, .. all got register access
time under ½ cycle, SPARCs got 1 full cycle, slowing the pipeline or the >frequency.

SPARC64 X+ was available at frequencies up to 3.7GHz.

Sparc M8 was (maybe still is) available at frequencies up to 5GHz.

MIPS, M88K, HP did not produce anything that has even remotely these frequencies.

These days, Intel is making Raptor Cove cores with 280 physical
registers, and they run at up to 6GHz.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to mitchalsup@aol.com on Tue Feb 20 22:15:35 2024

MitchAlsup1 <mitchalsup@aol.com> schrieb:

EricP wrote:

Robert Finch wrote:

Not being satisfied with current Q+ and the number of rename registers
required I decide to start yet another project, this time a CPU with
only 16 GPRs. I know that fewer registers will spill to memory more
often, so, I thought using explicit spill and fill instructions backed
up by appropriate buffers would help.
I found this article, which is related, suggesting ILP may be increased. >>>
http://cva.stanford.edu/classes/ee482a/projects/project_spill.pdf

With only 16 regs, some instructions can be reduced to 24-bits.

That's going to have the same problems as Sparc register windowing.
The problems happen when there is a memory reference to a register that
software thinks was spilled but is being held in the register window
that is acting as a hidden non-coherent cache.

It is similar to SPARC register windows in that it provides a place to perform spill/fill, and if that place does not "overflow" then the
STs to memory are not performed and fewer cycles are required. It is different in how the compiler expresses spill/fill: SPARC is implicit,
that paper is explicit.

The spill/restore step would still happen behind the program's
back, so there is at least some potential issue of inconsistent
memory state.

However, a clear ABI which makes sure that only local variables
which have nothing pointing to them can be spilled/restored in
this way could work. Any registers could be reclaimed when
the stack pointer is adjusted, without having to go through
the cache system.

Hmm... anything that could seriously go wrong with this?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Thomas Koenig on Tue Feb 20 23:32:23 2024

Thomas Koenig wrote:

MitchAlsup1 <mitchalsup@aol.com> schrieb:

EricP wrote:

Robert Finch wrote:

Not being satisfied with current Q+ and the number of rename registers >>>> required I decide to start yet another project, this time a CPU with
only 16 GPRs. I know that fewer registers will spill to memory more
often, so, I thought using explicit spill and fill instructions backed >>>> up by appropriate buffers would help.
I found this article, which is related, suggesting ILP may be increased. >>>>
http://cva.stanford.edu/classes/ee482a/projects/project_spill.pdf

With only 16 regs, some instructions can be reduced to 24-bits.

That's going to have the same problems as Sparc register windowing.
The problems happen when there is a memory reference to a register that
software thinks was spilled but is being held in the register window
that is acting as a hidden non-coherent cache.

It is similar to SPARC register windows in that it provides a place to
perform spill/fill, and if that place does not "overflow" then the
STs to memory are not performed and fewer cycles are required. It is
different in how the compiler expresses spill/fill: SPARC is implicit,
that paper is explicit.

The spill/restore step would still happen behind the program's
back, so there is at least some potential issue of inconsistent
memory state.

How so ?? If the spilled register has not reached memory, the fill
gets the non-SW-visible flip-flop data, and if it has reached memory
it gets the value in that memory. Some 3rd party reading memory
expecting a spill to be there would be problematic, but this would
be frowned upon programming practice and would have to be interlocked
with ATOMIC guards.

However, a clear ABI which makes sure that only local variables
which have nothing pointing to them can be spilled/restored in
this way could work. Any registers could be reclaimed when
the stack pointer is adjusted, without having to go through
the cache system.

Hmm... anything that could seriously go wrong with this?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to All on Wed Feb 21 10:58:20 2024

MitchAlsup1 wrote:

Thomas Koenig wrote:

MitchAlsup1 <mitchalsup@aol.com> schrieb:

EricP wrote:

Robert Finch wrote:

Not being satisfied with current Q+ and the number of rename
registers required I decide to start yet another project, this time
a CPU with only 16 GPRs. I know that fewer registers will spill to
memory more often, so, I thought using explicit spill and fill
instructions backed up by appropriate buffers would help.
I found this article, which is related, suggesting ILP may be
increased.

http://cva.stanford.edu/classes/ee482a/projects/project_spill.pdf

With only 16 regs, some instructions can be reduced to 24-bits.

That's going to have the same problems as Sparc register windowing.
The problems happen when there is a memory reference to a register that >>>> software thinks was spilled but is being held in the register window
that is acting as a hidden non-coherent cache.

It is similar to SPARC register windows in that it provides a place to
perform spill/fill, and if that place does not "overflow" then the
STs to memory are not performed and fewer cycles are required. It is
different in how the compiler expresses spill/fill: SPARC is implicit,
that paper is explicit.

The spill/restore step would still happen behind the program's
back, so there is at least some potential issue of inconsistent
memory state.

How so ?? If the spilled register has not reached memory, the fill
gets the non-SW-visible flip-flop data, and if it has reached memory
it gets the value in that memory. Some 3rd party reading memory
expecting a spill to be there would be problematic, but this would
be frowned upon programming practice and would have to be interlocked
with ATOMIC guards.

Exactly, it would be problematic for a third party like an IO,
interrupts, DMA, other threads.
Or a setjmp/longjmp.
Or a nested routine that is looking backwards in the stack
(remember, the callee doesn't know if the caller has done this).

Its doesn't need an atomic guard, but at a minimum it needs a non-privileged sync stack (syncstk) instruction that flushes all pending spills
*in the privilege mode active at the time the deferred spill was performed*.

And hardware the can handle flushing deferred user mode stack spills
and associated virtual address translates and page table walks
while in kernel mode.

Then the discussion becomes where and how often does syncstk need to be used, and are the rules for using it clear enough that it won't leave land mines
in code all over the place.

However, a clear ABI which makes sure that only local variables
which have nothing pointing to them can be spilled/restored in
this way could work. Any registers could be reclaimed when
the stack pointer is adjusted, without having to go through
the cache system.

Hmm... anything that could seriously go wrong with this?

It is an hidden non-coherent cache of unknown and variable size with
manual synchronization controls that must be invoked any time
there *might* be an access by the current execution context
into some unknown prior deferred spill.

For example, every interrupt, exception, or syscall will start with
a syncstk. So the deferred cost of spilling multiple sets of multiple
registers to user mode stack will be paid at the start of every interrupt.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to EricP on Wed Feb 21 22:12:41 2024

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

MitchAlsup1 wrote:

Thomas Koenig wrote:

MitchAlsup1 <mitchalsup@aol.com> schrieb:

EricP wrote:

Robert Finch wrote:

Not being satisfied with current Q+ and the number of rename
registers required I decide to start yet another project, this time >>>>>> a CPU with only 16 GPRs. I know that fewer registers will spill to >>>>>> memory more often, so, I thought using explicit spill and fill
instructions backed up by appropriate buffers would help.
I found this article, which is related, suggesting ILP may be
increased.

http://cva.stanford.edu/classes/ee482a/projects/project_spill.pdf

With only 16 regs, some instructions can be reduced to 24-bits.

That's going to have the same problems as Sparc register windowing.
The problems happen when there is a memory reference to a register that >>>>> software thinks was spilled but is being held in the register window >>>>> that is acting as a hidden non-coherent cache.

It is similar to SPARC register windows in that it provides a place to >>>> perform spill/fill, and if that place does not "overflow" then the
STs to memory are not performed and fewer cycles are required. It is
different in how the compiler expresses spill/fill: SPARC is implicit, >>>> that paper is explicit.

The spill/restore step would still happen behind the program's
back, so there is at least some potential issue of inconsistent
memory state.

How so ?? If the spilled register has not reached memory, the fill
gets the non-SW-visible flip-flop data, and if it has reached memory
it gets the value in that memory. Some 3rd party reading memory
expecting a spill to be there would be problematic, but this would
be frowned upon programming practice and would have to be interlocked
with ATOMIC guards.

Exactly, it would be problematic for a third party like an IO,
interrupts, DMA, other threads.

Make the spills backed up by stack storage only.

Or a setjmp/longjmp.

Not sure what is needed there.

Or a nested routine that is looking backwards in the stack
(remember, the callee doesn't know if the caller has done this).

Never pass a pointer to something that has been spilled. If you
do, it's an ABI violation (same as overwriting the stack
via some other pointer).

Its doesn't need an atomic guard, but at a minimum it needs a non-privileged sync stack (syncstk) instruction that flushes all pending spills
*in the privilege mode active at the time the deferred spill was performed*.

Or spill to memory on privilege change.

It could also be possible to have a background task in the processor
which does the syncing (while keeping the backed-up registers).

And hardware the can handle flushing deferred user mode stack spills
and associated virtual address translates and page table walks
while in kernel mode.

Then the discussion becomes where and how often does syncstk need to be used, and are the rules for using it clear enough that it won't leave land mines
in code all over the place.

However, a clear ABI which makes sure that only local variables
which have nothing pointing to them can be spilled/restored in
this way could work. Any registers could be reclaimed when
the stack pointer is adjusted, without having to go through
the cache system.

Hmm... anything that could seriously go wrong with this?

It is an hidden non-coherent cache of unknown and variable size with
manual synchronization controls that must be invoked any time
there *might* be an access by the current execution context
into some unknown prior deferred spill.

For example, every interrupt, exception, or syscall will start with
a syncstk. So the deferred cost of spilling multiple sets of multiple registers to user mode stack will be paid at the start of every interrupt.

That cost will be non-zero, agreed. But depending on the frequency
of interrupts (and if something has already done some of the work
in the background), there might still be a net gain overall.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Thomas Koenig on Fri Feb 23 13:49:50 2024

Thomas Koenig wrote:

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

MitchAlsup1 wrote:

Thomas Koenig wrote:

MitchAlsup1 <mitchalsup@aol.com> schrieb:

EricP wrote:

Robert Finch wrote:

Not being satisfied with current Q+ and the number of rename
registers required I decide to start yet another project, this time >>>>>>> a CPU with only 16 GPRs. I know that fewer registers will spill to >>>>>>> memory more often, so, I thought using explicit spill and fill
instructions backed up by appropriate buffers would help.
I found this article, which is related, suggesting ILP may be
increased.

http://cva.stanford.edu/classes/ee482a/projects/project_spill.pdf >>>>>>>
With only 16 regs, some instructions can be reduced to 24-bits.

That's going to have the same problems as Sparc register windowing. >>>>>> The problems happen when there is a memory reference to a register that >>>>>> software thinks was spilled but is being held in the register window >>>>>> that is acting as a hidden non-coherent cache.

It is similar to SPARC register windows in that it provides a place to >>>>> perform spill/fill, and if that place does not "overflow" then the
STs to memory are not performed and fewer cycles are required. It is >>>>> different in how the compiler expresses spill/fill: SPARC is implicit, >>>>> that paper is explicit.

The spill/restore step would still happen behind the program's
back, so there is at least some potential issue of inconsistent
memory state.

How so ?? If the spilled register has not reached memory, the fill
gets the non-SW-visible flip-flop data, and if it has reached memory
it gets the value in that memory. Some 3rd party reading memory
expecting a spill to be there would be problematic, but this would
be frowned upon programming practice and would have to be interlocked
with ATOMIC guards.

Exactly, it would be problematic for a third party like an IO,
interrupts, DMA, other threads.

Make the spills backed up by stack storage only.

The lazy spills may eventually write to the stack.
Its just you don't know if or when it will happen.

Or a setjmp/longjmp.

Not sure what is needed there.

I might be being overly paranoid on this one.
The function of the lazy spill instructions is incompatible with
a setjmp or any equivalent register set snapshot function.
So just don't mix the two.

On Sparc this was problematic because the register window creation was automatic so there was no way to avoid it. This meant that setjmp had
to sync the stack which requires flushing all the pending changes.
This was made even more expensive because Sparc used kernel traps
for managing register windows.

Sparc had similar flushing requirements for user mode task switching
as part of the current task context may be stuck in the window cache.
But flushing the window cache required a kernel trap, which kinda defeats
the whole purpose of cheap user mode task switching.

Or a nested routine that is looking backwards in the stack
(remember, the callee doesn't know if the caller has done this).

Never pass a pointer to something that has been spilled. If you
do, it's an ABI violation (same as overwriting the stack
via some other pointer).

My concern is at the hardware level not a language level.
There is no technical reason that you could not have a subroutine that,
say, reads the stack and writes it to a file as part of a an error logger,
or a debugger that examines or writes to the stack.

Its doesn't need an atomic guard, but at a minimum it needs a non-privileged >> sync stack (syncstk) instruction that flushes all pending spills
*in the privilege mode active at the time the deferred spill was performed*.

Or spill to memory on privilege change.

Therein lies another problem because the window is part of the thread
context but not be spillable to user mode after switching to kernel mode because the OS is not allowed to page fault in many places like interrupts.

So it would need a second mechanism so that it can save the pending register windows in non-paged kernel memory, so that kernel code can make calls that create new register windows.

It could also be possible to have a background task in the processor
which does the syncing (while keeping the backed-up registers).

Uhg.

And hardware the can handle flushing deferred user mode stack spills
and associated virtual address translates and page table walks
while in kernel mode.

Then the discussion becomes where and how often does syncstk need to be used,
and are the rules for using it clear enough that it won't leave land mines >> in code all over the place.

However, a clear ABI which makes sure that only local variables
which have nothing pointing to them can be spilled/restored in
this way could work. Any registers could be reclaimed when
the stack pointer is adjusted, without having to go through
the cache system.
Hmm... anything that could seriously go wrong with this?

It is an hidden non-coherent cache of unknown and variable size with
manual synchronization controls that must be invoked any time
there *might* be an access by the current execution context
into some unknown prior deferred spill.

For example, every interrupt, exception, or syscall will start with
a syncstk. So the deferred cost of spilling multiple sets of multiple
registers to user mode stack will be paid at the start of every interrupt.

That cost will be non-zero, agreed. But depending on the frequency
of interrupts (and if something has already done some of the work
in the background), there might still be a net gain overall.

After rummaging about for a while I have not been able to find the
papers that outlined all the issues with register windows (RW)
so I'll try to remember some people have mentioned...

- it requires many more hardware registers but doesn't allow them to be accessed directly. Sparc required 120 physical registers but only 29
were architecturally available to a programmer. This was far more
significant issue back in the 1980's when RW was first introduced.
But still today it could double the number of physical registers.

- Sparc's fixed window size of 8 registers was considered very inefficient.
The number of window save-sets was intended to be model specific but it
turned out that too many algorithms wound up depending on the initial
size of 4 so that's where it stayed.

-Sparc's RW was coupled to CALL and RET so you could not call a routine
without creating a new 8 register window.
A better method would support a variable size save-set that is independent
of CALL & RET so you can call leaf routines which require no registers saved.

- Sparc use traps to for overflow/underflow management which made it
expensive. Also one of the windows had to be reserved for the trap handler
so in practice there were only 3 save-sets.

- Kernel transitions have to save and restore some or all of the user
mode windows so it can use the windows in kernel mode, increasing the
overhead for interrupts and exceptions.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to EricP on Fri Feb 23 19:48:14 2024

EricP wrote:

Thomas Koenig wrote:

- it requires many more hardware registers but doesn't allow them to be accessed directly. Sparc required 120 physical registers but only 29
were architecturally available to a programmer. This was far more
significant issue back in the 1980's when RW was first introduced.
But still today it could double the number of physical registers.

- Sparc's fixed window size of 8 registers was considered very inefficient. The number of window save-sets was intended to be model specific but it turned out that too many algorithms wound up depending on the initial
size of 4 so that's where it stayed.

Indeed, the median number of registers required to save/restore across a subroutine boundary is between 2 and 3 (depending if you consider return address one of them.)

-Sparc's RW was coupled to CALL and RET so you could not call a routine without creating a new 8 register window.

Losing out on the leaf level procedure's typical lack of need for any
but temporary regsiters.

A better method would support a variable size save-set that is independent
of CALL & RET so you can call leaf routines which require no registers saved.

This requires tooo much logic in the register file decoder whereas SPARC RW only required a 2-bit numeric adder.

- Sparc use traps to for overflow/underflow management which made it expensive. Also one of the windows had to be reserved for the trap handler
so in practice there were only 3 save-sets.

SPARC Register windows were "well though out" only in an academic sense.

- Kernel transitions have to save and restore some or all of the user
mode windows so it can use the windows in kernel mode, increasing the overhead for interrupts and exceptions.

Which is why nobody copied them. I suggests others not copy them too.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to EricP on Fri Feb 23 22:28:43 2024

On 2024-02-23 20:49, EricP wrote:

[snip]

After rummaging about for a while I have not been able to find the
papers that outlined all the issues with register windows (RW)
so I'll try to remember some people have mentioned...

- it requires many more hardware registers but doesn't allow them to be accessed directly. Sparc required 120 physical registers but only 29
were architecturally available to a programmer. This was far more
significant issue back in the 1980's when RW was first introduced.
But still today it could double the number of physical registers.

- Sparc's fixed window size of 8 registers was considered very inefficient. The number of window save-sets was intended to be model specific but it turned out that too many algorithms wound up depending on the initial
size of 4 so that's where it stayed.

All the SPARC processors I have used (ERC32, LEON2) have 8 save-sets
(windows), of which one is typically reserved for trap handlers
(including the RW overflow/underflow handler), leaving 7 for the
application. This may be too few for some current SW that uses lots of
very small routines in very deep, rapidly see-sawing call-chains.

What algorithms would depend on the number of save-sets? No application algorithms should. The kernel's RW-handling operations may depend on it,
but that should not be a problem.

-Sparc's RW was coupled to CALL and RET so you could not call a routine without creating a new 8 register window.

No, the RW file is rotated by the SAVE and RESTORE instructions, not by
CALL and RET.

One of the gcc ports to SPARC (from Gaisler Research) has an option not
to use register windows at all, and instead use the SPARC as a "flat", unwindowed 32-register processor. (I haven't used that mode.)

A better method would support a variable size save-set that is independent
of CALL & RET so you can call leaf routines which require no registers
saved.

Indeed, and a common optimization in SPARC code is not to use
SAVE/RESTORE for leaf routines.

- Sparc use traps to for overflow/underflow management which made it expensive.

The expense of course depends on how the trap is implemented. In the
SPARC applications I worked on (real-time, bare machine or real-time
kernel) the overhead to enter and leave the trap handler was minor
compared to the work to handle the underflow or overflow. With a "real"
OS the case may be different.

Also one of the windows had to be reserved for the trap handler
so in practice there were only 3 save-sets.

With 8 save-sets, reserving one is not a big problem.

- Kernel transitions have to save and restore some or all of the user
mode windows so it can use the windows in kernel mode, increasing the overhead for interrupts and exceptions.

Maybe so, I'm not sure what a "real" OS kernel would do here. I don't
think this was a problem in my SPARC applications with real-time kernels because kernel services were usually not called via traps, but as normal routines. The kernel had to save/restore the register windows of a given process/thread only when switching process/thread.

In my SPARC applications, the major drawback of SPARC register windows
was that every stack frame (for non-leaf routines) had to have space to
store a whole register window, some 100 octets. This made the required
stack size for each thread rather larger than one would expect from the
source code. This should not be a big problem with today's memory sizes,
but it might increase data-cache misses.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Volatile_Memory
  Wed Sep 17 07:20:57 2025
  from Des Moines, Iowa via SSH
- Volatile_Memory
  Wed Sep 17 07:17:26 2025
  from Des Moines, Iowa via SSH
- Bob Worm
  Tue Sep 16 21:01:27 2025
  from Wales, Uk via Telnet
- Bob Worm
  Tue Sep 16 15:15:42 2025
  from Wales, Uk via Telnet
- Gretchiie
  Tue Sep 16 05:20:21 2025
  from Derry, Nh via Telnet
- Ginger1
  Mon Sep 15 19:33:54 2025
  from London via SSH
- Bob Worm
  Mon Sep 15 15:42:34 2025
  from Wales, Uk via Telnet
- Gretchiie
  Mon Sep 15 05:16:29 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	45:52:36
Calls:	10,394
Calls today:	2
Files:	14,066
Messages:	6,417,271

Re: Spill and Fill Buffers

Who's Online

Recent Visitors

System Info