1) 16-bit displacements are _important_. Pretty well *all* microprocessors use 16-bit displacements, rather than anything shorter like 12 bits.
Even
though this meant, in most cases, they had to give up indexing.
2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying
code.
Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.
Array accesses are common, and not needing extra instructions for them is therefore beneficial.
3) At least one major microprocessor manufacturer, Motorola, did have base-index addressing with 16-bit displacements, Motorola, starting with
the 68020.
(3) may not be much of an argument, but it seems to me that (1) and (2)
can reasonably be considered fairly strong arguments. But what about the drawbacks?
The use case
for Ra+Rb+13..16 bit is extremely limited.
I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.
On Sat, 26 Jul 2025 08:48:34 +0000, Thomas Koenig wrote:
The use case
for Ra+Rb+13..16 bit is extremely limited.
Maybe so. It covers the case where multiple small arrays are located
in the same kind of 64K-byte segment as simple variables.
But what if arrays are larger than 64K in size? Well, in that case,
I've included Array Mode in the standard form of a memory address.
This is a kind of indirect addressing that uses a table of array
addresses in memory to supply the address to which the index
register contents are added.
I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.
You may not find the arguments strong.
My goal was to provide the instruction set for a very powerful computer;
so I included this addressing mode so as not to lack a feature that
clearly benefited performance that other computers had.
Base plus index plus displacement saves an instruction or two.
Wait.
Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?
This makes negative sense. Memory accesses, even from L1 cache, are
very expensive these days.
I know that. But computers these days do have such a thing as cache, and giving the array pointer table a higher priority to cache because it's expected to be used a lot is doable.
Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?
If you want this to actually be useful, do as Mitch has done, and go a
full 32-bit constant.
On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:
Wait.
Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?
This makes negative sense. Memory accesses, even from L1 cache, are
very expensive these days.
I know that. But computers these days do have such a thing as cache,
and giving the array pointer table a higher priority to cache because
it's expected to be used a lot is doable.
Doable how, in such a way that both memory accesses are as fast as a
single one with the obvious scheme?
But leaving that aside for a moment: You also need cycles and
instructions to set up that table. Is that less than loading the base address of an array into a register?
12 bits would be plenty, if it were scaled.
Unscaled displacements effectively lose 2 or 3 bits of range for no real benefit.
On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:
If you want this to actually be useful, do as Mitch has done, and go a
full 32-bit constant.
Well, I would actually need to use 64 bits if I wanted to include
full-size memory addresses in instructions.
I think that is too long.
On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:
Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?
Obviously, loading the starting address of the array into a register is preferred.
There are seven registers used as base registers with 16-bit displacements, and another seven registers used as base registers with 12-bit
displacements.
So, in a normal program that uses only one each of the former group of base registers for its code and data segments respectively, there are enough registers to handle twelve arrays.
Array Mode is only for use when it's needed, because a program is either dealing with a lot of large arrays, or if it is under extreme register pressure in addition to dealing with some large arrays.
On Sun, 27 Jul 2025 07:15:02 +0000, Thomas Koenig wrote:
Doable how, in such a way that both memory accesses are as fast as a
single one with the obvious scheme?
Cache is storage inside the processor chip. So are registers. Cache will
be slower, but not by as much as a normal memory access.
But leaving that aside for a moment: You also need cycles and
instructions to set up that table. Is that less than loading the base
address of an array into a register?
No, but it only happens once at the beginning.
As I've noted, I agree that putting the start address in a register is better. When you can do that. When there are enough registers available.
But what if you have more arrays than you have registers?
But scaling *displacements* is an idea that had not even occurred to me.
BGB wrote:
One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.
Scaled displacements are also restricted in what it can address for
RIP-rel addressing if the RIP has alignment restrictions
(instructions aligned on 16 or 32 bits).
One possible downside of scaled displacements is that they can't directly encode a misaligned load or store. But this is rare.
One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.
I think it would be in keeping with John's tradition to "not choose" and instead use one bit to indicate whether the displacement should be
scaled.
My current design fuses a max of one memory op into instructions instead
of having a load followed by the instruction (or an instruction followed
by a store). Address mode available without adding instruction words are
Rn, (Rn), (Rn)+, -(Rn). After that 32-bit instruction words are added to support 32 and 64-bit displacements or addresses.
The instructions with the extra displacement words are larger but there
are fewer instructions to execute.
LOAD Rd,[Rb+Disp16]
ADD Rd,Ra,Rd
Requiring two instruction words, and executing as two instructions, gets replaced with:
ADD Rd,Ra,[Rb+Disp32]
Which also takes two instruction words, but only one instruction.
Immediate operands and memory operands are routed according to two two-
bit routing fields. I may be able to compress this to a single three-bit field.
Typical instruction encoding:
ADD: oooooo ss xx ii ww mmm rrrrr rrrrr ddddd
oooooo: is the opcode
ss: is the operation size
xx: is two extra opcode bits
ii: indicates which register field represents an immediate value
ww: indicates which register field is a memory operand
mmm: is the addressing mode, similar to a 68k
rrrrr: source register spec (or 4+ bit immediate)
ddddd: destination register spec (or 4+ bit immediate)
A 36-bit opcode would work great, allowing operand sign control.
John Savard <quadibloc@invalid.invalid> schrieb:
1) 16-bit displacements are _important_. Pretty well *all* microprocessors use 16-bit displacements, rather than anything shorter like 12 bits.
That is a bit of an exaggeration - SPARC has 13-bit constants, RISC-V
has 12-bit constants.
Even
though this meant, in most cases, they had to give up indexing.
SPARC has both Ra+Rb and Ra+immediate, but not combined. The use
case for Ra+Rb+13..16 bit is extremely limited.
2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying
code.
GP registers were an even greater achievement.
Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.
The usual method is Ra+Rb. Mitch also has Ra+Rb<<n+32-bit or Ra+Rb<<n+64-bit, with n from 0 to 3. This is useful for addressing
global data encoded in the constant, which a 12 to 16 bit-offset
is not.
Array accesses are common, and not needing extra instructions for them is therefore beneficial.
Yes, and indexing without offset can do that particular job just
fine.
3) At least one major microprocessor manufacturer, Motorola, did have base-index addressing with 16-bit displacements, Motorola, starting with the 68020.
Mitch recently explained that they had micorarchitectural reasons.
(3) may not be much of an argument, but it seems to me that (1) and (2)
can reasonably be considered fairly strong arguments. But what about the drawbacks?
I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.
My current design fuses a max of one memory op into instructions instead
of having a load followed by the instruction (or an instruction followed
by a store). Address mode available without adding instruction words are
Rn, (Rn), (Rn)+, -(Rn). After that 32-bit instruction words are added to support 32 and 64-bit displacements or addresses.
The instructions with the extra displacement words are larger but there
are fewer instructions to execute.
LOAD Rd,[Rb+Disp16]
ADD Rd,Ra,Rd
Requiring two instruction words, and executing as two instructions, gets replaced with:
ADD Rd,Ra,[Rb+Disp32]
Which also takes two instruction words, but only one instruction.
Immediate operands and memory operands are routed according to two
two-bit routing fields. I may be able to compress this to a single
three-bit field.
Typical instruction encoding:
ADD: oooooo ss xx ii ww mmm rrrrr rrrrr ddddd
oooooo: is the opcode
ss: is the operation size
xx: is two extra opcode bits
ii: indicates which register field represents an immediate value
ww: indicates which register field is a memory operand
mmm: is the addressing mode, similar to a 68k
rrrrr: source register spec (or 4+ bit immediate)
ddddd: destination register spec (or 4+ bit immediate)
A 36-bit opcode would work great, allowing operand sign control.
On 2025-07-28 1:29 a.m., Stephen Fuld wrote:
On 7/27/2025 3:50 AM, Robert Finch wrote:
big snip
First, thanks for posting this. I don't recall you posting much about your design. Can you talk about its goals, why you are doing it, its status, etc.?
Just started the design. Lots of details to work out. I like some
features of the 68k and 66k. I have some doubt as to starting a new
design. I would prefer to use something existing. I am not terribly fond
of RISC designs though.
Specific comments below
My current design fuses a max of one memory op into instructions
instead of having a load followed by the instruction (or an
instruction followed by a store). Address mode available without
adding instruction words are Rn, (Rn), (Rn)+, -(Rn). After that 32-bit
instruction words are added to support 32 and 64-bit displacements or
addresses.
The combined mem-op instructions used to be popular, but since the RISC revolution, are now out of fashion. Their advantages are, as you state, often eliminating an instruction. The disadvantages include that they preclude scheduling the load earlier in the instruction stream. Do you "crack" the instruction into two micro-ops in the decode stage? What drove your decision to "buck" the trend. I am not saying you are wrong.
I just want to understand your reasoning.
Instructions will be cracked into micro-ops. My compiler does not do instruction scheduling (yet). Relying on the processor to schedule instructions. There are explicit load and store instructions which
should allow scheduling earlier in the instruction stream.
I am under the impression that with a micro-op based processor the ISA (RISC/CISC) becomes somewhat less relevant allowing more flexibility in
the ISA design. >
The only thing in this corner of my ISA I regret is not having more bits
for the scale {to cover complex double, and Quaternions}
MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:
The only thing in this corner of my ISA I regret is not having more bits for the scale {to cover complex double, and Quaternions}
There is a bit of inconvenience, but strenght reduction can go a
long way to bridge that gap. Consider something like
void foo (__complex double *c, double *d, long int n)
{
for (long int i=0; i<n; i++)
c[i] += d[i];
}
which could be something like (translated by hand, so errors
are likely)
foo:
ble0 r3,.L_end
mov r4,#0
sll r3,r3,#3
vec r5,{}
ldd r6,[r2,r4,0]
ldd r7,[r1,r4<<2,0]
fadd r7,r7,r6
std r7,[r1,r4<<2,0]
loop1 ne,r4,#4,r3
.L_end:
ret
which is as close to optimum (just a single sll instruction) as
not to matter.
Thomas Koenig <tkoenig@netcologne.de> posted:
MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:
The only thing in this corner of my ISA I regret is not having more bits >> > for the scale {to cover complex double, and Quaternions}
There is a bit of inconvenience, but strenght reduction can go a
long way to bridge that gap. Consider something like
void foo (__complex double *c, double *d, long int n)
{
for (long int i=0; i<n; i++)
c[i] += d[i];
}
Wondering why "c[i] += d[i];" did not get a type mismatch.
Should be "c[i].real += d[i];"
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 16:10:42 |
Calls: | 10,389 |
Files: | 14,061 |
Messages: | 6,416,935 |