Forum: >>> Magnum BBS <<<

Base-Index Addressing in the Concertina II

From John Savard@21:1/5 to All on Sat Jul 26 00:33:22 2025

It occurred to me that I didn't really attempt to mount a full defense to
my decision to include base-index addressing with 16-bit displacements
in the Concertina II ISA.

Doing so definitely came with a cost; although the Concertina II ISA uses
banks of 32 registers, and when a block header is not present, it uses instructions that are all 32 bits in length, thus perhaps making it a
wolf in RISC clothing... it means that only seven, rather than all
thirty-one registers other than register zero, can be used as index
registers, and only a different seven registers can be used as base
registers with 16-bit displacements. Thus making it abundantly clear the architecture is not RISC.

Here is my defense for this design decision:

1) 16-bit displacements are _important_. Pretty well *all* microprocessors
use 16-bit displacements, rather than anything shorter like 12 bits. Even though this meant, in most cases, they had to give up indexing.

2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying
code. Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.

Array accesses are common, and not needing extra instructions for them is therefore beneficial.

3) At least one major microprocessor manufacturer, Motorola, did have base-index addressing with 16-bit displacements, Motorola, starting with
the 68020.

(3) may not be much of an argument, but it seems to me that (1) and (2)
can reasonably be considered fairly strong arguments. But what about the drawbacks?

If you have to have base registers in order to address memory, those
base registers will cause register pressure no matter where they're
put. The usual convention, at least on the System/360, was to put them
near the end of the register bank (although the last three registers
were special), so choosing the last seven registers for use as base
registers with the most common displacement size seemed like it shouldn't confuse register allocation too much.

Given, though, that one doesn't get absolute addressing - instead, one
gets other address modes - by specifying base register zero, one can't
put a pointer into an _index_ register on the Concertina II and get a
useful result. Hence, instead of the convention being to return pointers
to arrays in register 1, perhaps I'll have to use register 25. (However,
there is a good argument for register 17 instead, since some instructions
don't have 16-bit displacements available, for reasons of compactness,
and must make do with 12-bit displacements. Thus, using the first base
register for 12-bit displacements would be more universally useful.)

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Savard on Sat Jul 26 08:48:34 2025

John Savard <quadibloc@invalid.invalid> schrieb:

1) 16-bit displacements are _important_. Pretty well *all* microprocessors use 16-bit displacements, rather than anything shorter like 12 bits.

That is a bit of an exaggeration - SPARC has 13-bit constants, RISC-V
has 12-bit constants.

Even
though this meant, in most cases, they had to give up indexing.

SPARC has both Ra+Rb and Ra+immediate, but not combined. The use
case for Ra+Rb+13..16 bit is extremely limited.

2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying
code.

GP registers were an even greater achievement.

Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.

The usual method is Ra+Rb. Mitch also has Ra+Rb<<n+32-bit or
Ra+Rb<<n+64-bit, with n from 0 to 3. This is useful for addressing
global data encoded in the constant, which a 12 to 16 bit-offset
is not.

Array accesses are common, and not needing extra instructions for them is therefore beneficial.

Yes, and indexing without offset can do that particular job just
fine.

3) At least one major microprocessor manufacturer, Motorola, did have base-index addressing with 16-bit displacements, Motorola, starting with
the 68020.

Mitch recently explained that they had micorarchitectural reasons.

(3) may not be much of an argument, but it seems to me that (1) and (2)
can reasonably be considered fairly strong arguments. But what about the drawbacks?

I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to Thomas Koenig on Sat Jul 26 13:18:48 2025

On Sat, 26 Jul 2025 08:48:34 +0000, Thomas Koenig wrote:

The use case
for Ra+Rb+13..16 bit is extremely limited.

Maybe so. It covers the case where multiple small arrays are located
in the same kind of 64K-byte segment as simple variables.

But what if arrays are larger than 64K in size? Well, in that case,
I've included Array Mode in the standard form of a memory address.

This is a kind of indirect addressing that uses a table of array
addresses in memory to supply the address to which the index
register contents are added.

I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.

You may not find the arguments strong.

My goal was to provide the instruction set for a very powerful computer;
so I included this addressing mode so as not to lack a feature that
clearly benefited performance that other computers had.

Base plus index plus displacement saves an instruction or two.

16-bit displacements instead of 12-bit displacements ease register
pressure.

That was good enough for me, but of course others may take a different
view of things.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Savard on Sat Jul 26 13:34:19 2025

On 2025-07-26, John Savard <quadibloc@invalid.invalid> wrote:

On Sat, 26 Jul 2025 08:48:34 +0000, Thomas Koenig wrote:

The use case
for Ra+Rb+13..16 bit is extremely limited.

Maybe so. It covers the case where multiple small arrays are located
in the same kind of 64K-byte segment as simple variables.

The cost of having a register point to each array is much lower than
the cost of crippling your ISA with the complicated base + index
register scheme.

But what if arrays are larger than 64K in size? Well, in that case,
I've included Array Mode in the standard form of a memory address.

This is a kind of indirect addressing that uses a table of array
addresses in memory to supply the address to which the index
register contents are added.

Wait.

Do you really want to have an extra memory access to access an
array element, instead of loading the base address of your array
into a register?

This makes negative sense. Memory accesses, even from L1 cache,
are very expensive these days.

I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.

You may not find the arguments strong.

My goal was to provide the instruction set for a very powerful computer;
so I included this addressing mode so as not to lack a feature that
clearly benefited performance that other computers had.

Base plus index plus displacement saves an instruction or two.

How often? Do you have any idea if you're talking about 1%, 0.1%,
0.01% or 0.001%?

If you want this to actually be useful, do as Mitch has done, and
go a full 32-bit constant.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to Thomas Koenig on Sat Jul 26 17:43:58 2025

On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:

Wait.

Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?

This makes negative sense. Memory accesses, even from L1 cache, are
very expensive these days.

I know that. But computers these days do have such a thing as cache,
and giving the array pointer table a higher priority to cache because
it's expected to be used a lot is doable.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to John Savard on Sat Jul 26 18:23:35 2025

On Sat, 26 Jul 2025 17:43:58 +0000, John Savard wrote:

I know that. But computers these days do have such a thing as cache, and giving the array pointer table a higher priority to cache because it's expected to be used a lot is doable.

As it happens, you've given me an idea. And in preparing to edit my pages
to note this new feature, I found some errors that I've corrected as
well; I had not realized that my assignment of functions to the integer registers when used as base registers was different from what I thought it
was - apparently, I failed to edit some out-of-date text.

My idea? Well, I've shrunk the tables in memory used with Array Mode to 384 pointers from 512. That way, displacement values starting with 11 now
indicate that a register - one of the 128 registers in the extended
integer register bank - is being used to contain the pointer to an array.

Since I have plenty of those, and they're usually only used in code
intended to be massively superscalar, making use of VLIW features, and,
for that matter, the extended floating register bank may also be used as
a set of eight _string_ registers... it seemed like a good fit.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to Thomas Koenig on Sun Jul 27 00:31:41 2025

On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:

Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?

Obviously, loading the starting address of the array into a register is preferred.

There are seven registers used as base registers with 16-bit displacements,
and another seven registers used as base registers with 12-bit
displacements.

So, in a normal program that uses only one each of the former group of base registers for its code and data segments respectively, there are enough registers to handle twelve arrays.

Array Mode is only for use when it's needed, because a program is either dealing with a lot of large arrays, or if it is under extreme register
pressure in addition to dealing with some large arrays.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to Thomas Koenig on Sun Jul 27 01:36:57 2025

On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:

If you want this to actually be useful, do as Mitch has done, and go a
full 32-bit constant.

Well, I would actually need to use 64 bits if I wanted to include
full-size memory addresses in instructions. I think that is too long.

However, I _do_ have a class of extra-long instructions which use
a displacement longer than 16 bits. I followed the example of a major
computer manufacturer in doing so; thus, the longer displacement is
still only 20 bits.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Savard on Sun Jul 27 07:15:02 2025

John Savard <quadibloc@invalid.invalid> schrieb:

On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:

Wait.

Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?

This makes negative sense. Memory accesses, even from L1 cache, are
very expensive these days.

I know that. But computers these days do have such a thing as cache,
and giving the array pointer table a higher priority to cache because
it's expected to be used a lot is doable.

Doable how, in such a way that both memory accesses are as
fast as a single one with the obvious scheme? For that,
you would need a zero overhead for load from your speial
cache, which sounds slightly impossible.

But leaving that aside for a moment: You also need cycles and
instructions to set up that table. Is that less than loading the
base address of an array into a register?

--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to Thomas Koenig on Sun Jul 27 07:28:57 2025

On Sun, 27 Jul 2025 07:15:02 +0000, Thomas Koenig wrote:

Doable how, in such a way that both memory accesses are as fast as a
single one with the obvious scheme?

Cache is storage inside the processor chip. So are registers. Cache will
be slower, but not by as much as a normal memory access.

But leaving that aside for a moment: You also need cycles and
instructions to set up that table. Is that less than loading the base address of an array into a register?

No, but it only happens once at the beginning.

As I've noted, I agree that putting the start address in a register is
better. When you can do that. When there are enough registers available.
But what if you have more arrays than you have registers?

Although you gave me an idea, as I noted, so I have fixed that so it
won't happen nearly as often. By making use of another silly feature
of the Concertina II architecture: the Itanium had 128 registers, so
as to make the pipeline go faster, so I gave the Concertina extended
register banks of 128 registers each and special instructions to use
them as well. Since it isn't always practical to generate superscalar
code for every application that makes full use of all those registers...
well, now they can also be used as array pointers.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to BGB on Sun Jul 27 07:37:50 2025

On Sat, 26 Jul 2025 23:18:58 -0500, BGB wrote:

12 bits would be plenty, if it were scaled.
Unscaled displacements effectively lose 2 or 3 bits of range for no real benefit.

I had tried, in a few places, to allow the values in index registers to
be scaled. As this isn't a common feature in CPU designs, though, I didn't
use the opcode space to indicate this scaling in most instructions.

But scaling *displacements* is an idea that had not even occurred to me.

There is _one_ important benefit of not scaling the displacements which is
made very evident by certain other characteristics of the Concertina II architecture.

The System/360 had 12-bit displacements that were not scaled. As a result,
when writing code for the System/360, it was known that each base register
that was used would provide coverage of 4,096 bytes of memory. No more,
no less, regardless of what type of data you referenced. So you knew when
to allocate another base register with the address plus 4,096 in it if
needed.

In the Concertina II, I actually take the 32 registers, and in addition to taking the last seven of a group of eight as base registers for use with
16-bit displacemets, I go with a different group of eight registers for
use with 12-bit displacements, and a different one still for 20-bit displacements.

Because there is no value in being only able to access the first 4,096
bytes of a 65,536 byte region of memory. So instead of having addressing
modes that do that silly thing, I have more base registers available; if
you run out of 65,536-byte regios of memory to allocate, at leadt you can
also allocate additional 4,096-byte regions of nemory, and that might be useful.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Savard on Sun Jul 27 08:18:08 2025

John Savard <quadibloc@invalid.invalid> schrieb:

On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:

If you want this to actually be useful, do as Mitch has done, and go a
full 32-bit constant.

Well, I would actually need to use 64 bits if I wanted to include
full-size memory addresses in instructions.

OK, Mitch has this.

I think that is too long.

Too long for what? It's simpler than any of the alternatives, and
even x86_64 (which you are aiming to surpass in complexity, it seems)
has 64-bit constants.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Savard on Sun Jul 27 08:17:00 2025

John Savard <quadibloc@invalid.invalid> schrieb:

On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:

Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?

Obviously, loading the starting address of the array into a register is preferred.

Then do so, there is no need to add something more complicated than
necessary.

There are seven registers used as base registers with 16-bit displacements, and another seven registers used as base registers with 12-bit
displacements.

I have not seen the use case for R1+R2+16 bit.

So, in a normal program that uses only one each of the former group of base registers for its code and data segments respectively, there are enough registers to handle twelve arrays.

Simultaneously?

Array Mode is only for use when it's needed, because a program is either dealing with a lot of large arrays, or if it is under extreme register pressure in addition to dealing with some large arrays.

s/only for use when it's/not/
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Savard on Sun Jul 27 09:58:45 2025

John Savard <quadibloc@invalid.invalid> schrieb:

On Sun, 27 Jul 2025 07:15:02 +0000, Thomas Koenig wrote:

Doable how, in such a way that both memory accesses are as fast as a
single one with the obvious scheme?

Cache is storage inside the processor chip. So are registers. Cache will
be slower, but not by as much as a normal memory access.

A "normal memory access" is hundreds of cycles. If you are operating
on data outside the cache. that is a whole different game.

What I am looking at is an inner loop.

But leaving that aside for a moment: You also need cycles and
instructions to set up that table. Is that less than loading the base
address of an array into a register?

No, but it only happens once at the beginning.

Can you actually provide example code where this would matter?

As I've noted, I agree that putting the start address in a register is better. When you can do that. When there are enough registers available.
But what if you have more arrays than you have registers?

One instruction per loop, one cycle latency per loop startup
as opposed to 3-5 cycles latency on every loop iteration.

If there is a universe and time in which this makes sense, it's
not ours in 2025.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to John Savard on Sun Jul 27 10:51:43 2025

On Sun, 27 Jul 2025 07:37:50 +0000, John Savard wrote:

But scaling *displacements* is an idea that had not even occurred to me.

I realize now that I wasn't thinking clearly when I said that. In fact,
I used scaled displacements extensively in some previous iterations of
the Concertina II architecture.

I realized this when I noticed another problem with scaled displacements:
they only allow aligned data to be addressed.

But instead of scaling a displacement of constant size, in order to
maintain the size of the segment to which a given base register points
as a constant, I varied the size of the displacemeent based on the
degree to which it was scaled...

so if I had a 16-bit displacement for references to bytes, then I had
a 15-bit one to address 16-bit halfwords, a 14-bit one to address 32-bit
words, and a 13-bit one to address 64-bit long integers. This let me
use only two sets of opcodes, rather than one for each variable type,
for memory-reference instructions on integers.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to EricP on Sun Jul 27 15:31:20 2025

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

BGB wrote:

One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.

Scaled displacements are also restricted in what it can address for
RIP-rel addressing if the RIP has alignment restrictions
(instructions aligned on 16 or 32 bits).

Scaled displacements also create asymmetries. Consider

struct foo {
int a;
char b,c,d,e;
}

Accessing a might work, but b,c,d or e not.

--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Sun Jul 27 11:59:00 2025

One possible downside of scaled displacements is that they can't directly encode a misaligned load or store. But this is rare.

Much more frequent is misaligned pointers to aligned data, which are
used quite commonly in dynamically-typed languages when you use
a non-zero tag for boxed objects.

I think it would be in keeping with John's tradition to "not choose" and instead use one bit to indicate whether the displacement should
be scaled.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to BGB on Sun Jul 27 11:20:01 2025

BGB wrote:

One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.

Scaled displacements are also restricted in what it can address for
RIP-rel addressing if the RIP has alignment restrictions
(instructions aligned on 16 or 32 bits).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Savard@21:1/5 to Stefan Monnier on Sun Jul 27 17:51:39 2025

On Sun, 27 Jul 2025 11:59:00 -0400, Stefan Monnier wrote:

I think it would be in keeping with John's tradition to "not choose" and instead use one bit to indicate whether the displacement should be
scaled.

Since the way I had used scaled displacements was to shorten the amount
of opcode space needed by varying the displacement length based on the
variable type... I would use a bit to indicate scaled displacements, but
it would be in the header, to note which instruction set I'm using this
block.

In one earlier version of Concertina II, however, along with a full set of aligned load-store instructions, I allocated some opcode space to
load-store instructions with full-width plain displacements - which could
only work with the first eight of the thirty-two registers. So the
programmer did have both alternatives available.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to All on Sun Jul 27 22:29:13 2025

On 7/27/2025 3:50 AM, Robert Finch wrote:

big snip

First, thanks for posting this. I don't recall you posting much about
your design. Can you talk about its goals, why you are doing it, its
status, etc.?

Specific comments below

My current design fuses a max of one memory op into instructions instead
of having a load followed by the instruction (or an instruction followed
by a store). Address mode available without adding instruction words are
Rn, (Rn), (Rn)+, -(Rn). After that 32-bit instruction words are added to support 32 and 64-bit displacements or addresses.

The combined mem-op instructions used to be popular, but since the RISC revolution, are now out of fashion. Their advantages are, as you state,
often eliminating an instruction. The disadvantages include that they
preclude scheduling the load earlier in the instruction stream. Do you
"crack" the instruction into two micro-ops in the decode stage? What
drove your decision to "buck" the trend. I am not saying you are wrong.
I just want to understand your reasoning.

The instructions with the extra displacement words are larger but there
are fewer instructions to execute.
LOAD Rd,[Rb+Disp16]
ADD Rd,Ra,Rd
Requiring two instruction words, and executing as two instructions, gets replaced with:
ADD Rd,Ra,[Rb+Disp32]
Which also takes two instruction words, but only one instruction.

Immediate operands and memory operands are routed according to two two-
bit routing fields. I may be able to compress this to a single three-bit field.

Yes. For example, unless you are doing some special case thing, having
both registers be immediates probably doesn't make sense. And unless
you are going to allow two memory references in one instruction, that combination doesn't make sense.

Typical instruction encoding:
ADD: oooooo ss xx ii ww mmm rrrrr rrrrr ddddd
oooooo: is the opcode
ss: is the operation size
xx: is two extra opcode bits
ii: indicates which register field represents an immediate value
ww: indicates which register field is a memory operand
mmm: is the addressing mode, similar to a 68k
rrrrr: source register spec (or 4+ bit immediate)
ddddd: destination register spec (or 4+ bit immediate)

A 36-bit opcode would work great, allowing operand sign control.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to All on Sat Aug 23 18:16:57 2025

Thomas Koenig <tkoenig@netcologne.de> posted:

John Savard <quadibloc@invalid.invalid> schrieb:

1) 16-bit displacements are _important_. Pretty well *all* microprocessors use 16-bit displacements, rather than anything shorter like 12 bits.

That is a bit of an exaggeration - SPARC has 13-bit constants, RISC-V
has 12-bit constants.

Even
though this meant, in most cases, they had to give up indexing.

SPARC has both Ra+Rb and Ra+immediate, but not combined. The use
case for Ra+Rb+13..16 bit is extremely limited.

2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying
code.

GP registers were an even greater achievement.

Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.

The usual method is Ra+Rb. Mitch also has Ra+Rb<<n+32-bit or Ra+Rb<<n+64-bit, with n from 0 to 3. This is useful for addressing
global data encoded in the constant, which a 12 to 16 bit-offset
is not.

Array accesses are common, and not needing extra instructions for them is therefore beneficial.

Yes, and indexing without offset can do that particular job just
fine.

3) At least one major microprocessor manufacturer, Motorola, did have base-index addressing with 16-bit displacements, Motorola, starting with the 68020.

Mitch recently explained that they had micorarchitectural reasons.

'020 address modes have scaling with base and displacement. They did
not cut too soon.

(3) may not be much of an argument, but it seems to me that (1) and (2)
can reasonably be considered fairly strong arguments. But what about the drawbacks?

I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.

[Ra+Rb+Disp16] has the same gate delay as [Rb+Ri<<3+Disp64] and is not
as powerful, nor can it reach all of the virtual address space.

The only thing in this corner of my ISA I regret is not having more bits
for the scale {to cover complex double, and Quaternions}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to All on Sat Aug 23 18:25:09 2025

Robert Finch <robfi680@gmail.com> posted:
<snip>

My current design fuses a max of one memory op into instructions instead
of having a load followed by the instruction (or an instruction followed
by a store). Address mode available without adding instruction words are
Rn, (Rn), (Rn)+, -(Rn). After that 32-bit instruction words are added to support 32 and 64-bit displacements or addresses.

The instructions with the extra displacement words are larger but there
are fewer instructions to execute.
LOAD Rd,[Rb+Disp16]
ADD Rd,Ra,Rd
Requiring two instruction words, and executing as two instructions, gets replaced with:
ADD Rd,Ra,[Rb+Disp32]
Which also takes two instruction words, but only one instruction.

I have been using the term "instruction-specifier" for the first word of
an instruction, and "instruction" for all of the words of an instruction. Instruction-specifier contains everything about the instruction except
for the constants.

Since you and I are the only "RISCs" with VLE we (WE) should get our terminology aligned.

Immediate operands and memory operands are routed according to two
two-bit routing fields. I may be able to compress this to a single
three-bit field.

Typical instruction encoding:
ADD: oooooo ss xx ii ww mmm rrrrr rrrrr ddddd
oooooo: is the opcode
ss: is the operation size
xx: is two extra opcode bits
ii: indicates which register field represents an immediate value
ww: indicates which register field is a memory operand
mmm: is the addressing mode, similar to a 68k
rrrrr: source register spec (or 4+ bit immediate)
ddddd: destination register spec (or 4+ bit immediate)

A 36-bit opcode would work great, allowing operand sign control.

I cam to the same realization ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to All on Sat Aug 23 18:27:57 2025

Robert Finch <robfi680@gmail.com> posted:

On 2025-07-28 1:29 a.m., Stephen Fuld wrote:

On 7/27/2025 3:50 AM, Robert Finch wrote:

big snip

First, thanks for posting this. I don't recall you posting much about your design. Can you talk about its goals, why you are doing it, its status, etc.?

Just started the design. Lots of details to work out. I like some
features of the 68k and 66k. I have some doubt as to starting a new
design. I would prefer to use something existing. I am not terribly fond
of RISC designs though.

Specific comments below

My current design fuses a max of one memory op into instructions
instead of having a load followed by the instruction (or an
instruction followed by a store). Address mode available without
adding instruction words are Rn, (Rn), (Rn)+, -(Rn). After that 32-bit
instruction words are added to support 32 and 64-bit displacements or
addresses.

The combined mem-op instructions used to be popular, but since the RISC revolution, are now out of fashion. Their advantages are, as you state, often eliminating an instruction. The disadvantages include that they preclude scheduling the load earlier in the instruction stream. Do you "crack" the instruction into two micro-ops in the decode stage? What drove your decision to "buck" the trend. I am not saying you are wrong.
I just want to understand your reasoning.

Instructions will be cracked into micro-ops. My compiler does not do instruction scheduling (yet). Relying on the processor to schedule instructions. There are explicit load and store instructions which
should allow scheduling earlier in the instruction stream.

Once the Fetch-Issue width is greater than 2, compiler scheduling is
an anathema--just let the GBOoO FU schedulers do it.

I am under the impression that with a micro-op based processor the ISA (RISC/CISC) becomes somewhat less relevant allowing more flexibility in
the ISA design. >

There is always the complexity budget ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to MitchAlsup on Sat Aug 23 18:40:10 2025

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

The only thing in this corner of my ISA I regret is not having more bits
for the scale {to cover complex double, and Quaternions}

There is a bit of inconvenience, but strenght reduction can go a
long way to bridge that gap. Consider something like

void foo (__complex double *c, double *d, long int n)
{
for (long int i=0; i<n; i++)
c[i] += d[i];
}

which could be something like (translated by hand, so errors
are likely)

foo:
ble0 r3,.L_end
mov r4,#0
sll r3,r3,#3
vec r5,{}
ldd r6,[r2,r4,0]
ldd r7,[r1,r4<<2,0]
fadd r7,r7,r6
std r7,[r1,r4<<2,0]
loop1 ne,r4,#4,r3
.L_end:
ret

which is as close to optimum (just a single sll instruction) as
not to matter.

--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to All on Wed Aug 27 00:19:02 2025

Thomas Koenig <tkoenig@netcologne.de> posted:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

The only thing in this corner of my ISA I regret is not having more bits for the scale {to cover complex double, and Quaternions}

There is a bit of inconvenience, but strenght reduction can go a
long way to bridge that gap. Consider something like

void foo (__complex double *c, double *d, long int n)
{
for (long int i=0; i<n; i++)
c[i] += d[i];
}

Wondering why "c[i] += d[i];" did not get a type mismatch.

Should be "c[i].real += d[i];"

which could be something like (translated by hand, so errors
are likely)

foo:
ble0 r3,.L_end
mov r4,#0
sll r3,r3,#3
vec r5,{}
ldd r6,[r2,r4,0]
ldd r7,[r1,r4<<2,0]
fadd r7,r7,r6
std r7,[r1,r4<<2,0]
loop1 ne,r4,#4,r3
.L_end:
ret

which is as close to optimum (just a single sll instruction) as
not to matter.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to MitchAlsup on Wed Aug 27 05:08:08 2025

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Thomas Koenig <tkoenig@netcologne.de> posted:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

The only thing in this corner of my ISA I regret is not having more bits >> > for the scale {to cover complex double, and Quaternions}

There is a bit of inconvenience, but strenght reduction can go a
long way to bridge that gap. Consider something like

void foo (__complex double *c, double *d, long int n)
{
for (long int i=0; i<n; i++)
c[i] += d[i];
}

Wondering why "c[i] += d[i];" did not get a type mismatch.

Should be "c[i].real += d[i];"

C's implicit conversion rules.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	16:10:42
Calls:	10,389
Files:	14,061
Messages:	6,416,935

Base-Index Addressing in the Concertina II

Who's Online

System Info