I've noted earlier that I felt I had been going around in circles with Concertina II, changing the instruction format back and forth, instead
of making progress to flesh it out.
Recently, I added a new instruction to facilitate looping.
But the trouble was that it took up tooo much opcode space.
One thing that occured to me was that if I went back to an old method
of specifying instructions longer than 32-bits: using a 4-bit pSupp
field to point into the same reserved area in the block as used for pseudo-immediates, that would suit this instruction very well.
The reason is that if that techique were used, then I could use the
header that's also an instruction to just squeeze in the three-bit
decode field, and so access to the Loop instruction would be easy as
befits its importance.
Then I went back, and looked up an older version of Concertina II
which had it. It had complicated block headers. But worse than that,
it had _four_ different versions of the complete instruction set!
Which version was used depended on the header.The idea, of course,
that some headers required a pared-down version of the instruction set
so as to squeeze in more stuff.
It was also interesting to see how much further along I had gotten in fleshing out that older version of the instruction set.
John Savard
As to looping, I faced the same delimma and came to a different
conclusion::
You don't do it in 1 instruction, instead, you do it in a way where
your
2 instruction encoding executes one of the instructions only once. I
call
this bookending the loop.
On Mon, 17 Jun 2024 20:20:03 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
As to looping, I faced the same delimma and came to a different >>conclusion::
You don't do it in 1 instruction, instead, you do it in a way where
your
2 instruction encoding executes one of the instructions only once. I
call
this bookending the loop.
I considered something like that.
My problem was that encoding the parameters of the loop in one
instruction takes too much space. So the first thing I thought of was
to put some of them in the instruction that repeats the loop.
The proiblem was, though, that since the instruction that repeats the
loop points to the start of the loop in memory, it's a
memory-reference instruction, so there isn't much extra room left in
it.
However, there is a little room left, so I may indeed go back and
explore that possibility some more.
John Savard
On Mon, 17 Jun 2024 23:17:27 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
No, it is not a memref--it is a return ! using the register from the
VEC instruction.
As should not surprise you, I was referring to the end-of-loop
instruction in my current Concertina II, not the one in your MY 66000.
I try to avoid stacks, and reserving extra registers, as much as I
can.
No, it is not a memref--it is a return ! using the register from the
VEC instruction.
On Tue, 18 Jun 2024 10:01:20 -0600, John Savard <quadibloc@servername.invalid> wrote:
On Mon, 17 Jun 2024 23:17:27 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
No, it is not a memref--it is a return ! using the register from the
VEC instruction.
As should not surprise you, I was referring to the end-of-loop
instruction in my current Concertina II, not the one in your MY 66000.
I try to avoid stacks, and reserving extra registers, as much as I
can.
Also, this looping instruction is strictly a way to directly encode
the FORTRAN DO loop. It does not attempt any vectorization.
At one point, in the original Concertina, I did have a sort of
loop/vectorize instruction with a functionality that may be somewhat
similar to your VVM. I am definitely going to look at adding that to Concertina II, as this will perhaps clarify the discussion.
John Savard
On Tue, 18 Jun 2024 10:01:20 -0600, John Savard
<quadibloc@servername.invalid> wrote:
On Mon, 17 Jun 2024 23:17:27 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
No, it is not a memref--it is a return ! using the register from the
VEC instruction.
As should not surprise you, I was referring to the end-of-loop
instruction in my current Concertina II, not the one in your MY 66000.
I try to avoid stacks, and reserving extra registers, as much as I
can.
Also, this looping instruction is strictly a way to directly encode
the FORTRAN DO loop. It does not attempt any vectorization.
On Mon, 17 Jun 2024 23:17:27 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
No, it is not a memref--it is a return ! using the register from the
VEC instruction.
As should not surprise you, I was referring to the end-of-loop
instruction in my current Concertina II, not the one in your MY 66000.
I try to avoid stacks, and reserving extra registers, as much as I
can.
John Savard
John Savard <quadibloc@servername.invalid> schrieb:
Also, this looping instruction is strictly a way to directly encode
the FORTRAN DO loop. It does not attempt any vectorization.
Which one, the FORTRAN 66 one or the one since FORTRAN 77?
The semantics of instructions in a loop are subtly altered such
that they can be vectorized and to execute multi-lane style.
On Tue, 18 Jun 2024 16:17:33 -0000 (UTC), Thomas Koenig
<tkoenig@netcologne.de> wrote:
John Savard <quadibloc@servername.invalid> schrieb:
Also, this looping instruction is strictly a way to directly encode
the FORTRAN DO loop. It does not attempt any vectorization.
Which one, the FORTRAN 66 one or the one since FORTRAN 77?
FORTRAN IV (or 66) indeed.
John Savard <quadibloc@servername.invalid> schrieb:
On Tue, 18 Jun 2024 16:17:33 -0000 (UTC), Thomas Koenig >><tkoenig@netcologne.de> wrote:
John Savard <quadibloc@servername.invalid> schrieb:
Also, this looping instruction is strictly a way to directly encode
the FORTRAN DO loop. It does not attempt any vectorization.
Which one, the FORTRAN 66 one or the one since FORTRAN 77?
FORTRAN IV (or 66) indeed.
It was actually not defined in the standard, in practice it
was usually implemented by a test at the bottom of the loop,
and programs depended on that.
FORTRAN 77 fixed that, so now
DO 100 I=1,0
...
100 CONTINUE
is executed zero times.
On Tue, 18 Jun 2024 16:54:04 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
The semantics of instructions in a loop are subtly altered such
that they can be vectorized and to execute multi-lane style.
I've decided that I will not be able to use the one from the original Concertina, and will need to design a VVM-like instruction for
Concertina II from scratch.
Unlike yours, it won't be...subtle.
The action of the instruction which begins the loop will, I think, be basically the same as yours. It willl issue successive iterations of
the loop starting in consecutive cycles.
To do so, though, that instruction will contain a number of fields in
which to specify parameters:
(3 bits) An index register, which is initialized to zero at the start
of the loop, and "incremented" (the quote marks are, of course,
because it won't really be the same register on each iteration) for subsequent iterations.
(3 bits) The power of two which is to serve as the increment.
(8 bits) A register mask, in which a 1 bit corresponds to a register
used for intermediate results within the loop. This will become a
forwarding node rather than a register; all other registers can only
be read, and serve as constant values only. The index register set up previously does not need to be indicated by this.
(2 bits) This indicates which of the four groups of 8 registers in a
bank of 32 registers the register mask applies to.
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
In addition, in the long version of the instruction, there's a 16-bit register mask for the short vector registers.
Because iterations are independent, one can't handle a stride in the
natural efficient manner of adding the stride value to a second
pointer register. This could be a common source of error, so I feel
the need to make some provision for this.
One scheme I am considering would be to include one bit in the
instruction that begins a loop to indicate the loop contains a
preamble. The preambles execute serially, and when they conclude,
everything that follows is issued immediately, to execute in parallel
(but now with a multi-cycle offset) to previous iterations.
Upon reflection, this doesn't waste a huge amount of time, so it is
better to go with it than including fields for stride value and a
second counter register in the loop start instruction.
Since the preambles do execute serially, the "end preamble"
instruction would point to the loop start instruction. Instead of full memory-reference, though, it would just include a short value that is
a negative program-relative address.
Iterations that execute in parallel, though, don't "branch back"
anywhere, so the loop end instruction has no parameters. At least
something is like your VVM.
So this is how I take your VVM concept, and mess it up by making it unnecessarily complicated; basically, because I don't want to make an
ISA that requires implementations to be, so to speak, "intelligent".
(i.e. upon the first store into a register in the loop, categorize
that register as a node reference)
John Savard
John Savard wrote:
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
Loops controlled by floating point indexes do not vectorize, however
the
body of the loop can be any mix of int, logical, memory, or FP
instructions.
John Savard <quadibloc@servername.invalid> schrieb:
On Tue, 18 Jun 2024 16:17:33 -0000 (UTC), Thomas Koenig
<tkoenig@netcologne.de> wrote:
John Savard <quadibloc@servername.invalid> schrieb:
encode >>> the FORTRAN DO loop. It does not attempt any vectorization.Also, this looping instruction is strictly a way to directly
Which one, the FORTRAN 66 one or the one since FORTRAN 77?
FORTRAN IV (or 66) indeed.
It was actually not defined in the standard, in practice it
was usually implemented by a test at the bottom of the loop,
and programs depended on that.
FORTRAN 77 fixed that, so now
DO 100 I=1,0
...
100 CONTINUE
is executed zero times.
On Tue, 18 Jun 2024 21:23:57 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
John Savard wrote:
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
Loops controlled by floating point indexes do not vectorize, however
the body of the loop can be any mix of int, logical, memory, or FP >>instructions.
Oh no, my index is always an integer. This bit applies to the
"live-in" bits - if the loop performs floating-point computation, it's floating-point registers that I want to mark as forwarding nodes.
And so you indicate this explicitly in VVM as well. I tended to assume
only a limited number of registers would be needed to live in, plus I
have both floating and integer register files, hence the differences.
John Savard
Thomas Koenig wrote:
John Savard <quadibloc@servername.invalid> schrieb:
On Tue, 18 Jun 2024 16:17:33 -0000 (UTC), Thomas Koenigencode >>> the FORTRAN DO loop. It does not attempt any vectorization.
<tkoenig@netcologne.de> wrote:
John Savard <quadibloc@servername.invalid> schrieb:
Also, this looping instruction is strictly a way to directly
Which one, the FORTRAN 66 one or the one since FORTRAN 77?
FORTRAN IV (or 66) indeed.
It was actually not defined in the standard, in practice it
was usually implemented by a test at the bottom of the loop,
and programs depended on that.
FORTRAN 77 fixed that, so now
DO 100 I=1,0
...
100 CONTINUE
is executed zero times.
How does VVM handle that? It sems you must "waste" some time, not
executing the loop body until the furst LOOP instruction tells you
whether to or not, or perhaps not actually updating the values the
first time through the loop. Neither seems optimal. :-(
Thomas Koenig wrote:
John Savard <quadibloc@servername.invalid> schrieb:
On Tue, 18 Jun 2024 16:17:33 -0000 (UTC), Thomas Koenigencode >>> the FORTRAN DO loop. It does not attempt any vectorization.
<tkoenig@netcologne.de> wrote:
John Savard <quadibloc@servername.invalid> schrieb:
Also, this looping instruction is strictly a way to directly
Which one, the FORTRAN 66 one or the one since FORTRAN 77?
FORTRAN IV (or 66) indeed.
It was actually not defined in the standard, in practice it
was usually implemented by a test at the bottom of the loop,
and programs depended on that.
FORTRAN 77 fixed that, so now
DO 100 I=1,0
...
100 CONTINUE
is executed zero times.
How does VVM handle that? It sems you must "waste" some time, not
executing the loop body until the furst LOOP instruction tells you
whether to or not, or perhaps not actually updating the values the
first time through the loop. Neither seems optimal. :-(
Thomas Koenig wrote:
John Savard <quadibloc@servername.invalid> schrieb:
On Tue, 18 Jun 2024 16:17:33 -0000 (UTC), Thomas Koenigencode >>> the FORTRAN DO loop. It does not attempt any vectorization.
<tkoenig@netcologne.de> wrote:
John Savard <quadibloc@servername.invalid> schrieb:
Also, this looping instruction is strictly a way to directly
Which one, the FORTRAN 66 one or the one since FORTRAN 77?
FORTRAN IV (or 66) indeed.
It was actually not defined in the standard, in practice it
was usually implemented by a test at the bottom of the loop,
and programs depended on that.
FORTRAN 77 fixed that, so now
DO 100 I=1,0
...
100 CONTINUE
is executed zero times.
How does VVM handle that? It sems you must "waste" some time, not
executing the loop body until the furst LOOP instruction tells you
whether to or not, or perhaps not actually updating the values the
first time through the loop. Neither seems optimal. :-(
On Tue, 18 Jun 2024 16:54:04 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
The semantics of instructions in a loop are subtly altered such
that they can be vectorized and to execute multi-lane style.
I've decided that I will not be able to use the one from the original Concertina, and will need to design a VVM-like instruction for
Concertina II from scratch.
Unlike yours, it won't be...subtle.
The action of the instruction which begins the loop will, I think, be basically the same as yours. It willl issue successive iterations of
the loop starting in consecutive cycles.
To do so, though, that instruction will contain a number of fields in
which to specify parameters:
(3 bits) An index register, which is initialized to zero at the start
of the loop, and "incremented" (the quote marks are, of course,
because it won't really be the same register on each iteration) for subsequent iterations.
(3 bits) The power of two which is to serve as the increment.
(8 bits) A register mask, in which a 1 bit corresponds to a register
used for intermediate results within the loop. This will become a
forwarding node rather than a register; all other registers can only
be read, and serve as constant values only. The index register set up previously does not need to be indicated by this.
(2 bits) This indicates which of the four groups of 8 registers in a
bank of 32 registers the register mask applies to.
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
In addition, in the long version of the instruction, there's a 16-bit register mask for the short vector registers.
Because iterations are independent, one can't handle a stride in the
natural efficient manner of adding the stride value to a second
pointer register. This could be a common source of error, so I feel
the need to make some provision for this.
One scheme I am considering would be to include one bit in the
instruction that begins a loop to indicate the loop contains a
preamble. The preambles execute serially, and when they conclude,
everything that follows is issued immediately, to execute in parallel
(but now with a multi-cycle offset) to previous iterations.
Upon reflection, this doesn't waste a huge amount of time, so it is
better to go with it than including fields for stride value and a
second counter register in the loop start instruction.
Since the preambles do execute serially, the "end preamble"
instruction would point to the loop start instruction. Instead of full memory-reference, though, it would just include a short value that is
a negative program-relative address.
Iterations that execute in parallel, though, don't "branch back"
anywhere, so the loop end instruction has no parameters. At least
something is like your VVM.
So this is how I take your VVM concept, and mess it up by making it unnecessarily complicated; basically, because I don't want to make an
ISA that requires implementations to be, so to speak, "intelligent".
(i.e. upon the first store into a register in the loop, categorize
that register as a node reference)
John Savard
John Savard wrote:
And so you indicate this explicitly in VVM as well. I tended to assume
only a limited number of registers would be needed to live in, plus I
have both floating and integer register files, hence the differences.
It ends up that the majority of register uses in a loop do not need to
be visible outside of the loop. This is almost the contrapositive of >annotating which registers are temporary in the loop. 90%+ of loops do
not even need the index register to be live outside of the loop.
No, it is not a memref--it is a return ! using the register from the
VEC instruction. You "return" to the top of the loop. There is no
reason to use IP+Disp, and the fact there is no register nor disp-
lacement in LOOP enables it all to fit. In addition, when VEC executes,
IP is pointing at the top of the loop, requiring no calculation
whatsoever.
John Savard wrote:
On Tue, 18 Jun 2024 21:23:57 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
John Savard wrote:
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
Loops controlled by floating point indexes do not vectorize, however
the body of the loop can be any mix of int, logical, memory, or FP
instructions.
Oh no, my index is always an integer. This bit applies to the
"live-in" bits - if the loop performs floating-point computation, it's
floating-point registers that I want to mark as forwarding nodes.
See, I do not have this distinction, there is but one file.
And so you indicate this explicitly in VVM as well. I tended to assume
only a limited number of registers would be needed to live in, plus I
have both floating and integer register files, hence the differences.
It ends up that the majority of register uses in a loop do not need to
be visible outside of the loop. This is almost the contrapositive of annotating which registers are temporary in the loop. 90%+ of loops do
not even need the index register to be live outside of the loop.
MitchAlsup1 wrote:
John Savard wrote:
On Tue, 18 Jun 2024 21:23:57 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
John Savard wrote:
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
Loops controlled by floating point indexes do not vectorize, however
the body of the loop can be any mix of int, logical, memory, or FP
instructions.
Oh no, my index is always an integer. This bit applies to the
"live-in" bits - if the loop performs floating-point computation, it's
floating-point registers that I want to mark as forwarding nodes.
See, I do not have this distinction, there is but one file.
And so you indicate this explicitly in VVM as well. I tended to assume
only a limited number of registers would be needed to live in, plus I
have both floating and integer register files, hence the differences.
It ends up that the majority of register uses in a loop do not need to
be visible outside of the loop. This is almost the contrapositive of
annotating which registers are temporary in the loop. 90%+ of loops do
not even need the index register to be live outside of the loop.
This is partly due to programming languages that applies lifetimes to variables, so that an index register which is defined in the
scaffolding
of the loop (i.e. for (i = 0; i < limit; i++) {}) is invisible as soon
as the loop terminates.
Without such a restriction, there are many times when it would be very natural to inspect the index in order to determine if this was a normal
(counting) exit, or an early exit due to some internal test.
Personally, I have still not settled on my preferred way to handle
cases
like this, but I possibly will do so after I retire.
Terje
Ah. I can't include that fix now, as I've changed things so that one
of the parameters is at the end of the loop, so the instruction that
heads the loop doesn't know if the "step" parameter is negative or
not.
The change has not yet been posted.
MitchAlsup1 wrote:
John Savard wrote:
On Tue, 18 Jun 2024 21:23:57 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
John Savard wrote:
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
Loops controlled by floating point indexes do not vectorize, however
the body of the loop can be any mix of int, logical, memory, or FP
instructions.
Oh no, my index is always an integer. This bit applies to the
"live-in" bits - if the loop performs floating-point computation, it's
floating-point registers that I want to mark as forwarding nodes.
See, I do not have this distinction, there is but one file.
And so you indicate this explicitly in VVM as well. I tended to assume
only a limited number of registers would be needed to live in, plus I
have both floating and integer register files, hence the differences.
It ends up that the majority of register uses in a loop do not need to
be visible outside of the loop. This is almost the contrapositive of
annotating which registers are temporary in the loop. 90%+ of loops do
not even need the index register to be live outside of the loop.
This is partly due to programming languages that applies lifetimes to variables, so that an index register which is defined in the scaffolding
of the loop (i.e. for (i = 0; i < limit; i++) {}) is invisible as soon
as the loop terminates.
Without such a restriction, there are many times when it would be very natural to inspect the index in order to determine if this was a normal (counting) exit, or an early exit due to some internal test.
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
MitchAlsup1 wrote:
John Savard wrote:
On Tue, 18 Jun 2024 21:23:57 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
John Savard wrote:
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
Loops controlled by floating point indexes do not vectorize, however >>>>> the body of the loop can be any mix of int, logical, memory, or FP
instructions.
Oh no, my index is always an integer. This bit applies to the
"live-in" bits - if the loop performs floating-point computation, it's >>>> floating-point registers that I want to mark as forwarding nodes.
See, I do not have this distinction, there is but one file.
And so you indicate this explicitly in VVM as well. I tended to assume >>>> only a limited number of registers would be needed to live in, plus I
have both floating and integer register files, hence the differences.
It ends up that the majority of register uses in a loop do not need to
be visible outside of the loop. This is almost the contrapositive of
annotating which registers are temporary in the loop. 90%+ of loops do
not even need the index register to be live outside of the loop.
This is partly due to programming languages that applies lifetimes to
variables, so that an index register which is defined in the
scaffolding
of the loop (i.e. for (i = 0; i < limit; i++) {}) is invisible as soon
as the loop terminates.
This makes things more clear to anybody reading the code (and
unambiguous to the compiler). However, lifetime analysis has
also become very good, and if the value is not used afterwards,
I expect no difference in practice.
Without such a restriction, there are many times when it would be very
natural to inspect the index in order to determine if this was a normal
(counting) exit, or an early exit due to some internal test.
Hmm... do you mean for the programmer, or for the compiler?
Thomas Koenig wrote:
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
This is partly due to programming languages that applies lifetimes to
variables, so that an index register which is defined in the
scaffolding
of the loop (i.e. for (i = 0; i < limit; i++) {}) is invisible as
soon as the loop terminates.
This makes things more clear to anybody reading the code (and
unambiguous to the compiler). However, lifetime analysis has
also become very good, and if the value is not used afterwards,
I expect no difference in practice.
When one writes::
for( uint64_t i = 0; i < max; i++ )
the lifetime of i is explicit--it terminates with the loop.
Without such a restriction, there are many times when it would be
very natural to inspect the index in order to determine if this was a
normal
(counting) exit, or an early exit due to some internal test.
Hmm... do you mean for the programmer, or for the compiler?
Terje Mathisen wrote:
MitchAlsup1 wrote:
John Savard wrote:
On Tue, 18 Jun 2024 21:23:57 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
John Savard wrote:
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
Loops controlled by floating point indexes do not vectorize, however >>>>> the body of the loop can be any mix of int, logical, memory, or FP
instructions.
Oh no, my index is always an integer. This bit applies to the
"live-in" bits - if the loop performs floating-point computation, it's >>>> floating-point registers that I want to mark as forwarding nodes.
See, I do not have this distinction, there is but one file.
And so you indicate this explicitly in VVM as well. I tended to assume >>>> only a limited number of registers would be needed to live in, plus I
have both floating and integer register files, hence the differences.
It ends up that the majority of register uses in a loop do not need to
be visible outside of the loop. This is almost the contrapositive of
annotating which registers are temporary in the loop. 90%+ of loops
do not even need the index register to be live outside of the loop.
This is partly due to programming languages that applies lifetimes to
variables, so that an index register which is defined in the
scaffolding
of the loop (i.e. for (i = 0; i < limit; i++) {}) is invisible as soon
as the loop terminates.
There are loops for which the last index and the last inbound data
reference
want to remain visible--search loops for example. But in general, the
amount of data wanted outside of the loop is very small indeed.
Without such a restriction, there are many times when it would be very
natural to inspect the index in order to determine if this was a normal
(counting) exit, or an early exit due to some internal test.
The most important thing is that the live-outs of the loop are few
while
the loop-temps are many.
Personally, I have still not settled on my preferred way to handle
cases
like this, but I possibly will do so after I retire.
Terje
John Savard wrote:
On Tue, 18 Jun 2024 16:54:04 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
The semantics of instructions in a loop are subtly altered such
that they can be vectorized and to execute multi-lane style.
I've decided that I will not be able to use the one from the original
Concertina, and will need to design a VVM-like instruction for
Concertina II from scratch.
Unlike yours, it won't be...subtle.
LOL
On Tue, 18 Jun 2024 23:57:54 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
John Savard wrote:
You have convinced me here to learn from your wisdom: I will do twoAnd so you indicate this explicitly in VVM as well. I tended to assume
only a limited number of registers would be needed to live in, plus I
have both floating and integer register files, hence the differences.
It ends up that the majority of register uses in a loop do not need to
be visible outside of the loop. This is almost the contrapositive of >>annotating which registers are temporary in the loop. 90%+ of loops do
not even need the index register to be live outside of the loop.
things. One is to add a bit that decides whether my 1 bits (confined
to a single group of 8 registers) are live-in or live-out bits. The
other is to specify clearly to implementors that if a register is
specified as "live-in" but is never actually used in a loop, this must
not cause any problems.
So I've added operate instructions that allow operations where one
operand is in a normal register, and the other operand is in a
selected element of a vector register. The element is itself specified
by the contents of an integer register, for convenient use within
loops.
Thus, a VVM-alike loop, instead of going from some vectors in memory
to other vectors in memory, could go from some vector registers to
other vector registers. The vectors aren't virtual any more.
On Tue, 18 Jun 2024 21:36:06 -0600, John Savard <quadibloc@servername.invalid> wrote:
On Tue, 18 Jun 2024 23:57:54 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
John Savard wrote:You have convinced me here to learn from your wisdom: I will do two
And so you indicate this explicitly in VVM as well. I tended to assume >>>> only a limited number of registers would be needed to live in, plus I
have both floating and integer register files, hence the differences.
It ends up that the majority of register uses in a loop do not need to
be visible outside of the loop. This is almost the contrapositive of >>>annotating which registers are temporary in the loop. 90%+ of loops do >>>not even need the index register to be live outside of the loop.
things. One is to add a bit that decides whether my 1 bits (confined
to a single group of 8 registers) are live-in or live-out bits. The
other is to specify clearly to implementors that if a register is
specified as "live-in" but is never actually used in a loop, this must
not cause any problems.
I have not yet added my attempt at an imitation of VVM to Concertina
II. However, I have now laid some important groundwork for it.
In my architecture, there are already Cray-style long vectors. They
are intended to nbe the principal and most efficient way of working
with vector quantities in the architecture. So if my VVM-alike was
disjoint from them, and could only interact with them through memory,
this would be an awkwardness in the ISA that needlessly constrains performance.
So I've added operate instructions that allow operations where one
operand is in a normal register, and the other operand is in a
selected element of a vector register. The element is itself specified
by the contents of an integer register, for convenient use within
loops.
Thus, a VVM-alike loop, instead of going from some vectors in memory
to other vectors in memory, could go from some vector registers to
other vector registers. The vectors aren't virtual any more.
John Savard
While the vectorizing HW certainly has CRAY-like vector flip-flops
they are not addressable by SW. The code within the VEC--LOOP
brackets reads as if scalar:: So, My 66000 consumes exactly 2
OpCodes to provide an entire vector instruction set--one that
works as well as possible across various implementations.
On Sun, 23 Jun 2024 16:19:27 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
While the vectorizing HW certainly has CRAY-like vector flip-flops
they are not addressable by SW. The code within the VEC--LOOP
brackets reads as if scalar:: So, My 66000 consumes exactly 2
OpCodes to provide an entire vector instruction set--one that
works as well as possible across various implementations.
Oh, yes, your VVM is wonderful.
My attempt at an imitation of VVM, at least, if not the real thing
that you have in your 66000, would be inferior in one important way to Cray-style vector registers.
A virtual vector loop would take input vector values from memory, and
return results to memory. Yes, there are multiple operations within
the loop, but I am still assuming that the length and complexity of
such loops is constrained.
So if you have Cray-style vector registers, you have a place to store intermediate results _between_ these loops that avoids referring to
memory.
In addition, one potentially catastrophic limitation is that, because
the meaning of register specifications in instructions is changed,
_there can't be any subroutine calls in such loops_. (Now that it's
typical for computers to have instructions that do log and trig
functions, this is slightly _less_ catastrophic, though.) Branches
within the loops and instruction predication, though, would still be permitted.
John Savard
But, yes, VVM <as of now> only vectorizes the inner most loop.
Because it seemed to me that any VVM-alike instruction I had would
have to have at least an alternate form longer than 32 bits, despite
my efforts to squeeze it in to much less space than you use... I felt
that I needed to go back to an earlier iteration of Concertina for a
method of making it easier to use long instructions in programs.
Doing that, though, required me to reserve some opcode space, and one
of the consequence is that the instructions referred to above had to
be moved to an alternate instruction set!
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 547 |
Nodes: | 16 (2 / 14) |
Uptime: | 71:39:46 |
Calls: | 10,398 |
Files: | 14,070 |
Messages: | 6,417,621 |