Forum: >>> Magnum BBS <<<

Re: VAX encoding

From John Levine@21:1/5 to It appears that Waldek Hebisch on Fri Aug 1 15:30:56 2025

It appears that Waldek Hebisch <antispam@fricas.org> said:

My idea was that instruction decoder could essentially translate

ADDL (R2)+, R2, R3

into

MOV (R2)+, TMP
ADDL TMP, R2, R3

But how about this?

ADDL3 (R2)+,(R2)+,(R2)+

Now you need at least two temps, the second of which depends on the
first, and there are instructions with six operands. Or how about
this:

ADDL3 (R2)+,#1234,(R2)+

This is encoded as

OPCODE (R2)+ (PC)+ <1234> (R2)+

The immediate word is in the middle of the instruction. You have to decode
the operands one at a time so you can recognize immediates and skip over them. It must have seemed clever at the time, but ugh.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to John Levine on Fri Aug 1 18:08:30 2025

John Levine <johnl@taugh.com> wrote:

It appears that Waldek Hebisch <antispam@fricas.org> said:

My idea was that instruction decoder could essentially translate

ADDL (R2)+, R2, R3

into

MOV (R2)+, TMP
ADDL TMP, R2, R3

But how about this?

ADDL3 (R2)+,(R2)+,(R2)+

Now you need at least two temps, the second of which depends on the
first,

3 actually, the translation should be

MOVL (R2)+, TMP1
MOVL (R2)+, TMP2
ADDL TMP1, TMP2, TMP3
MOVL TMP3, (R2)+

Of course, temporaries are only within pipeline, so they probably
do not need real registers. But the instruction would need
4 clocks.

and there are instructions with six operands.

Those would be classified as hairy and done by microcode.

Or how about
this:

ADDL3 (R2)+,#1234,(R2)+

This is encoded as

OPCODE (R2)+ (PC)+ <1234> (R2)+

The immediate word is in the middle of the instruction. You have to decode the operands one at a time so you can recognize immediates and skip over them.

Actually decoder that I propose could decode _this_ one in one
cycle. But for this instruction one cycle decoding is not needed,
because execution will take multiple clocks. One cycle decoding
is needed for

ADDL3 R2,#1234,R2

which should be executed in one cycle. And to handle it one needs
7 operand decoders looking at 7 consequitive bytes, so that last
decoder sees last register argument.

It must have seemed clever at the time, but ugh.

VAX designers clearly had microcode in mind, even small changes
could make hardware decoding easier.

I have book by A. Tanenbaum about computer architecture that
was written in similar period as VAX design. Tanenbaum was
very positive about microcode and advocated adding instructions
that directly correspond to higher-level language constructs.
In a sense, Tanenbaum could see advantegs of RISC. Namely
he cites report about compiling Fortran to IBM microcode:
Fortran compiled to microcode could run 45 times faster than
Fortran compiled to native code. So it was implicitely
known to him that very primitive machine language was
pretty adequate to get high speed from compiled languages.
Yet Tanenbaum still wanted microcode and gave made-up
examples of microcode advantages. No wonder that VAX
designers were in the same camp.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Waldek Hebisch on Fri Aug 1 18:33:31 2025

antispam@fricas.org (Waldek Hebisch) writes:

John Levine <johnl@taugh.com> wrote:

<snip>

ADDL3 (R2)+,#1234,(R2)+

This is encoded as

OPCODE (R2)+ (PC)+ <1234> (R2)+

The immediate word is in the middle of the instruction. You have to decode >> the operands one at a time so you can recognize immediates and skip over them.

Actually decoder that I propose could decode _this_ one in one
cycle.

Assuming it didn't cross a cache line, which is possible with any
variable length instruction encoding.

But for this instruction one cycle decoding is not needed,
because execution will take multiple clocks. One cycle decoding
is needed for

ADDL3 R2,#1234,R2

which should be executed in one cycle. And to handle it one needs
7 operand decoders looking at 7 consequitive bytes, so that last
decoder sees last register argument.

It must have seemed clever at the time, but ugh.

VAX designers clearly had microcode in mind, even small changes
could make hardware decoding easier.

I have book by A. Tanenbaum about computer architecture that
was written in similar period as VAX design.

That would be:

$ author tanenbaum
Enter password:
artist title format location
Tanenbaum, Andrew S. Structured Computer Organization Hard A029

It's currently in box A029 in storage, but my recollection is that
it was rather vax-centric.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Waldek Hebisch on Fri Aug 1 19:13:53 2025

Waldek Hebisch <antispam@fricas.org> schrieb:

John Levine <johnl@taugh.com> wrote:

It appears that Waldek Hebisch <antispam@fricas.org> said:

My idea was that instruction decoder could essentially translate

ADDL (R2)+, R2, R3

into

MOV (R2)+, TMP
ADDL TMP, R2, R3

But how about this?

ADDL3 (R2)+,(R2)+,(R2)+

Now you need at least two temps, the second of which depends on the
first,

3 actually, the translation should be

MOVL (R2)+, TMP1
MOVL (R2)+, TMP2
ADDL TMP1, TMP2, TMP3
MOVL TMP3, (R2)+

Of course, temporaries are only within pipeline, so they probably
do not need real registers. But the instruction would need
4 clocks.

It would be, unoptimized (my VAX assember is very probably wrong)

MOVL (R2),TMP1
ADDL #4,R2
MOVL (R2),TMP2
ADDL #4,R2
ADDL TMP1,TMP2,TMP2 ! That could be one register,
! or an implied forwarding register
MOVL TMO2,(R2)
ADDL #4,R2

which could better be expressed by

MOVL (R2),TMP1
MOVL 4(R2),TMP2
ADDL TMP1,TMP2,TMP2
MOVL TMP2,8(R2)
ADDL #12,R2

--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to Scott Lurndal on Fri Aug 1 21:24:42 2025

Scott Lurndal <scott@slp53.sl.home> wrote:

antispam@fricas.org (Waldek Hebisch) writes:

John Levine <johnl@taugh.com> wrote:

<snip>

ADDL3 (R2)+,#1234,(R2)+

This is encoded as

OPCODE (R2)+ (PC)+ <1234> (R2)+

The immediate word is in the middle of the instruction. You have to decode >>> the operands one at a time so you can recognize immediates and skip over them.

Actually decoder that I propose could decode _this_ one in one
cycle.

Assuming it didn't cross a cache line, which is possible with any
variable length instruction encoding.

Assuming that instruction is in prefetch buffer. IIUC VAX accessed
cache in 4 byte units, so length of cache line did not matter.

But for this instruction one cycle decoding is not needed,
because execution will take multiple clocks. One cycle decoding
is needed for

ADDL3 R2,#1234,R2

which should be executed in one cycle. And to handle it one needs
7 operand decoders looking at 7 consequitive bytes, so that last
decoder sees last register argument.

It must have seemed clever at the time, but ugh.

VAX designers clearly had microcode in mind, even small changes
could make hardware decoding easier.

I have book by A. Tanenbaum about computer architecture that
was written in similar period as VAX design.

That would be:

$ author tanenbaum
Enter password:
artist title format location
Tanenbaum, Andrew S. Structured Computer Organization Hard A029

It's currently in box A029 in storage, but my recollection is that
it was rather vax-centric.

Maybe you have later edition. My had IBM-360, PDP-11 and Cyber-6600
as example intstruction sets and IBM-360, PDP-11 and a Burroughs
machine as examples of microcode level. VAX may be mentioned,
but I am not sure. In library I saw later edition with quite
different (post-VAX) examples.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to All on Thu Aug 28 15:10:55 2025

John Levine <johnl@taugh.com> posted:

It appears that Waldek Hebisch <antispam@fricas.org> said:

My idea was that instruction decoder could essentially translate

ADDL (R2)+, R2, R3

into

MOV (R2)+, TMP
ADDL TMP, R2, R3

But how about this?

ADDL3 (R2)+,(R2)+,(R2)+

Now you need at least two temps, the second of which depends on the
first, and there are instructions with six operands. Or how about
this:

ADDL3 (R2)+,#1234,(R2)+

This is encoded as

OPCODE (R2)+ (PC)+ <1234> (R2)+

The immediate word is in the middle of the instruction. You have to decode the operands one at a time so you can recognize immediates and skip over them.
It must have seemed clever at the time, but ugh.

What we must all realize is that each address mode in VAX was a microinstruction all unto itself.

And that is why it was not pipelineable in any real sense.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to MitchAlsup on Fri Aug 29 10:34:31 2025

MitchAlsup wrote:

John Levine <johnl@taugh.com> posted:

It appears that Waldek Hebisch <antispam@fricas.org> said:

My idea was that instruction decoder could essentially translate

ADDL (R2)+, R2, R3

into

MOV (R2)+, TMP
ADDL TMP, R2, R3

But how about this?

ADDL3 (R2)+,(R2)+,(R2)+

Now you need at least two temps, the second of which depends on the
first, and there are instructions with six operands. Or how about
this:

ADDL3 (R2)+,#1234,(R2)+

This is encoded as

OPCODE (R2)+ (PC)+ <1234> (R2)+

The immediate word is in the middle of the instruction. You have to decode >> the operands one at a time so you can recognize immediates and skip over them.
It must have seemed clever at the time, but ugh.

What we must all realize is that each address mode in VAX was a microinstruction all unto itself.

And that is why it was not pipelineable in any real sense.

Yes. The instructions are designed to parsed by a byte-code interpreter
in microcode. Even the NVAX in 1992 its Decode can only produce one
operand per clock.

If that operand is one of the complex memory address modes then it
might be possible to dispatch it and let the back end chew on it
while Decode works on the second operand.

But that assumes the operands are in slow memory. If they are in fast
registers then it stalls waiting for the second and third operands to be decoded making a pipeline pointless.

And since programs mostly put operands in registers it stalls at Decode.

One might say we should just build a fast decoder. But if you look at
the instruction formats, even the simplest 2 register instructions are
3 bytes and would require looking at 24 instruction bits and 3 valid bits
or 27 bits at once. The 3 operand rs1,rs2,rd instructions is 36 bits.

That decoder has to deal with 2^27 or 2^36 possibilities!
And that just handles 2 and 3 register instructions, no memory references.

It is hypothetically possible with a pre-decode stage to compact those
down to 17 bits for 2 register and 21 bits for 3 register but that is
still too many possibilities. That just throws transistors at a problem
that never needed to exist in the first place, and would still not be affordable in 1992 NMOS, certainly not in 1975 TTL.

If we look at what the VAX is actually spending most of its time on,
2 and 3 register ALU operations, those can be decoded in parallel by
looking at 10 bits (8 opcode + 2 valid) for 2 register,
15 bits (12 opcode + 3 valid) for 3 register instructions.
Which is quite doable in 1975 TTL in 1 clock.
And that allows the pipeline to not stall at Decode.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Gretchiie
  Mon Sep 15 05:16:29 2025
  from Derry, Nh via Telnet
- Fred Blogs
  Mon Sep 15 00:03:12 2025
  from Uk via SSH
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	01:20:26
Calls:	10,387
Calls today:	2
Files:	14,061
Messages:	6,416,728

Re: VAX encoding

Who's Online

Recent Visitors

System Info