• Re: VAX encoding

    From John Levine@21:1/5 to It appears that Waldek Hebisch on Fri Aug 1 15:30:56 2025
    It appears that Waldek Hebisch <antispam@fricas.org> said:
    My idea was that instruction decoder could essentially translate

    ADDL (R2)+, R2, R3

    into

    MOV (R2)+, TMP
    ADDL TMP, R2, R3

    But how about this?

    ADDL3 (R2)+,(R2)+,(R2)+

    Now you need at least two temps, the second of which depends on the
    first, and there are instructions with six operands. Or how about
    this:

    ADDL3 (R2)+,#1234,(R2)+

    This is encoded as

    OPCODE (R2)+ (PC)+ <1234> (R2)+

    The immediate word is in the middle of the instruction. You have to decode
    the operands one at a time so you can recognize immediates and skip over them. It must have seemed clever at the time, but ugh.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Waldek Hebisch@21:1/5 to John Levine on Fri Aug 1 18:08:30 2025
    John Levine <johnl@taugh.com> wrote:
    It appears that Waldek Hebisch <antispam@fricas.org> said:
    My idea was that instruction decoder could essentially translate

    ADDL (R2)+, R2, R3

    into

    MOV (R2)+, TMP
    ADDL TMP, R2, R3

    But how about this?

    ADDL3 (R2)+,(R2)+,(R2)+

    Now you need at least two temps, the second of which depends on the
    first,

    3 actually, the translation should be

    MOVL (R2)+, TMP1
    MOVL (R2)+, TMP2
    ADDL TMP1, TMP2, TMP3
    MOVL TMP3, (R2)+

    Of course, temporaries are only within pipeline, so they probably
    do not need real registers. But the instruction would need
    4 clocks.

    and there are instructions with six operands.

    Those would be classified as hairy and done by microcode.

    Or how about
    this:

    ADDL3 (R2)+,#1234,(R2)+

    This is encoded as

    OPCODE (R2)+ (PC)+ <1234> (R2)+

    The immediate word is in the middle of the instruction. You have to decode the operands one at a time so you can recognize immediates and skip over them.

    Actually decoder that I propose could decode _this_ one in one
    cycle. But for this instruction one cycle decoding is not needed,
    because execution will take multiple clocks. One cycle decoding
    is needed for

    ADDL3 R2,#1234,R2

    which should be executed in one cycle. And to handle it one needs
    7 operand decoders looking at 7 consequitive bytes, so that last
    decoder sees last register argument.

    It must have seemed clever at the time, but ugh.

    VAX designers clearly had microcode in mind, even small changes
    could make hardware decoding easier.

    I have book by A. Tanenbaum about computer architecture that
    was written in similar period as VAX design. Tanenbaum was
    very positive about microcode and advocated adding instructions
    that directly correspond to higher-level language constructs.
    In a sense, Tanenbaum could see advantegs of RISC. Namely
    he cites report about compiling Fortran to IBM microcode:
    Fortran compiled to microcode could run 45 times faster than
    Fortran compiled to native code. So it was implicitely
    known to him that very primitive machine language was
    pretty adequate to get high speed from compiled languages.
    Yet Tanenbaum still wanted microcode and gave made-up
    examples of microcode advantages. No wonder that VAX
    designers were in the same camp.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Waldek Hebisch on Fri Aug 1 18:33:31 2025
    antispam@fricas.org (Waldek Hebisch) writes:
    John Levine <johnl@taugh.com> wrote:
    <snip>
    ADDL3 (R2)+,#1234,(R2)+

    This is encoded as

    OPCODE (R2)+ (PC)+ <1234> (R2)+

    The immediate word is in the middle of the instruction. You have to decode >> the operands one at a time so you can recognize immediates and skip over them.

    Actually decoder that I propose could decode _this_ one in one
    cycle.

    Assuming it didn't cross a cache line, which is possible with any
    variable length instruction encoding.

    But for this instruction one cycle decoding is not needed,
    because execution will take multiple clocks. One cycle decoding
    is needed for

    ADDL3 R2,#1234,R2

    which should be executed in one cycle. And to handle it one needs
    7 operand decoders looking at 7 consequitive bytes, so that last
    decoder sees last register argument.

    It must have seemed clever at the time, but ugh.

    VAX designers clearly had microcode in mind, even small changes
    could make hardware decoding easier.

    I have book by A. Tanenbaum about computer architecture that
    was written in similar period as VAX design.

    That would be:

    $ author tanenbaum
    Enter password:
    artist title format location
    Tanenbaum, Andrew S. Structured Computer Organization Hard A029

    It's currently in box A029 in storage, but my recollection is that
    it was rather vax-centric.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Waldek Hebisch on Fri Aug 1 19:13:53 2025
    Waldek Hebisch <antispam@fricas.org> schrieb:
    John Levine <johnl@taugh.com> wrote:
    It appears that Waldek Hebisch <antispam@fricas.org> said:
    My idea was that instruction decoder could essentially translate

    ADDL (R2)+, R2, R3

    into

    MOV (R2)+, TMP
    ADDL TMP, R2, R3

    But how about this?

    ADDL3 (R2)+,(R2)+,(R2)+

    Now you need at least two temps, the second of which depends on the
    first,

    3 actually, the translation should be

    MOVL (R2)+, TMP1
    MOVL (R2)+, TMP2
    ADDL TMP1, TMP2, TMP3
    MOVL TMP3, (R2)+

    Of course, temporaries are only within pipeline, so they probably
    do not need real registers. But the instruction would need
    4 clocks.

    It would be, unoptimized (my VAX assember is very probably wrong)

    MOVL (R2),TMP1
    ADDL #4,R2
    MOVL (R2),TMP2
    ADDL #4,R2
    ADDL TMP1,TMP2,TMP2 ! That could be one register,
    ! or an implied forwarding register
    MOVL TMO2,(R2)
    ADDL #4,R2

    which could better be expressed by

    MOVL (R2),TMP1
    MOVL 4(R2),TMP2
    ADDL TMP1,TMP2,TMP2
    MOVL TMP2,8(R2)
    ADDL #12,R2

    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Waldek Hebisch@21:1/5 to Scott Lurndal on Fri Aug 1 21:24:42 2025
    Scott Lurndal <scott@slp53.sl.home> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:
    John Levine <johnl@taugh.com> wrote:
    <snip>
    ADDL3 (R2)+,#1234,(R2)+

    This is encoded as

    OPCODE (R2)+ (PC)+ <1234> (R2)+

    The immediate word is in the middle of the instruction. You have to decode >>> the operands one at a time so you can recognize immediates and skip over them.

    Actually decoder that I propose could decode _this_ one in one
    cycle.

    Assuming it didn't cross a cache line, which is possible with any
    variable length instruction encoding.

    Assuming that instruction is in prefetch buffer. IIUC VAX accessed
    cache in 4 byte units, so length of cache line did not matter.

    But for this instruction one cycle decoding is not needed,
    because execution will take multiple clocks. One cycle decoding
    is needed for

    ADDL3 R2,#1234,R2

    which should be executed in one cycle. And to handle it one needs
    7 operand decoders looking at 7 consequitive bytes, so that last
    decoder sees last register argument.

    It must have seemed clever at the time, but ugh.

    VAX designers clearly had microcode in mind, even small changes
    could make hardware decoding easier.

    I have book by A. Tanenbaum about computer architecture that
    was written in similar period as VAX design.

    That would be:

    $ author tanenbaum
    Enter password:
    artist title format location
    Tanenbaum, Andrew S. Structured Computer Organization Hard A029

    It's currently in box A029 in storage, but my recollection is that
    it was rather vax-centric.

    Maybe you have later edition. My had IBM-360, PDP-11 and Cyber-6600
    as example intstruction sets and IBM-360, PDP-11 and a Burroughs
    machine as examples of microcode level. VAX may be mentioned,
    but I am not sure. In library I saw later edition with quite
    different (post-VAX) examples.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to All on Thu Aug 28 15:10:55 2025
    John Levine <johnl@taugh.com> posted:

    It appears that Waldek Hebisch <antispam@fricas.org> said:
    My idea was that instruction decoder could essentially translate

    ADDL (R2)+, R2, R3

    into

    MOV (R2)+, TMP
    ADDL TMP, R2, R3

    But how about this?

    ADDL3 (R2)+,(R2)+,(R2)+

    Now you need at least two temps, the second of which depends on the
    first, and there are instructions with six operands. Or how about
    this:

    ADDL3 (R2)+,#1234,(R2)+

    This is encoded as

    OPCODE (R2)+ (PC)+ <1234> (R2)+

    The immediate word is in the middle of the instruction. You have to decode the operands one at a time so you can recognize immediates and skip over them.
    It must have seemed clever at the time, but ugh.


    What we must all realize is that each address mode in VAX was a microinstruction all unto itself.

    And that is why it was not pipelineable in any real sense.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to MitchAlsup on Fri Aug 29 10:34:31 2025
    MitchAlsup wrote:
    John Levine <johnl@taugh.com> posted:

    It appears that Waldek Hebisch <antispam@fricas.org> said:
    My idea was that instruction decoder could essentially translate

    ADDL (R2)+, R2, R3

    into

    MOV (R2)+, TMP
    ADDL TMP, R2, R3
    But how about this?

    ADDL3 (R2)+,(R2)+,(R2)+

    Now you need at least two temps, the second of which depends on the
    first, and there are instructions with six operands. Or how about
    this:

    ADDL3 (R2)+,#1234,(R2)+

    This is encoded as

    OPCODE (R2)+ (PC)+ <1234> (R2)+

    The immediate word is in the middle of the instruction. You have to decode >> the operands one at a time so you can recognize immediates and skip over them.
    It must have seemed clever at the time, but ugh.


    What we must all realize is that each address mode in VAX was a microinstruction all unto itself.

    And that is why it was not pipelineable in any real sense.

    Yes. The instructions are designed to parsed by a byte-code interpreter
    in microcode. Even the NVAX in 1992 its Decode can only produce one
    operand per clock.

    If that operand is one of the complex memory address modes then it
    might be possible to dispatch it and let the back end chew on it
    while Decode works on the second operand.

    But that assumes the operands are in slow memory. If they are in fast
    registers then it stalls waiting for the second and third operands to be decoded making a pipeline pointless.

    And since programs mostly put operands in registers it stalls at Decode.

    One might say we should just build a fast decoder. But if you look at
    the instruction formats, even the simplest 2 register instructions are
    3 bytes and would require looking at 24 instruction bits and 3 valid bits
    or 27 bits at once. The 3 operand rs1,rs2,rd instructions is 36 bits.

    That decoder has to deal with 2^27 or 2^36 possibilities!
    And that just handles 2 and 3 register instructions, no memory references.

    It is hypothetically possible with a pre-decode stage to compact those
    down to 17 bits for 2 register and 21 bits for 3 register but that is
    still too many possibilities. That just throws transistors at a problem
    that never needed to exist in the first place, and would still not be affordable in 1992 NMOS, certainly not in 1975 TTL.

    If we look at what the VAX is actually spending most of its time on,
    2 and 3 register ALU operations, those can be decoded in parallel by
    looking at 10 bits (8 opcode + 2 valid) for 2 register,
    15 bits (12 opcode + 3 valid) for 3 register instructions.
    Which is quite doable in 1975 TTL in 1 clock.
    And that allows the pipeline to not stall at Decode.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)