• Concedtina III May Be Returning

    From John Savard@21:1/5 to All on Sun Aug 31 06:17:07 2025
    I have had so much success in adjusting Concertina II to achieve my goals
    more fully than I had thought possible... that I now think that it may be possible to proceed from Concertina II to a design which gets rid of the
    one feature of Concertina II that has been the most controversial.

    Yes, I think that I could actually do without block structure.

    What would Concertina III look like?

    Well, the basic instruction set would be similar to that of Concertina II.
    But the P bits would be taken out of the operate instructions, and so
    would the option of replacing a register specification by a pseudo-
    immediate pointer.

    The tiny gaps between the opcodes of some instructions to squeeze out
    space for block headers would be removed.

    But the big spaces for the shortest block header prefixes would be what is
    used for doing without headers.

    Instead of a block header being used to indicate code consisting of variable-length instructions, variable-length instructions would be
    contained within a sequence of pairs of 32-bit instructions of this form:

    11110xx(17 bits)(8 bits)
    11111x(9 bits)(17 bits)

    Instructions could be 17 bits long, 34 bits long, 51 bits long, and so on,
    any multiple of 17 bits in length.

    In the first instruction slot of the pair, the two bits xx indicate, for
    the two 17-bit regions of the variable-length instruction area that start
    in it, if they are the first 17-bit area of an instruction. The second instruction slot only contains the start of one 17-bit area, so only one
    bit x is needed. Since 17 is an odd number, this meshes perfectly with the
    fact that the 17-bit area which straddles both words isn't split evenly,
    but rather one extra bit of it is in the second 32-bit instruction slot.

    I had been hoping to use 18-bit areas instead, but after re-checking my calculations, I found there just wasn't enough opcode space.

    Long instructions that contain immediates would not be part of variable-
    length instruction code. Instead, their lengths would be multiples of 32
    bits, making them part of ordinary code with 32-bit instructions.

    Their form would be like this:

    32-bit immediate:

    1010(12 bits)(16 bits)
    10111(11 bits)(16 bits)'

    where the first parenthesized area belongs to the instruction, and the
    second to the immediate.

    48-bit immediate:

    1010(12 bits)(16 bits)
    10110(11 bits)(16 bits)
    10111(11 bits)(16 bits)

    64-bit immediate:

    1010(12 bits)(16 bits)
    10110(3 bits)(24 bits)
    10111(3 bits)(24 bits)

    Since the instruction, exclusive of the immediate, really only needs 12
    bits - 7 bit opcode, and 5 bit destination register - in each case there's enough additional space for the instruction to begin with a few bits that indicates its length, so that decoding is simple.

    The scheme is not really space-efficient.

    But the question that I really have is... is this really any better than
    having block headers? Or is it just as bad, just as complicated?

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to BGB on Tue Sep 2 09:15:59 2025
    On Sun, 31 Aug 2025 13:12:52 -0500, BGB wrote:

    How about, say, 16/32/48/64/96:
    xxxx-xxxx-xxxx-xxx0 //16 bit
    xxxx-xxxx-xxxx-xxxx-xxxx-xxxx-xxyy-yyy1 //32 bit
    xxxx-xxxx-xxxx-xxxx-xxxx-xxxx-xx11-1111 //64/48/96 bit prefix

    Already elaborate enough...

    Thank you for your interesting suggestions.

    I'm envisaging Concertina III as closely based on Concertina II, with only minimal changes.

    Like Concertina II, it is to meet the overriding condition that
    instructions do not have to be decoded sequentially. This means that
    whenever an instruction, or group of instructions, spans more than 32
    bits, the 32 bit areas of the instruction, other than the first, must
    begin with a combination of bits that says "don't decode me".

    The first 32 bits of an instruction get decoded directly, and then trigger
    and control the decoding of the rest of the instruction.

    This has the consequence that any immediate value that is 32 bits or more
    in length has to be split up into smaller pieces; this is what I really
    don't like about giving up the block structure.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to All on Tue Sep 2 18:40:16 2025
    John Savard <quadibloc@invalid.invalid> posted:

    On Sun, 31 Aug 2025 13:12:52 -0500, BGB wrote:

    How about, say, 16/32/48/64/96:
    xxxx-xxxx-xxxx-xxx0 //16 bit
    xxxx-xxxx-xxxx-xxxx-xxxx-xxxx-xxyy-yyy1 //32 bit
    xxxx-xxxx-xxxx-xxxx-xxxx-xxxx-xx11-1111 //64/48/96 bit prefix

    Already elaborate enough...

    Thank you for your interesting suggestions.

    I'm envisaging Concertina III as closely based on Concertina II, with only minimal changes.

    Like Concertina II, it is to meet the overriding condition that
    instructions do not have to be decoded sequentially. This means that
    whenever an instruction, or group of instructions, spans more than 32
    bits, the 32 bit areas of the instruction, other than the first, must
    begin with a combination of bits that says "don't decode me".

    The first 32 bits of an instruction get decoded directly, and then trigger and control the decoding of the rest of the instruction.

    This has the consequence that any immediate value that is 32 bits or more
    in length has to be split up into smaller pieces; this is what I really
    don't like about giving up the block structure.

    I found this completely unnecessary.

    Only a small number of Major OpCodes can have constants, denoted by:: 0b'001xxxdd dddsssss D12dsmin orxsssss

    D=0 signifies '1' and '2' specify 5-bit immediates
    D=1 signifies a constant
    d=0 signifies 32-bit constant
    d=1 signifies 64-bit constant
    '1' signifies negation of Src1
    '2' signifies negation of Src2

    In effect, D12ds is a routing specifier, telling DECODE what to route
    where in an easy to determine pattern. You could go so far as to call
    it a routing OpCode. This field is a large contributor to how My 66000
    requires fewer instructions than Other ISAs.

    However, I also found that STs need an immediate and a displacement, so,
    Major == 0b'001001 and minor == 0b'011xxx has 4 ST instructions with
    potential displacement (from D12ds above) and the immediate has the
    size of the ST. This provides for::
    std #4607182418800017408,[r3,r2<<3,96]

    Lest one thinks this results in serial decoding, consider that the
    pattern decoder is 40 gates (just larger than 3-flip-flops) so one
    can afford to put this pattern decoder on every word in the inst-
    buffer and then inst[0] selects inst[1], but inst[1] has already
    selected inst[2] which has selected inst[3] and we have a tree
    pattern that can parse 16-instructions in a 16-gate cycle time
    from a 24-32 word input-buffer to DECODE. I call this stage of
    the pipeline PARSE.

    Also note that 1 My 66000 instruction does the work of 1.4 RISC-V
    instructions, so, a 6-wide My 66000 machine is equivalent to a
    8.4-to-9 wide RISC-V machine.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)