• Oops (Concertina II Going Around in Circles)

    From John Savard@21:1/5 to All on Wed Apr 24 23:49:25 2024
    I keep changing the basic design of Concertina II, instead of going
    forward and completing the task of fleshing it out.

    The reason for that... has been obvious all along. None of my attempts
    have satisfied me. I had goals for the architecture, some of which
    weren't being met by each iteration. So I kept going back and forth
    between compromising one set of goals, or compromising another set of
    goals.

    If I could make up my mind on what was most important to me, perhaps I
    could stop somewhere.

    Looking back at the various iterations, I did see that two goals were
    very important to me.

    I wanted to be able to have 16-bit instructions, at least in pairs
    within a 32-bit instruction slot, available without the overhead of a
    block header, in the basic instruction set. For this, I need to
    reserve 1/4 of the opcode space.

    Also, I wanted to have the basic load-store memory-reference
    instructions be able to use 16-bit displacements, have a three-bit
    index register field and a three-bit base register field, and be able
    to use all 32 registers in a normal register bank as destinations.
    This takes 3/4 of the opcode space.

    As 3/4 plus 1/4 is _not_ greater than 1, having both of these things
    in a design simultaneously is not impossible.

    And I've found some tiny scraps of opcode space left (in the 3/4 part;
    flexible auto-increment with an odd index register, since only even
    index registers are allowed in that mode) which are barely enough...

    for two-address register to register operate instructions, _and_ for a
    block header.

    The block header, while rudimentary, would be enough to allow...

    indicating some instruction slots as containing instructions from a
    secondary instruction set, so as to allow things like three-address
    operate instructions, multiple-register load and store instructions,

    and also allowing pseudo-immediates...

    and instructions longer than 32 bits.

    I have two unused opcodes in the load/store memory reference
    instructions, so I can use one of them for jump to subroutine (offset
    in the index register field, return address register in the
    destination register field) - and one for conditional jump. Since the
    condition code can go in the destinatin register field, and it only
    needs four bits, not five... I can also have a Load Address
    instruction, with the limitation that only registers 0-7 and 24-31 can
    be used as destinations (the ones used as index registers and the
    usual base registers).

    However, requiring the block header mechanism even for load and store
    multiple registers, basic to subroutine calls, means that the basic
    instruction set is... only _barely_ a complete one.

    So this is unlikely to satisfy me for very long either.

    One other possibility: stick with the current design - 1/4 of the
    opcode space for 16-bit instructions and 1/4 of the opcode space for instructions longer than 32 bits, so as to reduce their overhead and
    possibly allow the mechanism to also be used for prefixing
    instructions (not needed, though, if I decide to return to having
    block headers in a less vestigial form)...

    I would have to squeeze the "rest" of the instruction set a bit more
    if I switched from aligned-only load and store instructions to going
    to using only four base registers for them (the least painful of the restrictions I've considered so far), but it should be doable.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Thu Apr 25 16:00:14 2024
    John Savard wrote:

    I keep changing the basic design of Concertina II, instead of going
    forward and completing the task of fleshing it out.

    The reason for that... has been obvious all along. None of my attempts
    have satisfied me. I had goals for the architecture, some of which
    weren't being met by each iteration. So I kept going back and forth
    between compromising one set of goals, or compromising another set of
    goals.

    If I could make up my mind on what was most important to me, perhaps I
    could stop somewhere.

    Looking back at the various iterations, I did see that two goals were
    very important to me.

    I wanted to be able to have 16-bit instructions, at least in pairs
    within a 32-bit instruction slot, available without the overhead of a
    block header, in the basic instruction set. For this, I need to
    reserve 1/4 of the opcode space.

    Also, I wanted to have the basic load-store memory-reference
    instructions be able to use 16-bit displacements, have a three-bit
    index register field and a three-bit base register field, and be able
    to use all 32 registers in a normal register bank as destinations.
    This takes 3/4 of the opcode space.

    As 3/4 plus 1/4 is _not_ greater than 1, having both of these things
    in a design simultaneously is not impossible.

    Not impossible, sure: but reserving so much for so little is gonna hurt.

    And I've found some tiny scraps of opcode space left (in the 3/4 part; flexible auto-increment with an odd index register, since only even
    index registers are allowed in that mode) which are barely enough...

    In my opinion, your first cut at an ISA encoding should not consume more
    than ½ of the available encodings. Concer-tina-tanic is already full to
    the brim and you are still just fleshing it out.

    for two-address register to register operate instructions, _and_ for a
    block header.

    The block header, while rudimentary, would be enough to allow...

    indicating some instruction slots as containing instructions from a
    secondary instruction set, so as to allow things like three-address
    operate instructions, multiple-register load and store instructions,

    and also allowing pseudo-immediates...

    and instructions longer than 32 bits.

    I have two unused opcodes in the load/store memory reference
    instructions, so I can use one of them for jump to subroutine (offset
    in the index register field, return address register in the
    destination register field) - and one for conditional jump. Since the condition code can go in the destinatin register field, and it only
    needs four bits, not five... I can also have a Load Address
    instruction, with the limitation that only registers 0-7 and 24-31 can
    be used as destinations (the ones used as index registers and the
    usual base registers).

    However, requiring the block header mechanism even for load and store multiple registers, basic to subroutine calls, means that the basic instruction set is... only _barely_ a complete one.

    So this is unlikely to satisfy me for very long either.

    Sigh....

    One other possibility: stick with the current design - 1/4 of the
    opcode space for 16-bit instructions and 1/4 of the opcode space for instructions longer than 32 bits, so as to reduce their overhead and
    possibly allow the mechanism to also be used for prefixing
    instructions (not needed, though, if I decide to return to having
    block headers in a less vestigial form)...

    I would have to squeeze the "rest" of the instruction set a bit more
    if I switched from aligned-only load and store instructions to going
    to using only four base registers for them (the least painful of the restrictions I've considered so far), but it should be doable.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Thu Apr 25 12:41:23 2024
    On Thu, 25 Apr 2024 16:00:14 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    In my opinion, your first cut at an ISA encoding should not consume more
    than ½ of the available encodings. Concer-tina-tanic is already full to
    the brim and you are still just fleshing it out.

    Basically, I think that the reasonable length that a computer
    instruction should occupy is that which a similar instruction occupied
    on the IBM System/360 - which, in its day, was not regarded highly for
    its code density.

    However, I have banks of 32 registers instead of 16, and 16-bit
    displacements instead of 12 bits. Having only load and store
    memory-reference instructions, of course, helps to make up for this.

    That's why I can only use 8 of the 32 registers as base registers and
    as index registers, too.

    For wanting the impossible, of course I basically deserve what I get.
    If I _could_ manaage to pull it off, of course, the result would be of
    some practical use; an instruction set that's plain, clear, and simple
    (at least when compared to monstrosities like Itanium and x86) and
    which is parsimonious in its use of memory is of some value.

    While I'm rearranging the deck chairs, maybe I'll come up with an
    original idea.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Thu Apr 25 14:50:08 2024
    On Thu, 25 Apr 2024 12:41:23 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    While I'm rearranging the deck chairs, maybe I'll come up with an
    original idea.

    This latest proposal, which does differ from my previous attempts,
    does have _one_ advantage.

    In this case, as in the previous attempts, I will need to use the
    block header to indicate that some 32-bit instruction slots contain
    32-bit instructions in an "alternate" format.

    When the memory-reference instructions in the main format were
    compromised, that alternate format included uncompromised
    memory-reference instructions. So the extended instruction set,
    normal plus alternate, included the normal instructions twice.

    Here, I avoid that. Of course, though, the main format includes a
    severely compromised version of the register-to-register operate
    instructions. The alternate format would include the full version of
    those.

    Same thing, right?

    Well, not really - because the compromised version of
    register-to-register operate instructions contains only *one*
    instruction format. So there _is_ less duplication and waste, the
    instruction decode unit isn't set up to decode both the full version
    of the operate instructions and a second compromised format which is
    equally complex, but just has one bit trimmed off everywhere.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Sun May 5 00:57:44 2024
    On Wed, 24 Apr 2024 23:49:25 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    So this is unlikely to satisfy me for very long either.

    And, given that, I've thought long and hard about what really is
    needed.

    The main opcode space for 32-bit instructions is now divided as
    follows:

    3/4 for uncompromised memory-reference instructions.

    3/16 for uncompromised register-to-register operate instructions.

    1/16 for the header required for variable-length instructions.

    The variable-length instructions will allow, with 32 bits of overhead
    per block, arbitrary mixing of 17-bit short instructions (the extra
    bit goes into the two-bit prefix field in the header) and 32-bit
    instructions - and longer instructions.

    00 and 01 indicate 17-bit instructions starting with 0 and 1
    respectively.

    10 indicates a 16-bit extent that contains the start of an instruction
    32 bits long or longer.

    11 indicates a 16-bit extent that is not the start of an instruction.
    In addition to the remaining parts of an instruction, space reserved
    for pseudo-immediates can be indicated by this.

    There will be three forms of header.

    One just has a three-bit field indicating the number of 32-bit
    instruction slots reserved for pseudo-immediates, in a restricted register-to-register operate instruction squeezed into an odd bit of
    leftover opcode space.

    The other will provide VLIW functionality for code consisting only of
    32-bit instructions: predication, and explicit indication of
    parallelism.

    The final one is 1111 that allows 17-bit instructions, 48, 64, 80, and
    96 bit instructions, and their arbitrary mixing.

    This has the advantage of providing all the functionality I'm looking
    for - a large, extensible instruction set, compactness of code through
    16-bit instructions that don't restict which registers can be used,
    and memory-reference instructions that make full use of a 32-bit
    length being the only version of those instructions, instead of having
    to include both a cut-down form and a full-form, the latter only
    accessible with a header.

    Finally, this seems to be something that I will be forced to admit
    that further restructurings won't be able to improve upon - this will
    be the best way to squeeze everything I want into the 8-bit byte and
    the 32-bit word.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Mon May 6 11:10:17 2024
    On Sun, 05 May 2024 00:57:44 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    The main opcode space for 32-bit instructions is now divided as
    follows:

    3/4 for uncompromised memory-reference instructions.

    3/16 for uncompromised register-to-register operate instructions.

    1/16 for the header required for variable-length instructions.

    This is not quite right.

    3/4 for uncompromised basic memory-reference instructions.

    1/8 for other memory-reference instructions.

    1/16 for uncompromised register-to-register operate instructions.

    1/16 for the header required for variable-length instructions.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Mon May 6 11:06:44 2024
    On Sun, 05 May 2024 00:57:44 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    The final one is 1111 that allows 17-bit instructions, 48, 64, 80, and
    96 bit instructions, and their arbitrary mixing.

    However, one thing I wanted to do was have the 48-bit and longer
    instructions also available outside of the variable-length format.

    Previously, I had done this by having a second format of long
    instructions. I wanted to avoid that, this time.

    I came up with an idea. Just as 1111 _after_ the header in
    variable-length mode was used to indicate long instructions, for other
    modes, let 1111 after the header indicate each of two instruction
    slots in which three 18-bit units from variable-length are
    encapsulated. So a 48-bit instruction, taking up 64 bits, could be
    placed in any of the other modes.

    But because that code conflicts with the header, these things aren't first-class citizens! I tried freeing up 1110 as well, but that was
    clearly not going to work acceptably. So I took other measures that
    only partly addressed that issue but consumed far less opcode space.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Mon May 6 19:45:09 2024
    John Savard wrote:

    On Sun, 05 May 2024 00:57:44 -0600, John Savard <quadibloc@servername.invalid> wrote:

    The main opcode space for 32-bit instructions is now divided as
    follows:

    3/4 for uncompromised memory-reference instructions.

    3/16 for uncompromised register-to-register operate instructions.

    1/16 for the header required for variable-length instructions.

    This is not quite right.

    3/4 for uncompromised basic memory-reference instructions.

    1/8 for other memory-reference instructions.

    1/16 for uncompromised register-to-register operate instructions.

    1/16 for the header required for variable-length instructions.

    In comparison::

    1/8 for [Rbase+@disp16]
    1/8 for Rd = Rs1 OP imm16
    1/64 for [Rbase,Ri<<scale,#disp]
    1/64 for Rd = Rs1 OP Rs2
    1/64 for Rd = 3OP( Rs1,Rs2,Rs3)
    1/64 for Rd = 1OP( Rs1 )
    1/64 for PRED
    1/64 for <w:o>
    1/8 for branching

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Tue May 7 02:21:54 2024
    On Mon, 06 May 2024 11:06:44 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    But because that code conflicts with the header, these things aren't >first-class citizens! I tried freeing up 1110 as well, but that was
    clearly not going to work acceptably. So I took other measures that
    only partly addressed that issue but consumed far less opcode space.

    Although I had limited long vector and short vector operate
    instructions in the basic 32 bit instruction set, I didn't have long
    vector and short vector load and store instructions of any kind. Do I
    needed to add them in some form in order for the basic 32 bit
    instruction set to be complete.

    However, if I were to include a 6-bit length field in the long vector
    load and store instructions, once again I would have had to free up
    1/16 of the opcode space. Instead of completely doing without the
    ability to load and store any but full-length vectors, I eventually
    was able to include a two-bit length register field to the long vector
    load and store instructions.

    So this new instruction set has survived another challenge.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Tue May 7 21:49:10 2024
    On Tue, 07 May 2024 02:21:54 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    On Mon, 06 May 2024 11:06:44 -0600, John Savard ><quadibloc@servername.invalid> wrote:

    But because that code conflicts with the header, these things aren't >>first-class citizens! I tried freeing up 1110 as well, but that was
    clearly not going to work acceptably. So I took other measures that
    only partly addressed that issue but consumed far less opcode space.

    Although I had limited long vector and short vector operate
    instructions in the basic 32 bit instruction set, I didn't have long
    vector and short vector load and store instructions of any kind. Do I
    needed to add them in some form in order for the basic 32 bit
    instruction set to be complete.

    However, if I were to include a 6-bit length field in the long vector
    load and store instructions, once again I would have had to free up
    1/16 of the opcode space. Instead of completely doing without the
    ability to load and store any but full-length vectors, I eventually
    was able to include a two-bit length register field to the long vector
    load and store instructions.

    So this new instruction set has survived another challenge.

    And I've vinally gotten around, therefore, to updating my web site to
    present this latest incarnation as Concertina II.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Wed May 8 01:46:41 2024
    On Thu, 25 Apr 2024 16:00:14 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    In my opinion, your first cut at an ISA encoding should not consume more
    than ½ of the available encodings. Concer-tina-tanic is already full to
    the brim and you are still just fleshing it out.

    This is a point I think I should address.

    Why are my various iterations of Concertina II _all_, consistently,
    "full to the brim"?

    This is true if I compromise the basic load/store instructions, say by
    limiting them to three base registers for 16-bit displacements, so I
    can reserve 1/4 of the opcode space for paired 16-bit short
    instructions - which was one of the most common combinations -

    or if I reserve half the opcode space for two kinds of 16-bit short instructions,

    or if I don't compromise the basic load/store instructions, and only
    allow 16-bit instructions with a special header.

    These are the three basic variants of Concertina II that I have been vacillating between. But whichever I choose, I use nearly all possible
    opcode space, at least for basic 32-bit instructions.

    That didn't worry me much for two reasons.

    If I need an enormous amount of opcode space for some new kind of
    instructions for some new feature...

    I would still have _enormous_ amounts of opcode space available up in
    the stratosphere of 64-bit instructions and longer. (In some
    iterations, I did manage to use nearly all the 48-bit opcode space,
    because I tried to squeeze a form of string and packed decimal
    instructions there.)

    But what if the new feature was so important that I needed to have
    *short* instructions for the operations using that feature - 32-bit
    long instructions?

    Well, because of the block structure of Concertina II, which is
    primarily present to support pseudo-immediates (my idea of how to
    reconcile immediate values in instructions, which you've pointed out
    are very valuable, with my Concertina II design goal of fully
    independent and parallel decoding of every instruction in a block) and secondarily to allow VLIW features...

    I can always add one new type of header which specifies alternate
    instructions with fairly low overhead... and then, at a modest cost,
    even the most enormous new feature can have its own 32-bit
    instructions!

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Wed May 8 02:01:07 2024
    On Wed, 08 May 2024 01:46:41 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    Why are my various iterations of Concertina II _all_, consistently,
    "full to the brim"?

    I can always add one new type of header which specifies alternate >instructions with fairly low overhead... and then, at a modest cost,
    even the most enormous new feature can have its own 32-bit
    instructions!

    That only answersl a part of that question - why I feel I can _get
    away_ with having an ISA that is "full to the brim". But why did I let
    it get that way in the first place?

    Well, the reason for that is actually quite simple. Because a major
    design goal of Concertina II is to offer as much as possible of the
    basic operations required of a computer in instructions of the
    shortest possible length.

    16-bit displacements are the norm in microprocessor instruction sets,
    so I offer them. I offer base-index addressing - which microprocessors
    usually don't - because I feel it's needed for using arrays. And I
    have register banks of 32 registers because that's what today's RISC
    machines do.

    All of that means that the load and store instructions - particularly
    when integer load and store also include load unsigned and insert -
    take up 3/4 of all 32-bit instructions (approximately; one doesn't
    need unsigned load and insert for the 64-bit integer type, because it
    fills the register). And that's with using only 8 of the 32 registers
    for the base register and the index register each.

    Some parts of the instruction set do have slack. Two-address register-to-register operate instructions have a large opcode field,
    so there is some room for future expansion in parts of the instruction
    set.

    But, basically, it takes all the available bits to offer the level of functionality I am trying to provide with the basic 32-bit instruction
    set. Since that covers the traditional functionality of a CPU -
    floating-point and integer types - nothing basic is missing.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Wed May 8 09:17:34 2024
    On Wed, 08 May 2024 01:46:41 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    I can always add one new type of header which specifies alternate >instructions with fairly low overhead... and then, at a modest cost,
    even the most enormous new feature can have its own 32-bit
    instructions!

    And, naturally, after saying this, I had to go and prove it was
    possible by revising the ISA to add one alternate set of 32-bit
    instructions. Two more such sets are reserved for future expansion,
    however.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Wed May 8 21:46:37 2024
    John Savard wrote:

    On Thu, 25 Apr 2024 16:00:14 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    In my opinion, your first cut at an ISA encoding should not consume more >>than ½ of the available encodings. Concer-tina-tanic is already full to >>the brim and you are still just fleshing it out.

    This is a point I think I should address.

    Why are my various iterations of Concertina II _all_, consistently,
    "full to the brim"?

    This is true if I compromise the basic load/store instructions, say by limiting them to three base registers for 16-bit displacements, so I
    can reserve 1/4 of the opcode space for paired 16-bit short
    instructions - which was one of the most common combinations -

    or if I reserve half the opcode space for two kinds of 16-bit short instructions,

    or if I don't compromise the basic load/store instructions, and only
    allow 16-bit instructions with a special header.

    These are the three basic variants of Concertina II that I have been vacillating between. But whichever I choose, I use nearly all possible
    opcode space, at least for basic 32-bit instructions.

    This should hint that you are long down the dark alley.

    That didn't worry me much for two reasons.

    Perhaps you feel save down the dark alley....

    If I need an enormous amount of opcode space for some new kind of instructions for some new feature...

    I would still have _enormous_ amounts of opcode space available up in
    the stratosphere of 64-bit instructions and longer. (In some
    iterations, I did manage to use nearly all the 48-bit opcode space,
    because I tried to squeeze a form of string and packed decimal
    instructions there.)

    So, why do you need a header AT ALL !!

    {Notice that I get a full functional ISA where the instruction specifier
    is always 32-bits and I still have room for constants and for extensions.}

    If your bail out position is:: "some instructions can be 64-bits" --
    S T A R T with that as an assumption !!

    But what if the new feature was so important that I needed to have
    *short* instructions for the operations using that feature - 32-bit
    long instructions?

    G A S P ........why do I even try.....

    Well, because of the block structure of Concertina II, which is
    primarily present to support pseudo-immediates (my idea of how to
    reconcile immediate values in instructions, which you've pointed out
    are very valuable, with my Concertina II design goal of fully
    independent and parallel decoding of every instruction in a block) and secondarily to allow VLIW features...

    I can always add one new type of header which specifies alternate instructions with fairly low overhead... and then, at a modest cost,
    even the most enormous new feature can have its own 32-bit
    instructions!

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Wed May 8 21:59:54 2024
    John Savard wrote:

    On Wed, 08 May 2024 01:46:41 -0600, John Savard <quadibloc@servername.invalid> wrote:

    Why are my various iterations of Concertina II _all_, consistently,
    "full to the brim"?

    I can always add one new type of header which specifies alternate >>instructions with fairly low overhead... and then, at a modest cost,
    even the most enormous new feature can have its own 32-bit
    instructions!

    That only answersl a part of that question - why I feel I can _get
    away_ with having an ISA that is "full to the brim". But why did I let
    it get that way in the first place?

    Well, the reason for that is actually quite simple. Because a major
    design goal of Concertina II is to offer as much as possible of the
    basic operations required of a computer in instructions of the
    shortest possible length.

    May I suggest that sacrificing 16-bit instructions may give you the room whereby typical applications require less space without the 16-bit insts
    that with them !?!

    But this begs the question::

    Would your implementations perform better by executing FEWER instructions
    or executing MORE instructions at a faster rate ??? The tradeoffs are complicated and subtle. In 1986±, Mark Horowitz stated that <Stanford>
    MIPS executed 1.5× as many instructions as VAX 11/780 at 6× the frequency
    to achieve a 4× performance advantage.

    My 66000, on the other hand is executing 1.1× as many instructions as
    VAX 11/780 and has a 5% (1/20) per pipeline stage gate overhead compared
    to RISC-V (maybe) for a 35% performance advantage over RISC-V.

    I say (maybe) because the pipeline designs I see for RISC-V use a 2 cycle latency LD pipeline with set associative caches. This puts a lot of gates between AGEN and LD forwarding to fit in 2 cycles. My pipelines give this
    loop 3 cycles.

    16-bit displacements are the norm in microprocessor instruction sets,
    so I offer them. I offer base-index addressing - which microprocessors usually don't - because I feel it's needed for using arrays. And I
    have register banks of 32 registers because that's what today's RISC
    machines do.

    So, you are getting eaten alive by the extra bit of register specifier !! which, then, is forcing you into extreme encoding positions--gotcha.

    All of that means that the load and store instructions - particularly
    when integer load and store also include load unsigned and insert -
    take up 3/4 of all 32-bit instructions (approximately; one doesn't
    need unsigned load and insert for the 64-bit integer type, because it
    fills the register). And that's with using only 8 of the 32 registers
    for the base register and the index register each.

    Do not put into ISA that which compiler CANNOT use !!
    Oh, wait, you have no ability to know what the compiler can use--either.

    Some parts of the instruction set do have slack. Two-address register-to-register operate instructions have a large opcode field,
    so there is some room for future expansion in parts of the instruction
    set.

    But, basically, it takes all the available bits to offer the level of functionality I am trying to provide with the basic 32-bit instruction
    set. Since that covers the traditional functionality of a CPU - floating-point and integer types - nothing basic is missing.

    Tisk.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Wed May 8 20:39:09 2024
    On Wed, 8 May 2024 21:59:54 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    May I suggest that sacrificing 16-bit instructions may give you the room >whereby typical applications require less space without the 16-bit insts
    that with them !?!

    Your suggestions are always welcome, given your great breadth of
    knowledge.

    My latest "extreme encoding position" means that 16-bit instructinos
    are now relegated to a secondary instruction format that must be
    indicated by a header. However, now they're 17 bits long instead of 15
    bits long, so they can operate on any two registers in a 32-register
    bank.

    Having 14 instructions in a block instead of 8 instructions normally
    lets me do more. I know that in your MY 66000 architecture, the
    instructions have extra functionality that lets you combine things
    like negation and increment with an operation. While I certainly could
    try to add such a feature to my architecture - in fact, I did try that
    in one Concertina II iteration - I'm afraid that, not having your
    knowledge, I wouldn't be able to do it in a way that resulted in any significant savings in the number of instructions required for a
    program.

    And if I tried to add flexibility, I'd end up with an instruction set
    that looked like that of the VAX 11/780, which is not a direction to
    go in if performance is a concern.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Wed May 8 20:48:20 2024
    On Wed, 8 May 2024 21:46:37 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    So, why do you need a header AT ALL !!

    Assuming I don't want to ever allow the circuits of my computer to try
    decoding an instruction that turns out later to be data (unless the
    programmer has made an error, in which case the penalty of the program
    being aborted is no problem)...

    and I want the computer to be able to decode all the instructions in a
    block in parallel, as a way to improve performance,

    then I need a block header to say 'here are the instructions to
    decode' IF I don't want to be absolutely limited to all instructions
    having the same length.

    While I could still have a pair of 16-bit instructions in a 32-bit
    word, without headers I couldn't have immediates (at least not of most lengths), or other instructions longer than 32 bits.

    And headers let me add instruction predication, which is also good, as
    branches do cause difficulties which predication partly avoids.
    (There's still a dependency on what is being predicated upon.)

    The header facilitates fast decoding of a flexible instruction set,
    and allows VLIW features allowing the ISA to be used for embedded
    processors.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Thu May 9 03:05:55 2024
    John Savard wrote:

    On Wed, 8 May 2024 21:46:37 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    So, why do you need a header AT ALL !!

    Assuming I don't want to ever allow the circuits of my computer to try decoding an instruction that turns out later to be data (unless the programmer has made an error, in which case the penalty of the program
    being aborted is no problem)...

    and I want the computer to be able to decode all the instructions in a
    block in parallel, as a way to improve performance,

    What makes you think My 66000 ISA cannot be decoded in parallel ??
    Over the last year I have illustrated how up to 16 instructions,
    each variable length from 1..5 words, can be decoded in parallel.

    First you select a suitable buffer which presents instructions to be
    decoded. A 6-wide machine will be using 16 words.

    Each word (320-gates of flip flops) has a 40-gate size decoder,
    and this size is used to select its successor.

    The first instruction starts at IP, the next is selected from the
    decode of the first instructions (4 gates of delay). Here after,
    the selection of the second instruction selects instructions 3 and
    4. Next we select instructions 5 through 8, then 9 through 16.
    8 total gates of logic and several gates of fan-out buffering.

    I happen to call this parse--instructions are parsed into individual
    starting points. and up to 16 instructions are presented to 16
    instruction decoders. Each of these decoders decodes the entire ISA.
    {Although there are ways to route instructions to more specialized
    decoders if desired.}}

    You are using a header, I am using logic.

    By using logic, there is no waste of bits in the instruction encoding.

    then I need a block header to say 'here are the instructions to
    decode' IF I don't want to be absolutely limited to all instructions
    having the same length.

    Seems like a horrible plan going forward with your goals in mind.

    While I could still have a pair of 16-bit instructions in a 32-bit
    word, without headers I couldn't have immediates (at least not of most lengths), or other instructions longer than 32 bits.

    And headers let me add instruction predication, which is also good, as branches do cause difficulties which predication partly avoids.
    (There's still a dependency on what is being predicated upon.)

    I added predication without any such need.

    The header facilitates fast decoding of a flexible instruction set,
    and allows VLIW features allowing the ISA to be used for embedded
    processors.

    The header allows subtracting 1 stage from the 12+ stage k-wide pipeline,
    AND is causing all sorts of "other issues" to remain present.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Wed May 8 21:17:00 2024
    On Wed, 8 May 2024 21:46:37 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:
    John Savard wrote:

    But what if the new feature was so important that I needed to have
    *short* instructions for the operations using that feature - 32-bit
    long instructions?

    G A S P ........why do I even try.....

    I'm not sure why what I said _there_ was so shocking.

    But, yes, I do freely admit that Concertina II is _not_ an ISA that
    "makes sense" from your point of view... or, indeed, the point of view
    of many other people who value simplicity and elegance in a computer architecture.

    Instead, right from the start, it gives the appearance of having
    accumulated the kind of cruft that usually is acquired though decades
    of maintaining backwards compatibility.

    Still, I know that what I'm leading up to is shocking.

    The ISA looks - at first glance - like a plain old 32-bit RISC
    architecture. With a few little peculiarities... base-index
    addressing, like the 360, but not like any RISC architecture, for
    example.

    And then people notice the headers.

    Code is divided into 256-bit blocks, so that instructions can have "pseudo-immediates"; these values can be stacked at the end of a block
    so that they're all aligned, and they don't cause the instructions
    themselves to vary in length, so decoding is simple.

    Could that be regarded as tolerable?

    And the headers also allow... explicit indication of when instructions
    can execute in parallel, and instruction predication. Oh, so it's
    VLIW, too?

    And then they notice the killer. Perhaps they, too, will "gasp" in
    shock.

    There's also a header type that allows code where 16, 32, 48, 64...
    bit instructions can be combined in any order, for tracitional
    CISC-like code with a variable instruction size. But there's a 12.5%
    overhead penalty so that fast parallel decoding remains available.

    But that header does something else.

    It changes the instruction stream from being composed of 32-bit words
    to one composed of 36-bit words, divided into 18-bit halfwords.

    And if that isn't enough, the last two header types let you switch to
    38-bit words composed of two 19-bit halfwords. *That's* what I do to
    add a bunch of extra 32-bit instructions to the ISA, if some new
    feature is so important that I don't want the instructions that deal
    with it to have to be 48 bits long at least.

    And, yes, I can indeed understand why you might gasp in horror at that
    stage. But you said I was having problems running out of opcode space,
    so I had to demonstrate that I could pull new opcode space out of thin
    air, as it were, should I feel the need to do so.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Wed May 8 23:03:24 2024
    On Thu, 9 May 2024 03:05:55 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    What makes you think My 66000 ISA cannot be decoded in parallel ??
    Over the last year I have illustrated how up to 16 instructions,
    each variable length from 1..5 words, can be decoded in parallel.

    You are using a header, I am using logic.

    One of the things I'm doing is trying to make my ISA capable of
    efficient implementations by implementors who aren't necessarily as
    smart as you are; with headers, it's obvious how instructions can be
    decoded in parallel.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Wed May 8 23:09:09 2024
    On Wed, 08 May 2024 21:17:00 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    And if that isn't enough, the last two header types let you switch to
    38-bit words composed of two 19-bit halfwords. *That's* what I do to
    add a bunch of extra 32-bit instructions to the ISA, if some new
    feature is so important that I don't want the instructions that deal
    with it to have to be 48 bits long at least.

    I decided to plan ahead, and expand the opcode space even further by
    adding another header type.

    Now, one has access to three alternate instruction sets, but instead
    of those being fixed, the first two can be chosen from a pool of
    sixteen... and the third from a set of 128 different possibilities.

    Also, I've noted that each of those alternate instruction sets, while
    billed as sets of 32-bit instructions, can actually have opcode space
    reserved for longer instructions, just as is done in the primary
    instruction set.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Thu May 9 07:21:33 2024
    On Thu, 09 May 2024 07:16:58 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    On Wed, 08 May 2024 23:09:09 -0600, John Savard ><quadibloc@servername.invalid> wrote:

    Now, one has access to three alternate instruction sets, but instead
    of those being fixed, the first two can be chosen from a pool of
    sixteen... and the third from a set of 128 different possibilities.

    Of course, this sort of thing may leave you gasping in shock and
    horror. But look at the bright side. While 128 is a somewhat large
    number, it isn't astronomical; I haven't provided for an opcode space
    so large that there isn't enough matter in the whole Universe to
    print a programmer's manual for the architecture.

    Now, _that_ would be genuinely impracitcal!

    Of course, as these many additional sets of instructions get fleshed
    out, were the ISA to be implemented, such an ISA would lend new
    meaning to the term "dark silicon", since, having so many instructions available, they could hardly all be in common use.

    Indeed, the situation could even be described with the catchy book
    title...

    Fifty Shades of Dark Silicon

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Thu May 9 07:16:58 2024
    On Wed, 08 May 2024 23:09:09 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    Now, one has access to three alternate instruction sets, but instead
    of those being fixed, the first two can be chosen from a pool of
    sixteen... and the third from a set of 128 different possibilities.

    Of course, this sort of thing may leave you gasping in shock and
    horror. But look at the bright side. While 128 is a somewhat large
    number, it isn't astronomical; I haven't provided for an opcode space
    so large th at there isn't enough matter in the whole Universe to
    print a programmer's manual for the architecture.

    Now, _that_ would be genuinely impracitcal!

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Savard on Thu May 9 17:23:08 2024
    John Savard <quadibloc@servername.invalid> schrieb:
    On Thu, 9 May 2024 03:05:55 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    What makes you think My 66000 ISA cannot be decoded in parallel ??
    Over the last year I have illustrated how up to 16 instructions,
    each variable length from 1..5 words, can be decoded in parallel.

    You are using a header, I am using logic.

    One of the things I'm doing is trying to make my ISA capable of
    efficient implementations by implementors who aren't necessarily as
    smart as you are; with headers, it's obvious how instructions can be
    decoded in parallel.

    If you include a description of how to decode things in parallel
    in the description of your ISA, as Mitch has done for his, then
    implementers need not be particularly clever, they only need to
    follow what you write.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to John Savard on Thu May 9 20:28:00 2024
    In article <fajp3j12esafhpn3e27ntfq5f538jmb3q7@4ax.com>, quadibloc@servername.invalid (John Savard) wrote:

    Of course, this sort of thing may leave you gasping in shock and
    horror. But look at the bright side. While 128 is a somewhat large
    number, it isn't astronomical; I haven't provided for an opcode
    space so large that there isn't enough matter in the whole Universe to >print a programmer's manual for the architecture.

    Now, _that_ would be genuinely impracitcal!

    Of course, as these many additional sets of instructions get fleshed
    out, were the ISA to be implemented

    I think you've just added another couple of orders of magnitude to the
    odds against that happening.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to John Savard on Thu May 9 15:46:39 2024
    John Savard wrote:
    On Wed, 8 May 2024 21:46:37 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    So, why do you need a header AT ALL !!

    Assuming I don't want to ever allow the circuits of my computer to try decoding an instruction that turns out later to be data (unless the programmer has made an error, in which case the penalty of the program
    being aborted is no problem)...

    and I want the computer to be able to decode all the instructions in a
    block in parallel, as a way to improve performance,

    It is ok to *try* decoding a length from a token that might be an
    instruction as long as you toss it away when you later find that it wasn't.

    You use the tail of the first instruction to select the start of the second. You use the tail of the first pair to select the start of the second pair.
    You use the tail of the first quad to select the start of the second quad.

    For example, if instructions can be 1..4 tokens long
    then the next instruction comes from one of 4 following tokens,
    the next instruction pair comes from one of 7 following instruction pairs,
    the next instruction quad comes from one of 13 following instruction quads.

    Decode0 Decode1 Decode2 Decode3 Decode4 Decode5...
    | | | | | |
    v v v v v v
    Length0->[--------4:1 Select Mux----------][----------...
    | | | | | |
    v v | | | |
    Inst0 Inst1 v v v v
    Length1->[----------7:1 Select Mux---------------------]
    | | | |
    v v v v
    Inst2 Inst3 [----------13:1 Select Mux-----------]
    | | | |
    v v v v
    Inst4 Inst5 Inst6 Inst7

    <---first pair---><--second pair--><--third pair---><---fourth pair--->
    <-----------first quad------------><--------second quad--------------->


    Its mostly done with wires and muxes, and a little glue logic.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Thu May 9 15:09:11 2024
    On Thu, 9 May 2024 20:28 +0100 (BST), jgd@cix.co.uk (John Dallman)
    wrote:

    I think you've just added another couple of orders of magnitude to the
    odds against that happening.

    What, you don't think that an ISA that is capable of handlling an
    instruction set two orders of magnitude larger than ordinary
    instruction sets wouildn't have a highly sought-after feature, at
    least for some niches?

    Instructions are multiples of 16 bits in length, like on a Motorola
    68000 or an IBM System/360, not multiples of eight bits like on x86...
    so headers provide a way to add just a few bits to instructions
    instead of adding a whole 16 bits, when that isn't needed.

    And after devising a mechanism to use _three_ extra opcode spaces in
    the instruction set... I merely decided to be proactive, and give the architecture room for further expansion, by generalizing it a tad
    more, and allow an additional 123 opcode spaces, potentially of equal
    or larger size. (Larger because an additional opcode space could have
    the bigger than 32 bit instructions all start with 1 instead of 1111,
    and thus have more opcode space because of having a larger proportion
    of longer instructions.)

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to EricP on Thu May 9 21:08:48 2024
    EricP wrote:



    It is ok to *try* decoding a length from a token that might be an
    instruction as long as you toss it away when you later find that it wasn't.

    You use the tail of the first instruction to select the start of the second. You use the tail of the first pair to select the start of the second pair. You use the tail of the first quad to select the start of the second quad.

    For example, if instructions can be 1..4 tokens long
    then the next instruction comes from one of 4 following tokens,
    the next instruction pair comes from one of 7 following instruction pairs, the next instruction quad comes from one of 13 following instruction quads.

    Decode0 Decode1 Decode2 Decode3 Decode4 Decode5...
    | | | | | |
    v v v v v v
    Length0->[--------4:1 Select Mux----------][----------...
    | | | | | |
    v v | | | |
    Inst0 Inst1 v v v v
    Length1->[----------7:1 Select Mux---------------------]
    | | | |
    v v v v
    Inst2 Inst3 [----------13:1 Select Mux-----------]
    | | | |
    v v v v
    Inst4 Inst5 Inst6 Inst7

    <---first pair---><--second pair--><--third pair---><---fourth pair--->
    <-----------first quad------------><--------second quad--------------->


    Treeifying::

    Decode0 Decode1 Decode2 Decode3 Decode4 Decode5...
    | | | | | |
    | | | Pinst3->[--------4:1 Select Mux-
    | | | | | |
    | | Pinst2->[--------4:1 Select Mux----------]
    | | | | | |
    | Pinst1->[--------4:1 Select Mux----------]
    | Length1 | | | |
    v v v v v v
    Length0->[--------4:1 Select Mux----------]
    | | | | | |
    v v | | | |
    Inst0 Inst1 v v v v
    Length1->[----------2:1×4 Select Mux----------------]
    | | | |
    v v v v
    Inst2 Inst3 [----------2:1×4 Select Mux-----------]
    | | | |
    v v v v
    Inst4 Inst5 Inst6 Inst7

    <---first pair---><--second pair--><--third pair---><---fourth pair--->
    <-----------first quad------------><--------second quad--------------->


    Where Pinsti is a purported instruction decode which may or may not
    be selected as an instruction starting point. This gets rid of the
    wide multiplexers at the cost of additional 4:1 multiplexers.

    And thanks for taking the time to ASCII-art the figure.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to All on Thu May 9 19:40:26 2024
    MitchAlsup1 wrote:
    EricP wrote:



    It is ok to *try* decoding a length from a token that might be an
    instruction as long as you toss it away when you later find that it
    wasn't.

    You use the tail of the first instruction to select the start of the
    second.
    You use the tail of the first pair to select the start of the second
    pair.
    You use the tail of the first quad to select the start of the second
    quad.

    For example, if instructions can be 1..4 tokens long
    then the next instruction comes from one of 4 following tokens,
    the next instruction pair comes from one of 7 following instruction
    pairs,
    the next instruction quad comes from one of 13 following instruction
    quads.

    Decode0 Decode1 Decode2 Decode3 Decode4 Decode5...
    | | | | | |
    v v v v v v
    Length0->[--------4:1 Select Mux----------][----------...
    | | | | | |
    v v | | | |
    Inst0 Inst1 v v v v
    Length1->[----------7:1 Select Mux---------------------]
    | | | |
    v v v v
    Inst2 Inst3 [----------13:1 Select
    Mux-----------]
    | | | |
    v v v v
    Inst4 Inst5 Inst6 Inst7

    <---first pair---><--second pair--><--third pair---><---fourth
    pair--->
    <-----------first quad------------><--------second
    quad--------------->


    Treeifying::

    Decode0 Decode1 Decode2 Decode3 Decode4 Decode5...
    | | | | | |
    | | | Pinst3->[--------4:1 Select Mux-
    | | | | | |
    | | Pinst2->[--------4:1 Select Mux----------]
    | | | | | |
    | Pinst1->[--------4:1 Select Mux----------]
    | Length1 | | | |
    v v v v v v
    Length0->[--------4:1 Select Mux----------]
    | | | | | |
    v v | | | |
    Inst0 Inst1 v v v v
    Length1->[----------2:1×4 Select Mux----------------]
    | | | |
    v v v v
    Inst2 Inst3 [----------2:1×4 Select Mux-----------]
    | | | |
    v v v v
    Inst4 Inst5 Inst6 Inst7

    <---first pair---><--second pair--><--third pair---><---fourth pair--->
    <-----------first quad------------><--------second quad--------------->


    Where Pinsti is a purported instruction decode which may or may not
    be selected as an instruction starting point. This gets rid of the
    wide multiplexers at the cost of additional 4:1 multiplexers.

    And thanks for taking the time to ASCII-art the figure.

    I should have mentioned those muxes are replicated horizontally across
    the input token buffer for each offset a pair or quad could start at.
    In the above case, the input buffer has space for 8 instruction * 4 tokens,
    The first token is offset 0, the first possible pair starts at offset 1,
    the last possible pair starts at offset 28, so thats 28 sets of 4:1 muxes
    * 4 tokens per instruction * bits-per-token (plus sundry housekeeping bits).

    Also I used one-hot select muxes, that is the 4:1 mux has a 4-bit
    one-hot select control and the 7:1 mux has a 7-bit select control,
    as it is easier to shift a one-hot enable out to the next position,
    and it eliminates the mux binary decoder and length adders for
    figuring out where the next pair or quad starts from.

    So those wide muxes are really just a layer of AND gates enabled by
    one of the select control bits, and a 4 or 7 or 13 input OR.
    There are no length adders inside the selection routing tree,
    just at the end to sum up the total length of valid instruction bytes
    so we know what to increment the fetch RIP by.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to John Savard on Fri May 10 00:19:00 2024
    In article <ofeq3j9ni63e7tmccf2qbkb9t0naui44ei@4ax.com>, quadibloc@servername.invalid (John Savard) wrote:

    On Thu, 9 May 2024 20:28 +0100 (BST), jgd@cix.co.uk (John Dallman)
    wrote:
    I think you've just added another couple of orders of magnitude to
    the odds against that happening.

    What, you don't think that an ISA that is capable of handlling an
    instruction set two orders of magnitude larger than ordinary
    instruction sets wouildn't have a highly sought-after feature, at
    least for some niches?

    Not that justified the costs of implementing such a huge instruction set.
    All the transistors that go into that are not going into performance
    (caches, functional units, and OoO pool size) and are pushing up the size
    of the minimal implementation.

    Also, teaching development tools about vast instruction sets is likely to demonstrate the RISC lesson again: compilers only use the simple parts.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to EricP on Fri May 10 01:09:10 2024
    EricP wrote:

    MitchAlsup1 wrote:
    EricP wrote:



    It is ok to *try* decoding a length from a token that might be an
    instruction as long as you toss it away when you later find that it
    wasn't.

    You use the tail of the first instruction to select the start of the
    second.
    You use the tail of the first pair to select the start of the second
    pair.
    You use the tail of the first quad to select the start of the second
    quad.

    For example, if instructions can be 1..4 tokens long
    then the next instruction comes from one of 4 following tokens,
    the next instruction pair comes from one of 7 following instruction
    pairs,
    the next instruction quad comes from one of 13 following instruction
    quads.

    Decode0 Decode1 Decode2 Decode3 Decode4 Decode5...
    | | | | | |
    v v v v v v
    Length0->[--------4:1 Select Mux----------][----------...
    | | | | | |
    v v | | | |
    Inst0 Inst1 v v v v
    Length1->[----------7:1 Select Mux---------------------]
    | | | |
    v v v v
    Inst2 Inst3 [----------13:1 Select
    Mux-----------]
    | | | |
    v v v v
    Inst4 Inst5 Inst6 Inst7

    <---first pair---><--second pair--><--third pair---><---fourth
    pair--->
    <-----------first quad------------><--------second
    quad--------------->


    Treeifying::

    Decode0 Decode1 Decode2 Decode3 Decode4 Decode5...
    | | | | | |
    | | | Pinst3->[--------4:1 Select Mux-
    | | | | | |
    | | Pinst2->[--------4:1 Select Mux----------]
    | | | | | |
    | Pinst1->[--------4:1 Select Mux----------]
    | Length1 | | | |
    v v v v v v
    Length0->[--------4:1 Select Mux----------]
    | | | | | |
    v v | | | |
    Inst0 Inst1 v v v v
    Length1->[----------2:1×4 Select Mux----------------]
    | | | |
    v v v v
    Inst2 Inst3 [----------2:1×4 Select
    Mux-----------]
    | | | |
    v v v v
    Inst4 Inst5 Inst6 Inst7

    <---first pair---><--second pair--><--third pair---><---fourth pair---> >> <-----------first quad------------><--------second quad---------------> >>

    Where Pinsti is a purported instruction decode which may or may not
    be selected as an instruction starting point. This gets rid of the
    wide multiplexers at the cost of additional 4:1 multiplexers.

    And thanks for taking the time to ASCII-art the figure.

    I should have mentioned those muxes are replicated horizontally across
    the input token buffer for each offset a pair or quad could start at.
    In the above case, the input buffer has space for 8 instruction * 4 tokens, The first token is offset 0, the first possible pair starts at offset 1,
    the last possible pair starts at offset 28, so thats 28 sets of 4:1 muxes
    * 4 tokens per instruction * bits-per-token (plus sundry housekeeping bits).

    Also I used one-hot select muxes,

    To a logic designer, the difference between a 1-hot mux and a binary
    mux is a binary to 1-hot decoder--the part actually doing the muxing
    is identical.

    Also note: to the logic designer, a Find-First circuit produces a
    unary (1-hot) output and if you want binary output you put the 1-hot
    through aa 1-hot to binary encoder.

    that is the 4:1 mux has a 4-bit
    one-hot select control and the 7:1 mux has a 7-bit select control,

    A 4:1 mux is 1 gate of delay (and one logic inversion)
    a 7:1 mux is 2 gates of delay (and two logic inversions)
    A 13:1 mux is 3 gates of delay (and 3 logic inversions)

    By treeifying the logic all muxes (as above) become 1 gate delay.

    as it is easier to shift a one-hot enable out to the next position,

    99% of selection logic anywhere in a pipeline is 1-hot.

    and it eliminates the mux binary decoder and length adders for
    figuring out where the next pair or quad starts from.

    Exactly.

    So those wide muxes are really just a layer of AND gates enabled by
    one of the select control bits, and a 4 or 7 or 13 input OR.
    There are no length adders inside the selection routing tree,
    just at the end to sum up the total length of valid instruction bytes
    so we know what to increment the fetch RIP by.

    Basically, you let each word determine its output and you decode the
    LOBs of IP to get your starting point.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Thu May 9 21:02:51 2024
    On Fri, 10 May 2024 00:19 +0100 (BST), jgd@cix.co.uk (John Dallman)
    wrote:

    Not that justified the costs of implementing such a huge instruction set.

    Well, having a huge instruction set defined and implementing all of it
    are two different things.

    Look at x86, how MMX got replaced by SSE which got replaced by AVX.

    So if one is going to include instructions that will later become
    obsolete, and be replaced by other instructions, not re-using the same
    opcodes helps with upwards compatibility.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Fri May 10 17:27:10 2024
    John Savard wrote:

    On Fri, 10 May 2024 00:19 +0100 (BST), jgd@cix.co.uk (John Dallman)
    wrote:

    Not that justified the costs of implementing such a huge instruction set.

    Well, having a huge instruction set defined and implementing all of it
    are two different things.

    Look at x86, how MMX got replaced by SSE which got replaced by AVX.

    So if one is going to include instructions that will later become
    obsolete, and be replaced by other instructions, not re-using the same opcodes helps with upwards compatibility.

    Or skip to the end and only invent AVX while skipping the soon-to-be
    redundant intermediate stages.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to John Savard on Fri May 10 20:51:00 2024
    In article <be3r3jhr1kf9n1cdsbik5ejsuso7c3pmmk@4ax.com>, quadibloc@servername.invalid (John Savard) wrote:

    On Fri, 10 May 2024 00:19 +0100 (BST), jgd@cix.co.uk (John Dallman)
    wrote:

    Not that justified the costs of implementing such a huge
    instruction set.

    Well, having a huge instruction set defined and implementing all of
    it are two different things.

    Look at x86, how MMX got replaced by SSE which got replaced by AVX.

    So if one is going to include instructions that will later become
    obsolete, and be replaced by other instructions, not re-using the
    same opcodes helps with upwards compatibility.

    Intel did not re-use the opcodes. MMX, SSE, SSE2 and so on are all still implemented and usable. Once a hardware feature has been used in software, getting rid of it is hard. I'm still building x86-32 software for SSE2
    because AVX[2] doesn't do anything useful for it.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Fri May 10 14:22:46 2024
    On Fri, 10 May 2024 17:27:10 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    Or skip to the end and only invent AVX while skipping the soon-to-be >redundant intermediate stages.

    Well, I went to 256-bit short vectors as a permanent part of the
    architecture, with long vectors as the next step.

    But what about crypto assist instructions, as another example?

    However, I think I will adjust this feature. You comlained I used up
    too much of my opcode space, so I demonstrated that Concertina II had
    the potential to have... a _lot_ of opcode space, even to ludicrous
    lengths.

    Now that I think I can finally wrap up Concertina II, having found how
    to achieve its goals as best as possible, I can go on to Concertina
    III... and, given your anguished pleas, I _will_ give up on block
    structure for the next iteration.

    In order to do that, though, it will have to be CISC, not RISC...
    banks of 8 registes, sort of like Concertina I, but much less messy.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Fri May 10 21:06:58 2024
    John Savard wrote:

    On Fri, 10 May 2024 17:27:10 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    Or skip to the end and only invent AVX while skipping the soon-to-be >>redundant intermediate stages.

    Well, I went to 256-bit short vectors as a permanent part of the architecture, with long vectors as the next step.

    But what about crypto assist instructions, as another example?

    If used often enough, sure, they make a lot of sense--just make whatever
    you put in applicable to a myriad of crypto functions.

    However, I think I will adjust this feature. You comlained I used up
    too much of my opcode space, so I demonstrated that Concertina II had
    the potential to have... a _lot_ of opcode space, even to ludicrous
    lengths.

    Now that I think I can finally wrap up Concertina II, having found how
    to achieve its goals as best as possible, I can go on to Concertina
    III... and, given your anguished pleas, I _will_ give up on block
    structure for the next iteration.

    Would you like to read My 66000 ISA while taking a break between CT II and
    CT III ??

    In order to do that, though, it will have to be CISC, not RISC...
    banks of 8 registes, sort of like Concertina I, but much less messy.

    With MEM-OPs are you not already CISC ??

    Plu8s, not only is CISC<->RISC not a proper metric but merely points
    along a complexity spectrum--one the RISC camp has used to oversell
    their case.

    My point is that there is a point between CISC and RISC where it takes
    fewer instructions to execute a given workload and simultaneously you
    have not screwed up the pipeline frequency so the ISA gains drop all
    the way to the bottom line.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Fri May 10 18:34:09 2024
    On Fri, 10 May 2024 21:06:58 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:
    John Savard wrote:

    Now that I think I can finally wrap up Concertina II, having found how
    to achieve its goals as best as possible, I can go on to Concertina
    III... and, given your anguished pleas, I _will_ give up on block
    structure for the next iteration.

    Would you like to read My 66000 ISA while taking a break between CT II and
    CT III ??

    Oh, yes, indeed, although I don't promise to shamelessly steal all
    your good ideas.

    I am going to try to somehow squeeze immediates in while keeping
    instuction length decoding relatively simple. That, I fear, is not
    going to be easy for me, although I outlined a scheme before which I
    feel is not simple enough.

    In order to do that, though, it will have to be CISC, not RISC...
    banks of 8 registes, sort of like Concertina I, but much less messy.

    With MEM-OPs are you not already CISC ??

    I should have been clearer, but to tell the truth would have taken
    many words.

    What I meant was that while Concertina II indeed is hardly RISC, it
    still contains a near-RISC instruction set in the basic 32-bit
    operations. Unlike typical RISC instruction sets, it has base plus
    index addressing, though.

    Then mem-ops are added in the first supplementary instruction set,
    yes. Concertina II is intended to be "architecture-agnostic", being at
    once sort of like RISC, but also VLIW and CISC.

    What Concertina III would give up, to no longer be RISC at all, would
    be register banks of 32 registers. Changing that to 8 registers
    shortens certain fields, letting me switch to native variable-length instructions without the need for any block header mechanism.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Fri May 10 21:05:39 2024
    On Fri, 10 May 2024 18:34:09 -0600, John Savard
    <quadibloc@servername.invalid> wrote:
    On Fri, 10 May 2024 21:06:58 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    With MEM-OPs are you not already CISC ??

    I should have been clearer, but to tell the truth would have taken
    many words.

    What I meant was that while Concertina II indeed is hardly RISC, it
    still contains a near-RISC instruction set in the basic 32-bit
    operations. Unlike typical RISC instruction sets, it has base plus
    index addressing, though.

    Then mem-ops are added in the first supplementary instruction set,
    yes. Concertina II is intended to be "architecture-agnostic", being at
    once sort of like RISC, but also VLIW and CISC.

    What Concertina III would give up, to no longer be RISC at all, would
    be register banks of 32 registers. Changing that to 8 registers
    shortens certain fields, letting me switch to native variable-length >instructions without the need for any block header mechanism.

    The headers divide the architecture into its code types.

    If no headers are used, the available instruction set is basically a
    RISC instruction set... with more than the usual amount of
    instructions, and with base + index addressing.

    Using type I headers adds immediates.

    If one is producing VLIW-style code, one will use the type II header.

    If one uses the type III header, then one has a CISC instruction set,
    with different lengths of instructions, memory to register operate instructions, string instructions, and so on. This is true of the type
    VI and VIII headers.

    The type VII header combines VLIW with CISC; then one will liikely
    also use the encapsulation mechanism to place long instructions within
    blocks with a type II header to use all the VLIW features.

    The type III header extends the instruction set, without otherwise
    departing from a RISC-style instruction set.

    So a compiler, at least on any one code generation setting, would use
    only a subset of the available headers - although one that can
    generate both CISC and VLIW code as requested is certainly a
    possibility.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Savard on Sat May 11 07:22:49 2024
    John Savard <quadibloc@servername.invalid> schrieb:
    On Fri, 10 May 2024 17:27:10 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    Or skip to the end and only invent AVX while skipping the soon-to-be >>redundant intermediate stages.

    Well, I went to 256-bit short vectors as a permanent part of the architecture, with long vectors as the next step.

    But what about crypto assist instructions, as another example?

    You will probably want to look at AES for this. AES operates on
    16-byte blocks, so having 128-bit registers is natural.

    AES256 also needs 15 separate keys, which should be kept in
    registers if you are doing things on a CPU, so because you
    also need intermedite results and also to load/store data,
    so 32 128-bit registers would be a good fit. Look at POWER's
    vcipher and vcipherlast as an example.

    These register would also be a good fit for 128-bit IEEE floating
    point, which only POWER at the moment supports in hardware, plus
    those SIMD things that do not come in loops (aka SLP).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Sat May 11 17:13:33 2024
    John Savard wrote:

    On Fri, 10 May 2024 21:06:58 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    Would you like to read My 66000 ISA while taking a break between CT II and >>CT III ??

    Oh, yes, indeed, although I don't promise to shamelessly steal all
    your good ideas.

    Send me an e-mail I can use as a return address.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Dallman on Sat May 11 17:48:51 2024
    jgd@cix.co.uk (John Dallman) writes:
    Also, teaching development tools about vast instruction sets is likely to >demonstrate the RISC lesson again: compilers only use the simple parts.

    That needs some elaboration. There are several potential reasons for
    that:

    1) The compiler writers found it too hard to use the complex
    instructions or addressing modes. For some kinds of instructions that
    is the case (e.g, for the AES instructions in Intel and AMD CPUs), but
    at least these days such instructions are there for use in libraries
    written in assembly language/with intrinsics.

    2) Some instructions are slower than a sequence of simpler
    instructions, so compilers will avoid them even if they would
    otherwise use them. That has been reported by both the IBM 801
    project about some S/370 instructions and by the Berkeley RISC project
    about the VAX. I don't remember any reports about addressing modes
    with that problem.

    3) Some instructions or addressing modes can be selected by compilers
    and are beneficial when they are used, but they are selected rarely
    because they fit the needs of the compiled program rarely.

    IIRC the RISC papers mentioned mainly 2), with a little bit of 3).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Sat May 11 12:14:31 2024
    On Sat, 11 May 2024 17:13:33 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    John Savard wrote:

    On Fri, 10 May 2024 21:06:58 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    Would you like to read My 66000 ISA while taking a break between CT II and >>>CT III ??

    Oh, yes, indeed, although I don't promise to shamelessly steal all
    your good ideas.

    Send me an e-mail I can use as a return address.

    All right; however, I still have the copy of your MY 66000
    architecture description which you sent me in 2017.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to tkoenig@netcologne.de on Sat May 11 12:20:51 2024
    On Sat, 11 May 2024 07:22:49 -0000 (UTC), Thomas Koenig
    <tkoenig@netcologne.de> wrote:

    You will probably want to look at AES for this. AES operates on
    16-byte blocks, so having 128-bit registers is natural.

    Oh, I've taken a very good look at AES, once upon a time...

    http://www.quadibloc.com/crypto/co040401.htm

    back when it was called Rijndael, in fact.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Sat May 11 12:18:24 2024
    On Fri, 10 May 2024 21:06:58 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:
    John Savard wrote:

    But what about crypto assist instructions, as another example?

    If used often enough, sure, they make a lot of sense--just make whatever
    you put in applicable to a myriad of crypto functions.

    While that seems to be sound advice, often it's not possible to take
    it.

    Many crypto assist features work like this: there are instructions to
    place keys in a secure area on the chip, and then instructions to
    encrypt - or decrypt, depending on which secure area was chosen -
    using one of those keys... of which a copy is no longer retained
    anywhere in the computer.

    The idea behind this is to keep the keys hidden from malicious
    software. But if you don't have access to the key, actually performing
    the encryption operation yourself is out of the question, so the
    entire operation has to be done in a single operation.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Sat May 11 19:16:46 2024
    Anton Ertl wrote:

    jgd@cix.co.uk (John Dallman) writes:
    Also, teaching development tools about vast instruction sets is likely to >>demonstrate the RISC lesson again: compilers only use the simple parts.

    That needs some elaboration. There are several potential reasons for
    that:

    1) The compiler writers found it too hard to use the complex
    instructions or addressing modes. For some kinds of instructions that
    is the case (e.g, for the AES instructions in Intel and AMD CPUs), but
    at least these days such instructions are there for use in libraries
    written in assembly language/with intrinsics.

    The 801 was correct on this::

    The compiler must be developed at the same time as ISA, if ISA has it
    and the compiler cannot use it then why is it there {yes there are
    certain privileged instructions lacking this property} Conversely is
    compiler could almost use an instruction but does not, then adjust
    the instruction specification so the compiler can !!

    2) Some instructions are slower than a sequence of simpler
    instructions, so compilers will avoid them even if they would
    otherwise use them.

    VAX CALL instructions did more work than what was required, it did
    the work it was specified to perform as rapidly as the HW could perform
    the specified task. It took 10 years to figure out that the CALL/RET
    overhead was excessive and wasteful.

    That has been reported by both the IBM 801
    project about some S/370 instructions and by the Berkeley RISC project
    about the VAX. I don't remember any reports about addressing modes
    with that problem.

    The problem with address modes is their serial decode, not with the ability
    to craft any operand the instruction needs. The second problem with VAX-like addressing modes is that it is overly expressive, all operands can be constants, whereas a good compiler will never need more than 1 constant
    per instruction (because otherwise some constant arithmetic could be
    performed at compile (or link) time.)

    3) Some instructions or addressing modes can be selected by compilers
    and are beneficial when they are used, but they are selected rarely
    because they fit the needs of the compiled program rarely.

    The following constructs are seen often enough that the memory reference instructions should support them well::

    *p LD Rd,[Rp]
    next LD Rd,[Rp,#next]
    array[i] LD Rd,[Rp,Ri<<scale,#array] // RISC-V fails
    p[i] LD Rd,[Rp,Ri<<scale] // RISC-V fails
    p[i].field LD Rd,[Rp,Ri<<scale,#field] // RISC-V fails

    Given the above: local variables just use SP and the variable's offset on
    the stack or frame; global variables resolved by the linker are accessed
    with 32-bit displacements off the IP (instruction pointer); while external variables are accessed through GOT as::

    extern[i] LD Rp,[IP,,#GOT[i].address-.]
    LD Rd,[Rp,#GOT[i].offsset]

    or an entry point called with::

    extern f() CALX [IP,,#GOT[f].address-.]

    {{CALX is effectively LD IP,[address] while storing the return address in R0}}

    CALX was invented/developed specifically to optimize external calling
    of subroutines (and adjusted until it fit the requirements.) Thus, not only should the compiler be n development with ISA, so should the linker !!

    IIRC the RISC papers mentioned mainly 2), with a little bit of 3).

    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to All on Sun May 12 03:57:02 2024
    On Thu, 9 May 2024 20:28 +0100 (BST), jgd@cix.co.uk (John Dallman)
    wrote:
    In article <fajp3j12esafhpn3e27ntfq5f538jmb3q7@4ax.com>, >quadibloc@servername.invalid (John Savard) wrote:

    Of course, this sort of thing may leave you gasping in shock and
    horror. But look at the bright side. While 128 is a somewhat large
    number, it isn't astronomical; I haven't provided for an opcode
    space so large that there isn't enough matter in the whole Universe to
    print a programmer's manual for the architecture.

    Now, _that_ would be genuinely impracitcal!

    Of course, as these many additional sets of instructions get fleshed
    out, were the ISA to be implemented

    I think you've just added another couple of orders of magnitude to the
    odds against that happening.

    I've decided to claw some of those orders of magnitude back, even if
    it hardly matters (zero divided by 100 is still zero).

    Now I've changed the applicable header format to provide only _eight_ additional alternate instruction sets, instead of almost 128 of them,
    using the available bits instead for something much more important -
    allowing the explicit indication of parallelism to be easily combined
    with the use of all of the first four instruction sets, as well as one
    of the eight new ones.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Savard on Sun May 12 11:45:27 2024
    John Savard <quadibloc@servername.invalid> schrieb:

    Now I've changed the applicable header format to provide only _eight_ additional alternate instruction sets,

    Questions/remarks. Please feel free to answer/correct.

    A header introduces a block, correct?

    Once a block has been identified by a header, the format of all
    instructions in that block is set. Correct?

    The compiler (or assembler programmer) must then chose which
    instructions go into which block, correct?

    If a short instruction is followed by a long instruction, and
    both are in a single block, what is the compiler to do?
    I can only see either a) to chose a block for the longer
    instruciton or b) fill the rest of the block for short
    instruction with (short) NOPs.

    How is this supposed to save space?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Sun May 12 12:20:33 2024
    On Wed, 24 Apr 2024 23:49:25 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    I keep changing the basic design of Concertina II, instead of going
    forward and completing the task of fleshing it out.

    I have recently added another page to the current iteration, at

    http://www.quadibloc.com/arch/cab0102.htm

    which describes the formats of the instructions longer than 32 bits.

    Since I've been going around in circles, all I had to do was go back
    and grab my files for

    http://www.quadibloc.com/arch/cw0102.htm

    (not currently a valid URL, but it can be found, no doubt, on the
    Wayback Machine)

    and make slight changes. Well, at least at first. I've now
    significantly fleshed out the 48-bit instructions, so as to make the
    extended register banks of 128 registers first-class citizens.

    As befits the "Concertina-tanic", I suppose.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to quadibloc@servername.invalid on Sun May 12 14:14:44 2024
    On Sun, 12 May 2024 12:20:33 -0600, John Savard
    <quadibloc@servername.invalid> wrote:

    Since I've been going around in circles, all I had to do was go back
    and grab my files for

    http://www.quadibloc.com/arch/cw0102.htm

    (not currently a valid URL, but it can be found, no doubt, on the
    Wayback Machine)

    I tried, just to check, and, no, it's not there.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)