• Pseudo-Immediates as Part of the Instruction

    From John Savard@21:1/5 to All on Fri Aug 1 15:11:49 2025
    I couldn't locate a post I finally felt I was ready to respond to, which
    was in reply to one of my posts about Concertina II, which said that
    immediates ought to be properly considered part of the instruction.

    Well, in nearly all computer architectures, immediates _are_ part of the instruction, and quite obviously so.

    But what Concertina II has are *pseudo* immediates. That is, they're not
    really immediates, but they pretend to be.

    What does this mean? What could this mean?

    Well, in my register-to-register operate instruction, associated with each _source_ register field, there's a bit which, if set, says that the five
    bits in the field aren't a register specifier, but a pointer to a constant.

    A constant that's addressed by an instruction isn't an immediate; it's a constant. So why do I even call these constants "pseudo-immediates" then?

    Well, that pointer - five bits long - is an awfully short pointer. Where
    does it point?

    Instructions are fetched in blocks that are 256 bits long. One of the
    things this allows for is for the block to begin with a header that
    specifies that a certain number of 32-bit instruction slots at the end of
    the current block are to be skipped over in the sequence of instructions
    to be executed; this space can be used for constants.

    So although the constant is fetched in response to a pointer, and thus is
    not an immediate, the constant is located directly in the instruction
    stream. This is particularly true in implementations where the memory bus
    is 256 bits wide, and a block of instructions is fetched in a single
    memory read.

    So the pseudo-immediate value is not part of the _instruction_ in the conventional sense, but if you think of the 256-bit block as being the
    "real" instruction for a VLIW architecture, it's part of *that*.

    Think of the Itanium: the 128-bit thingie is one thing, and each of the 41-
    bit thingies that make it up, along with the 5-bit header, is another
    thing.

    The 5-bit header is part of the 128-bit thingy without being part of any
    of the 41-bit thingies. That is the limbo in which my pseudo-immediates
    are found. Data? Or a field in the instruction? It can be either one,
    depending on whether you define each individual 32-bit instruction as an instruction, or the 256-bit block as the "real" instruction the
    architecture executes.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to John Savard on Fri Aug 1 16:52:34 2025
    On Fri, 01 Aug 2025 15:11:49 +0000, John Savard wrote:

    The 5-bit header is part of the 128-bit thingy without being part of any
    of the 41-bit thingies. That is the limbo in which my pseudo-immediates
    are found. Data? Or a field in the instruction? It can be either one, depending on whether you define each individual 32-bit instruction as an instruction, or the 256-bit block as the "real" instruction the
    architecture executes.

    ...and if you think that's crazy, in some of the earliest iterations of
    the Concertina II design, I implemented instructions longer than 32 bits
    by having a six-bit pointer in an instruction to the rest of the
    instruction.

    Which, I suppose, argues against the view that pseudo-immediates are not
    part of the instruction, since that which definitely is part of the
    instruction can be pointed to in the same way.

    I stopped doing that because I felt it involved too much overhead.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Savard on Fri Aug 1 18:08:17 2025
    John Savard <quadibloc@invalid.invalid> schrieb:
    I couldn't locate a post I finally felt I was ready to respond to, which
    was in reply to one of my posts about Concertina II, which said that immediates ought to be properly considered part of the instruction.

    That was probably mine.

    Well, in nearly all computer architectures, immediates _are_ part of the instruction, and quite obviously so.

    But what Concertina II has are *pseudo* immediates. That is, they're not really immediates, but they pretend to be.

    What does this mean? What could this mean?

    Well, in my register-to-register operate instruction, associated with each _source_ register field, there's a bit which, if set, says that the five
    bits in the field aren't a register specifier, but a pointer to a constant.

    A constant that's addressed by an instruction isn't an immediate; it's a constant. So why do I even call these constants "pseudo-immediates" then?

    Well, that pointer - five bits long - is an awfully short pointer. Where
    does it point?

    Question: Do the pointers point to the same block only, or also
    to other blocks? With 5 bits, you could address others as well.
    Can you give an example of their use, including the block headers?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to Thomas Koenig on Fri Aug 1 21:04:11 2025
    On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:

    Question: Do the pointers point to the same block only, or also to other blocks? With 5 bits, you could address others as well. Can you give an example of their use, including the block headers?

    Actually, no, 5 bits are only enough to point within the same block.
    That's because it's a byte pointer, as it can be used to point to any type
    of constant, including single byte constants.

    This is despite the fact that I do have an instruction format for
    conventional style byte immediates (and I've just squeezed in one for
    16-bit immediates as well).

    However, they _can_ point to another block, by means of a sixth bit that
    some instructions have... but when this happens, it does not trigger an
    extra fetch from memory. Instead, the data is retrieved from a copy of an earlier block in the instruction stream that's saved in a special
    register... so as to reduce potential NOP-style problems.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to Robert Finch on Sat Aug 2 03:22:41 2025
    On Fri, 01 Aug 2025 21:03:17 -0400, Robert Finch wrote:

    I tried something similar to this but without block headers and it
    worked okay. But there were a couple of issues. One was the last
    instruction in cache line could not have an immediate. Or instructions
    had to stop before the end of the cache line to accommodate immediates.
    This resulted in some wasted space.

    This is interesting. I've tried to keep things simple by making everything explicit.

    Also, it made reading listings more difficult as constants were in the
    middle of sequences of instructions.

    I don't plan on structuring my assembly language that way. It might make reading _core dumps_ more difficult, but pseudo-immediate values would
    appear in the assembler source within the instruction just like
    conventional immediates.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Savard on Sat Aug 2 03:21:56 2025
    On Fri, 1 Aug 2025 15:11:49 -0000 (UTC), John Savard wrote:

    Well, that pointer - five bits long - is an awfully short pointer. Where
    does it point?

    Instructions are fetched in blocks that are 256 bits long. One of the
    things this allows for is for the block to begin with a header that
    specifies that a certain number of 32-bit instruction slots at the end
    of the current block are to be skipped over in the sequence of
    instructions to be executed; this space can be used for constants.

    Just add a couple of modifier bits: one is the indirect bit, indicating
    that the location referenced contains the address of the value, not the
    value itself, and another “page zero” bit, which indicates that the location is not in the current block, but in another block at a fixed
    address ...

    ... and I start having PDP-8 flashbacks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Savard on Sat Aug 2 09:12:17 2025
    John Savard <quadibloc@invalid.invalid> schrieb:
    On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:

    Question: Do the pointers point to the same block only, or also to other
    blocks? With 5 bits, you could address others as well. Can you give an
    example of their use, including the block headers?

    Actually, no, 5 bits are only enough to point within the same block.
    That's because it's a byte pointer, as it can be used to point to any type
    of constant, including single byte constants.

    This is despite the fact that I do have an instruction format for conventional style byte immediates (and I've just squeezed in one for
    16-bit immediates as well).

    Is there a reason for that? On the face of it, having both makes
    no sense.

    But even so: Having a single, let's say, 32-bit immedate would require
    a 32-bit header and a 32-bit constant, so 64 bits used instead of
    directly encoding a 32-bit constant.

    However, they _can_ point to another block, by means of a sixth bit that
    some instructions have...

    Try writing an assembler and disassembler for what you have. I have
    written this for Mitch's ISA, and it turned out to be very difficult
    already. Your method, I would guess, would be much more difficult.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to Thomas Koenig on Sat Aug 2 18:57:43 2025
    On Sat, 02 Aug 2025 09:12:17 +0000, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    This is despite the fact that I do have an instruction format for
    conventional style byte immediates (and I've just squeezed in one for
    16-bit immediates as well).

    Is there a reason for that? On the face of it, having both makes no
    sense.

    The option of having a pseudo-immediate pointer instead of a register specification is baked into the format of the operate instructions.
    Removing it for some variable types would be messy.

    But even so: Having a single, let's say, 32-bit immedate would require a 32-bit header and a 32-bit constant, so 64 bits used instead of directly encoding a 32-bit constant.

    And avoiding that for eight and sixteen bit constants is the reason for conventional immediates for them, despite the duplication. (Try fitting
    the other sizes of immediate into a 32-bit instruction.)

    But I'm sneaky. Since this situation dismayed me all along with
    Concertina II, I have what I call a "zero-overhead header". In the first instruction slot of a block, one may have a Type I header, which is a two-address operate instruction which *also* supplies a three-bit _decode_ field, reserving slots for pseudo-immediates.

    Since operate instructions are the most common type of instruction, if one
    can re-arrange instructions a little, one might be able to have these pseudo-imediates *without* the crushing burden of a 32-bit overhead!

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Savard on Sat Aug 2 19:23:01 2025
    John Savard <quadibloc@invalid.invalid> schrieb:

    Since operate instructions are the most common type of instruction, if one can re-arrange instructions a little, one might be able to have these pseudo-imediates *without* the crushing burden of a 32-bit overhead!

    I read "one might" as "never will".

    You still haven't shown a single piece of code with your header
    scheme, I presume because it is to difficult even for you, the
    author of the ISA.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to Thomas Koenig on Sun Aug 3 05:30:34 2025
    On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:

    You still haven't shown a single piece of code with your header scheme,
    I presume because it is to difficult even for you, the author of the
    ISA.

    I can understand how you might feel that way, but if my block structure
    isn't understandable when illustrated by diagrams showing the basic
    essentials of how it works, I fail to realize how making the extra effort
    to smother that information in a mass of irrelevant detail is going to
    make it any clearer to you.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Savard on Sun Aug 3 11:25:51 2025
    John Savard <quadibloc@invalid.invalid> schrieb:
    On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:

    You still haven't shown a single piece of code with your header scheme,
    I presume because it is to difficult even for you, the author of the
    ISA.

    I can understand how you might feel that way, but if my block structure
    isn't understandable when illustrated by diagrams showing the basic essentials of how it works, I fail to realize how making the extra effort
    to smother that information in a mass of irrelevant detail is going to
    make it any clearer to you.

    It is not how something appears in a diagram, it is how an actual
    algorithm is transformed into efficient machine language (I would
    have said assembly language, but you put a massive barrier between
    the two with your block structure).

    You wrote, upthread, that you have never done so. My current
    assumption is that you chose not to do it because this would
    be too complicated for you, the inventor of this ISA, let alone
    anybody else.

    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to Stephen Fuld on Tue Aug 5 23:49:08 2025
    On Tue, 05 Aug 2025 09:51:11 -0700, Stephen Fuld wrote:

    While I agree that having at least push and pop instructions would be beneficial,

    And I have now added exactly that to the architecture - as I note in the
    new thread titled "By Popular Demand".

    But subroutine calls still don't use them.

    I've also added another requested feature while I was at it; allowing the
    use of a 64-bit displacement without a base register but with an index.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Stephen Fuld on Wed Aug 6 05:32:41 2025
    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
    On 8/4/2025 9:56 PM, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    And... would you like to have a stack in your architecture?

    No.

    OK. I think that is the final nail in the coffin, I will
    henceforth stop reading (and writing) about your architecture.

    While I agree that having at least push and pop instructions would be beneficial, I hardly think that is the most "bizarre" and less than
    useful aspect of John's architecture. After all, both of those
    instructions can be accomplished by two "standard" instructions, a store
    and an add (for push) and a load and subtract (for pop). Interchange
    the add and the subtract if you want the stack to grow in the other direction.

    What I meant was that, the way he described his addressind modes,
    he was not considering a stack at all, even implemented by
    the usual RISC method (which is better than push/pop, see the
    special hoops that AMD64 has to jump through to fuse several
    push or pop instructions into one - IIRC, it costs them a cycle
    of pipeline length).

    And stacks _are_ extremely efficient, as everybody except one
    person knows, because they save memory and improve cache locality.

    Of course, you are free to stop contributing on this topic, but I, for
    one, will miss your contributions.

    Hm, thanks. Maybe I'll look into it again.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to BGB on Sun Aug 10 18:07:59 2025
    On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:

    That said, a lot of John's other ideas come off to me like straight up absurdity. So, I wouldn't hold up much hope personally for it to turn
    into much usable.

    While I think that not being able to be put to use isn't really one of the faults of the Concertina II ISA, the block structure, especially at its
    current level of complexity, is going to come across as quite weird to
    many, and I don't yet see any hope of achieving a drastic simplification
    in that area.

    Each of the sixteen block types serves one or another functionality which
    I see as necessary to give this ISA the breadth of application that I have
    as my goal.

    But I have introduced "scaled displacements" back in, allowing the
    augmented short instruction mode instruction set to be more powerful.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to John Savard on Mon Aug 11 10:27:08 2025
    On 8/10/2025 11:07 AM, John Savard wrote:
    On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:

    That said, a lot of John's other ideas come off to me like straight up
    absurdity. So, I wouldn't hold up much hope personally for it to turn
    into much usable.

    While I think that not being able to be put to use isn't really one of the faults of the Concertina II ISA,

    I am not sure what you are saying here. Is it the while you agree that
    at least some features cannot be put to use, but that isn't the fault of
    the ISA, or that the fault of not being able to be put to use doesn't
    exist in the ISA?


    the block structure, especially at its
    current level of complexity, is going to come across as quite weird to
    many, and I don't yet see any hope of achieving a drastic simplification
    in that area.

    Each of the sixteen block types serves one or another functionality which
    I see as necessary to give this ISA the breadth of application that I have
    as my goal.

    While I agree that they meet your goals (at least as I understand them),
    I think that you have two problems.

    Your goals, even if you meet them aren't particularly useful, e.g. being "nearly" plug compatible with S/360

    There are *far* simpler ways to accomplish what most people really want
    to do.


    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to Stephen Fuld on Mon Aug 11 18:20:05 2025
    On Mon, 11 Aug 2025 10:27:08 -0700, Stephen Fuld wrote:
    On 8/10/2025 11:07 AM, John Savard wrote:
    On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:

    That said, a lot of John's other ideas come off to me like straight up
    absurdity. So, I wouldn't hold up much hope personally for it to turn
    into much usable.

    While I think that not being able to be put to use isn't really one of
    the faults of the Concertina II ISA,

    I am not sure what you are saying here. Is it the while you agree that
    at least some features cannot be put to use, but that isn't the fault of
    the ISA, or that the fault of not being able to be put to use doesn't
    exist in the ISA?

    What I was trying to say was that while the Concertina II ISA no doubt has
    many flaws, not being able to crank out useful work is, in my opinion, not
    one of them.

    On the other hand, driving insane those who attempt to program it or write compilers for it must be admitted to be an obstacle to making use of a
    given CPU, and so I must admit to its usability being limited in that
    manner.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to Stephen Fuld on Mon Aug 11 18:33:14 2025
    On Mon, 11 Aug 2025 10:27:08 -0700, Stephen Fuld wrote:

    Your goals, even if you meet them aren't particularly useful, e.g. being "nearly" plug compatible with S/360

    There are *far* simpler ways to accomplish what most people really want
    to do.

    Being plug-compatible with System/360 is not among the goals of my ISA.
    The term "plug-compatible" refers to... _plugs_, as one might guess.
    Nothing in my ISA talks about stuff like USB ports, Centronics parallel ports... or the kind of port IBM used to connect a 1403 printer to a
    System/360 computer.

    There are certainly far simpler ways to run System/360 code correctly.
    One can just set a mode bit to enter System/360 emulation, for example.

    What I'm doing with the Type V header is to provide a way to imitate the behavior of a System/360 program after code conversion. So one could write
    a special FORTRAN compiler to generate code using this header to allow a FORTRAN program running on the Concertina II to deliver the same results
    as on a System/360.

    And this isn't simple because it's buried deep down in the instruction set
    as an _afterthought_ within an ISA which is primarily designed to do the
    same sort of work as one might do with an x86-64 chip or a PowerPC chip or
    a SPARC chip even. And secondarily designed to be capable of
    implementations which shine at whatever the TMS20C6000 shines at, or even whatever, if anything, the Itanium was good for.

    It may not, however, be lost on implementors that a full implementation of
    the Type V header stuff ends up putting the needed circuitry on the die to *provide* a very nice System/360 emulation or implementation, which they
    might offer as an added feature not defined in the Concertina II
    specification.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to John Savard on Mon Aug 11 19:16:06 2025
    On Mon, 11 Aug 2025 18:33:14 +0000, John Savard wrote:

    implementations which shine at whatever the TMS20C6000 shines at, or

    Oops, the TMS320C6000.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to All on Sun Aug 24 18:16:12 2025
    John Savard <quadibloc@invalid.invalid> posted:

    On Sun, 03 Aug 2025 13:03:21 -0700, Stephen Fuld wrote:

    I suspect that the purpose of Thomas's suggestion wasn't to make the
    design clearer to him, but to force you to discover/think about the
    utility and ease of use of some of the features you propose *in real programs* . If a typical programmer can't figure out how to use some
    CPU feature, it probably won't be used, and thus probably should not be
    in the architecture. The best way to learn about what features are
    useful is to try to use them! and the best way to do that is to write actual code for a real program.

    While I'm not prepared to go to the trouble of creating a fleshed-out example, a very short and trivial example will still indicate what my
    goals are.

    X = Y * 2.78 + Z

    Just playing devil's advocate:: My 66000

    LDD R8,[Y]
    LDD R6,[Z]
    FMAC R7,R8,#2.78D0,R6
    STD R7,[X]

    X, Y, and Z can be anywhere in 64-bit VAS ...
    On the other hand if X, Y, and Z were allocated into registers::

    FMAC Rx,Ry,#2.78D0,Rz

    On a typical RISC architecture, this would involve instructions like this:

    load 18, Y
    load 19, K#0001
    fmul 18, 18, 19
    load 19, Z
    fadd 18, 18, 19
    fsto X

    Six instructions, each 32 bits long.

    On the IBM System/360, though, it would be something like

    le 12, Y
    me 12, K#0001
    ae 12, Z
    ste 12, x

    All four instructions are memory-reference instructions, so they're also
    32 bits long.

    How would I do this on Concertina II?

    Well, since the sequence has to start with a memory-reference, I can't use the zero-overhead header (Type I). Instead, a Type XI header is in order; that specifies a decode field, so that space can be reserved for a pseudo- immediate, and instruction slots can be indicated as containing
    instructions from the alternate instruction set.

    Then the instructions can be

    lf 6,y
    mfr 6,#2.78
    af 6,z
    stf 6,x

    with the instruction "af" coming from the alternate 32-bit instruction set.

    The other tricky precondition that must be met is to store z in a data
    region that is only 4,096 bytes or less in size, prefaced with

    USING *,23

    or another register from 17 to 23 could be used as the base register, so
    that it is addressed with a 12-bit displacement. (Also, register 6, from
    the first eight registers, is used to do the arithmetic to meet the limitations of the "add floating" memory to register operate instruction
    in the alternate instruction set.)

    Because it uses a pseudo-immediate, which gets fetched along with the instruction stream, where the 360 uses a constant, it has an advantage
    over the 360. On the other hand, while the actual code is the same length, there's also the 32-bit overhead of the header.

    John Savard



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to All on Sun Aug 24 19:50:44 2025
    BGB <cr88192@gmail.com> posted:

    On 8/5/2025 11:51 AM, Stephen Fuld wrote:
    On 8/4/2025 9:56 PM, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    And... would you like to have a stack in your architecture?

    No.

    OK.  I think that is the final nail in the coffin, I will
    henceforth stop reading (and writing) about your architecture.

    While I agree that having at least push and pop instructions would be beneficial, I hardly think that is the most "bizarre" and less than
    useful aspect of John's architecture.  After all, both of those instructions can be accomplished by two "standard" instructions, a store and an add (for push) and a load and subtract (for pop).  Interchange
    the add and the subtract if you want the stack to grow in the other direction.

    Of course, you are free to stop contributing on this topic, but I, for
    one, will miss your contributions.



    The lack of dedicated PUSH/POP instructions IME has relatively little
    direct impact on the usability of an ISA. Either way, one is likely to
    need stack-frame adjustment, in which case PUSH/POP don't tend to offer
    much over normal Load/Store instructions.

    When I looked at this at AMD circa 2000, I found many Pushes/Pops occurred
    in short sequences of 2-4; like:

    Push EAX
    Push EBP
    Push ECX

    a) we should note pushes are serially dependent on the decrement of SP
    b) and so are the memory references

    But we could change these into::

    ST EAX,[SP-8]
    ST EBP,[SP-16]
    ST ECX,[SP-24]
    SUB Sp,SP,24

    a) now all the memory references are parallel
    b) there is only one alteration of SP
    c) all 4 instructions can start simultaneously
    So, latency goes from 3 to 1.

    That said, a lot of John's other ideas come off to me like straight up absurdity. So, I wouldn't hold up much hope personally for it to turn
    into much usable.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to MitchAlsup on Sun Aug 24 16:21:06 2025
    MitchAlsup wrote:
    BGB <cr88192@gmail.com> posted:
    The lack of dedicated PUSH/POP instructions IME has relatively little
    direct impact on the usability of an ISA. Either way, one is likely to
    need stack-frame adjustment, in which case PUSH/POP don't tend to offer
    much over normal Load/Store instructions.

    When I looked at this at AMD circa 2000, I found many Pushes/Pops occurred
    in short sequences of 2-4; like:

    Push EAX
    Push EBP
    Push ECX

    a) we should note pushes are serially dependent on the decrement of SP
    b) and so are the memory references

    But we could change these into::

    ST EAX,[SP-8]
    ST EBP,[SP-16]
    ST ECX,[SP-24]
    SUB Sp,SP,24

    a) now all the memory references are parallel
    b) there is only one alteration of SP
    c) all 4 instructions can start simultaneously
    So, latency goes from 3 to 1.

    Except storing below the SP is not interrupt safe without
    something special like defining a safe "red zone" below it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to All on Fri Aug 29 15:31:32 2025
    Robert Finch <robfi680@gmail.com> posted:

    On 2025-08-01 5:04 p.m., John Savard wrote:
    On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:

    Question: Do the pointers point to the same block only, or also to other >> blocks? With 5 bits, you could address others as well. Can you give an
    example of their use, including the block headers?

    Actually, no, 5 bits are only enough to point within the same block.
    That's because it's a byte pointer, as it can be used to point to any type of constant, including single byte constants.

    This is despite the fact that I do have an instruction format for conventional style byte immediates (and I've just squeezed in one for 16-bit immediates as well).

    However, they _can_ point to another block, by means of a sixth bit that some instructions have... but when this happens, it does not trigger an extra fetch from memory. Instead, the data is retrieved from a copy of an earlier block in the instruction stream that's saved in a special register... so as to reduce potential NOP-style problems.

    John Savard

    I tried something similar to this but without block headers and it
    worked okay. But there were a couple of issues. One was the last
    instruction in cache line could not have an immediate. Or instructions
    had to stop before the end of the cache line to accommodate immediates.
    This resulted in some wasted space. There would sometimes be a 32-bit
    hole between the last instruction and the first immediate. I used a
    four-bit index and 32-bit immediate, instruction word size. Four bits
    was enough for a 512-bit (cache line size). IIRC the wasted space was
    about 5%.

    We really don't want to waste space.

    It made the assembler more complex. I had immediates being positioned
    from the far end of the cache line down (like a stack) towards the instructions which began at the lower end. The assembler had to be able
    to keep track of where things were on the cache line and the assembler
    was not built to handle that.
    Also, it made reading listings more difficult as constants were in the
    middle of sequences of instructions.

    We really don't want to make it any harder to read ASM code.

    Sometimes constants could be shared, but this turned out to be not
    possible in many cases as the assembler needed to emit relocation
    records for some constants and it could not handle having two or more instructions pointing to the same constant.

    All the more reason to place the constant in the instruction stream.
    a) never wastes space*
    b) ASM readability

    (*) never wastes space refers to placement of constant, not that the constant-container is optimal for the placed constant.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to All on Fri Aug 29 19:35:15 2025
    Lawrence D'Oliveiro <ldo@nz.invalid> posted:

    On Fri, 1 Aug 2025 15:11:49 -0000 (UTC), John Savard wrote:

    Well, that pointer - five bits long - is an awfully short pointer. Where does it point?

    Instructions are fetched in blocks that are 256 bits long. One of the things this allows for is for the block to begin with a header that specifies that a certain number of 32-bit instruction slots at the end
    of the current block are to be skipped over in the sequence of
    instructions to be executed; this space can be used for constants.

    Just add a couple of modifier bits: one is the indirect bit, indicating
    that the location referenced contains the address of the value, not the
    value itself, and another “page zero” bit, which indicates that the location is not in the current block, but in another block at a fixed
    address ...

    What is the purported advantage of using a header instead of just having
    each instruction define its own length ?? and contain its own constants?

    ... and I start having PDP-8 flashbacks.

    As well you should.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)