• Re: Chipsandcheese article on the CDC6600

    From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Sat Jul 20 09:08:47 2024
    On Sat, 20 Jul 2024 08:35:20 -0000 (UTC), Thomas Koenig wrote:

    https://chipsandcheese.com/2024/04/01/inside-control-data-corporations-cdc-6600/

    Note the date. ;)

    It is true, though, that old Seymour Cray had little truck with fancy “operating systems” and “memory management” and “paging” and “protected mode” and all that jazz. He was just after speed, speed,
    speed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to All on Sat Jul 20 08:35:20 2024
    Not sure who follows this web site, it has quite some interesting
    articles, usually on newer processor designs. Here's one that's
    maybe not quite so serious, but amusing and informative at the
    same time:

    https://chipsandcheese.com/2024/04/01/inside-control-data-corporations-cdc-6600/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Thomas Koenig on Sat Jul 20 19:52:13 2024
    Thomas Koenig wrote:
    Not sure who follows this web site, it has quite some interesting
    articles, usually on newer processor designs. Here's one that's
    maybe not quite so serious, but amusing and informative at the
    same time:

    https://chipsandcheese.com/2024/04/01/inside-control-data-corporations-cdc-6600/

    I really like their "tongue-in-cheek" delivery. :-)

    "Frontend: Branch Prediction
    There is no branch prediction."

    Terje

    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to All on Sat Jul 20 22:18:42 2024
    Errors in the text::


    a) the use of the word non-blocking n the sense that increment does not
    block MUL is weird, we use the tern nowadays within a FU not between.

    b) CDC 6600 is NOT in-order, the whole motivation for the scoreboard is
    to enable Out-or-Order execution.

    c) At its introduction, the word cache had not yet been invented--so
    sure it had no cache and thereby had to have powerful central memory.
    (banked and interleaved).

    d) It did have a branch predictor--it predicted backwards branches were
    taken.

    e) The strange notion of self modifying code needing to be 32
    instructions
    in advance of the instruction being modified is also weird.

    f) It is not In Order, the scoreboard tacks RAW and WAR hazards and
    prevents WAW from passing out of the decoder. Then execution order
    is based on the availability of operands (typical) and the
    availability
    of a result bus (novel).

    g) Oh and BTW, there were 20 ports to the register fileS

    h) the 2 increment units were 2 sets of operand registers but 1 ALU, and
    ALU could not do back to back integer ADDs.

    Other than that, not bad.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Terje Mathisen on Sun Jul 21 01:06:02 2024
    On Sat, 20 Jul 2024 19:52:13 +0200, Terje Mathisen wrote:

    "Frontend: Branch Prediction There is no branch prediction."

    Is that like an American car? Great on the straights, not so good on
    cornering.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to ldo@nz.invalid on Sun Jul 21 07:35:55 2024
    On Sat, 20 Jul 2024 09:08:47 -0000 (UTC), Lawrence D'Oliveiro
    <ldo@nz.invalid> wrote:

    On Sat, 20 Jul 2024 08:35:20 -0000 (UTC), Thomas Koenig wrote:

    https://chipsandcheese.com/2024/04/01/inside-control-data-corporations-cdc-6600/

    Note the date. ;)

    That bit at the end, about how the world would need maybe only ten
    computers, because the 6600 is so powerful, and out-of-order machines
    running at 5 GHz would never be needed, of course, is the funny
    part...

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Savard on Sun Jul 21 22:01:59 2024
    On Sun, 21 Jul 2024 07:35:55 -0600, John Savard wrote:

    That bit at the end, about how the world would need maybe only ten computers, because the 6600 is so powerful, and out-of-order machines
    running at 5 GHz would never be needed, of course, is the funny part...

    Also a dig at IBM, from whose (early) boss (Watson Sr?) the quote about
    the world only needing ten computers is supposed to have originated.

    The 6600 (and its siblings) also popularized the term “supercomputer”.
    When people at the time thought “computer”, they thought “IBM”. But here
    was a machine that was so far ahead in performance, it left IBM (and
    everybody else) in the dust.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Sun Jul 21 22:21:41 2024
    On Sun, 21 Jul 2024 22:01:59 +0000, Lawrence D'Oliveiro wrote:

    On Sun, 21 Jul 2024 07:35:55 -0600, John Savard wrote:

    That bit at the end, about how the world would need maybe only ten
    computers, because the 6600 is so powerful, and out-of-order machines
    running at 5 GHz would never be needed, of course, is the funny part...

    Also a dig at IBM, from whose (early) boss (Watson Sr?) the quote about
    the world only needing ten computers is supposed to have originated.

    The 6600 (and its siblings) also popularized the term “supercomputer”. When people at the time thought “computer”, they thought “IBM”. But here
    was a machine that was so far ahead in performance, it left IBM (and everybody else) in the dust.
    I

    IBM CEO said something to the effect:: How can a team of 24 people,
    including the janitor, beat IBM ??

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Mon Jul 22 00:05:40 2024
    On Sun, 21 Jul 2024 22:21:41 +0000, MitchAlsup1 wrote:

    IBM CEO said something to the effect:: How can a team of 24 people,
    including the janitor, beat IBM ??

    From Watson’s memo:

    Last week Control Data had a press conference during which they
    officially announced their 6600 system. I understand that in the
    laboratory developing this system there are only 34 people,
    including the janitor. Of these, 14 are engineers and 4 are
    programmers, and only one person has a Ph.D., a relatively junior
    programmer. Contrasting this modest effort with our own vast
    development activities, I fail to understand why we have lost our
    industry leadership position by letting someone else offer the
    world’s most powerful computer.

    as quoted in Charles J Murray’s 1997 book “The Supermen”, page 93.
    Good description of the history of CDC and Cray Research/Cray
    Computer. Notwithstanding the plural in its title, it primarily
    focuses on Seymour Cray, and is not so complimentary about the work of
    some others, notably Steve Chen.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to Lawrence D'Oliveiro on Mon Jul 22 10:43:44 2024
    On 2024-07-22 3:05, Lawrence D'Oliveiro wrote:
    On Sun, 21 Jul 2024 22:21:41 +0000, MitchAlsup1 wrote:

    IBM CEO said something to the effect:: How can a team of 24 people,
    including the janitor, beat IBM ??

    From Watson’s memo:

    Last week Control Data had a press conference during which they
    officially announced their 6600 system. I understand that in the
    laboratory developing this system there are only 34 people,
    including the janitor. Of these, 14 are engineers and 4 are
    programmers, and only one person has a Ph.D., a relatively junior
    programmer. Contrasting this modest effort with our own vast
    development activities, I fail to understand why we have lost our
    industry leadership position by letting someone else offer the
    world’s most powerful computer.


    Per Wikipedia, Cray's reply was sardonic: "It seems like Mr. Watson has answered his own question."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Niklas Holsti on Mon Jul 22 13:08:27 2024
    On Mon, 22 Jul 2024 10:43:44 +0300
    Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

    On 2024-07-22 3:05, Lawrence D'Oliveiro wrote:
    On Sun, 21 Jul 2024 22:21:41 +0000, MitchAlsup1 wrote:

    IBM CEO said something to the effect:: How can a team of 24 people,
    including the janitor, beat IBM ??

    From Watson’s memo:

    Last week Control Data had a press conference during which they
    officially announced their 6600 system. I understand that in
    the laboratory developing this system there are only 34 people,
    including the janitor. Of these, 14 are engineers and 4 are
    programmers, and only one person has a Ph.D., a relatively
    junior programmer. Contrasting this modest effort with our own vast
    development activities, I fail to understand why we have lost
    our industry leadership position by letting someone else offer the
    world’s most powerful computer.


    Per Wikipedia, Cray's reply was sardonic: "It seems like Mr. Watson
    has answered his own question."




    At the end, the influence of 6600 on computers we use today is close to
    zero. On the other hand, influence of S/360 Model 85 is massive and
    influence of S/360 Model 91 is significant, although far less than the
    credit it is often given in popular articles.
    Back at their time 6600 was huge success and both Model 85 and Model 91
    were probably considered failures.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Eder@21:1/5 to Michael S on Mon Jul 22 13:46:20 2024
    On Mo 22 Jul 2024 at 13:08, Michael S <already5chosen@yahoo.com> wrote:

    On Mon, 22 Jul 2024 10:43:44 +0300
    Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

    On 2024-07-22 3:05, Lawrence D'Oliveiro wrote:
    On Sun, 21 Jul 2024 22:21:41 +0000, MitchAlsup1 wrote:

    IBM CEO said something to the effect:: How can a team of 24 people,
    including the janitor, beat IBM ??

    From Watson’s memo:

    Last week Control Data had a press conference during which they
    officially announced their 6600 system. I understand that in
    the laboratory developing this system there are only 34 people,
    including the janitor. Of these, 14 are engineers and 4 are
    programmers, and only one person has a Ph.D., a relatively
    junior programmer. Contrasting this modest effort with our own vast
    development activities, I fail to understand why we have lost
    our industry leadership position by letting someone else offer the
    world’s most powerful computer.


    Per Wikipedia, Cray's reply was sardonic: "It seems like Mr. Watson
    has answered his own question."




    At the end, the influence of 6600 on computers we use today is close to
    zero. On the other hand, influence of S/360 Model 85 is massive and
    influence of S/360 Model 91 is significant, although far less than the
    credit it is often given in popular articles.
    Back at their time 6600 was huge success and both Model 85 and Model 91
    were probably considered failures.

    And what does that tell us about computer architecture? Not much. But it
    says sonething market dominance and power.

    'Andreas

    --
    ceterum censeo redmondinem esse delendam

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Michael S on Mon Jul 22 13:41:40 2024
    On Mon, 22 Jul 2024 10:08:27 +0000, Michael S wrote:

    On Mon, 22 Jul 2024 10:43:44 +0300
    Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

    On 2024-07-22 3:05, Lawrence D'Oliveiro wrote:
    On Sun, 21 Jul 2024 22:21:41 +0000, MitchAlsup1 wrote:

    IBM CEO said something to the effect:: How can a team of 24 people,
    including the janitor, beat IBM ??

    From Watson’s memo:

    Last week Control Data had a press conference during which they
    officially announced their 6600 system. I understand that in
    the laboratory developing this system there are only 34 people,
    including the janitor. Of these, 14 are engineers and 4 are
    programmers, and only one person has a Ph.D., a relatively
    junior programmer. Contrasting this modest effort with our own vast
    development activities, I fail to understand why we have lost
    our industry leadership position by letting someone else offer the
    world’s most powerful computer.


    Per Wikipedia, Cray's reply was sardonic: "It seems like Mr. Watson
    has answered his own question."




    At the end, the influence of 6600 on computers we use today is close to
    zero.

    CDC 6600 had a RISC instruction set !!

    On the other hand, influence of S/360 Model 85 is massive and influence of S/360 Model 91 is significant, although far less than the
    credit it is often given in popular articles.
    Back at their time 6600 was huge success and both Model 85 and Model 91
    were probably considered failures.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Michael S on Mon Jul 22 12:52:35 2024
    Michael S <already5chosen@yahoo.com> writes:
    At the end, the influence of 6600 on computers we use today is close to
    zero. On the other hand, influence of S/360 Model 85 is massive and
    influence of S/360 Model 91 is significant, although far less than the
    credit it is often given in popular articles.

    Yes, all modern computers have virtual memory (which started with
    Atlas (and later S/360 Model 67), they have caches, which started with
    Titan (and later S/360 Model 85), they have reservation stations
    (which started with S/360 Model 91).

    However, the main reason why reservation stations won is because
    hardware branch prediction outpaced compiler branch prediction since
    the early 1990s*, and because the reorder buffer was invented, neither
    of which is due to anything done in any S/360 model or the CDC 6600).

    If hardware branch prediction had never been invented or had turned
    out to be a dud, maybe we would all be using EPIC architectures that
    use scoreboards rather then reservation stations; or maybe the
    register interlocks that were used in advanced in-order RISCs (those
    that Mitch Alsup calls OoO) and AFAIK in IA-64 implementations were
    good enough and one would have done without scoreboard.

    [*] More supercomputing-oriented people may claim that it has to do
    with the number of in-flight memory accesses, but actually IA-64 shone
    on SPEC FP (where in-flight memory accesses are more important than
    for SPECint), so it seems that there are ways to get the needed
    in-flight memory accesses with in-order execution.

    Back at their time 6600 was huge success and both Model 85 and Model 91
    were probably considered failures.

    From what I read, the Model 91 was considered a technical (and
    marketing) success, but commercially a failure (sold at a loss, and
    therefore quickly canceled). But apparently the market benefit was
    enough that they then built the 360/195 and 370/195. 15 91s were
    built and about 20 195s. The 195 was withdrawn in 1977, and AFAIK
    that was the end of IBM's supercomputing ambitions for a while. This
    may have had to do with the introduction of the Cray-1 in 1976 or the
    IBM 3033 in 1977. IBM eventually announced the optional vector
    facility for the 3090 in 1985. OoO processing vanished from S/360
    successors with the 195 and only reappeared quite a while after it had
    appeared in Intel and RISC CPUs.

    The Model 85 was built only 30 times, but it was the fastest IBM
    machine until the Model 195 (the CDC 7600 was faster, though, and also
    a bit faster than the 195). And the cache was then included in the
    Model 195, and on the 303x, and I expect all later IBM mainframes. So
    that certainly was a success.

    Concerning the CDC 6600, the barrel processor features of its PP can
    be considered predecessors of modern SMT (about as close as Model 91 reservation stations are to modern OoO).

    Concerning the Atlas, it's interesting that a project intended as a supercomputer introduced virtual memory, while Cray rejected VM for as
    long as he designed the CPUs.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to mitchalsup@aol.com on Mon Jul 22 17:51:21 2024
    On Mon, 22 Jul 2024 13:41:40 +0000
    mitchalsup@aol.com (MitchAlsup1) wrote:

    On Mon, 22 Jul 2024 10:08:27 +0000, Michael S wrote:

    On Mon, 22 Jul 2024 10:43:44 +0300
    Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

    On 2024-07-22 3:05, Lawrence D'Oliveiro wrote:
    On Sun, 21 Jul 2024 22:21:41 +0000, MitchAlsup1 wrote:

    IBM CEO said something to the effect:: How can a team of 24
    people, including the janitor, beat IBM ??

    From Watson’s memo:

    Last week Control Data had a press conference during which
    they officially announced their 6600 system. I understand that in
    the laboratory developing this system there are only 34 people,
    including the janitor. Of these, 14 are engineers and 4 are
    programmers, and only one person has a Ph.D., a relatively
    junior programmer. Contrasting this modest effort with our own
    vast development activities, I fail to understand why we have lost
    our industry leadership position by letting someone else offer the
    world’s most powerful computer.


    Per Wikipedia, Cray's reply was sardonic: "It seems like Mr. Watson
    has answered his own question."




    At the end, the influence of 6600 on computers we use today is
    close to zero.

    CDC 6600 had a RISC instruction set !!


    It has RISC-like features, most importantly load-store architecture.
    Simple instruction formats and only two instruction width options also
    sound RISCy. But hard coupling between A registers and X registers does
    not look like RISC to me.
    May be, I didn't think enough about it.

    Anyway, it seems to me that modern RISC was re-invented more or less
    from scratch in the late 70s and influence of 6600 architecture likely
    didn't play major role in that process.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Anton Ertl on Mon Jul 22 15:09:22 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    Michael S <already5chosen@yahoo.com> writes:
    At the end, the influence of 6600 on computers we use today is close to >>zero. On the other hand, influence of S/360 Model 85 is massive and >>influence of S/360 Model 91 is significant, although far less than the >>credit it is often given in popular articles.

    Yes, all modern computers have virtual memory (which started with
    Atlas (and later S/360 Model 67), they have caches, which started with
    Titan (and later S/360 Model 85), they have reservation stations
    (which started with S/360 Model 91).

    However, the main reason why reservation stations won is because
    hardware branch prediction outpaced compiler branch prediction since
    the early 1990s*, and because the reorder buffer was invented, neither
    of which is due to anything done in any S/360 model or the CDC 6600).

    FWIW, The burroughs medium systems had hardware branch
    prediction circa 1979. It had some warts, however,
    when used in SMP configurations.

    Basically, the hardware would modify the branch opcode in
    memory after every branch to track the last two taken/not-taken
    states.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Mon Jul 22 15:10:50 2024
    On Mon, 22 Jul 2024 12:52:35 +0000, Anton Ertl wrote:

    Michael S <already5chosen@yahoo.com> writes:
    At the end, the influence of 6600 on computers we use today is close to >>zero. On the other hand, influence of S/360 Model 85 is massive and >>influence of S/360 Model 91 is significant, although far less than the >>credit it is often given in popular articles.

    Yes, all modern computers have virtual memory (which started with
    Atlas (and later S/360 Model 67), they have caches, which started with
    Titan (and later S/360 Model 85), they have reservation stations
    (which started with S/360 Model 91).

    However, the main reason why reservation stations won is because
    hardware branch prediction outpaced compiler branch prediction since
    the early 1990s*, and because the reorder buffer was invented, neither
    of which is due to anything done in any S/360 model or the CDC 6600).

    If hardware branch prediction had never been invented or had turned
    out to be a dud, maybe we would all be using EPIC architectures that
    use scoreboards rather then reservation stations; or maybe the

    CDC 7600 predicted backwards branches to be taken and that this was
    worth a handful of % in performance gain:: and used no storage to
    do it. So an "as dumb as possible" predictor delivered gains.

    It is all uphill from there.

    register interlocks that were used in advanced in-order RISCs (those
    that Mitch Alsup calls OoO) and AFAIK in IA-64 implementations were
    good enough and one would have done without scoreboard.

    Register interlocks is the means to allow GHW to move instructions
    around in the pipeline--you just have to obey RAW, WAR, and WAW
    hazards.

    [*] More supercomputing-oriented people may claim that it has to do
    with the number of in-flight memory accesses, but actually IA-64 shone
    on SPEC FP (where in-flight memory accesses are more important than
    for SPECint), so it seems that there are ways to get the needed
    in-flight memory accesses with in-order execution.

    IA-64 had 2× the number of pins compared to its x86 brethren.
    No wonder it could consume more BW.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Scott Lurndal on Mon Jul 22 15:42:18 2024
    Scott Lurndal <scott@slp53.sl.home> schrieb:

    Basically, the hardware would modify the branch opcode in
    memory after every branch to track the last two taken/not-taken
    states.

    Ingenious.

    Talk about self-modifying code...


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Scott Lurndal on Mon Jul 22 16:33:28 2024
    scott@slp53.sl.home (Scott Lurndal) writes:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    However, the main reason why reservation stations won is because
    hardware branch prediction outpaced compiler branch prediction since
    the early 1990s*, and because the reorder buffer was invented, neither
    of which is due to anything done in any S/360 model or the CDC 6600).

    FWIW, The burroughs medium systems had hardware branch
    prediction circa 1979. It had some warts, however,
    when used in SMP configurations.

    Basically, the hardware would modify the branch opcode in
    memory after every branch to track the last two taken/not-taken
    states.

    That sounds like the two-bit scheme that early conditional branch
    predictors used, but inlined in the instructions and thus
    architecturally visible (whereas branch prediction is normally microarchitecture, i.e., architecturally invisible). These two-bit
    schemes were about as good as the better compiler-based schemes (IIRC
    10% mispredictions on typical integer code). It was branch prediction
    schemes with global history tables etc. that made hardware branch
    prediction much more accurate and meant that OoO execution
    outperformed EPIC.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to mitchalsup@aol.com on Mon Jul 22 19:31:24 2024
    On Mon, 22 Jul 2024 15:10:50 +0000
    mitchalsup@aol.com (MitchAlsup1) wrote:


    IA-64 had 2× the number of pins compared to its x86 brethren.
    No wonder it could consume more BW.

    Madison Itanium with 400 MHz FSB had the same theoretical memory
    bandwidth (6.4 GB/s) as AMD S939/S940 K8 Opteron/Athlon64 processors.
    And somewhat lower practical bandwidth because of higher latency caused
    by off-chip memory controller.
    Despite that its SPECfp2000 scores were slightly (5-6%) higher.

    Contemporary Intel x86 processors (Nocona Xeon) had twice narrower
    data bus, but it was running at twice higher data rate. So, at least theoretically, peak bandwidth was the same. Practically, bandwidth was
    somewhat lower because source-synchronous bus of Xeon had higher
    latency than simpler clock-synchronous bus of Itanium2.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to mitchalsup@aol.com on Mon Jul 22 16:40:48 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    If hardware branch prediction had never been invented or had turned
    out to be a dud, maybe we would all be using EPIC architectures that
    use scoreboards rather then reservation stations; or maybe the

    CDC 7600 predicted backwards branches to be taken

    That's a primitive form of compiler branch prediction. More advanced
    schemes had a direction hint in the instruction. These schemes are
    not hardware branch prediction as far as "compiler vs. hardware branch prediction" is concerned.

    register interlocks that were used in advanced in-order RISCs (those
    that Mitch Alsup calls OoO) and AFAIK in IA-64 implementations were
    good enough and one would have done without scoreboard.

    Register interlocks is the means to allow GHW to move instructions
    around in the pipeline--you just have to obey RAW, WAR, and WAW
    hazards.

    What is GHW? Stanford MIPS and most of MIPS R2000/R3000 moved
    instructions in the pipeline without interlocks. It's in their name: Microprocessor without interlocked pipeline stages.

    [*] More supercomputing-oriented people may claim that it has to do
    with the number of in-flight memory accesses, but actually IA-64 shone
    on SPEC FP (where in-flight memory accesses are more important than
    for SPECint), so it seems that there are ways to get the needed
    in-flight memory accesses with in-order execution.

    IA-64 had 2× the number of pins compared to its x86 brethren.
    No wonder it could consume more BW.

    Did not help it a bit with integer code. If the myth was true that
    only OoO enables many in-flight memory accesses, it would not help for bandwidth-hungry code, either. The fact that IA-64 implementations
    could make use of the bandwidth busts that myth.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Mon Jul 22 18:23:02 2024
    On Mon, 22 Jul 2024 16:40:48 +0000, Anton Ertl wrote:

    What is GHW?

    HW with a G that should not have been there.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to Anton Ertl on Mon Jul 22 13:09:12 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    From what I read, the Model 91 was considered a technical (and
    marketing) success, but commercially a failure (sold at a loss, and
    therefore quickly canceled). But apparently the market benefit was
    enough that they then built the 360/195 and 370/195. 15 91s were
    built and about 20 195s. The 195 was withdrawn in 1977, and AFAIK
    that was the end of IBM's supercomputing ambitions for a while. This
    may have had to do with the introduction of the Cray-1 in 1976 or the
    IBM 3033 in 1977. IBM eventually announced the optional vector
    facility for the 3090 in 1985. OoO processing vanished from S/360
    successors with the 195 and only reappeared quite a while after it had appeared in Intel and RISC CPUs.

    ... also shortly after joining IBM, was asked if I could help with
    project to multi-thread 370/195 .... also from acs end page: https://people.computing.clemson.edu/~mark/acs_end.html Sidebar:
    Multithreading

    In summer 1968, Ed Sussenguth investigated making the ACS/360 into a multithreaded design by adding a second instruction counter and a second
    set of registers to the simulator. Instructions were tagged with an
    additional "red/blue" bit to designate the instruction stream and
    register set; and, as was expected, the utilization of the functional
    units increased since more independent instructions were available.

    IBM patents and disclosures on multithreading include:

    US Patent 3,728,692, J.W. Fennel, Jr., "Instruction selection in a
    two-program counter instruction unit," filed August 1971, and issued
    April 1973.

    US Patent 3,771,138, J.O. Celtruda, et al., "Apparatus and method for serializing instructions from two independent instruction streams,"
    filed August 1971, and issued November 1973. [Note that John Earle is
    one of the inventors listed on the '138.]

    "Multiple instruction stream uniprocessor," IBM Technical Disclosure
    Bulletin, January 1976, 2pp. [for S/370]

    ... snip ...

    370/195 had 64 instruction pipeline and could do out-of-order ... but
    didn't have branch prediction or speculative executive ... so
    conditional branches drained pipeline and most codes ran at half 195
    rated throubhput. Simulating multiprocessor with red/blue instruction
    streams ... could get two half-rate streams running 195 a full speed
    (modulo MVT/MVS two processor support only having 1.2-1.5 throughput of
    single processor). The whole thing was shutdown when it was decided to
    add virtual memory to all 370s ... which was decided not practical for
    195.

    z195 (july2010) documents claim that half of the per-processor
    improvement in mip rate (compared to the previous z10) is due to
    introduction of introduction of things like out-of-order.

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to Michael S on Mon Jul 22 12:57:03 2024
    Michael S <already5chosen@yahoo.com> writes:
    At the end, the influence of 6600 on computers we use today is close to
    zero. On the other hand, influence of S/360 Model 85 is massive and
    influence of S/360 Model 91 is significant, although far less than the
    credit it is often given in popular articles.
    Back at their time 6600 was huge success and both Model 85 and Model 91
    were probably considered failures.


    Amdahl wins battle to make ACS, 360 compatible, then shortly later ACS
    was shutdown (folklore is executives felt it would advance
    state-of-the-art too fast and IBM would loose control of the market),
    Amdahl leaves IBM shortly later ... lots of history, including some of
    the ACS features show up more than two decades later with ES/9000 https://people.computing.clemson.edu/~mark/acs_end.html
    Of the 26,000 IBM computer systems in use, 16,000 were S/360 models
    (that is, over 60%). [Fig. 1.311.2]

    Of the general-purpose systems having the largest fraction of total
    installed value, the IBM S/360 Model 30 was ranked first with 12%
    (rising to 17% in 1969). The S/360 Model 40 was ranked second with 11%
    (rising to almost 15% in 1970). [Figs. 2.10.4 and 2.10.5]

    Of the number of operations per second in use, the IBM S/360 Model 65
    ranked first with 23%. The Univac 1108 ranked second with slightly over
    14%, and the CDC 6600 ranked third with 10%. [Figs. 2.10.6 and 2.10.7]

    ... snip ...

    old email:

    To: wheeler
    Date: 04/23/81 09:57:42

    your ramblings concerning the corp(se?) showed up in my reader
    yesterday. like all good net people, i passed them along to 3 other
    people. like rabbits interesting things seem to multiply on the
    net. many of us here in pok experience the sort of feelings your mail
    seems so burdened by: the company, from our point of view, is out of
    control. i think the word will reach higher only when the almighty $$$
    impact starts to hit. but maybe it never will. its hard to imagine one
    stuffed company president saying to another (our) stuffed company
    president i think i'll buy from those inovative freaks down the
    street. '(i am not defending the mess that surrounds us, just trying to understand why only some of us seem to see it).

    bob tomasulo and dave anderson, the two poeple responsible for the model
    91 and the (incredible but killed) hawk project, just left pok for the
    new stc computer company. management reaction: when dave told them he
    was thinking of leaving they said 'ok. 'one word. 'ok. ' they tried to
    keep bob by telling him he shouldn't go (the reward system in pok could
    be a subject of long correspondence). when he left, the management
    position was 'he wasn't doing anything anyway. '

    in some sense true. but we haven't built an interesting high-speed
    machine in 10 years. look at the 85/165/168/3033/trout. all the same
    machine with treaks here and there. and the hordes continue to sweep in
    with faster and faster machines. true, endicott plans to bring the
    low/middle into the current high-end arena, but then where is the
    high-end product development?

    ... snip ...

    first part of 70s, IBM had the Future System effort, completely
    different from 370 and going to completely replace it (internal politics
    during FS was killing off 370 products, the lack of new 370 during FS is credited with giving clone 370 makers their market foothold, including
    Amdahl) ... when FS implodes there was mad rush to get new stuff back
    into the 370 product pipelines including kicking off quick&dirty
    3033&3081 efforts in parallel.
    http://www.jfsowa.com/computer/memo125.htm https://people.computing.clemson.edu/~mark/fs.html

    370/xa was referred to "811" for the architecture/design documents'
    Nov1978 publication date, nearly all of it was to address MVS short
    comings (aka head of POK had shortly before managed to convince
    corporate to kill the VM370 product, shutdown the development group and
    have all the people transferred to POK for MVS/XA; Endicott did
    eventually manage to save the VM370 product mission ... for the
    "mid-range).

    trivia: when I joined IBM, one of my hobbies was enhanced production
    operating system for internal datacenters (including world-wide
    sales&market support HONE). In the original morph of CP67->VM370, lots
    of stuff was dropped and/or simplified (including multiprocessor
    support). In 1974, I start moving a bunch of stuff to VM370R2, including
    kernel reorg for multiprocessor support, but not actual multiprocessor
    support itself. In 1975, I move my CSC/VM system to VM370R3 and add multiprocessor support, originally for the US consolidated
    sales&marketing support HONE datacenters up in Palo Alto (the
    consolidated US systems had been consolidated into a single system
    image, loosely-coupled, shared DASD operation with load-balancing and
    fall-over (one of the largest such complexes in the world). The
    multiprocessor support allowed them to add a 2nd processor to each
    system (making it the largest in the world, airlines' TPF had similar shared-dasd complexes, but TPF didn't get SMP support for another
    decade). I had done some hacks in order to get two processor system
    twice the throughput of single process (at the time MVS documentation
    was two processor MVS had 1.2-1.5 times the thoughput of a single
    processor).

    With the implosion of FS (and the demise of the VM370 development group)
    ... I got roped into helping with a 16-processor 370 SMP and we con'ed
    the 3033 processor engineers into working on it in their spare time (a
    lot more interesting than remapping 168 logic to 20% faster chips).
    Everybody thought it was great until somebody tells the had of POK, it
    could be decades before the POK favorite son operating system (MVS)
    would have (effective) 16-processor support (POK doesn't ship a
    16-processor system until after the turn of the century) and the head of
    POK invites some of us to never visit POK again (and tells the 3033
    processor engineers, heads down on 3033 and no distractions). Some POK executives were also out bullying internal datacenters (including HONE)
    that they had to convert from VM370 to MVS. Once 3033 was out the door,
    they start on trout/3090.

    trivia: Jan1979, I was con'ed into doing a 6600 forttran benchmark on an engineering IBM4341 (mid-range), for a national lab that was looking at
    getting 70 for a compute farm (sort of the leading edge of the coming
    cluster supercomputing tsunami) ... the engineering 4341 benchmark was
    slightly slower than 6600 but production machines that shipped, were
    slightly faster.

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue Jul 23 00:20:46 2024
    On Mon, 22 Jul 2024 13:08:27 +0300, Michael S wrote:

    At the end, the influence of 6600 on computers we use today is close to
    zero.

    He pioneered pipelining and multiple function units. He went on to pioneer vector processing (long vectors, not the short-vector SIMD stuff that
    infests CPU designs today). He was always very conservative in the
    fabrication technologies he adopted, but he was brilliant at pushing them
    to their limits.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Savard@21:1/5 to niklas.holsti@tidorum.invalid on Mon Jul 22 18:18:52 2024
    On Mon, 22 Jul 2024 10:43:44 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:
    On 2024-07-22 3:05, Lawrence D'Oliveiro wrote:
    On Sun, 21 Jul 2024 22:21:41 +0000, MitchAlsup1 wrote:

    IBM CEO said something to the effect:: How can a team of 24 people,
    including the janitor, beat IBM ??

    From Watsons memo:

    Last week Control Data had a press conference during which they
    officially announced their 6600 system. I understand that in the
    laboratory developing this system there are only 34 people,
    including the janitor. Of these, 14 are engineers and 4 are
    programmers, and only one person has a Ph.D., a relatively junior
    programmer. Contrasting this modest effort with our own vast
    development activities, I fail to understand why we have lost our
    industry leadership position by letting someone else offer the
    worlds most powerful computer.

    Per Wikipedia, Cray's reply was sardonic: "It seems like Mr. Watson has >answered his own question."

    While IBM did not appear to understand the wisdom in Cray's remark at
    the time - that a large organization can have internal politics and communications overhead and other things that hamper innovation - it
    did _eventually_ learn its lesson.

    So when it came time for IBM to make its mark in the new, emerging
    field of microcomputers, it had a small team, working in isolation
    from the rest of IBM, go and design the IBM Personal Computer in all
    its 4.77 MHz 8088 glory.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue Jul 23 00:21:44 2024
    On Mon, 22 Jul 2024 17:51:21 +0300, Michael S wrote:

    On Mon, 22 Jul 2024 13:41:40 +0000 mitchalsup@aol.com (MitchAlsup1)
    wrote:

    CDC 6600 had a RISC instruction set !!


    It has RISC-like features, most importantly load-store architecture.
    Simple instruction formats and only two instruction width options also
    sound RISCy. But hard coupling between A registers and X registers does
    not look like RISC to me.

    RISC-V is adopting a very similar idea, in preference to the widespread
    current SIMD fashion.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Tue Jul 23 00:22:51 2024
    On Mon, 22 Jul 2024 12:57:03 -1000, Lynn Wheeler wrote:

    Of the general-purpose systems having the largest fraction of total
    installed value ...

    What an odd way to measure market share, that will naturally favour more expensive machines over the more popular ones.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Savard on Tue Jul 23 00:38:53 2024
    On Tue, 23 Jul 2024 0:18:52 +0000, John Savard wrote:

    On Mon, 22 Jul 2024 10:43:44 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:
    On 2024-07-22 3:05, Lawrence D'Oliveiro wrote:
    On Sun, 21 Jul 2024 22:21:41 +0000, MitchAlsup1 wrote:

    IBM CEO said something to the effect:: How can a team of 24 people,
    including the janitor, beat IBM ??

    From Watson’s memo:

    Last week Control Data had a press conference during which they
    officially announced their 6600 system. I understand that in the
    laboratory developing this system there are only 34 people,
    including the janitor. Of these, 14 are engineers and 4 are
    programmers, and only one person has a Ph.D., a relatively junior
    programmer. Contrasting this modest effort with our own vast
    development activities, I fail to understand why we have lost our
    industry leadership position by letting someone else offer the
    world’s most powerful computer.

    Per Wikipedia, Cray's reply was sardonic: "It seems like Mr. Watson has >>answered his own question."

    While IBM did not appear to understand the wisdom in Cray's remark at
    the time - that a large organization can have internal politics and communications overhead and other things that hamper innovation - it
    did _eventually_ learn its lesson.

    The largest a team should be is 11 + leader--Jesus tried for 12 and
    failed.

    So when it came time for IBM to make its mark in the new, emerging
    field of microcomputers, it had a small team, working in isolation
    from the rest of IBM, go and design the IBM Personal Computer in all
    its 4.77 MHz 8088 glory.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Tue Jul 23 01:53:13 2024
    On Mon, 22 Jul 2024 12:52:35 GMT, Anton Ertl wrote:

    Yes, all modern computers have virtual memory (which started with Atlas
    (and later S/360 Model 67), they have caches, which started with Titan
    (and later S/360 Model 85), they have reservation stations (which
    started with S/360 Model 91).

    What happened to the 360/90? That was IBM’s long-promised answer to the
    CDC 6600. Was it ever more than vapourware? In the end, it was a classic
    case of over-promising and under-delivering.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Tue Jul 23 01:56:25 2024
    On Tue, 23 Jul 2024 00:38:53 +0000, MitchAlsup1 wrote:

    The largest a team should be is 11 + leader--Jesus tried for 12 and
    failed.

    CDC had a team about 3× that size and succeeded.

    Just as well we don’t take religious texts as serious guides to anything important, do we? ;)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Tue Jul 23 01:58:21 2024
    On Mon, 22 Jul 2024 13:09:12 -1000, Lynn Wheeler wrote:

    IBM patents and disclosures on multithreading include:

    ...


    The whole thing was shutdown when it was decided to
    add virtual memory to all 370s ... which was decided not practical for
    195.

    Yet another bit of evidence that you don’t need to prove an idea works
    before getting a patent on it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Savard on Tue Jul 23 01:55:00 2024
    On Mon, 22 Jul 2024 18:18:52 -0600, John Savard wrote:

    While IBM did not appear to understand the wisdom in Cray's remark at
    the time - that a large organization can have internal politics and communications overhead and other things that hamper innovation - it did _eventually_ learn its lesson.

    So when it came time for IBM to make its mark in the new, emerging field
    of microcomputers, it had a small team, working in isolation from the
    rest of IBM, go and design the IBM Personal Computer in all its 4.77 MHz
    8088 glory.

    That turned out to be an exception (a temporary lapse in corporate
    control) rather than a rule. IBM as a whole learned no such lesson, as the subsequent PS/2 and OS/2 development showed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Tue Jul 23 01:59:12 2024
    On Mon, 22 Jul 2024 15:09:22 GMT, Scott Lurndal wrote:

    Basically, the hardware would modify the branch opcode in memory after
    every branch to track the last two taken/not-taken states.

    Did the Burroughs share code between processes/threads?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to mitchalsup@aol.com on Mon Jul 22 17:01:10 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    The largest a team should be is 11 + leader--Jesus tried for 12 and
    failed.

    trivia: science center wanted to get a 360/50 to modify for virtual
    memory, but all the extra 360/50s were going to FAA ATC ... and so they
    had to settle for 360/40 ... they implemented virtual memory with
    associative array that held process-ID and virtual page number for each
    real page (compared to Atlas associative array, which just had virtual
    page number for each real page... effectively just single large virtual address space).

    the official IBM operating system for (standard virtual memory) 360/67
    was TSS/360 which peaked around 1200 people at a time when the science
    center had 12 people (that included secretary) morphing CP/40 into
    CP/67.

    Melinda's history website
    http://www.leeandmelindavarian.com/Melinda#VMHist
    description of CP/40 (for modified 360/40) http://www.leeandmelindavarian.com/Melinda/JimMarch/CP40_The_Origin_of_VM370.pdf

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Tue Jul 23 12:40:32 2024
    On Tue, 23 Jul 2024 00:21:44 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Mon, 22 Jul 2024 17:51:21 +0300, Michael S wrote:

    On Mon, 22 Jul 2024 13:41:40 +0000 mitchalsup@aol.com (MitchAlsup1)
    wrote:

    CDC 6600 had a RISC instruction set !!


    It has RISC-like features, most importantly load-store architecture.
    Simple instruction formats and only two instruction width options
    also sound RISCy. But hard coupling between A registers and X
    registers does not look like RISC to me.

    RISC-V is adopting a very similar idea, in preference to the
    widespread current SIMD fashion.

    WTF are you talking about?
    Sounds like you either never read about CDC 6600 instruction set or
    completely forgot everything you ever read about it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Tue Jul 23 12:38:21 2024
    On Tue, 23 Jul 2024 00:20:46 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Mon, 22 Jul 2024 13:08:27 +0300, Michael S wrote:

    At the end, the influence of 6600 on computers we use today is
    close to zero.

    He pioneered pipelining

    It (6600), not he (Seymour).
    No, it didn't.
    If instead of 'it' we'd talk about 'he' then the first Cray-Thornton
    pipelined computer is 7600. But by then pipelining was hardly new.
    7600 can be arguably credited for "pipelining done right", but not as
    a pioneer in that area.

    and multiple function units.

    Multiple functional units existed before. The special thing about 6600
    was that it had ALOT of them. Having a lot of non-pipelined functional
    units after 1-wide or even 2 or 3-wide front end sounds like
    architectural dead end.

    He went on to
    pioneer vector processing (long vectors, not the short-vector SIMD
    stuff that infests CPU designs today).

    I am talking about 6600, the computer, not Seymour Cray, the person, or Cray-Thornton, the team.
    Cray, the person was innovative and influential.
    6600, the computer was innovative and not influential in the long run.

    He was always very
    conservative in the fabrication technologies he adopted, but he was
    brilliant at pushing them to their limits.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Savard on Tue Jul 23 12:53:57 2024
    On Mon, 22 Jul 2024 18:18:52 -0600
    John Savard <quadibloc@servername.invalid> wrote:

    On Mon, 22 Jul 2024 10:43:44 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:
    On 2024-07-22 3:05, Lawrence D'Oliveiro wrote:
    On Sun, 21 Jul 2024 22:21:41 +0000, MitchAlsup1 wrote:

    IBM CEO said something to the effect:: How can a team of 24
    people, including the janitor, beat IBM ??

    From Watson’s memo:

    Last week Control Data had a press conference during which
    they officially announced their 6600 system. I understand that in
    the laboratory developing this system there are only 34 people,
    including the janitor. Of these, 14 are engineers and 4 are
    programmers, and only one person has a Ph.D., a relatively
    junior programmer. Contrasting this modest effort with our own vast
    development activities, I fail to understand why we have lost
    our industry leadership position by letting someone else offer the
    world’s most powerful computer.

    Per Wikipedia, Cray's reply was sardonic: "It seems like Mr. Watson
    has answered his own question."

    While IBM did not appear to understand the wisdom in Cray's remark at
    the time - that a large organization can have internal politics and communications overhead and other things that hamper innovation - it
    did _eventually_ learn its lesson.

    So when it came time for IBM to make its mark in the new, emerging
    field of microcomputers, it had a small team, working in isolation
    from the rest of IBM, go and design the IBM Personal Computer in all
    its 4.77 MHz 8088 glory.

    John Savard

    I don't like how IBM screwed interrupts architecture of IBM PC,
    completely ignoring Intel's recommendation to assign hardware
    interrupts to INT #32 and higher.
    The unnecessary mess they created had negative effect for a long time,
    15 years at least.
    Also I am not sure that at time (1981) 8250 was the optimal choice for
    UART chip.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Tue Jul 23 13:20:21 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Mon, 22 Jul 2024 15:09:22 GMT, Scott Lurndal wrote:

    Basically, the hardware would modify the branch opcode in memory after
    every branch to track the last two taken/not-taken states.

    Did the Burroughs share code between processes/threads?

    That depends on which of the three mainframe lines you
    consider.

    Large systems (B6500 descendents) was multithreaded
    from the start.

    While mediuum systems user code was mostly single threaded, the operating system had full multithreading, with hardware mutex and condition
    variables via the LOK, UNLK, WAIT and CAUSE instructions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Michael S on Tue Jul 23 15:22:12 2024
    On Tue, 23 Jul 2024 9:53:57 +0000, Michael S wrote:

    I don't like how IBM screwed interrupts architecture of IBM PC,
    completely ignoring Intel's recommendation to assign hardware
    interrupts to INT #32 and higher.
    The unnecessary mess they created had negative effect for a long time,
    15 years at least.
    Also I am not sure that at time (1981) 8250 was the optimal choice for
    UART chip.

    It is not just IBM, but every interrupt architecture prior to MSI-X
    messages are screwed up (and most after MSI-X, too).

    The property one wants is that VMexit has to do nothing to the interrupts/controllers in order to gain full control over the core.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Wed Jul 24 00:17:37 2024
    On Tue, 23 Jul 2024 13:20:21 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Mon, 22 Jul 2024 15:09:22 GMT, Scott Lurndal wrote:

    Basically, the hardware would modify the branch opcode in memory after
    every branch to track the last two taken/not-taken states.

    Did the Burroughs share code between processes/threads?

    Large systems (B6500 descendents) was multithreaded from the start.

    Not quite what I asked. I was wondering how those code patches would
    impact on shared code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Wed Jul 24 13:22:46 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Tue, 23 Jul 2024 13:20:21 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Mon, 22 Jul 2024 15:09:22 GMT, Scott Lurndal wrote:

    Basically, the hardware would modify the branch opcode in memory after >>>> every branch to track the last two taken/not-taken states.

    Did the Burroughs share code between processes/threads?

    Large systems (B6500 descendents) was multithreaded from the start.

    Not quite what I asked. I was wondering how those code patches would
    impact on shared code.

    Global branch prediction, of course.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Wed Jul 24 23:45:46 2024
    On Wed, 24 Jul 2024 13:22:46 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    I was wondering how those code patches would
    impact on shared code.

    Global branch prediction, of course.

    But the characteristics of a program run in one thread/process might not
    match those in another. If both runs are modifying the same code, the
    reasons will be ... sub-optimal.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Lawrence D'Oliveiro on Thu Jul 25 01:42:28 2024
    Lawrence D'Oliveiro wrote:
    On Wed, 24 Jul 2024 13:22:46 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    I was wondering how those code patches would
    impact on shared code.
    Global branch prediction, of course.

    But the characteristics of a program run in one thread/process might not match those in another. If both runs are modifying the same code, the
    reasons will be ... sub-optimal.

    Because they will both keep patching it to be what they want.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Lawrence D'Oliveiro on Thu Jul 25 10:59:16 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Wed, 24 Jul 2024 13:22:46 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    I was wondering how those code patches would
    impact on shared code.

    Global branch prediction, of course.

    But the characteristics of a program run in one thread/process might not >match those in another.

    They might not, or they might. When the hardware branch predictor
    researchers looked into it, they found that there is more synergy than interference. Consequently, they did not take measures to avoid
    sharing. And for the approach of patching the hints in the code, the
    results of sharing will be beneficial on average, too, because the
    only difference from the 2-bit/branch predictor is that the latter is
    in microarchitectural state instead of in the code.

    Now somebody will point out that sharing makes it possible for an
    attacker to train branch predictors in one process to attack a
    different process through Spectre and friends. While preventing
    sharing would close that, it does not close training the predictors in
    the same thread.

    Closing Spectre through invisible speculation (several papers exist
    about that) makes it irrelevant (for Spectre) whether the predictors
    are shared or not. Of course, for invisible speculation the permanent
    branch predictors must not be updated speculatively, but that's
    probably better anyway.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Anton Ertl on Thu Jul 25 14:43:44 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Wed, 24 Jul 2024 13:22:46 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    I was wondering how those code patches would
    impact on shared code.

    Global branch prediction, of course.

    But the characteristics of a program run in one thread/process might not >>match those in another.

    They might not, or they might.

    Quite true. In this particular case, the characteristics were generally similar between tasks sharing the same text (executable code).

    When the hardware branch predictor
    researchers looked into it, they found that there is more synergy than >interference.

    Indeed, internal benchmarking showed a definite improvement
    in all workloads.

    This being a 1960's vintage mainframe architecture, the chances
    of introducing exploit code to attempt a spectre style attack
    on the 'branch predictor' was zero.


    Consequently, they did not take measures to avoid
    sharing. And for the approach of patching the hints in the code, the
    results of sharing will be beneficial on average, too, because the
    only difference from the 2-bit/branch predictor is that the latter is
    in microarchitectural state instead of in the code.

    Now somebody will point out that sharing makes it possible for an
    attacker to train branch predictors in one process to attack a
    different process through Spectre and friends. While preventing
    sharing would close that, it does not close training the predictors in
    the same thread.

    Closing Spectre through invisible speculation (several papers exist
    about that) makes it irrelevant (for Spectre) whether the predictors
    are shared or not. Of course, for invisible speculation the permanent
    branch predictors must not be updated speculatively, but that's
    probably better anyway.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Thu Jul 25 17:22:44 2024
    On Thu, 25 Jul 2024 10:59:16 +0000, Anton Ertl wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Wed, 24 Jul 2024 13:22:46 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    I was wondering how those code patches would
    impact on shared code.

    Global branch prediction, of course.

    But the characteristics of a program run in one thread/process might not >>match those in another.

    They might not, or they might. When the hardware branch predictor researchers looked into it, they found that there is more synergy than interference. Consequently, they did not take measures to avoid
    sharing. And for the approach of patching the hints in the code, the
    results of sharing will be beneficial on average, too, because the
    only difference from the 2-bit/branch predictor is that the latter is
    in microarchitectural state instead of in the code.

    Now somebody will point out that sharing makes it possible for an
    attacker to train branch predictors in one process to attack a
    different process through Spectre and friends. While preventing
    sharing would close that, it does not close training the predictors in
    the same thread.

    Not allowing a dependent AGEN to happen when the first AGEN takes
    a fault ALSO prevents SPectré like attacks {Whether the crack is
    opened up by the BP, IBP, or any other predictor.} Then not modifying
    any cache prior to instruction retirement cements the door closed.

    Closing Spectre through invisible speculation (several papers exist
    about that) makes it irrelevant (for Spectre) whether the predictors
    are shared or not. Of course, for invisible speculation the permanent
    branch predictors must not be updated speculatively, but that's
    probably better anyway.

    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to mitchalsup@aol.com on Fri Jul 26 16:36:07 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    On Thu, 25 Jul 2024 10:59:16 +0000, Anton Ertl wrote:
    Now somebody will point out that sharing makes it possible for an
    attacker to train branch predictors in one process to attack a
    different process through Spectre and friends. While preventing
    sharing would close that, it does not close training the predictors in
    the same thread.

    Not allowing a dependent AGEN to happen when the first AGEN takes
    a fault ALSO prevents SPectré like attacks

    Spectre does not need a fault. You are probably thinking of Meltdown.
    That, at least has been fixed by Intel (and hopefully also ARM) in its
    original variant pretty soon, although other variants have been
    discovered since then (IIRC including some where the fault has nothing
    to do with addresses).

    Then not modifying
    any cache prior to instruction retirement cements the door closed.

    Not changing microarchitectural state (not just caches) through
    misspeculation (invisible speculation) is a proper fix for Spectre,
    and looks like the best fix to me.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)