• Architectural implications of locate mode I/O

    From John Levine@21:1/5 to All on Tue Jul 2 19:12:17 2024
    After our recent silly arguments about locate vs move mode I/O, I got
    to thinking about what a computer needs for locate mode even to be
    interesting.

    Early computers had tiny memories and rudimentary I/O. When doing
    an I/O operation the CPU was too busy servicing the I/O device to
    do much computing. Hence the normal approach was to do the read
    or write directly into the memory where the program used it,
    no buffering needed. OS/360 still had that with BSAM in which
    you did a READ or WRITE macro to

    The 709 introduced data channels in 1958 which allowed the CPU to do
    other stuff while the channel did the I/O. Wikipedia says the first
    I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
    an I/O interrupt would be of much use before channels. Once you had a
    channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    The other thing you need to make locate mode useful is index
    registers, since without them, it's about as hard to change the
    instructions to point to the buffer as to move the data so you might
    as well move the data.

    The IBM 7070 was a short lived machine with 10 digit fixed length
    decimal words. It had channels and index registers in locations in low
    memory which it called index words. Its IOCS could run several devices
    at the same time, e.g. two tape drives. Once you had your tape file
    open, you used GET and PUT macros. Each had a locate form that set an
    index word to point to the record, and a move form that copied the
    data to or from your own work area. PUT also had some other
    options like writing a record just read from one file onto
    another.

    https://bitsavers.org/pdf/ibm/7080/C28-6237_7080_IOCS80_1962.pdf

    The 7090 was a 36 bit binary machine with index registers and indirect addressing and channels and interrupts. Its IOCS had read and write
    calls that patched the addreses of the record areas into words
    following the calls. You could them use those as indirect addresses or
    load them into index registers. That makes sense since there were only
    three index registers but you usually had more than three tapes going.

    https://bitsavers.org/pdf/ibm/7090/C28-6100-2_7090_IOCS.pdf

    So it looks like as soon as the machine architectures made it
    practical to have overlapped I/O and efficient ways to do indirect or
    indexed addressing, I/O systems took advantage of it by passing buffer
    pointers to user code.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Thomas Koenig on Tue Jul 2 21:05:43 2024
    Thomas Koenig <tkoenig@netcologne.de> writes:
    John Levine <johnl@taugh.com> schrieb:

    The 709 introduced data channels in 1958 which allowed the CPU to do
    other stuff while the channel did the I/O. Wikipedia says the first
    I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
    an I/O interrupt would be of much use before channels. Once you had a
    channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    Not sure what you mean by "channel" in this context - hardware
    channels like the /360 had, or any asynchronous I/O in general,
    even without hardware support?

    Sending the next character to a teletype after the user program
    fills a buffer and waiting for the next interrupt to tell you it's
    ready makes sense, without a busy loop, makes sense anyway.

    Although in the mainframe era, most terminals were block-mode
    rather than character-by-character, which reduced the interrupt
    frequency on the host (often via a front-end data communications
    processor) at the expense of more logic in the terminal device.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Levine on Tue Jul 2 20:36:24 2024
    John Levine <johnl@taugh.com> schrieb:

    The 709 introduced data channels in 1958 which allowed the CPU to do
    other stuff while the channel did the I/O. Wikipedia says the first
    I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
    an I/O interrupt would be of much use before channels. Once you had a channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    Not sure what you mean by "channel" in this context - hardware
    channels like the /360 had, or any asynchronous I/O in general,
    even without hardware support?

    Sending the next character to a teletype after the user program
    fills a buffer and waiting for the next interrupt to tell you it's
    ready makes sense, without a busy loop, makes sense anyway.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Thomas Koenig on Tue Jul 2 22:14:51 2024
    Thomas Koenig wrote:

    John Levine <johnl@taugh.com> schrieb:

    The 709 introduced data channels in 1958 which allowed the CPU to do
    other stuff while the channel did the I/O. Wikipedia says the first
    I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
    an I/O interrupt would be of much use before channels. Once you had a
    channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    Not sure what you mean by "channel" in this context - hardware
    channels like the /360 had, or any asynchronous I/O in general,
    even without hardware support?

    I think he is talking about anything between the PPs of CDC 6600
    through that of System 360 channels, where there is enough smarts
    in the channel to perform the mundane tasks of shuttling data to
    or from the device (DMA) and sending an interrupt at the end
    (exchange Jump in 6600 parlance).

    Sending the next character to a teletype after the user program
    fills a buffer and waiting for the next interrupt to tell you it's
    ready makes sense, without a busy loop, makes sense anyway.

    At that time, TTYs were slow enough to poll (PDP/8) or interrupt
    per character.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Scott Lurndal on Tue Jul 2 22:17:20 2024
    Scott Lurndal wrote:

    Thomas Koenig <tkoenig@netcologne.de> writes:
    John Levine <johnl@taugh.com> schrieb:

    The 709 introduced data channels in 1958 which allowed the CPU to do
    other stuff while the channel did the I/O. Wikipedia says the first
    I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
    an I/O interrupt would be of much use before channels. Once you had a
    channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    Not sure what you mean by "channel" in this context - hardware
    channels like the /360 had, or any asynchronous I/O in general,
    even without hardware support?

    Sending the next character to a teletype after the user program
    fills a buffer and waiting for the next interrupt to tell you it's
    ready makes sense, without a busy loop, makes sense anyway.

    Although in the mainframe era, most terminals were block-mode
    rather than character-by-character, which reduced the interrupt
    frequency on the host (often via a front-end data communications
    processor) at the expense of more logic in the terminal device.


    Once you recognize that I/O is eating up your precious CPU, and you
    get to the point you are willing to expend another fixed programmed
    device to make the I/O burden manageable, then you basically have
    CDC 6600 Peripheral Processors, programmed in code or microcode.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Wed Jul 3 01:35:02 2024
    On Tue, 2 Jul 2024 19:12:17 -0000 (UTC), John Levine wrote:

    After our recent silly arguments about locate vs move mode I/O, I got to thinking about what a computer needs for locate mode even to be
    interesting.

    Should there be some kind of flashing indicator saying “LOCATE MODE I/O IN EFFECT”, or should it be the opposite “LOCATE MODE I/O NOT IN EFFECT”? Maybe the latter, to remind those who think they’ve engaged it, that they haven’t.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Wed Jul 3 03:14:37 2024
    According to Thomas Koenig <tkoenig@netcologne.de>:
    John Levine <johnl@taugh.com> schrieb:

    The 709 introduced data channels in 1958 which allowed the CPU to do
    other stuff while the channel did the I/O. Wikipedia says the first
    I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
    an I/O interrupt would be of much use before channels. Once you had a
    channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    Not sure what you mean by "channel" in this context - hardware
    channels like the /360 had, or any asynchronous I/O in general,
    even without hardware support?

    Something that does the I/O sufficiently independently that the CPU
    can do something else at the same time. The first channel is generally
    agreed to be the 766 Data Synchronizer on the IBM 709.

    Before there were channels, I/O worked a word at a time and a CPU had
    to issue I/O instructions to read or write those words under tight
    time constraints so it was in a busy loop and couldn't do anything
    else. The channel had a direct path to memory separate from the CPU's
    and enough logic to do an entire operation like read a tape block or
    print a line on the printer without the CPU's help. This may seem
    obvious now but it was a big advance.

    There's been lots of variations on the channel theme. I would agree
    that the CDC peripheral processors served as channels. On the smallest
    360s, the channel was implemented in CPU microcode. When running fast
    devices like disks the channel used so much of the CPU that the
    program stalled, but it was worth it to be compatible with faster
    machines. Even then, disk seeks or tape rewinds or reading cards or
    printing on printers let the CPU do useful work while the channel and
    device did its thing.

    IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
    not granted until 1974. The patent is 488 pages long including 409
    pages of figures, 130 columns of narrative text, and 91 claims.

    https://patents.google.com/patent/US3812475A/en

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to mitchalsup@aol.com on Tue Jul 2 17:36:50 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    Once you recognize that I/O is eating up your precious CPU, and you
    get to the point you are willing to expend another fixed programmed
    device to make the I/O burden manageable, then you basically have
    CDC 6600 Peripheral Processors, programmed in code or microcode.

    QSAM library does serialization for the application ... get/put calls
    does "wait" operations inside the library for I/O complete. BSAM library
    has the applications performing serialization with their own "wait"
    operations for read/write calls (application handling overlap of
    possible processing with I/O).

    Recent IBM articles mentioning that QSAM default multiple buffering was established years ago as "five" ... but current recommendations are for
    more like 150 (for QSAM to have high processing overlapped with
    I/O). Note while they differentiate between application buffers and
    "system" buffers (for move & locate mode), QSAM (system) buffers run was
    part of application address space but are managed as part of QSAM
    library code.

    Both QSAM & BSAM libraries build the (application) channel programs
    ... and since OS/360 move to virtual memory for all 370s, they all have (application address space) virtual addresses. When the library code
    passes the channel program to EXCP/SVC0, a copy of the passed channel
    programs are made, replacing the virtual addresses in the CCWs with
    "real addresses". QSAM GET can return the address within its buffers
    (involved in the actual I/O, "locate" mode) or copy data from its
    buffers to the application buffers ("move" mode). The references on the
    web all seemed to reference "system" and "application" buffers, but I
    think it would be more appropriate to reference them as "library" and "application" buffers.

    370/158 had "integrated channels" ... the 158 engine ran both 370
    instruction set microcode and the integrated channel microcode.

    when future system imploded, there was mad rush to get stuff back into
    the 370 product pipelines, including kicking off the quick&dirty
    303x&3081 efforts in parallel.

    for 303x they created "external channels" by taking a 158 engine with
    just the integrated channel microcode (and no 370 microcode) for the
    303x "channel director".

    a 3031 was two 158 engines, one with just the 370 microcode and a 2nd
    with just the integrated channel microcode

    a 3032 was 168-3 remapped to use channel director for external channels.

    a 3033 started with 168-3 logic remapped to 20% faster chips.

    Jan1979, I had lots of use of an early engineering 4341 and was con'ed
    into doing a (cdc6600) benchmark for national lab that was looking for
    70 4341s for computer farm (sort of leading edge of the coming cluster supercomputing tsunami). Benchmark was fortran compute doing no I/O and executed with nothing else running.

    4341: 36.21secs, 3031: 37.03secs, 158: 45.64secs

    now integrated channel microcode ... 158 even with no I/O running was
    still 45.64secs compared to the same hardware in 3031 but w/o channel microcode: 37.03secs.

    I had a channel efficiency benchmark ... basically how fast can channel
    handle each channel command word (CCW) in a channel program (channel architecture required it fetched, decoded and executed purely sequentially/synchronously). Test was to have two disk read ("chained")
    CCWs for two consecutive records. Then add a CCW between the two disk
    read CCWs (in the channel program) ... which results in a complete
    revolution to read the 2nd data record (because the latency, while disk
    is spinning, in handling the extra CCW separating the two record read
    CCWs).

    Then reformat the disk to add a dummy record between each data record, gradually increasing the size of the dummy record until the two data
    records can be done in single revolution.

    The size of the dummy record required for single revolution reading the
    two records was the largest for 158 integrated channel as well as all
    the 303x channel directors. The original 168 external channels could do
    single revolution with the smallest possible dummy record (but a 168
    with channel director, aka 3032, couldn't, nor could 3033) ... also the
    4341 integrated channel microcode could do it with smallest possible
    dummy record.

    The 3081 channel command word (CCW) processing latency was more like 158 integrated channel (and 303x channel directors)

    Second half of the 80s, I was member of Chesson's XTP TAB ... found a comparison between typical UNIX at the time for TCP/IP had on the order
    of 5k instructions and five buffer copies ... while compareable
    mainframe protocol in VTAM had 160k instructions and 15 buffer copies
    (larger buffers on high-end cache machines ... the cache misses for the
    15 buffer copies could exceed processor cycles for the 160k
    instructions).

    XTP was working on no buffer copies and streaming I/O ... attempting to
    process TCP as close as possible to no buffer copy disk I/O.
    Scatter/gather I/O for separate header and data ... also move from
    header CRC protocol .... to trailor CRC protocol ... instead of software prescanning the buffer to calculate CRC (for placing in header)
    ... outboard processing the data as it streams through, doing the CRC
    and then appended to the end of the record.

    When doing IBM's HA/CMP and working with major RDBMS vendors on cluster
    scaleup in late 80s/early 90s, there was lots of references to POSIX light-weight threads and asynchronous I/O for RDBMS (with no buffer
    copies) and the RDBMS managing large record cache.

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Wed Jul 3 04:57:04 2024
    On Tue, 02 Jul 2024 17:36:50 -1000, Lynn Wheeler wrote:

    When doing IBM's HA/CMP and working with major RDBMS vendors on cluster scaleup in late 80s/early 90s, there was lots of references to POSIX light-weight threads ...

    Threads were all the rage in the 1990s. People were using them for
    everything. One language (Java) absorbed threading right into its core DNA (where is the locking API? Oh, it’s attached to the base “Object” type itself!).

    People backed off a bit after that. Nowadays we see a revival of the “coroutine” idea, where preemption only happens at explicit “await” points. For non-CPU-intensive workloads, this is much easier to cope with.

    ... and asynchronous I/O for RDBMS (with no buffer > copies) and the
    RDBMS managing large record cache.

    This is why POSIX has the disk-oriented “aio” API, for the diehard DBMS folks. Linux also added “io_uring”, for high performance but not disk- specific I/O.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Levine on Wed Jul 3 05:52:07 2024
    John Levine <johnl@taugh.com> writes:
    On the smallest
    360s, the channel was implemented in CPU microcode. When running fast
    devices like disks the channel used so much of the CPU that the
    program stalled, but it was worth it to be compatible with faster
    machines. Even then, disk seeks or tape rewinds or reading cards or
    printing on printers let the CPU do useful work while the channel and
    device did its thing.

    This sounds very much like hardware multi-threading to me: The CPU had
    separate state for the channel and used its hardware for doing the
    channel stuff when there was I/O to do, while running the non-channel
    stuff the rest of the time, all without OS-level context switching.

    The barrel processors implemented in the CDC 6600's PPs are another
    variant of the same principle from around the same time, but using the
    same hardware for such different tasks is a new twist.

    Interestingly, this is one development that has not been repeated in microprocessors AFAIK. If they did not want to spend hardware on a
    separate DMA device, they just let the software use polling of the I/O
    device. For the 8086 and 68000, I guess that patents may have
    discouraged adopting this idea; when the patents ran out, they had
    established an ecosystem with separate DMA devices. And of course for
    the early RISCs there was no way to do that in microcode.

    IIRC some microcomputers (IBM PC I think) had dedicated central DMA
    processors (but not on the CPU chip at first IIRC), but these fell
    into disuse soon when the I/O devices that do lots of I/O (like disk controllers) included their own DMA circuits. Having the DMA on the
    I/O device wliminates the overhead of first requiring the bus for
    getting the data from the I/O device, and then another bus cycle for
    storing it into memory (or the other way round).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Findlay@21:1/5 to All on Wed Jul 3 13:08:31 2024
    On 2 Jul 2024, MitchAlsup1 wrote
    (in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

    Once you recognize that I/O is eating up your precious CPU, and you
    get to the point you are willing to expend another fixed programmed
    device to make the I/O burden manageable, then you basically have
    CDC 6600 Peripheral Processors, programmed in code or microcode.

    The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
    They all did DMA, interrupting only at the end of the transfer.
    Slow devices accessed the core store for each character,
    fast devices did so for each word.

    This was mediated by one of the KDF9's many state machines,
    I/O Control, which multiplexed core requests from devices
    and interrupted the CPU at the end of a transfer
    if the transfer had been initiated by a program
    of higherCPU priority than the one currently running,
    or if there was a possibility of priority inversion.

    I/O Control also autonomously re-issued an I/O command
    to a device that reported a parity error
    if that device was capable of retrying the transfer
    (e.g. MT controllers could backspace a block and re-read).

    --
    Bill Findlay

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Bill Findlay on Wed Jul 3 15:34:06 2024
    On Wed, 03 Jul 2024 13:08:31 +0100
    Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

    On 2 Jul 2024, MitchAlsup1 wrote
    (in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

    Once you recognize that I/O is eating up your precious CPU, and you
    get to the point you are willing to expend another fixed programmed
    device to make the I/O burden manageable, then you basically have
    CDC 6600 Peripheral Processors, programmed in code or microcode.

    The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
    They all did DMA, interrupting only at the end of the transfer.
    Slow devices accessed the core store for each character,
    fast devices did so for each word.

    This was mediated by one of the KDF9's many state machines,
    I/O Control, which multiplexed core requests from devices
    and interrupted the CPU at the end of a transfer
    if the transfer had been initiated by a program
    of higherCPU priority than the one currently running,
    or if there was a possibility of priority inversion.

    I/O Control also autonomously re-issued an I/O command
    to a device that reported a parity error
    if that device was capable of retrying the transfer
    (e.g. MT controllers could backspace a block and re-read).


    That sounds quite advanced.
    But when I try to compare with contemporaries, like S/360 Model 65, it
    appears that despite advances KDF9 was not competitive to maximally
    configured 65 because of shortage of main memory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Anton Ertl on Wed Jul 3 08:44:01 2024
    Anton Ertl wrote:
    John Levine <johnl@taugh.com> writes:
    On the smallest
    360s, the channel was implemented in CPU microcode. When running fast
    devices like disks the channel used so much of the CPU that the
    program stalled, but it was worth it to be compatible with faster
    machines. Even then, disk seeks or tape rewinds or reading cards or
    printing on printers let the CPU do useful work while the channel and
    device did its thing.

    This sounds very much like hardware multi-threading to me: The CPU had separate state for the channel and used its hardware for doing the
    channel stuff when there was I/O to do, while running the non-channel
    stuff the rest of the time, all without OS-level context switching.

    The barrel processors implemented in the CDC 6600's PPs are another
    variant of the same principle from around the same time, but using the
    same hardware for such different tasks is a new twist.

    Interestingly, this is one development that has not been repeated in microprocessors AFAIK. If they did not want to spend hardware on a
    separate DMA device, they just let the software use polling of the I/O device. For the 8086 and 68000, I guess that patents may have
    discouraged adopting this idea; when the patents ran out, they had established an ecosystem with separate DMA devices. And of course for
    the early RISCs there was no way to do that in microcode.

    DMA support devices have been available since the 8080 and 6800.
    They are just an up address counter and a down byte counter
    with some logic to diddle the bus control lines.
    Or you can easily build one from TTL.

    The RCA 1802 microprocessor had DMA IN & OUT pins for triggering the
    processor to use its register R0 as an address counter and perform a
    write (IN) or read (OUT) bus cycle. The I/O device just had to drive
    or latch its data onto the system bus and count down the bytes.

    The Intel 8086 also had the 8089 I/O Coprocessor which was like a
    channel processor in that it had a completely different ISA.
    It was not in the PC and I guess little used as there were cheaper ways
    to accomplish IO.

    https://en.wikipedia.org/wiki/Intel_8089

    IIRC some microcomputers (IBM PC I think) had dedicated central DMA processors (but not on the CPU chip at first IIRC), but these fell
    into disuse soon when the I/O devices that do lots of I/O (like disk controllers) included their own DMA circuits. Having the DMA on the
    I/O device wliminates the overhead of first requiring the bus for
    getting the data from the I/O device, and then another bus cycle for
    storing it into memory (or the other way round).

    - anton

    It is a system design issue whether to have a single, central DMA
    controller, or to have each device that wants to DMA have its own.
    The single, central controller increases the base cost of the cpu,
    and needs to be more complex as it has to support general features,
    but allows add-on devices to be lower cost as they use that DMA.
    But it's also inflexible and locks you into that mechanism.

    An alternative is each device has its own DMA controller
    which is just a couple of counters.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Swindells@21:1/5 to Anton Ertl on Wed Jul 3 13:09:28 2024
    On Wed, 03 Jul 2024 05:52:07 GMT, Anton Ertl wrote:

    Interestingly, this is one development that has not been repeated in microprocessors AFAIK. If they did not want to spend hardware on a
    separate DMA device, they just let the software use polling of the I/O device. For the 8086 and 68000, I guess that patents may have
    discouraged adopting this idea; when the patents ran out, they had established an ecosystem with separate DMA devices. And of course for
    the early RISCs there was no way to do that in microcode.

    There was the 8089 coprocessor for the 8086, it was used in the Apricot
    PC.

    <https://en.wikipedia.org/wiki/Intel_8089>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to EricP on Wed Jul 3 16:00:27 2024
    On Wed, 03 Jul 2024 08:44:01 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:


    An alternative is each device has its own DMA controller
    which is just a couple of counters.


    The main cost is not counters, but bus mastering logic.
    For PCI it was non-trivial cost even as late as year 2000. For example,
    PLX PCI to local bus bridges with bus mastering capability, like PCI
    9080 costed non-trivially more than slave-only 9060.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Findlay@21:1/5 to Michael S on Wed Jul 3 15:02:58 2024
    On 3 Jul 2024, Michael S wrote
    (in article <20240703153406.00006ebe@yahoo.com>):

    On Wed, 03 Jul 2024 13:08:31 +0100
    Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

    On 2 Jul 2024, MitchAlsup1 wrote
    (in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

    Once you recognize that I/O is eating up your precious CPU, and you
    get to the point you are willing to expend another fixed programmed device to make the I/O burden manageable, then you basically have
    CDC 6600 Peripheral Processors, programmed in code or microcode.

    The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
    They all did DMA, interrupting only at the end of the transfer.
    Slow devices accessed the core store for each character,
    fast devices did so for each word.

    This was mediated by one of the KDF9's many state machines,
    I/O Control, which multiplexed core requests from devices
    and interrupted the CPU at the end of a transfer
    if the transfer had been initiated by a program
    of higherCPU priority than the one currently running,
    or if there was a possibility of priority inversion.

    I/O Control also autonomously re-issued an I/O command
    to a device that reported a parity error
    if that device was capable of retrying the transfer
    (e.g. MT controllers could backspace a block and re-read).

    That sounds quite advanced.
    But when I try to compare with contemporaries, like S/360 Model 65, it appears that despite advances KDF9 was not competitive to maximally configured 65 because of shortage of main memory.

    Yup.

    --
    Bill Findlay

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lynn Wheeler on Wed Jul 3 14:53:06 2024
    Lynn Wheeler wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    Once you recognize that I/O is eating up your precious CPU, and you
    get to the point you are willing to expend another fixed programmed
    device to make the I/O burden manageable, then you basically have
    CDC 6600 Peripheral Processors, programmed in code or microcode.

    Jan1979, I had lots of use of an early engineering 4341 and was con'ed
    into doing a (cdc6600) benchmark for national lab that was looking for
    70 4341s for computer farm (sort of leading edge of the coming cluster supercomputing tsunami). Benchmark was fortran compute doing no I/O and executed with nothing else running.

    4341: 36.21secs, 3031: 37.03secs, 158: 45.64secs

    Do you have data on how the CDC 6600 did ?

    now integrated channel microcode ... 158 even with no I/O running was
    still 45.64secs compared to the same hardware in 3031 but w/o channel microcode: 37.03secs.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to mitchalsup@aol.com on Wed Jul 3 18:47:30 2024
    On Wed, 3 Jul 2024 14:53:06 +0000
    mitchalsup@aol.com (MitchAlsup1) wrote:

    Lynn Wheeler wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    Once you recognize that I/O is eating up your precious CPU, and you
    get to the point you are willing to expend another fixed programmed
    device to make the I/O burden manageable, then you basically have
    CDC 6600 Peripheral Processors, programmed in code or microcode.

    Jan1979, I had lots of use of an early engineering 4341 and was
    con'ed into doing a (cdc6600) benchmark for national lab that was
    looking for 70 4341s for computer farm (sort of leading edge of the
    coming cluster supercomputing tsunami). Benchmark was fortran
    compute doing no I/O and executed with nothing else running.

    4341: 36.21secs, 3031: 37.03secs, 158: 45.64secs

    Do you have data on how the CDC 6600 did ?

    now integrated channel microcode ... 158 even with no I/O running
    was still 45.64secs compared to the same hardware in 3031 but w/o
    channel microcode: 37.03secs.


    A little bit of googling easily gets the answer: 35.77 s

    https://www.garlic.com/~lynn/2006y.html#21

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Michael S on Wed Jul 3 13:34:39 2024
    Michael S wrote:
    On Wed, 03 Jul 2024 08:44:01 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:

    An alternative is each device has its own DMA controller
    which is just a couple of counters.


    The main cost is not counters, but bus mastering logic.
    For PCI it was non-trivial cost even as late as year 2000. For example,
    PLX PCI to local bus bridges with bus mastering capability, like PCI
    9080 costed non-trivially more than slave-only 9060.

    PCI is a different matter. I think they shot themselves in the foot.
    That is because the PCI design used FAST TTL and was ridiculously
    complex and had all sorts of unnecessary optional features like bridges.

    To my eye the choice of FAST TTL looks wrong headed. They needed FAST
    because they wanted to run at 33 MHz which is beyond LS TTL limit.
    With a bus at 33 MHz and 4 bytes it superficially sounds like 133 MB/s.
    But 33 MHz was too fast to decode or latch the address and data,
    plus it multiplexes address and data, and took 5 cycles to do a transfer.
    So the bus actual data transfer rate was more like 133/5 = 26 MB/s.

    I looked into PCI bus interface chips when it first came out and there
    were just *TWO* manufacturers for them on the planet, and they charged
    $50,000 just to answer the phone, and you had to pay them to design a
    custom chip for you even though it was supposed to be a standard design.
    This all pushed the price of PCI cards way up so, for example,
    an ISA bus modem card cost $50 but the exact same modem on PCI was $250.
    No wonder most people stuck with ISA bus cards.

    Whereas 74LS TTL was cheap and available from manufacturers everywhere.
    I would have used 74LS TTL and done a straight 32-bit bus with no options, multiplexed address and data to keep down the connector pin count and cost. That could have run at 20 MHz which leaves 50 ns for address and data
    decode and latch, and driven 8 bus loads with no bridges.
    That gives 10 MT/s = 40 MB/s. Plus cheap and widely available.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to All on Wed Jul 3 17:41:26 2024
    MitchAlsup1 wrote:

    Scott Lurndal wrote:

    Thomas Koenig <tkoenig@netcologne.de> writes:
    John Levine <johnl@taugh.com> schrieb:

    The 709 introduced data channels in 1958 which allowed the CPU
    to do other stuff while the channel did the I/O. Wikipedia says
    the first I/O interrupt was on the NBS DYSEAC in 1954 but it's
    hard to see how an I/O interrupt would be of much use before
    channels. Once you had a channel, I/O buffering made sense,
    have the channel read or write one area while you're working on
    the other.

    Not sure what you mean by "channel" in this context - hardware
    channels like the /360 had, or any asynchronous I/O in general,
    even without hardware support?

    Sending the next character to a teletype after the user program
    fills a buffer and waiting for the next interrupt to tell you it's
    ready makes sense, without a busy loop, makes sense anyway.

    Although in the mainframe era, most terminals were block-mode
    rather than character-by-character, which reduced the interrupt
    frequency on the host (often via a front-end data communications
    processor) at the expense of more logic in the terminal device.


    Once you recognize that I/O is eating up your precious CPU, and you
    get to the point you are willing to expend another fixed programmed
    device to make the I/O burden manageable, then you basically have
    CDC 6600 Peripheral Processors, programmed in code or microcode.


    There is a lot of design space between having the CPU do the I/O itself
    and having a separate programmable processor. Since the minimum
    requirements are a DMA, a way to request memory access, and a little
    logic to differentiate peripheral commands and status from normal data,
    and a way to indicate I/O completion (e.g. generate an interrupt), you
    could do all this in a fairly small amount of dedicated hardware.

    While the PPs were an elegant solution, they were more than what was
    required. Similarly, the IBM S/360 channels were way overkill for
    pretty much all I/O except for using CKD disks. In the day, different mainframes used various implementations of this idea to do I/O.



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to All on Wed Jul 3 07:42:42 2024
    little "dependable" I/O drift

    1980, IBM STL (since renamed SVL) was bursting at the seams and they
    were moving 300 people (and their 3270 terminals) from the IMS (DBMS)
    group to offsite bldg with dataprocessing service back to STL
    datacenter. hey had tried "remote 3270", but found the human factors unacceptable. I get con'ed into implementating channel extender support
    (A220, A710/A715/A720, A510/A515) ... allowing channel attached 3270
    cntrolers to be located at the offsite bldg, connected to mainframes
    back in STL ... with no perceived difference in human factors (quarter
    second or better trivial response). https://en.wikipedia.org/wiki/Network_Systems_Corporation https://en.wikipedia.org/wiki/HYPERchannel

    STL had spread 3270 controller boxes across all the channels with 3830
    disk controller boxes. Turns out the A220 mainframe channel-attach boxes
    (used for channel extender) had significantly lower channel busy for the
    same amount of 3270 terminal traffic (as 3270 channel-attach
    controllers) and as a result the throughput for IMS group 168s (with NSC
    A220s) increased by 10-15% ... and STL considered using NSC HYPERChannel
    A220 channel-extender configuration, for all 3270 controllers (even
    those within STL). NSC tried to get IBM to release my support, but a
    group in POK playing with some fiber stuff got it vetoed (concerned that
    if it was in the market, it would make it harder to release their
    stuff).

    trivia: The vendor eventually duplicated my support and then the 3090
    product administer tracked me down. He said that 3090 channels were
    designed to have an aggregate total 3-5 channel errors (EREP reported)
    for all systems&customers over a year period and there were instead 20
    (extra, turned out to be channel-extender support). When I got a
    unrecoverable telco transmission error, I would reflect a CSW
    "channel-check" to the host software. I did some research and found that
    if an IFCC (interface control check) was reflected instead, it basically resulted in the same system recovery activity (and got vendor to change
    their software from "CC" to "IFCC").

    I was asked to give a talk at NASA dependable computing workshop and
    used the 3090 example as part of the talk https://web.archive.org/web/20011004023230/http://www.hdcc.cs.cmu.edu/may01/index.html

    About the same time, the IBM communication group was fighting off the
    release of mainframe TCP/IP ... and when that got reversed, they changed
    their tactic and claimed that since they had corporate ownership of
    everything that crossed datacenter walls, TCP/IP had to be released
    through them; what shipped got 44kbytes/sec aggregate using nearly whole
    3090 processor. I then did RFC1044 support and in some tuning tests at
    Cray Research between Cray and IBM 4341, got sustained 4341 channel
    throughput using only modest amount of 4341 CPU (something like 500
    times improvement in bytes moved per instruction executed).

    other trivia: 1988, the IBM branch office asks me if I could help LLNL (national lab) "standardize" some fiber stuff they were playing with,
    which quickly becomes FCS (fibre-channel standard, including some stuff
    I had done in 1980), initially 1gbit/sec, full-duplex, aggregate
    200mbyte/sec. Then the POK "fiber" group gets their stuff released in
    the 90s with ES/9000 as ESCON, when it was already obsolete,
    17mbytes/sec. Then some POK engineers get involved with FCS and define a heavy-weight protocol that drastically cuts the native throughput which eventually ships as FICON. Most recent public benchmark I've found is
    z196 "Peak I/O" getting 2M IOPS using 104 FICON (over 104 FCS). About
    the same time a FCS was announced for E5-2600 server blades claiming
    over million IOPS (two such FCS having higher throughput than 104
    FICON). Note also, IBM documents keeping SAPs (system assist processors
    that do I/O) to 70% CPU (which would be more like 1.5M IOPS).

    after leaving IBM in early 90s, I was brought in as consultant into
    small client/server company, two former Oracle employees (that I had
    worked with on cluster scale-up for IBM HA/CMP) were there, responsible
    for something called "commerce server" doing credit card transactions,
    the startup had also done this invention called "SSL" they were using,
    it is now frequently called "electronic commerce". I had responsibility
    for everything between webservers and the financial payment networks. I
    then did a talk on "Why The Internet Wasn't Business Critical
    Dataprocessing" (that Postel sponsored at ISI/USC), based on the
    reliability, recovery & diagnostic software, procedures, etc I did for e-commerce. Payment networks had a requirement that their trouble desks
    doing first level problem determination within five minutes. Early
    trials had a major sports store chain doing internet e-commerce ads
    during week-end national football game half-times and there were
    problems being able to connect to payment networks for credit-card
    transactions ... after three hrs, it was closed as "NTF" (no trouble
    found).

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to EricP on Wed Jul 3 17:56:34 2024
    An Historical Perspective::

    EricP wrote:

    Michael S wrote:
    On Wed, 03 Jul 2024 08:44:01 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:

    An alternative is each device has its own DMA controller
    which is just a couple of counters.


    The main cost is not counters, but bus mastering logic.
    For PCI it was non-trivial cost even as late as year 2000. For example,
    PLX PCI to local bus bridges with bus mastering capability, like PCI
    9080 costed non-trivially more than slave-only 9060.

    PCI is a different matter. I think they shot themselves in the foot.
    That is because the PCI design used FAST TTL and was ridiculously
    complex and had all sorts of unnecessary optional features like bridges.

    I don't think it was as much "shot themselves in the foot" as it was
    not looking forward enough. CPUs had just dropped from 5.0V to 3.3V
    and few peripherals were going 3.3--yet.

    There were no real "popcorn" parts on 3.3V. CMOS was gradually taking
    over, but was "essentially" compatible voltage wise with TTL.

    To my eye the choice of FAST TTL looks wrong headed. They needed FAST
    because they wanted to run at 33 MHz which is beyond LS TTL limit.

    Was also faster than popcorn CMOS of the day.

    With a bus at 33 MHz and 4 bytes it superficially sounds like 133 MB/s.
    But 33 MHz was too fast to decode or latch the address and data,
    plus it multiplexes address and data, and took 5 cycles to do a
    transfer.
    So the bus actual data transfer rate was more like 133/5 = 26 MB/s.

    Welcome to "back when computers were hard".

    I looked into PCI bus interface chips when it first came out and there
    were just *TWO* manufacturers for them on the planet, and they charged $50,000 just to answer the phone, and you had to pay them to design a
    custom chip for you even though it was supposed to be a standard design.

    When PC shipped in the thousands per month this was the way things were.
    When PC started to ship hundreds of thousands per months things changed (early-mid 90s).

    This all pushed the price of PCI cards way up so, for example,
    an ISA bus modem card cost $50 but the exact same modem on PCI was $250.
    No wonder most people stuck with ISA bus cards.

    Exacerbating the above.

    Whereas 74LS TTL was cheap and available from manufacturers everywhere.
    I would have used 74LS TTL and done a straight 32-bit bus with no
    options,
    multiplexed address and data to keep down the connector pin count and
    cost.
    That could have run at 20 MHz which leaves 50 ns for address and data
    decode and latch, and driven 8 bus loads with no bridges.
    That gives 10 MT/s = 40 MB/s. Plus cheap and widely available.

    It would take "too many" TTL parts to implement a small form factor
    initerface, so integration was needed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Wed Jul 3 18:13:22 2024
    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    John Levine <johnl@taugh.com> writes:
    On the smallest
    360s, the channel was implemented in CPU microcode. When running fast >>devices like disks the channel used so much of the CPU that the
    program stalled, but it was worth it to be compatible with faster
    machines. Even then, disk seeks or tape rewinds or reading cards or >>printing on printers let the CPU do useful work while the channel and >>device did its thing.

    This sounds very much like hardware multi-threading to me: The CPU had >separate state for the channel and used its hardware for doing the
    channel stuff when there was I/O to do, while running the non-channel
    stuff the rest of the time, all without OS-level context switching.

    More likely it was just polling between instructions to switch between
    the CPU code and the channel code. The 360/30 was a very small machine
    by modern standards.

    The DEC 12 and 18 bit machines offered DMA in two flavors called
    one-cycle and three-cycle data break. The one-cycle would be familiar
    now, the device grabbed the bus, provided the address and control bits
    and then read or wrote a word. But in that era flip flops and counters
    were expensive, so they had three-cycle. The device sent a fixed
    address to the CPU, which then decremented the word at that address
    and sent the device a "done" signal if the result was zero, otherwise incremented the word at the next address and used its contents as the
    address to read or write the data provided from or to the device.

    It was like a very primitive channel that only knew how to do block
    transfers. Three cycle was of course three times as slow, but for most
    of the devices at the time, it was adequate.

    The PDP-6 and KA10 had their own version of three cycle data break
    which was too ugly to describe here.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to EricP on Wed Jul 3 21:25:50 2024
    On Wed, 03 Jul 2024 13:34:39 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:

    Michael S wrote:
    On Wed, 03 Jul 2024 08:44:01 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:

    An alternative is each device has its own DMA controller
    which is just a couple of counters.


    The main cost is not counters, but bus mastering logic.
    For PCI it was non-trivial cost even as late as year 2000. For
    example, PLX PCI to local bus bridges with bus mastering
    capability, like PCI 9080 costed non-trivially more than slave-only
    9060.

    PCI is a different matter. I think they shot themselves in the foot.
    That is because the PCI design used FAST TTL and was ridiculously
    complex and had all sorts of unnecessary optional features like
    bridges.


    Bridges were needed for the high-end. How else would you go over 4 or
    5 slots with crappy edge conector of standard PCI? How else would you
    go over 8-10 slots even with much much better Compact PCI connectors?
    Bridges do not work very well in read direction, but in write direction
    they do not impact performance at all.

    To my eye the choice of FAST TTL looks wrong headed. They needed FAST
    because they wanted to run at 33 MHz which is beyond LS TTL limit.
    With a bus at 33 MHz and 4 bytes it superficially sounds like 133
    MB/s. But 33 MHz was too fast to decode or latch the address and data,
    plus it multiplexes address and data, and took 5 cycles to do a
    transfer. So the bus actual data transfer rate was more like 133/5 =
    26 MB/s.


    But bursts work as advertaized.
    We designed many boards PCI 32bx33MHz that sustained over 90 MB/s in
    host memory to device direction and over 100 MB/s in device to host
    memory.
    Few still in produdction, although we will eventually move awayy from
    this architectore for reason unrelated to system bus.

    I looked into PCI bus interface chips when it first came out and there
    were just *TWO* manufacturers for them on the planet, and they charged $50,000 just to answer the phone, and you had to pay them to design a
    custom chip for you even though it was supposed to be a standard
    design. This all pushed the price of PCI cards way up so, for example,
    an ISA bus modem card cost $50 but the exact same modem on PCI was
    $250. No wonder most people stuck with ISA bus cards.


    Sounds like very early days.

    Whereas 74LS TTL was cheap and available from manufacturers
    everywhere. I would have used 74LS TTL and done a straight 32-bit bus
    with no options, multiplexed address and data to keep down the
    connector pin count and cost. That could have run at 20 MHz which
    leaves 50 ns for address and data decode and latch, and driven 8 bus
    loads with no bridges. That gives 10 MT/s = 40 MB/s. Plus cheap and
    widely available.


    Pay attention that nothing of what you wrote above has anything to do
    with difference between bus-mastering PCI devices and slave-only PCI
    devices.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Wed Jul 3 19:09:23 2024
    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 03 Jul 2024 13:08:31 +0100
    Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

    On 2 Jul 2024, MitchAlsup1 wrote
    (in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

    Once you recognize that I/O is eating up your precious CPU, and you
    get to the point you are willing to expend another fixed programmed
    device to make the I/O burden manageable, then you basically have
    CDC 6600 Peripheral Processors, programmed in code or microcode.

    The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
    They all did DMA, interrupting only at the end of the transfer.
    Slow devices accessed the core store for each character,
    fast devices did so for each word.

    This was mediated by one of the KDF9's many state machines,
    I/O Control, which multiplexed core requests from devices
    and interrupted the CPU at the end of a transfer
    if the transfer had been initiated by a program
    of higherCPU priority than the one currently running,
    or if there was a possibility of priority inversion.

    I/O Control also autonomously re-issued an I/O command
    to a device that reported a parity error
    if that device was capable of retrying the transfer
    (e.g. MT controllers could backspace a block and re-read).


    That sounds quite advanced.
    But when I try to compare with contemporaries, like S/360 Model 65, it >appears that despite advances KDF9 was not competitive to maximally >configured 65 because of shortage of main memory.

    The contemporaneous Burroughs B3500 I/O subsystem
    fully supported asynchronous DMA transfers with no
    CPU intervention.

    The operating system issued an I/O request using the
    IIO (Initiate I/O Instruction). The data payload
    was a 16-bit field - the first four bits were
    a one-hot field that selected one of:
    - Echo /* Diagnostic test of DMA hardware */
    - Read /* Move information from device to memory */
    - Write /* Move information from memory to device */
    - Test /* Miscellaneous operations */

    The remaining 12 bits encoded controller-specific
    options (like a space count for tapes, channel
    number for line printers, card data format (BCL, EBCDIC),
    etc.)

    The instruction provided a buffer start and buffer end
    address and an optional 24-bit or 32-bit field that
    would select disk sectors, etc. These were kept by
    the IOP (I/O Processor), the base address would
    be updated as each transaction consumed or filled
    the buffer and the final addresses available to the CPU via a
    RAD instruction after the I/O complete interrupt
    (e.g. to determine a short read, or partial write).

    When the operation was complete, the IOP would
    store a 16 to 48 bit 'Result Descriptor' in
    memory and generate an interrupt to the CPU.

    The CPU would check the R/D for errors and
    reinstate the process waiting for the I/O.

    Host configurations ranged from 8 to 64 channels,
    each of which could have one or more attached
    - Card Reader/Punch
    - Magnetic Tape Drive
    - Disk/HPT/Pack
    - Data Comm processor
    - Line Printer
    - et alia

    Disk channels attached to disk pack
    controllers, each of which had a string
    of up to 16 drives. Multiple channels
    could be multiplexed with multiple disk
    pack controllers sharing a string of
    drives through a disk exchange controller
    supporting 8 hosts sharing 16 drives; with
    operating system and hardware support
    for loosely coupled shared filesystems.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Scott Lurndal on Wed Jul 3 19:32:20 2024
    Scott Lurndal wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 03 Jul 2024 13:08:31 +0100
    Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

    On 2 Jul 2024, MitchAlsup1 wrote
    (in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

    Once you recognize that I/O is eating up your precious CPU, and
    you >> > get to the point you are willing to expend another fixed
    programmed >> > device to make the I/O burden manageable, then you
    basically have >> > CDC 6600 Peripheral Processors, programmed in
    code or microcode. >>
    The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
    They all did DMA, interrupting only at the end of the transfer.
    Slow devices accessed the core store for each character,
    fast devices did so for each word.

    This was mediated by one of the KDF9's many state machines,
    I/O Control, which multiplexed core requests from devices
    and interrupted the CPU at the end of a transfer
    if the transfer had been initiated by a program
    of higherCPU priority than the one currently running,
    or if there was a possibility of priority inversion.

    I/O Control also autonomously re-issued an I/O command
    to a device that reported a parity error
    if that device was capable of retrying the transfer
    (e.g. MT controllers could backspace a block and re-read).


    That sounds quite advanced.
    But when I try to compare with contemporaries, like S/360 Model 65,
    it appears that despite advances KDF9 was not competitive to
    maximally configured 65 because of shortage of main memory.

    The contemporaneous Burroughs B3500 I/O subsystem
    fully supported asynchronous DMA transfers with no
    CPU intervention.


    snipped description

    Yes, that is an example of the kind of thing to which I was referring
    in my response to Mitch's post. A question. Was all of this pure
    hardware, or was it microcoded?



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Stephen Fuld on Wed Jul 3 20:59:04 2024
    "Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:
    Scott Lurndal wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 03 Jul 2024 13:08:31 +0100
    Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

    On 2 Jul 2024, MitchAlsup1 wrote
    (in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

    Once you recognize that I/O is eating up your precious CPU, and
    you >> > get to the point you are willing to expend another fixed
    programmed >> > device to make the I/O burden manageable, then you
    basically have >> > CDC 6600 Peripheral Processors, programmed in
    code or microcode. >>
    The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
    They all did DMA, interrupting only at the end of the transfer.
    Slow devices accessed the core store for each character,
    fast devices did so for each word.

    This was mediated by one of the KDF9's many state machines,
    I/O Control, which multiplexed core requests from devices
    and interrupted the CPU at the end of a transfer
    if the transfer had been initiated by a program
    of higherCPU priority than the one currently running,
    or if there was a possibility of priority inversion.

    I/O Control also autonomously re-issued an I/O command
    to a device that reported a parity error
    if that device was capable of retrying the transfer
    (e.g. MT controllers could backspace a block and re-read).


    That sounds quite advanced.
    But when I try to compare with contemporaries, like S/360 Model 65,
    it appears that despite advances KDF9 was not competitive to
    maximally configured 65 because of shortage of main memory.

    The contemporaneous Burroughs B3500 I/O subsystem
    fully supported asynchronous DMA transfers with no
    CPU intervention.


    snipped description

    Yes, that is an example of the kind of thing to which I was referring
    in my response to Mitch's post. A question. Was all of this pure
    hardware, or was it microcoded?

    In the 60's, mostly pure hardware (although that was before
    I worked for Burroughs). In the 70-90's, they used commodity
    microprocessors for the IOPs (8080, 8085 and 80186).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Stephen Fuld on Wed Jul 3 21:18:19 2024
    Stephen Fuld wrote:

    Scott Lurndal wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 03 Jul 2024 13:08:31 +0100
    Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

    On 2 Jul 2024, MitchAlsup1 wrote
    (in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

    Once you recognize that I/O is eating up your precious CPU, and
    you >> > get to the point you are willing to expend another fixed
    programmed >> > device to make the I/O burden manageable, then you
    basically have >> > CDC 6600 Peripheral Processors, programmed in
    code or microcode. >>
    The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
    They all did DMA, interrupting only at the end of the transfer.
    Slow devices accessed the core store for each character,
    fast devices did so for each word.

    This was mediated by one of the KDF9's many state machines,
    I/O Control, which multiplexed core requests from devices
    and interrupted the CPU at the end of a transfer
    if the transfer had been initiated by a program
    of higherCPU priority than the one currently running,
    or if there was a possibility of priority inversion.

    I/O Control also autonomously re-issued an I/O command
    to a device that reported a parity error
    if that device was capable of retrying the transfer
    (e.g. MT controllers could backspace a block and re-read).


    That sounds quite advanced.
    But when I try to compare with contemporaries, like S/360 Model 65,
    it appears that despite advances KDF9 was not competitive to
    maximally configured 65 because of shortage of main memory.

    The contemporaneous Burroughs B3500 I/O subsystem
    fully supported asynchronous DMA transfers with no
    CPU intervention.


    snipped description

    Yes, that is an example of the kind of thing to which I was referring
    in my response to Mitch's post. A question. Was all of this pure
    hardware, or was it microcoded?

    S.E.L created a thing they called the RCU (Remote Control Unit).
    It was basically a channel with writable microcode. NASA bought
    a bunch of them because they had tapes from the deep space radio
    telescopes where an entire 9-track tape contained 1 record. NASA
    just started the tape and recorded satellite data until the end
    of the tape, where they would start the next tape just before the
    end of the previous tape.

    So we programmed the RCU to read as much as the system memory
    allowed, backed the tap up 1 second while dumping the data to
    disk. Then we started the tape forward with the RCU watching
    the pattern on the tape, when it detected 4096 bytes of the
    last read, it would start streaming data in to memory again.

    No other company could demonstrate that they could read one
    of those tapes.

    Presto, reading a whole 9-track tape with no inter record gaps !!

    {{I may have the name of the unit wrong}}

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Levine on Thu Jul 4 15:48:45 2024
    John Levine <johnl@taugh.com> schrieb:

    IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
    not granted until 1974. The patent is 488 pages long including 409
    pages of figures, 130 columns of narrative text, and 91 claims.

    https://patents.google.com/patent/US3812475A/en

    What a monster.

    I've written long patents myself, but this one surely takes the
    biscuit.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Thomas Koenig on Thu Jul 4 16:56:29 2024
    Thomas Koenig wrote:

    John Levine <johnl@taugh.com> schrieb:

    IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
    not granted until 1974. The patent is 488 pages long including 409
    pages of figures, 130 columns of narrative text, and 91 claims.

    https://patents.google.com/patent/US3812475A/en

    What a monster.

    I've written long patents myself, but this one surely takes the
    biscuit.

    The amalgamation of the figures and the placement of the figures
    via the figure placement "figure" enable one to directly implement
    the device in logic.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Michael S on Thu Jul 4 12:32:05 2024
    Michael S wrote:
    On Wed, 03 Jul 2024 13:34:39 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:

    Michael S wrote:
    On Wed, 03 Jul 2024 08:44:01 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:

    An alternative is each device has its own DMA controller
    which is just a couple of counters.

    The main cost is not counters, but bus mastering logic.
    For PCI it was non-trivial cost even as late as year 2000. For
    example, PLX PCI to local bus bridges with bus mastering
    capability, like PCI 9080 costed non-trivially more than slave-only
    9060.
    PCI is a different matter. I think they shot themselves in the foot.
    That is because the PCI design used FAST TTL and was ridiculously
    complex and had all sorts of unnecessary optional features like
    bridges.


    Bridges were needed for the high-end. How else would you go over 4 or
    5 slots with crappy edge conector of standard PCI? How else would you
    go over 8-10 slots even with much much better Compact PCI connectors?
    Bridges do not work very well in read direction, but in write direction
    they do not impact performance at all.

    Bridges were needed because PCI's design only could drive 3 or 4 slots
    In 1992 when PCI launched most people needed more that 3 to 4 card slots because almost everything required a plug-in card.

    PCI could only drive 3 to 4 slots because they were running at 33 MHz and,
    as I understand it, signal reflections limit the numbers of bus loads.

    To my eye the choice of FAST TTL looks wrong headed. They needed FAST
    because they wanted to run at 33 MHz which is beyond LS TTL limit.
    With a bus at 33 MHz and 4 bytes it superficially sounds like 133
    MB/s. But 33 MHz was too fast to decode or latch the address and data,
    plus it multiplexes address and data, and took 5 cycles to do a
    transfer. So the bus actual data transfer rate was more like 133/5 =
    26 MB/s.


    But bursts work as advertaized.
    We designed many boards PCI 32bx33MHz that sustained over 90 MB/s in
    host memory to device direction and over 100 MB/s in device to host
    memory.
    Few still in produdction, although we will eventually move awayy from
    this architectore for reason unrelated to system bus.

    With one card doing a burst on a bus with no bridges.
    But as soon as you have to turn around bus ownership, with multiple
    masters competing for bus access, then you have to pay the protocol
    overhead and the effective data transfer rate drops considerably.
    And add a bridge because you need more than 3-4 cards, even lower.

    I looked into PCI bus interface chips when it first came out and there
    were just *TWO* manufacturers for them on the planet, and they charged
    $50,000 just to answer the phone, and you had to pay them to design a
    custom chip for you even though it was supposed to be a standard
    design. This all pushed the price of PCI cards way up so, for example,
    an ISA bus modem card cost $50 but the exact same modem on PCI was
    $250. No wonder most people stuck with ISA bus cards.


    Sounds like very early days.

    Whereas 74LS TTL was cheap and available from manufacturers
    everywhere. I would have used 74LS TTL and done a straight 32-bit bus
    with no options, multiplexed address and data to keep down the
    connector pin count and cost. That could have run at 20 MHz which
    leaves 50 ns for address and data decode and latch, and driven 8 bus
    loads with no bridges. That gives 10 MT/s = 40 MB/s. Plus cheap and
    widely available.


    Pay attention that nothing of what you wrote above has anything to do
    with difference between bus-mastering PCI devices and slave-only PCI
    devices.

    As I described the simpler bus design can drive 8 to 10 slots directly,
    so most systems would not need more card slots, so no bridges.

    It also has no interface cost difference for a bus mastering device.
    Of course as bus master cards have extra functionality they have extra
    logic, like two 32-bit counters for each DMA they wish to perform
    and a little logic to handshake the bus request-grant lines.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to All on Thu Jul 4 13:33:36 2024
    MitchAlsup1 wrote:
    An Historical Perspective::

    EricP wrote:

    Michael S wrote:
    On Wed, 03 Jul 2024 08:44:01 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:

    An alternative is each device has its own DMA controller
    which is just a couple of counters.


    The main cost is not counters, but bus mastering logic.
    For PCI it was non-trivial cost even as late as year 2000. For example,
    PLX PCI to local bus bridges with bus mastering capability, like PCI
    9080 costed non-trivially more than slave-only 9060.

    PCI is a different matter. I think they shot themselves in the foot.
    That is because the PCI design used FAST TTL and was ridiculously
    complex and had all sorts of unnecessary optional features like bridges.

    I don't think it was as much "shot themselves in the foot" as it was
    not looking forward enough. CPUs had just dropped from 5.0V to 3.3V
    and few peripherals were going 3.3--yet.

    There were no real "popcorn" parts on 3.3V. CMOS was gradually taking
    over, but was "essentially" compatible voltage wise with TTL.

    Except this is an I/O bus not a system bus. A BIG part of it's job
    is to be a market place, the basis of a standard ecosystem.
    So no I would not have a 3.3v option as that just fragments the market, increase parts inventory, lowers production runs, and drives up cost.
    3.3v was another PCI option.

    The internal system bus is of course free to do whatever it wants.

    To my eye the choice of FAST TTL looks wrong headed. They needed FAST
    because they wanted to run at 33 MHz which is beyond LS TTL limit.

    Was also faster than popcorn CMOS of the day.

    I worked with 4000 series CMOS in late 1970's.
    One used RCA 1802 processor in a data acquisition and recording system
    for a towed side-scan sonar. I also designed a digital tape controller
    using 4000 CMOS for an airborne data acquisition system.
    In both cases power was the primary consideration.

    But for an I/O bus circa 1990, LS TTL seems like the best trade off
    at that time.

    With a bus at 33 MHz and 4 bytes it superficially sounds like 133 MB/s.
    But 33 MHz was too fast to decode or latch the address and data,
    plus it multiplexes address and data, and took 5 cycles to do a
    transfer.
    So the bus actual data transfer rate was more like 133/5 = 26 MB/s.

    Welcome to "back when computers were hard".

    I looked into PCI bus interface chips when it first came out and there
    were just *TWO* manufacturers for them on the planet, and they charged
    $50,000 just to answer the phone, and you had to pay them to design a
    custom chip for you even though it was supposed to be a standard design.

    When PC shipped in the thousands per month this was the way things were.
    When PC started to ship hundreds of thousands per months things changed (early-mid 90s).

    This all pushed the price of PCI cards way up so, for example,
    an ISA bus modem card cost $50 but the exact same modem on PCI was $250.
    No wonder most people stuck with ISA bus cards.

    Exacerbating the above.

    Exactly. Because the I/O bus is a meeting place.
    You have to design it to be inviting, which includes implementation cost.

    This was back just after the "bus wars" era where many big players
    were trying to grab control of the PC market with their proprietary
    (and patented) next generation bus. PCI was hoped to resolve this
    and provide that standard, open market, but failed because it didn't
    address what card suppliers and customers wanted.

    Whereas 74LS TTL was cheap and available from manufacturers everywhere.
    I would have used 74LS TTL and done a straight 32-bit bus with no
    options,
    multiplexed address and data to keep down the connector pin count and
    cost.
    That could have run at 20 MHz which leaves 50 ns for address and data
    decode and latch, and driven 8 bus loads with no bridges.
    That gives 10 MT/s = 40 MB/s. Plus cheap and widely available.

    It would take "too many" TTL parts to implement a small form factor initerface, so integration was needed.

    This is circa 1990 so the choice looks limited.
    one could use LS TTL for the bus interface and
    NMOS LSI parts for the rest of a card.

    The problem is current draw and heat dissipation in bus drivers.
    But I figure, say, 50 wire bus, LS TTL drivers, 10 card loads,
    worst case is about 50*10*1.6 ma = 0.8 A, 4.0 W, in the bus master.
    Average case should be 1/8 that = 0.1 A, 0.5 W in a bus master
    (assuming the I/O bus is 50% busy, with 5 loads, 1/2 bits are zeros).
    That seems quite acceptable for the bus interface.
    And of course there is the power for the rest of the card logic.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to EricP on Thu Jul 4 22:01:51 2024
    On Thu, 04 Jul 2024 13:33:36 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:

    This was back just after the "bus wars" era where many big players
    were trying to grab control of the PC market with their proprietary
    (and patented) next generation bus. PCI was hoped to resolve this
    and provide that standard, open market, but failed because it didn't
    address what card suppliers and customers wanted.


    PCI didn't fail. It was a stunniing success.
    With emergence of PCI anything else either died at spot (IBM
    Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
    and off-springs).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Michael S on Thu Jul 4 16:15:14 2024
    Michael S wrote:
    On Thu, 04 Jul 2024 13:33:36 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:
    This was back just after the "bus wars" era where many big players
    were trying to grab control of the PC market with their proprietary
    (and patented) next generation bus. PCI was hoped to resolve this
    and provide that standard, open market, but failed because it didn't
    address what card suppliers and customers wanted.


    PCI didn't fail. It was a stunniing success.
    With emergence of PCI anything else either died at spot (IBM
    Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
    and off-springs).

    Also vying for attention in the wars were Multibus I & II (pushed by Intel), Futurebus (pushed by DEC), IIRC Apple had its own derivative of Futurebus
    but different (of course) so you had to buy Apple devices,
    NuBus, FASTBUS, Q-Bus.

    PCI failed from the view that it was supposed to replace the 16-bit ISA
    bus on PC's, but because of PCI card cost customers demanded that
    motherboards continue to support ISA. Also plug-n-play made ISA board
    support a lot easier. Motherboards had at least to support AGP,
    the PCI variant for graphics cards so they couldn't eliminate PCI.
    So PC motherboards and device manufacturers wound up supporting both.
    Not what anyone planned or wanted.

    Where PCI succeeded I suppose is, as you point out, outside the PC market
    it killed off all the others. But that left those systems tied to the
    higher cost cards, increasing their cost relative to PC's.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Michael S on Thu Jul 4 20:20:34 2024
    Michael S wrote:

    On Thu, 04 Jul 2024 13:33:36 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:

    This was back just after the "bus wars" era where many big players
    were trying to grab control of the PC market with their proprietary
    (and patented) next generation bus. PCI was hoped to resolve this
    and provide that standard, open market, but failed because it didn't
    address what card suppliers and customers wanted.


    PCI didn't fail. It was a stunniing success.

    I am going to recommend this sentence as the largest understatement
    of they year (so far).

    With emergence of PCI anything else either died at spot (IBM
    Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
    and off-springs).

    And were supposed to.......

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to mitchalsup@aol.com on Thu Jul 4 21:15:46 2024
    MitchAlsup1 <mitchalsup@aol.com> schrieb:
    Thomas Koenig wrote:

    John Levine <johnl@taugh.com> schrieb:

    IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
    not granted until 1974. The patent is 488 pages long including 409
    pages of figures, 130 columns of narrative text, and 91 claims.

    https://patents.google.com/patent/US3812475A/en

    What a monster.

    I've written long patents myself, but this one surely takes the
    biscuit.

    The amalgamation of the figures and the placement of the figures
    via the figure placement "figure" enable one to directly implement
    the device in logic.

    That is, of course, very nice.

    But the sheer number of claims, 91, with around than half of them
    indpendent (but quite a few formulated as "in combination", so there
    may have been some dependency to other claims hidden in there...
    must have taken the competition quite some time to figure out
    what was actually covered, and if their own designs fell under
    that patent or not.

    And then it was granted after ~ 20 years, and continued to be
    valid for another ~ 20 - US patent law used to be weird.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Thu Jul 4 22:00:09 2024
    According to Thomas Koenig <tkoenig@netcologne.de>:
    IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
    not granted until 1974. The patent is 488 pages long including 409
    pages of figures, 130 columns of narrative text, and 91 claims.

    https://patents.google.com/patent/US3812475A/en

    But the sheer number of claims, 91, with around than half of them
    indpendent (but quite a few formulated as "in combination", so there
    may have been some dependency to other claims hidden in there...
    must have taken the competition quite some time to figure out
    what was actually covered, and if their own designs fell under
    that patent or not.

    And then it was granted after ~ 20 years, and continued to be
    valid for another ~ 20 - US patent law used to be weird.

    It is unusual for a patent to take that long without either the
    inventor deliberately delaying it with endless amendments or it being classified, neither of which seems relevant here.

    You can't challenge other people for violating a patent until it's
    issued, and by 1974 channels were rather old news. I never heard of
    IBM enforcing it. They probably put it in the patent pool they cross
    licensed to other computer makers.

    "IBM's Early Computers" says almost nothing about channels other than
    that they were invented for the 709 and added to the last version of
    the 705.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Thomas Koenig on Thu Jul 4 23:47:30 2024
    Thomas Koenig wrote:

    MitchAlsup1 <mitchalsup@aol.com> schrieb:
    Thomas Koenig wrote:

    John Levine <johnl@taugh.com> schrieb:

    IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
    not granted until 1974. The patent is 488 pages long including 409
    pages of figures, 130 columns of narrative text, and 91 claims.

    https://patents.google.com/patent/US3812475A/en

    What a monster.

    I've written long patents myself, but this one surely takes the
    biscuit.

    The amalgamation of the figures and the placement of the figures
    via the figure placement "figure" enable one to directly implement
    the device in logic.

    That is, of course, very nice.

    But the sheer number of claims, 91, with around than half of them
    indpendent (but quite a few formulated as "in combination", so there
    may have been some dependency to other claims hidden in there...
    must have taken the competition quite some time to figure out
    what was actually covered, and if their own designs fell under
    that patent or not.

    It was the first !

    And then it was granted after ~ 20 years, and continued to be
    valid for another ~ 20 - US patent law used to be weird.

    Much of the time, it is USPTO that has to bring an examiner up
    to speed and completely digest the topic, this sets off a flurry
    of notifications for clarification, followed by changes to the
    text, claims, figures. All the wile that is going on, the examiner
    is looking across his library for similar already patented "stuff".

    All of that takes time measured in months and years.

    During my tenure, I averaged 5 years from invention to grant or
    reject, with 18 months from "at USPTO" to first correspondence.
    Generally the examiner has found dozens to hundreds of discrepancies,
    claim formation violations, figure violations,...and you get to
    fix them all before [s]he begins anew--this goes on multiple
    times.

    After my tenure, I wrote my own patent and submitted it via my
    patent lawyer, and waited and waited. Finally after 26 months,
    I got my first correspondence: [S]he complained about one
    sub-clause on one claim, and did not like the wording of one
    paragraph. We fixed that and had the patent granted in 2 months.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Levine on Thu Jul 4 23:39:41 2024
    John Levine wrote:

    According to Thomas Koenig <tkoenig@netcologne.de>:
    IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but >>>>> not granted until 1974. The patent is 488 pages long including 409
    pages of figures, 130 columns of narrative text, and 91 claims.

    https://patents.google.com/patent/US3812475A/en

    But the sheer number of claims, 91, with around than half of them >>indpendent (but quite a few formulated as "in combination", so there
    may have been some dependency to other claims hidden in there...
    must have taken the competition quite some time to figure out
    what was actually covered, and if their own designs fell under
    that patent or not.

    And then it was granted after ~ 20 years, and continued to be
    valid for another ~ 20 - US patent law used to be weird.

    It is unusual for a patent to take that long without either the
    inventor deliberately delaying it with endless amendments or it being classified, neither of which seems relevant here.

    You can't challenge other people for violating a patent until it's
    issued, and by 1974 channels were rather old news. I never heard of
    IBM enforcing it. They probably put it in the patent pool they cross
    licensed to other computer makers.

    In general, IBM uses its patent portfolio in a defensive posture.

    Imagine you are the employee of xyz corporation and want to assert
    your newly granted patent onto IBM.

    IBM will simply show you that they have 400,000 current patents that
    they will assert back on you if you try. Most of the time, xyz corp
    cannot afford to even read all of IBM's patents and remain with
    positive cash flow. Often xuz corporation does not have enough
    employees to read all IBM's patents in the duration their new
    patent remains valid; and they certainly cannot afford to hire
    lawyers to do it.

    "IBM's Early Computers" says almost nothing about channels other than
    that they were invented for the 709 and added to the last version of
    the 705.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joe Pfeiffer@21:1/5 to John Levine on Thu Jul 4 18:39:21 2024
    John Levine <johnl@taugh.com> writes:

    The 709 introduced data channels in 1958 which allowed the CPU to do
    other stuff while the channel did the I/O. Wikipedia says the first
    I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
    an I/O interrupt would be of much use before channels. Once you had a channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    The day the CPU became faster than a teletype (or any other IO device
    you care to name) interrupts became useful. Get an interrupt saying the teletype is ready, send a character, go back to work, repeat.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joe Pfeiffer@21:1/5 to EricP on Thu Jul 4 18:30:49 2024
    EricP <ThatWouldBeTelling@thevillage.com> writes:

    Michael S wrote:
    On Thu, 04 Jul 2024 13:33:36 -0400
    EricP <ThatWouldBeTelling@thevillage.com> wrote:
    This was back just after the "bus wars" era where many big players
    were trying to grab control of the PC market with their proprietary
    (and patented) next generation bus. PCI was hoped to resolve this
    and provide that standard, open market, but failed because it didn't
    address what card suppliers and customers wanted.

    PCI didn't fail. It was a stunniing success.
    With emergence of PCI anything else either died at spot (IBM
    Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
    and off-springs).

    Also vying for attention in the wars were Multibus I & II (pushed by Intel), Futurebus (pushed by DEC), IIRC Apple had its own derivative of Futurebus
    but different (of course) so you had to buy Apple devices,
    NuBus, FASTBUS, Q-Bus.

    PCI failed from the view that it was supposed to replace the 16-bit ISA
    bus on PC's, but because of PCI card cost customers demanded that motherboards continue to support ISA. Also plug-n-play made ISA board
    support a lot easier. Motherboards had at least to support AGP,
    the PCI variant for graphics cards so they couldn't eliminate PCI.
    So PC motherboards and device manufacturers wound up supporting both.
    Not what anyone planned or wanted.

    Where PCI succeeded I suppose is, as you point out, outside the PC market
    it killed off all the others. But that left those systems tied to the
    higher cost cards, increasing their cost relative to PC's.

    My recollection is that ISA hung on for a while to support the ISA cards
    we already had. As the old cards went out of service demand for ISA
    support went with them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Fri Jul 5 09:59:39 2024
    Lawrence D'Oliveiro wrote:
    On Tue, 02 Jul 2024 17:36:50 -1000, Lynn Wheeler wrote:

    When doing IBM's HA/CMP and working with major RDBMS vendors on cluster
    scaleup in late 80s/early 90s, there was lots of references to POSIX
    light-weight threads ...

    Threads were all the rage in the 1990s. People were using them for everything. One language (Java) absorbed threading right into its core DNA (where is the locking API? Oh, it’s attached to the base “Object” type itself!).

    People backed off a bit after that. Nowadays we see a revival of the “coroutine” idea, where preemption only happens at explicit “await” points. For non-CPU-intensive workloads, this is much easier to cope with.

    ... and asynchronous I/O for RDBMS (with no buffer > copies) and the
    RDBMS managing large record cache.

    This is why POSIX has the disk-oriented “aio” API, for the diehard DBMS folks. Linux also added “io_uring”, for high performance but not disk- specific I/O.

    Really old PC printers (dot matrix or similar) still had a line buffer
    worth of on-device memory, enough so that the CPU could sit in a buzy
    loop sending bytes over the Centronix interface until it got the "I'm
    full" status bit back, or N bytes had been sent. At that point my disk
    spooler code would back off and let the next timer interrupt check to
    see if there was both more print data to be sent and the printer
    signalled that it was ready to receive more data.

    The original serial ports had no buffer at all, so you had to use
    interrupts for both sending and receiving if you wanted to do anything
    else while communicating. Around 1984 the serial ports gained a 16-byte buffer, so at that time it made sense to use a hybrid approach, filling/emptying the buffer on each interrupt, while reducing the
    interrupt load by an order of magnitude.

    I.e. acting somewhat like a channel program, but using the main/only cpu
    to do all the work in the background.
    Terje

    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Joe Pfeiffer on Fri Jul 5 09:52:45 2024
    Joe Pfeiffer wrote:
    John Levine <johnl@taugh.com> writes:
    The 709 introduced data channels in 1958 which allowed the CPU to do
    other stuff while the channel did the I/O. Wikipedia says the first
    I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
    an I/O interrupt would be of much use before channels. Once you had a
    channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    The day the CPU became faster than a teletype (or any other IO device
    you care to name) interrupts became useful. Get an interrupt saying the teletype is ready, send a character, go back to work, repeat.

    Not just serial connections, a parallel output port also could use
    an interrupt if it was connected to a slow device. The port write
    sets a "full" flag on the output, the slow device sees the full flag
    and eventually reads the parallel value, clearing the full flag,
    causing an "empty" interrupt back to the sender.

    This basic handshake shows up lots of places but an interrupt
    is only beneficial if the cost of servicing the interrupt is
    lower than the cost of spin-waiting for the empty signal.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Fri Jul 5 20:02:23 2024
    According to Joe Pfeiffer <pfeiffer@cs.nmsu.edu>:
    ... Once you had a
    channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    The day the CPU became faster than a teletype (or any other IO device
    you care to name) interrupts became useful. Get an interrupt saying the >teletype is ready, send a character, go back to work, repeat.

    That's certainly the model that DEC used in the PDP-1 and their other
    minis. Lightweight interrupts and simple device controllers worked
    for them. But the tradeoffs can be a lot more complicated.

    Let us turn back to the late, not very lamented IBM 1130 mini. It
    usually came with an 1132 printer which printed about 100
    lines/minute. A drum rotated behind the paper with 48 rows of
    characters, each row being all the same character. In front of the
    paper was the ribbon and a row of solenoid driven hammers.

    When the 1130 wanted to print a line, it started the printer, which
    would then tell it what the upcoming character was on the drum. The
    computer then had less than 10ms to scan the line of characters to be
    printed and put a bit map saying which solenoids to fire into fixed
    locations in low memory that the printer then fetched using DMA.
    Repeat until all of the characters were printed, and tell the printer
    to advance the paper.

    Given the modest speed of the 1130, while it was printing a line it
    couldn't do anything else. But it was even worse than that. There were
    two models of 1130, fast and slow, with the difference being a delay
    circuit. The slow model couldn't produce the bitmaps fast enough, so
    there was a "print mode" that disabled the delay circuit while it was
    printing. As you might expect, students quickly figured out how to put
    their 1130s into print mode all the time.

    The printer interrupted after a paper move was complete, giving the
    computer some chance to compute the next line to print in the
    meantime. To skip to the top of the next page or other paper motion,
    it told the printer to start moving the paper, and a hole in which row
    in the carriage control tape (look it up) to wait for. When the hole
    came around, the printer interrupted the CPU which then told the
    printer to stop the paper.

    The other printer was a 1403 which had 300 and 600 LPM models. Its
    print mechanism was sort of similar, a horizontal chain of characters
    spinning behind the paper, but that made the hammer management harder
    since what character was at what position changed every character
    time. But that wasn't the CPU's problem. The 1403 used its own unique
    character code probably related to the layout of the print chain, so
    the CPU translated the line into printer code, stored the result in a
    buffer, and then sent a command to the printer telling it to print the
    buffer. The printer printed, then interrupted, at which point the CPU
    told it to either space one line or skip to row N in the carriage
    control tape, again interrupting when done.

    By putting most of the logic into the printer controller, the 1403 was
    not just faster, but only took a small fraction of the CPU so the
    whole system could do more work to keep the printer printing.

    The point of this long anecdote is that you don't just want an
    interrupt when the CPU is a little faster than the device. At least in
    that era you wanted to offload as much work as possible so the CPU
    could keep the device going and balance the speed of the CPU and
    the devices.

    As a final note, keep in mind when you look at the 400 page patent on
    the 709's channel that the logic was built entirely out of vacuum
    tubes, and was not a lot less complex than the computer to which it
    was attached. A basic 709 rented for $10K/mo (about $100K now) and
    each channel was $3600/mo ($37K now). But the speed improvement
    was worth it.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to John Levine on Fri Jul 5 15:35:50 2024
    John Levine <johnl@taugh.com> writes:
    By putting most of the logic into the printer controller, the 1403 was
    not just faster, but only took a small fraction of the CPU so the
    whole system could do more work to keep the printer printing.

    360 "CKD DASD" and multi-track search trade-off. 360s had relatively
    little real storage (for caching information) and slow processor, so for program libraries on disk ... they created "PDS" format and had (disk
    resident, cylinder aligned) directory that contained records for name of
    each program and its disk location in the library. To load a program, it
    first did a "multi-track" search of of the PDS directory started at
    track 0 of the 1st cylinder of the directory ... ran until it found name
    match (or reached end of cylinder). If name wasn't found at end of
    cylinder, it would restart if there were additional cylinders in the
    directory. Trivia: the searched-for program name was in processor memory
    and the multi-track search operation would refetch the name every time
    it did a compare for matching name (with records in the PDS directory), monopolizing channel, controller, & disk.

    Roll forward to 1979, a large national grocery chain had large
    loosely-coupled complex of multiple 370/168 systems sharing string of
    DASD containing the PDS dataset of store controller applications ... and
    was experiencing enormous throughput problems. All the usual corporate performance specialists had been dragged through the datacenter with
    hopes that they could address the problem ... until they eventually got
    around to calling me. I was brought into large classroom with tables
    covered with large stacks of activity/performance reports for each
    system. After 30-40 mins examaning the reports ... I being to realize
    the aggregate activity (summed across all systems) for specific shared
    disk was peaking at 6-7 (total) I/O operations ... and corresponding
    with severe performance problem. I asked what was on that disk and was
    told it was the (shared) store controller program library for all the
    stores in all regions and 168 systems; which I then strongly suspected
    it was the PDS multi-track search perfoerformance that I had grappled
    with as undergraduate in the 60s.

    The store controller PDS dataset was quite large and had a three
    cylinder directory, resident on 3330 disk drive ... implying that on the
    avg, a search required 1.5 cylinders (and two I/Os), the first
    multi-track search I/O for all 19 cylinders would be 19/60=.317sec
    (during which time that processor's channel was busy, and the shared
    controller was also busy ... blocking access to all disks on that
    string, not just the speecific drive, for all systems in the complex)
    and the 2nd would be 9.5/60=.158sec ... or .475sec for the two ... plus
    a seek to move the disk arm to PDS directory, another seek to move the
    disk arm to the cylinder where the program was located
    ... approx. .5+secs total for each store controller program library load (involving 6-7 I/Os) or two program loads per second aggregate serving
    all stores in the country.

    The store controller PDS program library was then split across set of
    three disks, one dedicated (non-shared) set for each system in the in
    the complex.

    I was also doing some work on System/R (original sql/releational RDBMS)
    and taking some flak from the IMS DBMS group down the road. The IMS
    group were complaining that RDBMS had twice the disk space (for RDBMS
    index) and increased the number of disk I/Os by 4-5 times (for
    processing RDBMS index). Counter was that the RDBMS index significantly
    reduced the manual maintenance (compared to IMS). By early 80s, disk
    price/bit was significantly plummeting and system real memory
    significantly increased useable for RDBMS caching, reducing physical
    I/Os (while manual maintenance skills costs were significantly
    increasing).

    other trivia: when I transfer to San Jose, I got to wander around
    datacenters in silicon valley, including disk engineering & product test (bldg14&15) across the street. They were doing prescheduled, 7x24,
    stand-alone mainframe testing. They mentioned they had recently tried
    MVS, but it had 15min mean-time-between-failure, requiring manual
    re-ipl/reboot in that environment. I offered to rewrite I/O supervisor
    to make it bullet-proof and never fail enabling any amount of on-demand, concurrent testing (greatly improving productivity). Downside was they
    would point their finger at me whenever they had problem and I was
    spending increasing amount of time diagnosing their hardware problems.

    1980 was real tipping point as hardware tradeoff switched from system bottleneck to I/O bottleck (my claims that relative system disk
    throughput had declined by order or magnitude, systems got 40-50 times
    faster, disks got 3-5 fasters).

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lynn Wheeler on Sat Jul 6 02:21:46 2024
    Lynn Wheeler wrote:

    other trivia: when I transfer to San Jose, I got to wander around
    datacenters in silicon valley, including disk engineering & product test (bldg14&15) across the street. They were doing prescheduled, 7x24, stand-alone mainframe testing. They mentioned they had recently tried
    MVS, but it had 15min mean-time-between-failure, requiring manual re-ipl/reboot in that environment. I offered to rewrite I/O supervisor
    to make it bullet-proof and never fail enabling any amount of on-demand, concurrent testing (greatly improving productivity). Downside was they
    would point their finger at me whenever they had problem and I was
    spending increasing amount of time diagnosing their hardware problems.

    Punishment of the Good Samaritan.....

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Levine on Sat Jul 6 02:16:34 2024
    John Levine wrote:

    According to Joe Pfeiffer <pfeiffer@cs.nmsu.edu>:
    ... Once you had a
    channel, I/O buffering made sense, have the channel read or write one
    area while you're working on the other.

    The day the CPU became faster than a teletype (or any other IO device
    you care to name) interrupts became useful. Get an interrupt saying the >>teletype is ready, send a character, go back to work, repeat.

    That's certainly the model that DEC used in the PDP-1 and their other
    minis. Lightweight interrupts and simple device controllers worked
    for them. But the tradeoffs can be a lot more complicated.

    Let us turn back to the late, not very lamented IBM 1130 mini. It
    usually came with an 1132 printer which printed about 100
    lines/minute. A drum rotated behind the paper with 48 rows of
    characters, each row being all the same character. In front of the
    paper was the ribbon and a row of solenoid driven hammers.

    When the 1130 wanted to print a line, it started the printer, which
    would then tell it what the upcoming character was on the drum. The
    computer then had less than 10ms to scan the line of characters to be
    printed and put a bit map saying which solenoids to fire into fixed
    locations in low memory that the printer then fetched using DMA.
    Repeat until all of the characters were printed, and tell the printer
    to advance the paper.

    Given the modest speed of the 1130, while it was printing a line it
    couldn't do anything else. But it was even worse than that. There were
    two models of 1130, fast and slow, with the difference being a delay
    circuit. The slow model couldn't produce the bitmaps fast enough, so
    there was a "print mode" that disabled the delay circuit while it was printing. As you might expect, students quickly figured out how to put
    their 1130s into print mode all the time.

    We students, also, figured out that TSS had dumped OS process state in
    page 0 (un accessible normally) of the user's address space. We then
    figured out that while we could not LD or ST that data, we could queue
    up I/O (out) write the data to disk, read it back where we could diddle
    with it, write it back to disk, queue up I/O (in) and read it back into
    page 0.

    All we did was to set the privilege bit !!

    We were generous and loaned out all the CPU time we did not need......

    The printer interrupted after a paper move was complete, giving the
    computer some chance to compute the next line to print in the
    meantime. To skip to the top of the next page or other paper motion,
    it told the printer to start moving the paper, and a hole in which row
    in the carriage control tape (look it up) to wait for. When the hole
    came around, the printer interrupted the CPU which then told the
    printer to stop the paper.

    The other printer was a 1403 which had 300 and 600 LPM models. Its
    print mechanism was sort of similar, a horizontal chain of characters spinning behind the paper, but that made the hammer management harder
    since what character was at what position changed every character
    time. But that wasn't the CPU's problem. The 1403 used its own unique character code probably related to the layout of the print chain, so
    the CPU translated the line into printer code, stored the result in a
    buffer, and then sent a command to the printer telling it to print the buffer. The printer printed, then interrupted, at which point the CPU
    told it to either space one line or skip to row N in the carriage
    control tape, again interrupting when done.

    We (students) used to have comment cards that cause the hammers all
    fly at the same time. So, instead of a natural z z z z z of the print,
    it would go BANG BANG BANG and we knew out stuff was being printed...

    By putting most of the logic into the printer controller, the 1403 was
    not just faster, but only took a small fraction of the CPU so the
    whole system could do more work to keep the printer printing.

    The point of this long anecdote is that you don't just want an
    interrupt when the CPU is a little faster than the device. At least in
    that era you wanted to offload as much work as possible so the CPU
    could keep the device going and balance the speed of the CPU and
    the devices.

    As a final note, keep in mind when you look at the 400 page patent on
    the 709's channel that the logic was built entirely out of vacuum
    tubes, and was not a lot less complex than the computer to which it
    was attached. A basic 709 rented for $10K/mo (about $100K now) and
    each channel was $3600/mo ($37K now). But the speed improvement
    was worth it.

    Ah memories.......

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Lynn Wheeler on Sat Jul 6 06:15:56 2024
    Lynn Wheeler wrote:

    John Levine <johnl@taugh.com> writes:
    By putting most of the logic into the printer controller, the 1403
    was not just faster, but only took a small fraction of the CPU so
    the whole system could do more work to keep the printer printing.

    360 "CKD DASD" and multi-track search trade-off.

    As you posted below, the whole PDS search stuff could easily be a
    disaster. Even with moremodest sized PDSs, it was inefficient has
    hell. Doing a linear search, and worse yet, doing it on a device that
    was slower than main memory, and tying up the disk controller and
    channel to do it. It wasn't even sort of addressed until the early
    1990s with the "fast PDS search" feature in the 3990 controller. The
    searches still took the same amount of elapsed time, but the key field comparison was done in the controller and it only returned status when
    it found a match (or end of the extent), which freed up the channel.
    Things would have been much better if they simply used some sort of
    "table of contents" or index at the start of the PDS, read it in, then
    did an in memory search. Even on small memory machines, if you had a
    small sized index block and used something like a B-tree of them, it
    would have been faster.




    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sat Jul 6 14:58:17 2024
    According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
    Things would have been much better if they simply used some sort of
    "table of contents" or index at the start of the PDS, read it in, then
    did an in memory search. Even on small memory machines, if you had a
    small sized index block and used something like a B-tree of them, it
    would have been faster.

    I believe that's what they did with VSAM.

    The 360/20's disk controller formatted the disk with 270 byte fixed
    length records, no keys. Nonetheless the IOCS library provided ISAM
    with track and cylinder indexes with the keys of the last record on
    the track or cylinder. It would probably have run quite fast if they
    had buffered the indexes in core but they didn't since the whole
    thing including the application had to fit in 12K or 16K bytes.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to Stephen Fuld on Sat Jul 6 07:34:43 2024
    "Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:
    As you posted below, the whole PDS search stuff could easily be a
    disaster. Even with moremodest sized PDSs, it was inefficient has
    hell. Doing a linear search, and worse yet, doing it on a device that
    was slower than main memory, and tying up the disk controller and
    channel to do it. It wasn't even sort of addressed until the early
    1990s with the "fast PDS search" feature in the 3990 controller. The searches still took the same amount of elapsed time, but the key field comparison was done in the controller and it only returned status when
    it found a match (or end of the extent), which freed up the channel.
    Things would have been much better if they simply used some sort of
    "table of contents" or index at the start of the PDS, read it in, then
    did an in memory search. Even on small memory machines, if you had a
    small sized index block and used something like a B-tree of them, it
    would have been faster.

    trivia: I've also mentioned in 1980 using HYPERChannel to implement
    channel extender ... as side-effect also reduced channel busy on the
    "real" channels ... another side-effect would get calls from ibm
    branches that had customers also doing stuff with HYPERChannel including
    NCAR that did supercomputer "network access system" that as side-effect eliminated channel busy for CKD DASD "search" operations in 1st half
    of 80s (a decade before 3990)

    for A510 channel emulator , the channel program was downloaded into the
    A510 and executed from there. NCAR got a upgrade for A515 which also
    allowed the search argument to be included in the download ... so
    mainframe real memory and channels weren't involved (although the dasd controller was still involved). It also supported 3rd party transfer.

    Supercomputer would send request over HYPERChannel to mainframe
    server. The mainframe would download the channel program into a A515 and
    return the A515 and channel program "handle" to the supercomputer. The supercomputer would send a request to that A515 to execute the specified channel program (and data would transfer directly between the disk and
    the supercomputer w/o passing through the mainframe).

    Then became involved with HIPPI (open cray channel standard pushed by
    LANL) and FCS (open fibre channel pushed by LLNL) also being able to do
    3rd party transfers ... along with have LLNL's LINCS/Unitree ported to
    IBM's HA/CMP product we were doing.

    other trivia: as also mentioned System/R (original SQL/relational RDBMS implementation) used cacheable indexes ... not linear searches.


    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to John Levine on Sat Jul 6 19:16:34 2024
    John Levine wrote:

    According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
    Things would have been much better if they simply used some sort of
    "table of contents" or index at the start of the PDS, read it in,
    then did an in memory search. Even on small memory machines, if
    you had a small sized index block and used something like a B-tree
    of them, it would have been faster.

    I believe that's what they did with VSAM.

    Agreed in the sense that VSAM replaced ISAM, but, and I am getting
    beyond my depth here, I wasn't aware that PDSs used ISAM. I had
    thought they were a thing unto themselves. Please correct me if I am
    wrong. In any event, PDSs in their original form lasted beyond the introduction of VSAM, or the PDS search assist functionality wouldn't
    have been needed.





    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sat Jul 6 21:28:25 2024
    According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
    I believe that's what they did with VSAM.

    Agreed in the sense that VSAM replaced ISAM, but, and I am getting
    beyond my depth here, I wasn't aware that PDSs used ISAM. I had
    thought they were a thing unto themselves. Please correct me if I am
    wrong. In any event, PDSs in their original form lasted beyond the >introduction of VSAM, or the PDS search assist functionality wouldn't
    have been needed.

    A PDS had a directory at the front followed by the members. The
    directory had an entry per member with the name, the starting
    location, and optional other stuff. The entries were in order by
    member name, and packed into 256 byte records each of which had a
    hardware key with the name of the last entry in the block. It searched
    the PDS directory with the same kind of channel key search it did for
    ISAM, leading to the performance issues Lynn described.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to John Levine on Sun Jul 7 06:23:23 2024
    John Levine wrote:

    According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
    I believe that's what they did with VSAM.

    Agreed in the sense that VSAM replaced ISAM, but, and I am getting
    beyond my depth here, I wasn't aware that PDSs used ISAM. I had
    thought they were a thing unto themselves. Please correct me if I
    am wrong. In any event, PDSs in their original form lasted beyond
    the introduction of VSAM, or the PDS search assist functionality
    wouldn't have been needed.

    A PDS had a directory at the front followed by the members. The
    directory had an entry per member with the name, the starting
    location, and optional other stuff. The entries were in order by
    member name, and packed into 256 byte records each of which had a
    hardware key with the name of the last entry in the block. It searched
    the PDS directory with the same kind of channel key search it did for
    ISAM, leading to the performance issues Lynn described.


    Yes, I don't disagree with any of that. But I got the impression from
    your previous posts that IBM had replaced the search key fields of a
    PDS with some kind of VSAM (i.e. b-tree or such) varient, as they did
    with ISAM. If that is true I had never heard about it.



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Stephen Fuld on Sun Jul 7 06:35:40 2024
    Stephen Fuld <SFuld@alumni.cmu.edu.invalid> schrieb:
    John Levine wrote:

    According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
    I believe that's what they did with VSAM.

    Agreed in the sense that VSAM replaced ISAM, but, and I am getting
    beyond my depth here, I wasn't aware that PDSs used ISAM. I had
    thought they were a thing unto themselves. Please correct me if I
    am wrong. In any event, PDSs in their original form lasted beyond
    the introduction of VSAM, or the PDS search assist functionality
    wouldn't have been needed.

    A PDS had a directory at the front followed by the members. The
    directory had an entry per member with the name, the starting
    location, and optional other stuff. The entries were in order by
    member name, and packed into 256 byte records each of which had a
    hardware key with the name of the last entry in the block. It searched
    the PDS directory with the same kind of channel key search it did for
    ISAM, leading to the performance issues Lynn described.


    Yes, I don't disagree with any of that. But I got the impression from
    your previous posts that IBM had replaced the search key fields of a
    PDS with some kind of VSAM (i.e. b-tree or such) varient, as they did
    with ISAM. If that is true I had never heard about it.

    They introduced PDSE, see https://www.ibm.com/docs/en/zos-basic-skills?topic=sets-what-is-pdse
    It appears they fixed many of the problems with the original
    design, but not all (why is the number of extents still imited?)
    It also seems that RECFM=U for load modules is no longer supported,
    you have to do something different.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Thomas Koenig on Sun Jul 7 08:30:41 2024
    Thomas Koenig <tkoenig@netcologne.de> schrieb:
    Stephen Fuld <SFuld@alumni.cmu.edu.invalid> schrieb:
    John Levine wrote:

    According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
    I believe that's what they did with VSAM.

    Agreed in the sense that VSAM replaced ISAM, but, and I am getting
    beyond my depth here, I wasn't aware that PDSs used ISAM. I had
    thought they were a thing unto themselves. Please correct me if I
    am wrong. In any event, PDSs in their original form lasted beyond
    the introduction of VSAM, or the PDS search assist functionality
    wouldn't have been needed.

    A PDS had a directory at the front followed by the members. The
    directory had an entry per member with the name, the starting
    location, and optional other stuff. The entries were in order by
    member name, and packed into 256 byte records each of which had a
    hardware key with the name of the last entry in the block. It searched
    the PDS directory with the same kind of channel key search it did for
    ISAM, leading to the performance issues Lynn described.


    Yes, I don't disagree with any of that. But I got the impression from
    your previous posts that IBM had replaced the search key fields of a
    PDS with some kind of VSAM (i.e. b-tree or such) varient, as they did
    with ISAM. If that is true I had never heard about it.

    They introduced PDSE, see https://www.ibm.com/docs/en/zos-basic-skills?topic=sets-what-is-pdse
    It appears they fixed many of the problems with the original
    design, but not all (why is the number of extents still imited?)
    It also seems that RECFM=U for load modules is no longer supported,
    you have to do something different.

    Reading https://share.confex.com/share/121/webprogram/Handout/Session14147/SHARE%20PDSE%20What%27s%20new%20in%202.1.pdf
    it seems IBM really messed up PDSE the first time around,
    introducing size limits on members which were not present in the
    original PDS. Seems somebody didn't take backwards compatibility
    too seriously, after all...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Thomas Koenig on Sun Jul 7 14:11:49 2024
    Thomas Koenig wrote:

    Thomas Koenig <tkoenig@netcologne.de> schrieb:
    Stephen Fuld <SFuld@alumni.cmu.edu.invalid> schrieb:
    John Levine wrote:

    According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
    I believe that's what they did with VSAM.

    Agreed in the sense that VSAM replaced ISAM, but, and I am
    getting >>> > beyond my depth here, I wasn't aware that PDSs used
    ISAM. I had >>> > thought they were a thing unto themselves. Please
    correct me if I >>> > am wrong. In any event, PDSs in their original
    form lasted beyond >>> > the introduction of VSAM, or the PDS search
    assist functionality >>> > wouldn't have been needed.

    A PDS had a directory at the front followed by the members. The
    directory had an entry per member with the name, the starting
    location, and optional other stuff. The entries were in order by
    member name, and packed into 256 byte records each of which had a
    hardware key with the name of the last entry in the block. It
    searched >>> the PDS directory with the same kind of channel key
    search it did for >>> ISAM, leading to the performance issues Lynn
    described.


    Yes, I don't disagree with any of that. But I got the impression
    from >> your previous posts that IBM had replaced the search key
    fields of a >> PDS with some kind of VSAM (i.e. b-tree or such)
    varient, as they did >> with ISAM. If that is true I had never heard
    about it.

    They introduced PDSE, see https://www.ibm.com/docs/en/zos-basic-skills?topic=sets-what-is-pdse
    It appears they fixed many of the problems with the original
    design, but not all (why is the number of extents still imited?)
    It also seems that RECFM=U for load modules is no longer supported,
    you have to do something different.

    Reading

    https://share.confex.com/share/121/webprogram/Handout/Session14147/SHARE%20PDSE%20What%27s%20new%20in%202.1.pdf
    it seems IBM really messed up PDSE the first time around,
    introducing size limits on members which were not present in the
    original PDS. Seems somebody didn't take backwards compatibility
    too seriously, after all...


    Ahhh! That (and your previous post) is the answer. This all happened
    long after I was not involved,

    Thank You Thomas.



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to Paul A. Clayton on Sun Jul 7 07:30:58 2024
    "Paul A. Clayton" <paaronclayton@gmail.com> writes:
    In theory, non-practicing patent licensors seem to make sense, similar
    to ARM not making chips, but when the cost and risk to the single
    patent holder is disproportionately small, patent trolling can be
    profitable. (I suspect only part of the disparity comes from not
    practicing; the U.S. legal system has significant weaknesses and
    actual expertise is not easily communicated. My father, who worked for
    AT&T, mentioned a lawyer who repeated sued AT&T who settled out of
    court because such was cheaper than defending even against a claim
    without basis.)

    in 90s, there was semantic analysis of patents and found that something
    like 30% of "computer/technology" patents were filed in other categories
    using ambiguous wording ... "submarine" patents (unlikely to be found in
    normal patent search) ... waiting for somebody that was making lots of
    money that could be sued for patent infringement.

    other trivia: around turn of century was doing some security chip work
    for financial institution and was asked to work with patent boutique
    legal firm, eventually had 50 draft (all assigned) patents and the legal
    firm predicted that there would be over hundred before done ... some
    executive looked at the filing costs and directed all the claims be
    repackaged as nine patents. then the patent office came back and said
    they were getting tired of these humongous patents where the filing fee
    didn't even cover the cost of reading the patents ... and directed the
    claims be repackaged as at least a couple dozen paents.

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Joe Pfeiffer on Mon Jul 8 06:45:38 2024
    On Thu, 04 Jul 2024 18:39:21 -0600, Joe Pfeiffer wrote:

    Get an interrupt saying the teletype is ready, send a character, go back
    to work, repeat.

    But this is all still copying characters back and forth. What happened to
    the idea of locate mode?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Mon Jul 8 07:43:28 2024
    On Fri, 05 Jul 2024 15:35:50 -1000, Lynn Wheeler wrote:

    The IMS group were complaining that RDBMS had twice the disk space (for
    RDBMS index) and increased the number of disk I/Os by 4-5 times (for processing RDBMS index). Counter was that the RDBMS index significantly reduced the manual maintenance (compared to IMS).

    Did IMS have a locate mode as well?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to Lawrence D'Oliveiro on Mon Jul 8 08:01:22 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    Did IMS have a locate mode as well?

    channel programs were built by filesystem library running as part of application or directly by applicatiion code .... and then executes
    system call, EXCP/SVC0 to invoke the channel program. With MVS and
    virtual memory its in application virtual address space.

    QSAM the library code data is to/from library buffers and then
    either copies to application buffers or "locate" mode passing
    pointers in QSAM buffers.

    For IMS has data buffer cache directly managed (aware of whether data record
    is aleady in cache or must be read ... and/or is changed in cache and
    must be written) ... also transaction log)

    With transition to virtual memory, the channel programs passed to
    EXCP/SVC0 now had virtual addresses and channel architecture required
    real addresses ... so EXCP/SVC0 required making a copy of the passed
    channel programs replacing virtual addresses with real addresses (as
    well as pinning the associated virtual pages until I/O completes, code
    to create channel program copies with real addresses and managing
    virtual page pin/unpin initially done copy crafting virtual machine CP67 "CCWTRANS" into EXCP).

    for priviliged apps that had fixed/pinned virtual pages for I/O buffers,
    a new EXCPVR interface was built ... effectively the original EXCP w/o (CCWTRANS) channel program copying (and virtual page pinning/unpinning).

    IMS "OSAM" and "VSAM" (OSAM may use QSAM https://www.ibm.com/docs/en/ims/15.3.0?topic=sets-using-osam-as-access-method IMS communicates with OSAM using OPEN, CLOSE, READ, and WRITE macros. In
    turn, OSAM communicates with the I/O supervisor by using the I/O driver interface.

    Data sets

    An OSAM data set can be read by using either the BSAM or QSAM access method.

    ... snip ...

    IMS Performance and Tuning guide page167 https://www.redbooks.ibm.com/redbooks/pdfs/sg247324.pdf
    * EXCPVR=0
    Prevents page fixing of the OSAM buffer pool. This is the correct choice
    these days

    ... snip ..

    START_Input/Output
    https://en.wikipedia.org/wiki/Start_Input/Output
    EXCPVR
    https://en.wikipedia.org/wiki/Execute_Channel_Program_in_Real_Storage

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Wed Jul 31 22:20:44 2024
    On 7/31/2024 6:41 PM, Lawrence D'Oliveiro wrote:
    On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:

    As you posted below, the whole PDS search stuff could easily be a
    disaster. Even with moremodest sized PDSs, it was inefficient has
    hell.

    Would locate mode have helped with this?

    No. The problem was that the PDS search was a linear search of records
    on the disk drive i.e. typically multiple disk revolutions), and
    furthermore, it required (until the fast PDS search came along) that the
    host channel take action on each disk record checked, even the ones that
    didn't match, including resending the search argument to the disk
    controller.

    This has nothing to do with locate mode.



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Thu Aug 1 01:41:45 2024
    On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:

    As you posted below, the whole PDS search stuff could easily be a
    disaster. Even with moremodest sized PDSs, it was inefficient has
    hell.

    Would locate mode have helped with this?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Thu Aug 1 02:38:01 2024
    On Mon, 8 Jul 2024 6:45:38 +0000, Lawrence D'Oliveiro wrote:

    On Thu, 04 Jul 2024 18:39:21 -0600, Joe Pfeiffer wrote:

    Get an interrupt saying the teletype is ready, send a character, go back
    to work, repeat.

    But this is all still copying characters back and forth. What happened
    to
    the idea of locate mode?

    Even a touch typist typing at 120 words per minute would not benefit
    from
    LOCATE mode I/O--the input rate is just too slow.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Thu Aug 1 21:01:12 2024
    According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
    On 7/31/2024 6:41 PM, Lawrence D'Oliveiro wrote:
    On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:

    As you posted below, the whole PDS search stuff could easily be a
    disaster. Even with moremodest sized PDSs, it was inefficient has
    hell.

    Would locate mode have helped with this?

    No. The problem was that the PDS search was a linear search of records
    on the disk drive i.e. typically multiple disk revolutions), and
    furthermore, it required (until the fast PDS search came along) that the
    host channel take action on each disk record checked, even the ones that >didn't match, including resending the search argument to the disk
    controller.

    This has nothing to do with locate mode.

    He's made it painfully clear that he doesn't understand the relative
    speed of CPUs and disks, or the costs of multiple I/O buffers on
    systems with small memories, particularly back in the 1960s when this
    stuff was being designed. Nor how records in COBOL data divisions were
    designed so implementations could read and write file records directly
    from and to the buffers, and the IOCS of the era enabled that in COBOL
    and other languages.

    Perhaps this would be a good time to stop taking the bait. I thought
    this silly argument was over a month ago.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to John Levine on Thu Aug 1 22:58:16 2024
    On 8/1/2024 2:01 PM, John Levine wrote:
    According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
    On 7/31/2024 6:41 PM, Lawrence D'Oliveiro wrote:
    On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:

    As you posted below, the whole PDS search stuff could easily be a
    disaster. Even with moremodest sized PDSs, it was inefficient has
    hell.

    Would locate mode have helped with this?

    No. The problem was that the PDS search was a linear search of records
    on the disk drive i.e. typically multiple disk revolutions), and
    furthermore, it required (until the fast PDS search came along) that the
    host channel take action on each disk record checked, even the ones that
    didn't match, including resending the search argument to the disk
    controller.

    This has nothing to do with locate mode.

    He's made it painfully clear that he doesn't understand the relative
    speed of CPUs and disks, or the costs of multiple I/O buffers on
    systems with small memories, particularly back in the 1960s when this
    stuff was being designed. Nor how records in COBOL data divisions were designed so implementations could read and write file records directly
    from and to the buffers, and the IOCS of the era enabled that in COBOL
    and other languages.

    I believe you are right about that.



    Perhaps this would be a good time to stop taking the bait.

    Perhaps you are right again, but something about hope springs eternal,
    and I couldn't resist pointing out the problems with the whole PDS
    search mechanism.

    I thought
    this silly argument was over a month ago.

    Again, you may be right. We'll see,



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Sun Aug 11 03:09:07 2024
    On Wed, 31 Jul 2024 22:20:44 -0700, Stephen Fuld wrote:

    This has nothing to do with locate mode.

    Why is it in this thread, then?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)