Forum: >>> Magnum BBS <<<

Architectural implications of locate mode I/O

From John Levine@21:1/5 to All on Tue Jul 2 19:12:17 2024

After our recent silly arguments about locate vs move mode I/O, I got
to thinking about what a computer needs for locate mode even to be
interesting.

Early computers had tiny memories and rudimentary I/O. When doing
an I/O operation the CPU was too busy servicing the I/O device to
do much computing. Hence the normal approach was to do the read
or write directly into the memory where the program used it,
no buffering needed. OS/360 still had that with BSAM in which
you did a READ or WRITE macro to

The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

The other thing you need to make locate mode useful is index
registers, since without them, it's about as hard to change the
instructions to point to the buffer as to move the data so you might
as well move the data.

The IBM 7070 was a short lived machine with 10 digit fixed length
decimal words. It had channels and index registers in locations in low
memory which it called index words. Its IOCS could run several devices
at the same time, e.g. two tape drives. Once you had your tape file
open, you used GET and PUT macros. Each had a locate form that set an
index word to point to the record, and a move form that copied the
data to or from your own work area. PUT also had some other
options like writing a record just read from one file onto
another.

https://bitsavers.org/pdf/ibm/7080/C28-6237_7080_IOCS80_1962.pdf

The 7090 was a 36 bit binary machine with index registers and indirect addressing and channels and interrupts. Its IOCS had read and write
calls that patched the addreses of the record areas into words
following the calls. You could them use those as indirect addresses or
load them into index registers. That makes sense since there were only
three index registers but you usually had more than three tapes going.

https://bitsavers.org/pdf/ibm/7090/C28-6100-2_7090_IOCS.pdf

So it looks like as soon as the machine architectures made it
practical to have overlapped I/O and efficient ways to do indirect or
indexed addressing, I/O systems took advantage of it by passing buffer
pointers to user code.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Thomas Koenig on Tue Jul 2 21:05:43 2024

Thomas Koenig <tkoenig@netcologne.de> writes:

John Levine <johnl@taugh.com> schrieb:

The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?

Sending the next character to a teletype after the user program
fills a buffer and waiting for the next interrupt to tell you it's
ready makes sense, without a busy loop, makes sense anyway.

Although in the mainframe era, most terminals were block-mode
rather than character-by-character, which reduced the interrupt
frequency on the host (often via a front-end data communications
processor) at the expense of more logic in the terminal device.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Levine on Tue Jul 2 20:36:24 2024

John Levine <johnl@taugh.com> schrieb:

The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?

Sending the next character to a teletype after the user program
fills a buffer and waiting for the next interrupt to tell you it's
ready makes sense, without a busy loop, makes sense anyway.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Thomas Koenig on Tue Jul 2 22:14:51 2024

Thomas Koenig wrote:

John Levine <johnl@taugh.com> schrieb:

The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?

I think he is talking about anything between the PPs of CDC 6600
through that of System 360 channels, where there is enough smarts
in the channel to perform the mundane tasks of shuttling data to
or from the device (DMA) and sending an interrupt at the end
(exchange Jump in 6600 parlance).

Sending the next character to a teletype after the user program
fills a buffer and waiting for the next interrupt to tell you it's
ready makes sense, without a busy loop, makes sense anyway.

At that time, TTYs were slow enough to poll (PDP/8) or interrupt
per character.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Scott Lurndal on Tue Jul 2 22:17:20 2024

Scott Lurndal wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:

John Levine <johnl@taugh.com> schrieb:

The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?

Sending the next character to a teletype after the user program
fills a buffer and waiting for the next interrupt to tell you it's
ready makes sense, without a busy loop, makes sense anyway.

Although in the mainframe era, most terminals were block-mode
rather than character-by-character, which reduced the interrupt
frequency on the host (often via a front-end data communications
processor) at the expense of more logic in the terminal device.

Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Wed Jul 3 01:35:02 2024

On Tue, 2 Jul 2024 19:12:17 -0000 (UTC), John Levine wrote:

After our recent silly arguments about locate vs move mode I/O, I got to thinking about what a computer needs for locate mode even to be
interesting.

Should there be some kind of flashing indicator saying “LOCATE MODE I/O IN EFFECT”, or should it be the opposite “LOCATE MODE I/O NOT IN EFFECT”? Maybe the latter, to remind those who think they’ve engaged it, that they haven’t.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Wed Jul 3 03:14:37 2024

According to Thomas Koenig <tkoenig@netcologne.de>:

John Levine <johnl@taugh.com> schrieb:

The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?

Something that does the I/O sufficiently independently that the CPU
can do something else at the same time. The first channel is generally
agreed to be the 766 Data Synchronizer on the IBM 709.

Before there were channels, I/O worked a word at a time and a CPU had
to issue I/O instructions to read or write those words under tight
time constraints so it was in a busy loop and couldn't do anything
else. The channel had a direct path to memory separate from the CPU's
and enough logic to do an entire operation like read a tape block or
print a line on the printer without the CPU's help. This may seem
obvious now but it was a big advance.

There's been lots of variations on the channel theme. I would agree
that the CDC peripheral processors served as channels. On the smallest
360s, the channel was implemented in CPU microcode. When running fast
devices like disks the channel used so much of the CPU that the
program stalled, but it was worth it to be compatible with faster
machines. Even then, disk seeks or tape rewinds or reading cards or
printing on printers let the CPU do useful work while the channel and
device did its thing.

IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.

https://patents.google.com/patent/US3812475A/en

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to mitchalsup@aol.com on Tue Jul 2 17:36:50 2024

mitchalsup@aol.com (MitchAlsup1) writes:

Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.

QSAM library does serialization for the application ... get/put calls
does "wait" operations inside the library for I/O complete. BSAM library
has the applications performing serialization with their own "wait"
operations for read/write calls (application handling overlap of
possible processing with I/O).

Recent IBM articles mentioning that QSAM default multiple buffering was established years ago as "five" ... but current recommendations are for
more like 150 (for QSAM to have high processing overlapped with
I/O). Note while they differentiate between application buffers and
"system" buffers (for move & locate mode), QSAM (system) buffers run was
part of application address space but are managed as part of QSAM
library code.

Both QSAM & BSAM libraries build the (application) channel programs
... and since OS/360 move to virtual memory for all 370s, they all have (application address space) virtual addresses. When the library code
passes the channel program to EXCP/SVC0, a copy of the passed channel
programs are made, replacing the virtual addresses in the CCWs with
"real addresses". QSAM GET can return the address within its buffers
(involved in the actual I/O, "locate" mode) or copy data from its
buffers to the application buffers ("move" mode). The references on the
web all seemed to reference "system" and "application" buffers, but I
think it would be more appropriate to reference them as "library" and "application" buffers.

370/158 had "integrated channels" ... the 158 engine ran both 370
instruction set microcode and the integrated channel microcode.

when future system imploded, there was mad rush to get stuff back into
the 370 product pipelines, including kicking off the quick&dirty
303x&3081 efforts in parallel.

for 303x they created "external channels" by taking a 158 engine with
just the integrated channel microcode (and no 370 microcode) for the
303x "channel director".

a 3031 was two 158 engines, one with just the 370 microcode and a 2nd
with just the integrated channel microcode

a 3032 was 168-3 remapped to use channel director for external channels.

a 3033 started with 168-3 logic remapped to 20% faster chips.

Jan1979, I had lots of use of an early engineering 4341 and was con'ed
into doing a (cdc6600) benchmark for national lab that was looking for
70 4341s for computer farm (sort of leading edge of the coming cluster supercomputing tsunami). Benchmark was fortran compute doing no I/O and executed with nothing else running.

4341: 36.21secs, 3031: 37.03secs, 158: 45.64secs

now integrated channel microcode ... 158 even with no I/O running was
still 45.64secs compared to the same hardware in 3031 but w/o channel microcode: 37.03secs.

I had a channel efficiency benchmark ... basically how fast can channel
handle each channel command word (CCW) in a channel program (channel architecture required it fetched, decoded and executed purely sequentially/synchronously). Test was to have two disk read ("chained")
CCWs for two consecutive records. Then add a CCW between the two disk
read CCWs (in the channel program) ... which results in a complete
revolution to read the 2nd data record (because the latency, while disk
is spinning, in handling the extra CCW separating the two record read
CCWs).

Then reformat the disk to add a dummy record between each data record, gradually increasing the size of the dummy record until the two data
records can be done in single revolution.

The size of the dummy record required for single revolution reading the
two records was the largest for 158 integrated channel as well as all
the 303x channel directors. The original 168 external channels could do
single revolution with the smallest possible dummy record (but a 168
with channel director, aka 3032, couldn't, nor could 3033) ... also the
4341 integrated channel microcode could do it with smallest possible
dummy record.

The 3081 channel command word (CCW) processing latency was more like 158 integrated channel (and 303x channel directors)

Second half of the 80s, I was member of Chesson's XTP TAB ... found a comparison between typical UNIX at the time for TCP/IP had on the order
of 5k instructions and five buffer copies ... while compareable
mainframe protocol in VTAM had 160k instructions and 15 buffer copies
(larger buffers on high-end cache machines ... the cache misses for the
15 buffer copies could exceed processor cycles for the 160k
instructions).

XTP was working on no buffer copies and streaming I/O ... attempting to
process TCP as close as possible to no buffer copy disk I/O.
Scatter/gather I/O for separate header and data ... also move from
header CRC protocol .... to trailor CRC protocol ... instead of software prescanning the buffer to calculate CRC (for placing in header)
... outboard processing the data as it streams through, doing the CRC
and then appended to the end of the record.

When doing IBM's HA/CMP and working with major RDBMS vendors on cluster
scaleup in late 80s/early 90s, there was lots of references to POSIX light-weight threads and asynchronous I/O for RDBMS (with no buffer
copies) and the RDBMS managing large record cache.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Wed Jul 3 04:57:04 2024

On Tue, 02 Jul 2024 17:36:50 -1000, Lynn Wheeler wrote:

When doing IBM's HA/CMP and working with major RDBMS vendors on cluster scaleup in late 80s/early 90s, there was lots of references to POSIX light-weight threads ...

Threads were all the rage in the 1990s. People were using them for
everything. One language (Java) absorbed threading right into its core DNA (where is the locking API? Oh, it’s attached to the base “Object” type itself!).

People backed off a bit after that. Nowadays we see a revival of the “coroutine” idea, where preemption only happens at explicit “await” points. For non-CPU-intensive workloads, this is much easier to cope with.

... and asynchronous I/O for RDBMS (with no buffer > copies) and the
RDBMS managing large record cache.

This is why POSIX has the disk-oriented “aio” API, for the diehard DBMS folks. Linux also added “io_uring”, for high performance but not disk- specific I/O.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Levine on Wed Jul 3 05:52:07 2024

John Levine <johnl@taugh.com> writes:

On the smallest
360s, the channel was implemented in CPU microcode. When running fast
devices like disks the channel used so much of the CPU that the
program stalled, but it was worth it to be compatible with faster
machines. Even then, disk seeks or tape rewinds or reading cards or
printing on printers let the CPU do useful work while the channel and
device did its thing.

This sounds very much like hardware multi-threading to me: The CPU had
separate state for the channel and used its hardware for doing the
channel stuff when there was I/O to do, while running the non-channel
stuff the rest of the time, all without OS-level context switching.

The barrel processors implemented in the CDC 6600's PPs are another
variant of the same principle from around the same time, but using the
same hardware for such different tasks is a new twist.

Interestingly, this is one development that has not been repeated in microprocessors AFAIK. If they did not want to spend hardware on a
separate DMA device, they just let the software use polling of the I/O
device. For the 8086 and 68000, I guess that patents may have
discouraged adopting this idea; when the patents ran out, they had
established an ecosystem with separate DMA devices. And of course for
the early RISCs there was no way to do that in microcode.

IIRC some microcomputers (IBM PC I think) had dedicated central DMA
processors (but not on the CPU chip at first IIRC), but these fell
into disuse soon when the I/O devices that do lots of I/O (like disk controllers) included their own DMA circuits. Having the DMA on the
I/O device wliminates the overhead of first requiring the bus for
getting the data from the I/O device, and then another bus cycle for
storing it into memory (or the other way round).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bill Findlay@21:1/5 to All on Wed Jul 3 13:08:31 2024

On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.

The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.

This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.

I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).

--
Bill Findlay

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Bill Findlay on Wed Jul 3 15:34:06 2024

On Wed, 03 Jul 2024 13:08:31 +0100
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.

The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.

This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.

I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).

That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65, it
appears that despite advances KDF9 was not competitive to maximally
configured 65 because of shortage of main memory.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Anton Ertl on Wed Jul 3 08:44:01 2024

Anton Ertl wrote:

John Levine <johnl@taugh.com> writes:

On the smallest
360s, the channel was implemented in CPU microcode. When running fast
devices like disks the channel used so much of the CPU that the
program stalled, but it was worth it to be compatible with faster
machines. Even then, disk seeks or tape rewinds or reading cards or
printing on printers let the CPU do useful work while the channel and
device did its thing.

This sounds very much like hardware multi-threading to me: The CPU had separate state for the channel and used its hardware for doing the
channel stuff when there was I/O to do, while running the non-channel
stuff the rest of the time, all without OS-level context switching.

The barrel processors implemented in the CDC 6600's PPs are another
variant of the same principle from around the same time, but using the
same hardware for such different tasks is a new twist.

Interestingly, this is one development that has not been repeated in microprocessors AFAIK. If they did not want to spend hardware on a
separate DMA device, they just let the software use polling of the I/O device. For the 8086 and 68000, I guess that patents may have
discouraged adopting this idea; when the patents ran out, they had established an ecosystem with separate DMA devices. And of course for
the early RISCs there was no way to do that in microcode.

DMA support devices have been available since the 8080 and 6800.
They are just an up address counter and a down byte counter
with some logic to diddle the bus control lines.
Or you can easily build one from TTL.

The RCA 1802 microprocessor had DMA IN & OUT pins for triggering the
processor to use its register R0 as an address counter and perform a
write (IN) or read (OUT) bus cycle. The I/O device just had to drive
or latch its data onto the system bus and count down the bytes.

The Intel 8086 also had the 8089 I/O Coprocessor which was like a
channel processor in that it had a completely different ISA.
It was not in the PC and I guess little used as there were cheaper ways
to accomplish IO.

https://en.wikipedia.org/wiki/Intel_8089

IIRC some microcomputers (IBM PC I think) had dedicated central DMA processors (but not on the CPU chip at first IIRC), but these fell
into disuse soon when the I/O devices that do lots of I/O (like disk controllers) included their own DMA circuits. Having the DMA on the
I/O device wliminates the overhead of first requiring the bus for
getting the data from the I/O device, and then another bus cycle for
storing it into memory (or the other way round).

- anton

It is a system design issue whether to have a single, central DMA
controller, or to have each device that wants to DMA have its own.
The single, central controller increases the base cost of the cpu,
and needs to be more complex as it has to support general features,
but allows add-on devices to be lower cost as they use that DMA.
But it's also inflexible and locks you into that mechanism.

An alternative is each device has its own DMA controller
which is just a couple of counters.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Robert Swindells@21:1/5 to Anton Ertl on Wed Jul 3 13:09:28 2024

On Wed, 03 Jul 2024 05:52:07 GMT, Anton Ertl wrote:

Interestingly, this is one development that has not been repeated in microprocessors AFAIK. If they did not want to spend hardware on a
separate DMA device, they just let the software use polling of the I/O device. For the 8086 and 68000, I guess that patents may have
discouraged adopting this idea; when the patents ran out, they had established an ecosystem with separate DMA devices. And of course for
the early RISCs there was no way to do that in microcode.

There was the 8089 coprocessor for the 8086, it was used in the Apricot
PC.

<https://en.wikipedia.org/wiki/Intel_8089>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to EricP on Wed Jul 3 16:00:27 2024

On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

An alternative is each device has its own DMA controller
which is just a couple of counters.

The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For example,
PLX PCI to local bus bridges with bus mastering capability, like PCI
9080 costed non-trivially more than slave-only 9060.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bill Findlay@21:1/5 to Michael S on Wed Jul 3 15:02:58 2024

On 3 Jul 2024, Michael S wrote
(in article <20240703153406.00006ebe@yahoo.com>):

On Wed, 03 Jul 2024 13:08:31 +0100
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.

The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.

This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.

I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).

That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65, it appears that despite advances KDF9 was not competitive to maximally configured 65 because of shortage of main memory.

Yup.

--
Bill Findlay

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lynn Wheeler on Wed Jul 3 14:53:06 2024

Lynn Wheeler wrote:

mitchalsup@aol.com (MitchAlsup1) writes:

Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.

Jan1979, I had lots of use of an early engineering 4341 and was con'ed
into doing a (cdc6600) benchmark for national lab that was looking for
70 4341s for computer farm (sort of leading edge of the coming cluster supercomputing tsunami). Benchmark was fortran compute doing no I/O and executed with nothing else running.

4341: 36.21secs, 3031: 37.03secs, 158: 45.64secs

Do you have data on how the CDC 6600 did ?

now integrated channel microcode ... 158 even with no I/O running was
still 45.64secs compared to the same hardware in 3031 but w/o channel microcode: 37.03secs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to mitchalsup@aol.com on Wed Jul 3 18:47:30 2024

On Wed, 3 Jul 2024 14:53:06 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:

Lynn Wheeler wrote:

mitchalsup@aol.com (MitchAlsup1) writes:

Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.

Jan1979, I had lots of use of an early engineering 4341 and was
con'ed into doing a (cdc6600) benchmark for national lab that was
looking for 70 4341s for computer farm (sort of leading edge of the
coming cluster supercomputing tsunami). Benchmark was fortran
compute doing no I/O and executed with nothing else running.

4341: 36.21secs, 3031: 37.03secs, 158: 45.64secs

Do you have data on how the CDC 6600 did ?

now integrated channel microcode ... 158 even with no I/O running
was still 45.64secs compared to the same hardware in 3031 but w/o
channel microcode: 37.03secs.

A little bit of googling easily gets the answer: 35.77 s

https://www.garlic.com/~lynn/2006y.html#21

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Michael S on Wed Jul 3 13:34:39 2024

Michael S wrote:

On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

An alternative is each device has its own DMA controller
which is just a couple of counters.

The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For example,
PLX PCI to local bus bridges with bus mastering capability, like PCI
9080 costed non-trivially more than slave-only 9060.

PCI is a different matter. I think they shot themselves in the foot.
That is because the PCI design used FAST TTL and was ridiculously
complex and had all sorts of unnecessary optional features like bridges.

To my eye the choice of FAST TTL looks wrong headed. They needed FAST
because they wanted to run at 33 MHz which is beyond LS TTL limit.
With a bus at 33 MHz and 4 bytes it superficially sounds like 133 MB/s.
But 33 MHz was too fast to decode or latch the address and data,
plus it multiplexes address and data, and took 5 cycles to do a transfer.
So the bus actual data transfer rate was more like 133/5 = 26 MB/s.

I looked into PCI bus interface chips when it first came out and there
were just *TWO* manufacturers for them on the planet, and they charged
$50,000 just to answer the phone, and you had to pay them to design a
custom chip for you even though it was supposed to be a standard design.
This all pushed the price of PCI cards way up so, for example,
an ISA bus modem card cost $50 but the exact same modem on PCI was $250.
No wonder most people stuck with ISA bus cards.

Whereas 74LS TTL was cheap and available from manufacturers everywhere.
I would have used 74LS TTL and done a straight 32-bit bus with no options, multiplexed address and data to keep down the connector pin count and cost. That could have run at 20 MHz which leaves 50 ns for address and data
decode and latch, and driven 8 bus loads with no bridges.
That gives 10 MT/s = 40 MB/s. Plus cheap and widely available.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to All on Wed Jul 3 17:41:26 2024

MitchAlsup1 wrote:

Scott Lurndal wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:

John Levine <johnl@taugh.com> schrieb:

The 709 introduced data channels in 1958 which allowed the CPU
to do other stuff while the channel did the I/O. Wikipedia says
the first I/O interrupt was on the NBS DYSEAC in 1954 but it's
hard to see how an I/O interrupt would be of much use before
channels. Once you had a channel, I/O buffering made sense,
have the channel read or write one area while you're working on
the other.

Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?

Sending the next character to a teletype after the user program
fills a buffer and waiting for the next interrupt to tell you it's
ready makes sense, without a busy loop, makes sense anyway.

Although in the mainframe era, most terminals were block-mode
rather than character-by-character, which reduced the interrupt
frequency on the host (often via a front-end data communications
processor) at the expense of more logic in the terminal device.

Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.

There is a lot of design space between having the CPU do the I/O itself
and having a separate programmable processor. Since the minimum
requirements are a DMA, a way to request memory access, and a little
logic to differentiate peripheral commands and status from normal data,
and a way to indicate I/O completion (e.g. generate an interrupt), you
could do all this in a fairly small amount of dedicated hardware.

While the PPs were an elegant solution, they were more than what was
required. Similarly, the IBM S/360 channels were way overkill for
pretty much all I/O except for using CKD disks. In the day, different mainframes used various implementations of this idea to do I/O.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to All on Wed Jul 3 07:42:42 2024

little "dependable" I/O drift

1980, IBM STL (since renamed SVL) was bursting at the seams and they
were moving 300 people (and their 3270 terminals) from the IMS (DBMS)
group to offsite bldg with dataprocessing service back to STL
datacenter. hey had tried "remote 3270", but found the human factors unacceptable. I get con'ed into implementating channel extender support
(A220, A710/A715/A720, A510/A515) ... allowing channel attached 3270
cntrolers to be located at the offsite bldg, connected to mainframes
back in STL ... with no perceived difference in human factors (quarter
second or better trivial response). https://en.wikipedia.org/wiki/Network_Systems_Corporation https://en.wikipedia.org/wiki/HYPERchannel

STL had spread 3270 controller boxes across all the channels with 3830
disk controller boxes. Turns out the A220 mainframe channel-attach boxes
(used for channel extender) had significantly lower channel busy for the
same amount of 3270 terminal traffic (as 3270 channel-attach
controllers) and as a result the throughput for IMS group 168s (with NSC
A220s) increased by 10-15% ... and STL considered using NSC HYPERChannel
A220 channel-extender configuration, for all 3270 controllers (even
those within STL). NSC tried to get IBM to release my support, but a
group in POK playing with some fiber stuff got it vetoed (concerned that
if it was in the market, it would make it harder to release their
stuff).

trivia: The vendor eventually duplicated my support and then the 3090
product administer tracked me down. He said that 3090 channels were
designed to have an aggregate total 3-5 channel errors (EREP reported)
for all systems&customers over a year period and there were instead 20
(extra, turned out to be channel-extender support). When I got a
unrecoverable telco transmission error, I would reflect a CSW
"channel-check" to the host software. I did some research and found that
if an IFCC (interface control check) was reflected instead, it basically resulted in the same system recovery activity (and got vendor to change
their software from "CC" to "IFCC").

I was asked to give a talk at NASA dependable computing workshop and
used the 3090 example as part of the talk https://web.archive.org/web/20011004023230/http://www.hdcc.cs.cmu.edu/may01/index.html

About the same time, the IBM communication group was fighting off the
release of mainframe TCP/IP ... and when that got reversed, they changed
their tactic and claimed that since they had corporate ownership of
everything that crossed datacenter walls, TCP/IP had to be released
through them; what shipped got 44kbytes/sec aggregate using nearly whole
3090 processor. I then did RFC1044 support and in some tuning tests at
Cray Research between Cray and IBM 4341, got sustained 4341 channel
throughput using only modest amount of 4341 CPU (something like 500
times improvement in bytes moved per instruction executed).

other trivia: 1988, the IBM branch office asks me if I could help LLNL (national lab) "standardize" some fiber stuff they were playing with,
which quickly becomes FCS (fibre-channel standard, including some stuff
I had done in 1980), initially 1gbit/sec, full-duplex, aggregate
200mbyte/sec. Then the POK "fiber" group gets their stuff released in
the 90s with ES/9000 as ESCON, when it was already obsolete,
17mbytes/sec. Then some POK engineers get involved with FCS and define a heavy-weight protocol that drastically cuts the native throughput which eventually ships as FICON. Most recent public benchmark I've found is
z196 "Peak I/O" getting 2M IOPS using 104 FICON (over 104 FCS). About
the same time a FCS was announced for E5-2600 server blades claiming
over million IOPS (two such FCS having higher throughput than 104
FICON). Note also, IBM documents keeping SAPs (system assist processors
that do I/O) to 70% CPU (which would be more like 1.5M IOPS).

after leaving IBM in early 90s, I was brought in as consultant into
small client/server company, two former Oracle employees (that I had
worked with on cluster scale-up for IBM HA/CMP) were there, responsible
for something called "commerce server" doing credit card transactions,
the startup had also done this invention called "SSL" they were using,
it is now frequently called "electronic commerce". I had responsibility
for everything between webservers and the financial payment networks. I
then did a talk on "Why The Internet Wasn't Business Critical
Dataprocessing" (that Postel sponsored at ISI/USC), based on the
reliability, recovery & diagnostic software, procedures, etc I did for e-commerce. Payment networks had a requirement that their trouble desks
doing first level problem determination within five minutes. Early
trials had a major sports store chain doing internet e-commerce ads
during week-end national football game half-times and there were
problems being able to connect to payment networks for credit-card
transactions ... after three hrs, it was closed as "NTF" (no trouble
found).

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to EricP on Wed Jul 3 17:56:34 2024

An Historical Perspective::

EricP wrote:

Michael S wrote:

On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

An alternative is each device has its own DMA controller
which is just a couple of counters.

The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For example,
PLX PCI to local bus bridges with bus mastering capability, like PCI
9080 costed non-trivially more than slave-only 9060.

PCI is a different matter. I think they shot themselves in the foot.
That is because the PCI design used FAST TTL and was ridiculously
complex and had all sorts of unnecessary optional features like bridges.

I don't think it was as much "shot themselves in the foot" as it was
not looking forward enough. CPUs had just dropped from 5.0V to 3.3V
and few peripherals were going 3.3--yet.

There were no real "popcorn" parts on 3.3V. CMOS was gradually taking
over, but was "essentially" compatible voltage wise with TTL.

To my eye the choice of FAST TTL looks wrong headed. They needed FAST
because they wanted to run at 33 MHz which is beyond LS TTL limit.

Was also faster than popcorn CMOS of the day.

With a bus at 33 MHz and 4 bytes it superficially sounds like 133 MB/s.
But 33 MHz was too fast to decode or latch the address and data,
plus it multiplexes address and data, and took 5 cycles to do a
transfer.
So the bus actual data transfer rate was more like 133/5 = 26 MB/s.

Welcome to "back when computers were hard".

I looked into PCI bus interface chips when it first came out and there
were just *TWO* manufacturers for them on the planet, and they charged $50,000 just to answer the phone, and you had to pay them to design a
custom chip for you even though it was supposed to be a standard design.

When PC shipped in the thousands per month this was the way things were.
When PC started to ship hundreds of thousands per months things changed (early-mid 90s).

This all pushed the price of PCI cards way up so, for example,
an ISA bus modem card cost $50 but the exact same modem on PCI was $250.
No wonder most people stuck with ISA bus cards.

Exacerbating the above.

Whereas 74LS TTL was cheap and available from manufacturers everywhere.
I would have used 74LS TTL and done a straight 32-bit bus with no
options,
multiplexed address and data to keep down the connector pin count and
cost.
That could have run at 20 MHz which leaves 50 ns for address and data
decode and latch, and driven 8 bus loads with no bridges.
That gives 10 MT/s = 40 MB/s. Plus cheap and widely available.

It would take "too many" TTL parts to implement a small form factor
initerface, so integration was needed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Wed Jul 3 18:13:22 2024

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

John Levine <johnl@taugh.com> writes:

On the smallest
360s, the channel was implemented in CPU microcode. When running fast >>devices like disks the channel used so much of the CPU that the
program stalled, but it was worth it to be compatible with faster
machines. Even then, disk seeks or tape rewinds or reading cards or >>printing on printers let the CPU do useful work while the channel and >>device did its thing.

This sounds very much like hardware multi-threading to me: The CPU had >separate state for the channel and used its hardware for doing the
channel stuff when there was I/O to do, while running the non-channel
stuff the rest of the time, all without OS-level context switching.

More likely it was just polling between instructions to switch between
the CPU code and the channel code. The 360/30 was a very small machine
by modern standards.

The DEC 12 and 18 bit machines offered DMA in two flavors called
one-cycle and three-cycle data break. The one-cycle would be familiar
now, the device grabbed the bus, provided the address and control bits
and then read or wrote a word. But in that era flip flops and counters
were expensive, so they had three-cycle. The device sent a fixed
address to the CPU, which then decremented the word at that address
and sent the device a "done" signal if the result was zero, otherwise incremented the word at the next address and used its contents as the
address to read or write the data provided from or to the device.

It was like a very primitive channel that only knew how to do block
transfers. Three cycle was of course three times as slow, but for most
of the devices at the time, it was adequate.

The PDP-6 and KA10 had their own version of three cycle data break
which was too ugly to describe here.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to EricP on Wed Jul 3 21:25:50 2024

On Wed, 03 Jul 2024 13:34:39 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Michael S wrote:

On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

An alternative is each device has its own DMA controller
which is just a couple of counters.

The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For
example, PLX PCI to local bus bridges with bus mastering
capability, like PCI 9080 costed non-trivially more than slave-only
9060.

PCI is a different matter. I think they shot themselves in the foot.
That is because the PCI design used FAST TTL and was ridiculously
complex and had all sorts of unnecessary optional features like
bridges.

Bridges were needed for the high-end. How else would you go over 4 or
5 slots with crappy edge conector of standard PCI? How else would you
go over 8-10 slots even with much much better Compact PCI connectors?
Bridges do not work very well in read direction, but in write direction
they do not impact performance at all.

To my eye the choice of FAST TTL looks wrong headed. They needed FAST
because they wanted to run at 33 MHz which is beyond LS TTL limit.
With a bus at 33 MHz and 4 bytes it superficially sounds like 133
MB/s. But 33 MHz was too fast to decode or latch the address and data,
plus it multiplexes address and data, and took 5 cycles to do a
transfer. So the bus actual data transfer rate was more like 133/5 =
26 MB/s.

But bursts work as advertaized.
We designed many boards PCI 32bx33MHz that sustained over 90 MB/s in
host memory to device direction and over 100 MB/s in device to host
memory.
Few still in produdction, although we will eventually move awayy from
this architectore for reason unrelated to system bus.

I looked into PCI bus interface chips when it first came out and there
were just *TWO* manufacturers for them on the planet, and they charged $50,000 just to answer the phone, and you had to pay them to design a
custom chip for you even though it was supposed to be a standard
design. This all pushed the price of PCI cards way up so, for example,
an ISA bus modem card cost $50 but the exact same modem on PCI was
$250. No wonder most people stuck with ISA bus cards.

Sounds like very early days.

Whereas 74LS TTL was cheap and available from manufacturers
everywhere. I would have used 74LS TTL and done a straight 32-bit bus
with no options, multiplexed address and data to keep down the
connector pin count and cost. That could have run at 20 MHz which
leaves 50 ns for address and data decode and latch, and driven 8 bus
loads with no bridges. That gives 10 MT/s = 40 MB/s. Plus cheap and
widely available.

Pay attention that nothing of what you wrote above has anything to do
with difference between bus-mastering PCI devices and slave-only PCI
devices.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Michael S on Wed Jul 3 19:09:23 2024

Michael S <already5chosen@yahoo.com> writes:

On Wed, 03 Jul 2024 13:08:31 +0100
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.

The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.

This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.

I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).

That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65, it >appears that despite advances KDF9 was not competitive to maximally >configured 65 because of shortage of main memory.

The contemporaneous Burroughs B3500 I/O subsystem
fully supported asynchronous DMA transfers with no
CPU intervention.

The operating system issued an I/O request using the
IIO (Initiate I/O Instruction). The data payload
was a 16-bit field - the first four bits were
a one-hot field that selected one of:
- Echo /* Diagnostic test of DMA hardware */
- Read /* Move information from device to memory */
- Write /* Move information from memory to device */
- Test /* Miscellaneous operations */

The remaining 12 bits encoded controller-specific
options (like a space count for tapes, channel
number for line printers, card data format (BCL, EBCDIC),
etc.)

The instruction provided a buffer start and buffer end
address and an optional 24-bit or 32-bit field that
would select disk sectors, etc. These were kept by
the IOP (I/O Processor), the base address would
be updated as each transaction consumed or filled
the buffer and the final addresses available to the CPU via a
RAD instruction after the I/O complete interrupt
(e.g. to determine a short read, or partial write).

When the operation was complete, the IOP would
store a 16 to 48 bit 'Result Descriptor' in
memory and generate an interrupt to the CPU.

The CPU would check the R/D for errors and
reinstate the process waiting for the I/O.

Host configurations ranged from 8 to 64 channels,
each of which could have one or more attached
- Card Reader/Punch
- Magnetic Tape Drive
- Disk/HPT/Pack
- Data Comm processor
- Line Printer
- et alia

Disk channels attached to disk pack
controllers, each of which had a string
of up to 16 drives. Multiple channels
could be multiplexed with multiple disk
pack controllers sharing a string of
drives through a disk exchange controller
supporting 8 hosts sharing 16 drives; with
operating system and hardware support
for loosely coupled shared filesystems.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Scott Lurndal on Wed Jul 3 19:32:20 2024

Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

On Wed, 03 Jul 2024 13:08:31 +0100
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

Once you recognize that I/O is eating up your precious CPU, and

you >> > get to the point you are willing to expend another fixed
programmed >> > device to make the I/O burden manageable, then you
basically have >> > CDC 6600 Peripheral Processors, programmed in
code or microcode. >>

The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.

This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.

I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).

That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65,
it appears that despite advances KDF9 was not competitive to
maximally configured 65 because of shortage of main memory.

The contemporaneous Burroughs B3500 I/O subsystem
fully supported asynchronous DMA transfers with no
CPU intervention.

snipped description

Yes, that is an example of the kind of thing to which I was referring
in my response to Mitch's post. A question. Was all of this pure
hardware, or was it microcoded?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Stephen Fuld on Wed Jul 3 20:59:04 2024

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

On Wed, 03 Jul 2024 13:08:31 +0100
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

Once you recognize that I/O is eating up your precious CPU, and

you >> > get to the point you are willing to expend another fixed
programmed >> > device to make the I/O burden manageable, then you
basically have >> > CDC 6600 Peripheral Processors, programmed in
code or microcode. >>

The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.

This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.

I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).

That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65,
it appears that despite advances KDF9 was not competitive to
maximally configured 65 because of shortage of main memory.

The contemporaneous Burroughs B3500 I/O subsystem
fully supported asynchronous DMA transfers with no
CPU intervention.

snipped description

Yes, that is an example of the kind of thing to which I was referring
in my response to Mitch's post. A question. Was all of this pure
hardware, or was it microcoded?

In the 60's, mostly pure hardware (although that was before
I worked for Burroughs). In the 70-90's, they used commodity
microprocessors for the IOPs (8080, 8085 and 80186).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stephen Fuld on Wed Jul 3 21:18:19 2024

Stephen Fuld wrote:

Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

On Wed, 03 Jul 2024 13:08:31 +0100
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:

On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):

Once you recognize that I/O is eating up your precious CPU, and

you >> > get to the point you are willing to expend another fixed
programmed >> > device to make the I/O burden manageable, then you
basically have >> > CDC 6600 Peripheral Processors, programmed in
code or microcode. >>

The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.

This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.

I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).

That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65,
it appears that despite advances KDF9 was not competitive to
maximally configured 65 because of shortage of main memory.

The contemporaneous Burroughs B3500 I/O subsystem
fully supported asynchronous DMA transfers with no
CPU intervention.

snipped description

Yes, that is an example of the kind of thing to which I was referring
in my response to Mitch's post. A question. Was all of this pure
hardware, or was it microcoded?

S.E.L created a thing they called the RCU (Remote Control Unit).
It was basically a channel with writable microcode. NASA bought
a bunch of them because they had tapes from the deep space radio
telescopes where an entire 9-track tape contained 1 record. NASA
just started the tape and recorded satellite data until the end
of the tape, where they would start the next tape just before the
end of the previous tape.

So we programmed the RCU to read as much as the system memory
allowed, backed the tap up 1 second while dumping the data to
disk. Then we started the tape forward with the RCU watching
the pattern on the tape, when it detected 4096 bytes of the
last read, it would start streaming data in to memory again.

No other company could demonstrate that they could read one
of those tapes.

Presto, reading a whole 9-track tape with no inter record gaps !!

{{I may have the name of the unit wrong}}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Levine on Thu Jul 4 15:48:45 2024

John Levine <johnl@taugh.com> schrieb:

IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.

https://patents.google.com/patent/US3812475A/en

What a monster.

I've written long patents myself, but this one surely takes the
biscuit.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Thomas Koenig on Thu Jul 4 16:56:29 2024

Thomas Koenig wrote:

John Levine <johnl@taugh.com> schrieb:

IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.

https://patents.google.com/patent/US3812475A/en

What a monster.

I've written long patents myself, but this one surely takes the
biscuit.

The amalgamation of the figures and the placement of the figures
via the figure placement "figure" enable one to directly implement
the device in logic.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Michael S on Thu Jul 4 12:32:05 2024

Michael S wrote:

On Wed, 03 Jul 2024 13:34:39 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

Michael S wrote:

On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

An alternative is each device has its own DMA controller
which is just a couple of counters.

The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For
example, PLX PCI to local bus bridges with bus mastering
capability, like PCI 9080 costed non-trivially more than slave-only
9060.

PCI is a different matter. I think they shot themselves in the foot.
That is because the PCI design used FAST TTL and was ridiculously
complex and had all sorts of unnecessary optional features like
bridges.

Bridges were needed for the high-end. How else would you go over 4 or
5 slots with crappy edge conector of standard PCI? How else would you
go over 8-10 slots even with much much better Compact PCI connectors?
Bridges do not work very well in read direction, but in write direction
they do not impact performance at all.

Bridges were needed because PCI's design only could drive 3 or 4 slots
In 1992 when PCI launched most people needed more that 3 to 4 card slots because almost everything required a plug-in card.

PCI could only drive 3 to 4 slots because they were running at 33 MHz and,
as I understand it, signal reflections limit the numbers of bus loads.

To my eye the choice of FAST TTL looks wrong headed. They needed FAST
because they wanted to run at 33 MHz which is beyond LS TTL limit.
With a bus at 33 MHz and 4 bytes it superficially sounds like 133
MB/s. But 33 MHz was too fast to decode or latch the address and data,
plus it multiplexes address and data, and took 5 cycles to do a
transfer. So the bus actual data transfer rate was more like 133/5 =
26 MB/s.

But bursts work as advertaized.
We designed many boards PCI 32bx33MHz that sustained over 90 MB/s in
host memory to device direction and over 100 MB/s in device to host
memory.
Few still in produdction, although we will eventually move awayy from
this architectore for reason unrelated to system bus.

With one card doing a burst on a bus with no bridges.
But as soon as you have to turn around bus ownership, with multiple
masters competing for bus access, then you have to pay the protocol
overhead and the effective data transfer rate drops considerably.
And add a bridge because you need more than 3-4 cards, even lower.

I looked into PCI bus interface chips when it first came out and there
were just *TWO* manufacturers for them on the planet, and they charged
$50,000 just to answer the phone, and you had to pay them to design a
custom chip for you even though it was supposed to be a standard
design. This all pushed the price of PCI cards way up so, for example,
an ISA bus modem card cost $50 but the exact same modem on PCI was
$250. No wonder most people stuck with ISA bus cards.

Sounds like very early days.

Whereas 74LS TTL was cheap and available from manufacturers
everywhere. I would have used 74LS TTL and done a straight 32-bit bus
with no options, multiplexed address and data to keep down the
connector pin count and cost. That could have run at 20 MHz which
leaves 50 ns for address and data decode and latch, and driven 8 bus
loads with no bridges. That gives 10 MT/s = 40 MB/s. Plus cheap and
widely available.

Pay attention that nothing of what you wrote above has anything to do
with difference between bus-mastering PCI devices and slave-only PCI
devices.

As I described the simpler bus design can drive 8 to 10 slots directly,
so most systems would not need more card slots, so no bridges.

It also has no interface cost difference for a bus mastering device.
Of course as bus master cards have extra functionality they have extra
logic, like two 32-bit counters for each DMA they wish to perform
and a little logic to handshake the bus request-grant lines.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to All on Thu Jul 4 13:33:36 2024

MitchAlsup1 wrote:

An Historical Perspective::

EricP wrote:

Michael S wrote:

On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

An alternative is each device has its own DMA controller
which is just a couple of counters.

The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For example,
PLX PCI to local bus bridges with bus mastering capability, like PCI
9080 costed non-trivially more than slave-only 9060.

PCI is a different matter. I think they shot themselves in the foot.
That is because the PCI design used FAST TTL and was ridiculously
complex and had all sorts of unnecessary optional features like bridges.

I don't think it was as much "shot themselves in the foot" as it was
not looking forward enough. CPUs had just dropped from 5.0V to 3.3V
and few peripherals were going 3.3--yet.

There were no real "popcorn" parts on 3.3V. CMOS was gradually taking
over, but was "essentially" compatible voltage wise with TTL.

Except this is an I/O bus not a system bus. A BIG part of it's job
is to be a market place, the basis of a standard ecosystem.
So no I would not have a 3.3v option as that just fragments the market, increase parts inventory, lowers production runs, and drives up cost.
3.3v was another PCI option.

The internal system bus is of course free to do whatever it wants.

To my eye the choice of FAST TTL looks wrong headed. They needed FAST
because they wanted to run at 33 MHz which is beyond LS TTL limit.

Was also faster than popcorn CMOS of the day.

I worked with 4000 series CMOS in late 1970's.
One used RCA 1802 processor in a data acquisition and recording system
for a towed side-scan sonar. I also designed a digital tape controller
using 4000 CMOS for an airborne data acquisition system.
In both cases power was the primary consideration.

But for an I/O bus circa 1990, LS TTL seems like the best trade off
at that time.

With a bus at 33 MHz and 4 bytes it superficially sounds like 133 MB/s.
But 33 MHz was too fast to decode or latch the address and data,
plus it multiplexes address and data, and took 5 cycles to do a
transfer.
So the bus actual data transfer rate was more like 133/5 = 26 MB/s.

Welcome to "back when computers were hard".

I looked into PCI bus interface chips when it first came out and there
were just *TWO* manufacturers for them on the planet, and they charged
$50,000 just to answer the phone, and you had to pay them to design a
custom chip for you even though it was supposed to be a standard design.

When PC shipped in the thousands per month this was the way things were.
When PC started to ship hundreds of thousands per months things changed (early-mid 90s).

This all pushed the price of PCI cards way up so, for example,
an ISA bus modem card cost $50 but the exact same modem on PCI was $250.
No wonder most people stuck with ISA bus cards.

Exacerbating the above.

Exactly. Because the I/O bus is a meeting place.
You have to design it to be inviting, which includes implementation cost.

This was back just after the "bus wars" era where many big players
were trying to grab control of the PC market with their proprietary
(and patented) next generation bus. PCI was hoped to resolve this
and provide that standard, open market, but failed because it didn't
address what card suppliers and customers wanted.

Whereas 74LS TTL was cheap and available from manufacturers everywhere.
I would have used 74LS TTL and done a straight 32-bit bus with no
options,
multiplexed address and data to keep down the connector pin count and
cost.
That could have run at 20 MHz which leaves 50 ns for address and data
decode and latch, and driven 8 bus loads with no bridges.
That gives 10 MT/s = 40 MB/s. Plus cheap and widely available.

It would take "too many" TTL parts to implement a small form factor initerface, so integration was needed.

This is circa 1990 so the choice looks limited.
one could use LS TTL for the bus interface and
NMOS LSI parts for the rest of a card.

The problem is current draw and heat dissipation in bus drivers.
But I figure, say, 50 wire bus, LS TTL drivers, 10 card loads,
worst case is about 50*10*1.6 ma = 0.8 A, 4.0 W, in the bus master.
Average case should be 1/8 that = 0.1 A, 0.5 W in a bus master
(assuming the I/O bus is 50% busy, with 5 loads, 1/2 bits are zeros).
That seems quite acceptable for the bus interface.
And of course there is the power for the rest of the card logic.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to EricP on Thu Jul 4 22:01:51 2024

On Thu, 04 Jul 2024 13:33:36 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

This was back just after the "bus wars" era where many big players
were trying to grab control of the PC market with their proprietary
(and patented) next generation bus. PCI was hoped to resolve this
and provide that standard, open market, but failed because it didn't
address what card suppliers and customers wanted.

PCI didn't fail. It was a stunniing success.
With emergence of PCI anything else either died at spot (IBM
Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
and off-springs).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Michael S on Thu Jul 4 16:15:14 2024

Michael S wrote:

On Thu, 04 Jul 2024 13:33:36 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

This was back just after the "bus wars" era where many big players
were trying to grab control of the PC market with their proprietary
(and patented) next generation bus. PCI was hoped to resolve this
and provide that standard, open market, but failed because it didn't
address what card suppliers and customers wanted.

PCI didn't fail. It was a stunniing success.
With emergence of PCI anything else either died at spot (IBM
Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
and off-springs).

Also vying for attention in the wars were Multibus I & II (pushed by Intel), Futurebus (pushed by DEC), IIRC Apple had its own derivative of Futurebus
but different (of course) so you had to buy Apple devices,
NuBus, FASTBUS, Q-Bus.

PCI failed from the view that it was supposed to replace the 16-bit ISA
bus on PC's, but because of PCI card cost customers demanded that
motherboards continue to support ISA. Also plug-n-play made ISA board
support a lot easier. Motherboards had at least to support AGP,
the PCI variant for graphics cards so they couldn't eliminate PCI.
So PC motherboards and device manufacturers wound up supporting both.
Not what anyone planned or wanted.

Where PCI succeeded I suppose is, as you point out, outside the PC market
it killed off all the others. But that left those systems tied to the
higher cost cards, increasing their cost relative to PC's.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Michael S on Thu Jul 4 20:20:34 2024

Michael S wrote:

On Thu, 04 Jul 2024 13:33:36 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

This was back just after the "bus wars" era where many big players
were trying to grab control of the PC market with their proprietary
(and patented) next generation bus. PCI was hoped to resolve this
and provide that standard, open market, but failed because it didn't
address what card suppliers and customers wanted.

PCI didn't fail. It was a stunniing success.

I am going to recommend this sentence as the largest understatement
of they year (so far).

With emergence of PCI anything else either died at spot (IBM
Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
and off-springs).

And were supposed to.......

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to mitchalsup@aol.com on Thu Jul 4 21:15:46 2024

MitchAlsup1 <mitchalsup@aol.com> schrieb:

Thomas Koenig wrote:

John Levine <johnl@taugh.com> schrieb:

IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.

https://patents.google.com/patent/US3812475A/en

What a monster.

I've written long patents myself, but this one surely takes the
biscuit.

The amalgamation of the figures and the placement of the figures
via the figure placement "figure" enable one to directly implement
the device in logic.

That is, of course, very nice.

But the sheer number of claims, 91, with around than half of them
indpendent (but quite a few formulated as "in combination", so there
may have been some dependency to other claims hidden in there...
must have taken the competition quite some time to figure out
what was actually covered, and if their own designs fell under
that patent or not.

And then it was granted after ~ 20 years, and continued to be
valid for another ~ 20 - US patent law used to be weird.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Thu Jul 4 22:00:09 2024

According to Thomas Koenig <tkoenig@netcologne.de>:

IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.

https://patents.google.com/patent/US3812475A/en

But the sheer number of claims, 91, with around than half of them
indpendent (but quite a few formulated as "in combination", so there
may have been some dependency to other claims hidden in there...
must have taken the competition quite some time to figure out
what was actually covered, and if their own designs fell under
that patent or not.

And then it was granted after ~ 20 years, and continued to be
valid for another ~ 20 - US patent law used to be weird.

It is unusual for a patent to take that long without either the
inventor deliberately delaying it with endless amendments or it being classified, neither of which seems relevant here.

You can't challenge other people for violating a patent until it's
issued, and by 1974 channels were rather old news. I never heard of
IBM enforcing it. They probably put it in the patent pool they cross
licensed to other computer makers.

"IBM's Early Computers" says almost nothing about channels other than
that they were invented for the 709 and added to the last version of
the 705.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Thomas Koenig on Thu Jul 4 23:47:30 2024

Thomas Koenig wrote:

MitchAlsup1 <mitchalsup@aol.com> schrieb:

Thomas Koenig wrote:

John Levine <johnl@taugh.com> schrieb:

IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.

https://patents.google.com/patent/US3812475A/en

What a monster.

I've written long patents myself, but this one surely takes the
biscuit.

The amalgamation of the figures and the placement of the figures
via the figure placement "figure" enable one to directly implement
the device in logic.

That is, of course, very nice.

But the sheer number of claims, 91, with around than half of them
indpendent (but quite a few formulated as "in combination", so there
may have been some dependency to other claims hidden in there...
must have taken the competition quite some time to figure out
what was actually covered, and if their own designs fell under
that patent or not.

It was the first !

And then it was granted after ~ 20 years, and continued to be
valid for another ~ 20 - US patent law used to be weird.

Much of the time, it is USPTO that has to bring an examiner up
to speed and completely digest the topic, this sets off a flurry
of notifications for clarification, followed by changes to the
text, claims, figures. All the wile that is going on, the examiner
is looking across his library for similar already patented "stuff".

All of that takes time measured in months and years.

During my tenure, I averaged 5 years from invention to grant or
reject, with 18 months from "at USPTO" to first correspondence.
Generally the examiner has found dozens to hundreds of discrepancies,
claim formation violations, figure violations,...and you get to
fix them all before [s]he begins anew--this goes on multiple
times.

After my tenure, I wrote my own patent and submitted it via my
patent lawyer, and waited and waited. Finally after 26 months,
I got my first correspondence: [S]he complained about one
sub-clause on one claim, and did not like the wording of one
paragraph. We fixed that and had the patent granted in 2 months.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Levine on Thu Jul 4 23:39:41 2024

John Levine wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but >>>>> not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.

https://patents.google.com/patent/US3812475A/en

But the sheer number of claims, 91, with around than half of them >>indpendent (but quite a few formulated as "in combination", so there
may have been some dependency to other claims hidden in there...
must have taken the competition quite some time to figure out
what was actually covered, and if their own designs fell under
that patent or not.

And then it was granted after ~ 20 years, and continued to be
valid for another ~ 20 - US patent law used to be weird.

It is unusual for a patent to take that long without either the
inventor deliberately delaying it with endless amendments or it being classified, neither of which seems relevant here.

You can't challenge other people for violating a patent until it's
issued, and by 1974 channels were rather old news. I never heard of
IBM enforcing it. They probably put it in the patent pool they cross
licensed to other computer makers.

In general, IBM uses its patent portfolio in a defensive posture.

Imagine you are the employee of xyz corporation and want to assert
your newly granted patent onto IBM.

IBM will simply show you that they have 400,000 current patents that
they will assert back on you if you try. Most of the time, xyz corp
cannot afford to even read all of IBM's patents and remain with
positive cash flow. Often xuz corporation does not have enough
employees to read all IBM's patents in the duration their new
patent remains valid; and they certainly cannot afford to hire
lawyers to do it.

"IBM's Early Computers" says almost nothing about channels other than
that they were invented for the 709 and added to the last version of
the 705.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Joe Pfeiffer@21:1/5 to John Levine on Thu Jul 4 18:39:21 2024

John Levine <johnl@taugh.com> writes:

The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

The day the CPU became faster than a teletype (or any other IO device
you care to name) interrupts became useful. Get an interrupt saying the teletype is ready, send a character, go back to work, repeat.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Joe Pfeiffer@21:1/5 to EricP on Thu Jul 4 18:30:49 2024

EricP <ThatWouldBeTelling@thevillage.com> writes:

Michael S wrote:

On Thu, 04 Jul 2024 13:33:36 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:

This was back just after the "bus wars" era where many big players
were trying to grab control of the PC market with their proprietary
(and patented) next generation bus. PCI was hoped to resolve this
and provide that standard, open market, but failed because it didn't
address what card suppliers and customers wanted.

PCI didn't fail. It was a stunniing success.
With emergence of PCI anything else either died at spot (IBM
Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
and off-springs).

Also vying for attention in the wars were Multibus I & II (pushed by Intel), Futurebus (pushed by DEC), IIRC Apple had its own derivative of Futurebus
but different (of course) so you had to buy Apple devices,
NuBus, FASTBUS, Q-Bus.

PCI failed from the view that it was supposed to replace the 16-bit ISA
bus on PC's, but because of PCI card cost customers demanded that motherboards continue to support ISA. Also plug-n-play made ISA board
support a lot easier. Motherboards had at least to support AGP,
the PCI variant for graphics cards so they couldn't eliminate PCI.
So PC motherboards and device manufacturers wound up supporting both.
Not what anyone planned or wanted.

Where PCI succeeded I suppose is, as you point out, outside the PC market
it killed off all the others. But that left those systems tied to the
higher cost cards, increasing their cost relative to PC's.

My recollection is that ISA hung on for a while to support the ISA cards
we already had. As the old cards went out of service demand for ISA
support went with them.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Fri Jul 5 09:59:39 2024

Lawrence D'Oliveiro wrote:

On Tue, 02 Jul 2024 17:36:50 -1000, Lynn Wheeler wrote:

When doing IBM's HA/CMP and working with major RDBMS vendors on cluster
scaleup in late 80s/early 90s, there was lots of references to POSIX
light-weight threads ...

Threads were all the rage in the 1990s. People were using them for everything. One language (Java) absorbed threading right into its core DNA (where is the locking API? Oh, it’s attached to the base “Object” type itself!).

People backed off a bit after that. Nowadays we see a revival of the “coroutine” idea, where preemption only happens at explicit “await” points. For non-CPU-intensive workloads, this is much easier to cope with.

... and asynchronous I/O for RDBMS (with no buffer > copies) and the
RDBMS managing large record cache.

This is why POSIX has the disk-oriented “aio” API, for the diehard DBMS folks. Linux also added “io_uring”, for high performance but not disk- specific I/O.

Really old PC printers (dot matrix or similar) still had a line buffer
worth of on-device memory, enough so that the CPU could sit in a buzy
loop sending bytes over the Centronix interface until it got the "I'm
full" status bit back, or N bytes had been sent. At that point my disk
spooler code would back off and let the next timer interrupt check to
see if there was both more print data to be sent and the printer
signalled that it was ready to receive more data.

The original serial ports had no buffer at all, so you had to use
interrupts for both sending and receiving if you wanted to do anything
else while communicating. Around 1984 the serial ports gained a 16-byte buffer, so at that time it made sense to use a hybrid approach, filling/emptying the buffer on each interrupt, while reducing the
interrupt load by an order of magnitude.

I.e. acting somewhat like a channel program, but using the main/only cpu
to do all the work in the background.
Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Joe Pfeiffer on Fri Jul 5 09:52:45 2024

Joe Pfeiffer wrote:

John Levine <johnl@taugh.com> writes:

The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

The day the CPU became faster than a teletype (or any other IO device
you care to name) interrupts became useful. Get an interrupt saying the teletype is ready, send a character, go back to work, repeat.

Not just serial connections, a parallel output port also could use
an interrupt if it was connected to a slow device. The port write
sets a "full" flag on the output, the slow device sees the full flag
and eventually reads the parallel value, clearing the full flag,
causing an "empty" interrupt back to the sender.

This basic handshake shows up lots of places but an interrupt
is only beneficial if the cost of servicing the interrupt is
lower than the cost of spin-waiting for the empty signal.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Fri Jul 5 20:02:23 2024

According to Joe Pfeiffer <pfeiffer@cs.nmsu.edu>:

... Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

The day the CPU became faster than a teletype (or any other IO device
you care to name) interrupts became useful. Get an interrupt saying the >teletype is ready, send a character, go back to work, repeat.

That's certainly the model that DEC used in the PDP-1 and their other
minis. Lightweight interrupts and simple device controllers worked
for them. But the tradeoffs can be a lot more complicated.

Let us turn back to the late, not very lamented IBM 1130 mini. It
usually came with an 1132 printer which printed about 100
lines/minute. A drum rotated behind the paper with 48 rows of
characters, each row being all the same character. In front of the
paper was the ribbon and a row of solenoid driven hammers.

When the 1130 wanted to print a line, it started the printer, which
would then tell it what the upcoming character was on the drum. The
computer then had less than 10ms to scan the line of characters to be
printed and put a bit map saying which solenoids to fire into fixed
locations in low memory that the printer then fetched using DMA.
Repeat until all of the characters were printed, and tell the printer
to advance the paper.

Given the modest speed of the 1130, while it was printing a line it
couldn't do anything else. But it was even worse than that. There were
two models of 1130, fast and slow, with the difference being a delay
circuit. The slow model couldn't produce the bitmaps fast enough, so
there was a "print mode" that disabled the delay circuit while it was
printing. As you might expect, students quickly figured out how to put
their 1130s into print mode all the time.

The printer interrupted after a paper move was complete, giving the
computer some chance to compute the next line to print in the
meantime. To skip to the top of the next page or other paper motion,
it told the printer to start moving the paper, and a hole in which row
in the carriage control tape (look it up) to wait for. When the hole
came around, the printer interrupted the CPU which then told the
printer to stop the paper.

The other printer was a 1403 which had 300 and 600 LPM models. Its
print mechanism was sort of similar, a horizontal chain of characters
spinning behind the paper, but that made the hammer management harder
since what character was at what position changed every character
time. But that wasn't the CPU's problem. The 1403 used its own unique
character code probably related to the layout of the print chain, so
the CPU translated the line into printer code, stored the result in a
buffer, and then sent a command to the printer telling it to print the
buffer. The printer printed, then interrupted, at which point the CPU
told it to either space one line or skip to row N in the carriage
control tape, again interrupting when done.

By putting most of the logic into the printer controller, the 1403 was
not just faster, but only took a small fraction of the CPU so the
whole system could do more work to keep the printer printing.

The point of this long anecdote is that you don't just want an
interrupt when the CPU is a little faster than the device. At least in
that era you wanted to offload as much work as possible so the CPU
could keep the device going and balance the speed of the CPU and
the devices.

As a final note, keep in mind when you look at the 400 page patent on
the 709's channel that the logic was built entirely out of vacuum
tubes, and was not a lot less complex than the computer to which it
was attached. A basic 709 rented for $10K/mo (about $100K now) and
each channel was $3600/mo ($37K now). But the speed improvement
was worth it.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to John Levine on Fri Jul 5 15:35:50 2024

John Levine <johnl@taugh.com> writes:

By putting most of the logic into the printer controller, the 1403 was
not just faster, but only took a small fraction of the CPU so the
whole system could do more work to keep the printer printing.

360 "CKD DASD" and multi-track search trade-off. 360s had relatively
little real storage (for caching information) and slow processor, so for program libraries on disk ... they created "PDS" format and had (disk
resident, cylinder aligned) directory that contained records for name of
each program and its disk location in the library. To load a program, it
first did a "multi-track" search of of the PDS directory started at
track 0 of the 1st cylinder of the directory ... ran until it found name
match (or reached end of cylinder). If name wasn't found at end of
cylinder, it would restart if there were additional cylinders in the
directory. Trivia: the searched-for program name was in processor memory
and the multi-track search operation would refetch the name every time
it did a compare for matching name (with records in the PDS directory), monopolizing channel, controller, & disk.

Roll forward to 1979, a large national grocery chain had large
loosely-coupled complex of multiple 370/168 systems sharing string of
DASD containing the PDS dataset of store controller applications ... and
was experiencing enormous throughput problems. All the usual corporate performance specialists had been dragged through the datacenter with
hopes that they could address the problem ... until they eventually got
around to calling me. I was brought into large classroom with tables
covered with large stacks of activity/performance reports for each
system. After 30-40 mins examaning the reports ... I being to realize
the aggregate activity (summed across all systems) for specific shared
disk was peaking at 6-7 (total) I/O operations ... and corresponding
with severe performance problem. I asked what was on that disk and was
told it was the (shared) store controller program library for all the
stores in all regions and 168 systems; which I then strongly suspected
it was the PDS multi-track search perfoerformance that I had grappled
with as undergraduate in the 60s.

The store controller PDS dataset was quite large and had a three
cylinder directory, resident on 3330 disk drive ... implying that on the
avg, a search required 1.5 cylinders (and two I/Os), the first
multi-track search I/O for all 19 cylinders would be 19/60=.317sec
(during which time that processor's channel was busy, and the shared
controller was also busy ... blocking access to all disks on that
string, not just the speecific drive, for all systems in the complex)
and the 2nd would be 9.5/60=.158sec ... or .475sec for the two ... plus
a seek to move the disk arm to PDS directory, another seek to move the
disk arm to the cylinder where the program was located
... approx. .5+secs total for each store controller program library load (involving 6-7 I/Os) or two program loads per second aggregate serving
all stores in the country.

The store controller PDS program library was then split across set of
three disks, one dedicated (non-shared) set for each system in the in
the complex.

I was also doing some work on System/R (original sql/releational RDBMS)
and taking some flak from the IMS DBMS group down the road. The IMS
group were complaining that RDBMS had twice the disk space (for RDBMS
index) and increased the number of disk I/Os by 4-5 times (for
processing RDBMS index). Counter was that the RDBMS index significantly
reduced the manual maintenance (compared to IMS). By early 80s, disk
price/bit was significantly plummeting and system real memory
significantly increased useable for RDBMS caching, reducing physical
I/Os (while manual maintenance skills costs were significantly
increasing).

other trivia: when I transfer to San Jose, I got to wander around
datacenters in silicon valley, including disk engineering & product test (bldg14&15) across the street. They were doing prescheduled, 7x24,
stand-alone mainframe testing. They mentioned they had recently tried
MVS, but it had 15min mean-time-between-failure, requiring manual
re-ipl/reboot in that environment. I offered to rewrite I/O supervisor
to make it bullet-proof and never fail enabling any amount of on-demand, concurrent testing (greatly improving productivity). Downside was they
would point their finger at me whenever they had problem and I was
spending increasing amount of time diagnosing their hardware problems.

1980 was real tipping point as hardware tradeoff switched from system bottleneck to I/O bottleck (my claims that relative system disk
throughput had declined by order or magnitude, systems got 40-50 times
faster, disks got 3-5 fasters).

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lynn Wheeler on Sat Jul 6 02:21:46 2024

Lynn Wheeler wrote:

other trivia: when I transfer to San Jose, I got to wander around
datacenters in silicon valley, including disk engineering & product test (bldg14&15) across the street. They were doing prescheduled, 7x24, stand-alone mainframe testing. They mentioned they had recently tried
MVS, but it had 15min mean-time-between-failure, requiring manual re-ipl/reboot in that environment. I offered to rewrite I/O supervisor
to make it bullet-proof and never fail enabling any amount of on-demand, concurrent testing (greatly improving productivity). Downside was they
would point their finger at me whenever they had problem and I was
spending increasing amount of time diagnosing their hardware problems.

Punishment of the Good Samaritan.....

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Levine on Sat Jul 6 02:16:34 2024

John Levine wrote:

According to Joe Pfeiffer <pfeiffer@cs.nmsu.edu>:

... Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.

The day the CPU became faster than a teletype (or any other IO device
you care to name) interrupts became useful. Get an interrupt saying the >>teletype is ready, send a character, go back to work, repeat.

That's certainly the model that DEC used in the PDP-1 and their other
minis. Lightweight interrupts and simple device controllers worked
for them. But the tradeoffs can be a lot more complicated.

Let us turn back to the late, not very lamented IBM 1130 mini. It
usually came with an 1132 printer which printed about 100
lines/minute. A drum rotated behind the paper with 48 rows of
characters, each row being all the same character. In front of the
paper was the ribbon and a row of solenoid driven hammers.

When the 1130 wanted to print a line, it started the printer, which
would then tell it what the upcoming character was on the drum. The
computer then had less than 10ms to scan the line of characters to be
printed and put a bit map saying which solenoids to fire into fixed
locations in low memory that the printer then fetched using DMA.
Repeat until all of the characters were printed, and tell the printer
to advance the paper.

Given the modest speed of the 1130, while it was printing a line it
couldn't do anything else. But it was even worse than that. There were
two models of 1130, fast and slow, with the difference being a delay
circuit. The slow model couldn't produce the bitmaps fast enough, so
there was a "print mode" that disabled the delay circuit while it was printing. As you might expect, students quickly figured out how to put
their 1130s into print mode all the time.

We students, also, figured out that TSS had dumped OS process state in
page 0 (un accessible normally) of the user's address space. We then
figured out that while we could not LD or ST that data, we could queue
up I/O (out) write the data to disk, read it back where we could diddle
with it, write it back to disk, queue up I/O (in) and read it back into
page 0.

All we did was to set the privilege bit !!

We were generous and loaned out all the CPU time we did not need......

The printer interrupted after a paper move was complete, giving the
computer some chance to compute the next line to print in the
meantime. To skip to the top of the next page or other paper motion,
it told the printer to start moving the paper, and a hole in which row
in the carriage control tape (look it up) to wait for. When the hole
came around, the printer interrupted the CPU which then told the
printer to stop the paper.

The other printer was a 1403 which had 300 and 600 LPM models. Its
print mechanism was sort of similar, a horizontal chain of characters spinning behind the paper, but that made the hammer management harder
since what character was at what position changed every character
time. But that wasn't the CPU's problem. The 1403 used its own unique character code probably related to the layout of the print chain, so
the CPU translated the line into printer code, stored the result in a
buffer, and then sent a command to the printer telling it to print the buffer. The printer printed, then interrupted, at which point the CPU
told it to either space one line or skip to row N in the carriage
control tape, again interrupting when done.

We (students) used to have comment cards that cause the hammers all
fly at the same time. So, instead of a natural z z z z z of the print,
it would go BANG BANG BANG and we knew out stuff was being printed...

By putting most of the logic into the printer controller, the 1403 was
not just faster, but only took a small fraction of the CPU so the
whole system could do more work to keep the printer printing.

The point of this long anecdote is that you don't just want an
interrupt when the CPU is a little faster than the device. At least in
that era you wanted to offload as much work as possible so the CPU
could keep the device going and balance the speed of the CPU and
the devices.

As a final note, keep in mind when you look at the 400 page patent on
the 709's channel that the logic was built entirely out of vacuum
tubes, and was not a lot less complex than the computer to which it
was attached. A basic 709 rented for $10K/mo (about $100K now) and
each channel was $3600/mo ($37K now). But the speed improvement
was worth it.

Ah memories.......

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lynn Wheeler on Sat Jul 6 06:15:56 2024

Lynn Wheeler wrote:

John Levine <johnl@taugh.com> writes:

By putting most of the logic into the printer controller, the 1403
was not just faster, but only took a small fraction of the CPU so
the whole system could do more work to keep the printer printing.

360 "CKD DASD" and multi-track search trade-off.

As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell. Doing a linear search, and worse yet, doing it on a device that
was slower than main memory, and tying up the disk controller and
channel to do it. It wasn't even sort of addressed until the early
1990s with the "fast PDS search" feature in the 3990 controller. The
searches still took the same amount of elapsed time, but the key field comparison was done in the controller and it only returned status when
it found a match (or end of the extent), which freed up the channel.
Things would have been much better if they simply used some sort of
"table of contents" or index at the start of the PDS, read it in, then
did an in memory search. Even on small memory machines, if you had a
small sized index block and used something like a B-tree of them, it
would have been faster.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sat Jul 6 14:58:17 2024

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

Things would have been much better if they simply used some sort of
"table of contents" or index at the start of the PDS, read it in, then
did an in memory search. Even on small memory machines, if you had a
small sized index block and used something like a B-tree of them, it
would have been faster.

I believe that's what they did with VSAM.

The 360/20's disk controller formatted the disk with 270 byte fixed
length records, no keys. Nonetheless the IOCS library provided ISAM
with track and cylinder indexes with the keys of the last record on
the track or cylinder. It would probably have run quite fast if they
had buffered the indexes in core but they didn't since the whole
thing including the application had to fit in 12K or 16K bytes.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Stephen Fuld on Sat Jul 6 07:34:43 2024

"Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> writes:

As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell. Doing a linear search, and worse yet, doing it on a device that
was slower than main memory, and tying up the disk controller and
channel to do it. It wasn't even sort of addressed until the early
1990s with the "fast PDS search" feature in the 3990 controller. The searches still took the same amount of elapsed time, but the key field comparison was done in the controller and it only returned status when
it found a match (or end of the extent), which freed up the channel.
Things would have been much better if they simply used some sort of
"table of contents" or index at the start of the PDS, read it in, then
did an in memory search. Even on small memory machines, if you had a
small sized index block and used something like a B-tree of them, it
would have been faster.

trivia: I've also mentioned in 1980 using HYPERChannel to implement
channel extender ... as side-effect also reduced channel busy on the
"real" channels ... another side-effect would get calls from ibm
branches that had customers also doing stuff with HYPERChannel including
NCAR that did supercomputer "network access system" that as side-effect eliminated channel busy for CKD DASD "search" operations in 1st half
of 80s (a decade before 3990)

for A510 channel emulator , the channel program was downloaded into the
A510 and executed from there. NCAR got a upgrade for A515 which also
allowed the search argument to be included in the download ... so
mainframe real memory and channels weren't involved (although the dasd controller was still involved). It also supported 3rd party transfer.

Supercomputer would send request over HYPERChannel to mainframe
server. The mainframe would download the channel program into a A515 and
return the A515 and channel program "handle" to the supercomputer. The supercomputer would send a request to that A515 to execute the specified channel program (and data would transfer directly between the disk and
the supercomputer w/o passing through the mainframe).

Then became involved with HIPPI (open cray channel standard pushed by
LANL) and FCS (open fibre channel pushed by LLNL) also being able to do
3rd party transfers ... along with have LLNL's LINCS/Unitree ported to
IBM's HA/CMP product we were doing.

other trivia: as also mentioned System/R (original SQL/relational RDBMS implementation) used cacheable indexes ... not linear searches.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to John Levine on Sat Jul 6 19:16:34 2024

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

Things would have been much better if they simply used some sort of
"table of contents" or index at the start of the PDS, read it in,
then did an in memory search. Even on small memory machines, if
you had a small sized index block and used something like a B-tree
of them, it would have been faster.

I believe that's what they did with VSAM.

Agreed in the sense that VSAM replaced ISAM, but, and I am getting
beyond my depth here, I wasn't aware that PDSs used ISAM. I had
thought they were a thing unto themselves. Please correct me if I am
wrong. In any event, PDSs in their original form lasted beyond the introduction of VSAM, or the PDS search assist functionality wouldn't
have been needed.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sat Jul 6 21:28:25 2024

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

I believe that's what they did with VSAM.

Agreed in the sense that VSAM replaced ISAM, but, and I am getting
beyond my depth here, I wasn't aware that PDSs used ISAM. I had
thought they were a thing unto themselves. Please correct me if I am
wrong. In any event, PDSs in their original form lasted beyond the >introduction of VSAM, or the PDS search assist functionality wouldn't
have been needed.

A PDS had a directory at the front followed by the members. The
directory had an entry per member with the name, the starting
location, and optional other stuff. The entries were in order by
member name, and packed into 256 byte records each of which had a
hardware key with the name of the last entry in the block. It searched
the PDS directory with the same kind of channel key search it did for
ISAM, leading to the performance issues Lynn described.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to John Levine on Sun Jul 7 06:23:23 2024

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

I believe that's what they did with VSAM.

Agreed in the sense that VSAM replaced ISAM, but, and I am getting
beyond my depth here, I wasn't aware that PDSs used ISAM. I had
thought they were a thing unto themselves. Please correct me if I
am wrong. In any event, PDSs in their original form lasted beyond
the introduction of VSAM, or the PDS search assist functionality
wouldn't have been needed.

A PDS had a directory at the front followed by the members. The
directory had an entry per member with the name, the starting
location, and optional other stuff. The entries were in order by
member name, and packed into 256 byte records each of which had a
hardware key with the name of the last entry in the block. It searched
the PDS directory with the same kind of channel key search it did for
ISAM, leading to the performance issues Lynn described.

Yes, I don't disagree with any of that. But I got the impression from
your previous posts that IBM had replaced the search key fields of a
PDS with some kind of VSAM (i.e. b-tree or such) varient, as they did
with ISAM. If that is true I had never heard about it.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Stephen Fuld on Sun Jul 7 06:35:40 2024

Stephen Fuld <SFuld@alumni.cmu.edu.invalid> schrieb:

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

I believe that's what they did with VSAM.

Agreed in the sense that VSAM replaced ISAM, but, and I am getting
beyond my depth here, I wasn't aware that PDSs used ISAM. I had
thought they were a thing unto themselves. Please correct me if I
am wrong. In any event, PDSs in their original form lasted beyond
the introduction of VSAM, or the PDS search assist functionality
wouldn't have been needed.

A PDS had a directory at the front followed by the members. The
directory had an entry per member with the name, the starting
location, and optional other stuff. The entries were in order by
member name, and packed into 256 byte records each of which had a
hardware key with the name of the last entry in the block. It searched
the PDS directory with the same kind of channel key search it did for
ISAM, leading to the performance issues Lynn described.

Yes, I don't disagree with any of that. But I got the impression from
your previous posts that IBM had replaced the search key fields of a
PDS with some kind of VSAM (i.e. b-tree or such) varient, as they did
with ISAM. If that is true I had never heard about it.

They introduced PDSE, see https://www.ibm.com/docs/en/zos-basic-skills?topic=sets-what-is-pdse
It appears they fixed many of the problems with the original
design, but not all (why is the number of extents still imited?)
It also seems that RECFM=U for load modules is no longer supported,
you have to do something different.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Thomas Koenig on Sun Jul 7 08:30:41 2024

Thomas Koenig <tkoenig@netcologne.de> schrieb:

Stephen Fuld <SFuld@alumni.cmu.edu.invalid> schrieb:

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

I believe that's what they did with VSAM.

Agreed in the sense that VSAM replaced ISAM, but, and I am getting
beyond my depth here, I wasn't aware that PDSs used ISAM. I had
thought they were a thing unto themselves. Please correct me if I
am wrong. In any event, PDSs in their original form lasted beyond
the introduction of VSAM, or the PDS search assist functionality
wouldn't have been needed.

A PDS had a directory at the front followed by the members. The
directory had an entry per member with the name, the starting
location, and optional other stuff. The entries were in order by
member name, and packed into 256 byte records each of which had a
hardware key with the name of the last entry in the block. It searched
the PDS directory with the same kind of channel key search it did for
ISAM, leading to the performance issues Lynn described.

Yes, I don't disagree with any of that. But I got the impression from
your previous posts that IBM had replaced the search key fields of a
PDS with some kind of VSAM (i.e. b-tree or such) varient, as they did
with ISAM. If that is true I had never heard about it.

They introduced PDSE, see https://www.ibm.com/docs/en/zos-basic-skills?topic=sets-what-is-pdse
It appears they fixed many of the problems with the original
design, but not all (why is the number of extents still imited?)
It also seems that RECFM=U for load modules is no longer supported,
you have to do something different.

Reading https://share.confex.com/share/121/webprogram/Handout/Session14147/SHARE%20PDSE%20What%27s%20new%20in%202.1.pdf
it seems IBM really messed up PDSE the first time around,
introducing size limits on members which were not present in the
original PDS. Seems somebody didn't take backwards compatibility
too seriously, after all...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Thomas Koenig on Sun Jul 7 14:11:49 2024

Thomas Koenig wrote:

Thomas Koenig <tkoenig@netcologne.de> schrieb:

Stephen Fuld <SFuld@alumni.cmu.edu.invalid> schrieb:

John Levine wrote:

According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:

I believe that's what they did with VSAM.

Agreed in the sense that VSAM replaced ISAM, but, and I am

getting >>> > beyond my depth here, I wasn't aware that PDSs used
ISAM. I had >>> > thought they were a thing unto themselves. Please
correct me if I >>> > am wrong. In any event, PDSs in their original
form lasted beyond >>> > the introduction of VSAM, or the PDS search
assist functionality >>> > wouldn't have been needed.

A PDS had a directory at the front followed by the members. The
directory had an entry per member with the name, the starting
location, and optional other stuff. The entries were in order by
member name, and packed into 256 byte records each of which had a
hardware key with the name of the last entry in the block. It

searched >>> the PDS directory with the same kind of channel key
search it did for >>> ISAM, leading to the performance issues Lynn
described.

Yes, I don't disagree with any of that. But I got the impression

from >> your previous posts that IBM had replaced the search key
fields of a >> PDS with some kind of VSAM (i.e. b-tree or such)
varient, as they did >> with ISAM. If that is true I had never heard
about it.

They introduced PDSE, see https://www.ibm.com/docs/en/zos-basic-skills?topic=sets-what-is-pdse
It appears they fixed many of the problems with the original
design, but not all (why is the number of extents still imited?)
It also seems that RECFM=U for load modules is no longer supported,
you have to do something different.

Reading

https://share.confex.com/share/121/webprogram/Handout/Session14147/SHARE%20PDSE%20What%27s%20new%20in%202.1.pdf

it seems IBM really messed up PDSE the first time around,
introducing size limits on members which were not present in the
original PDS. Seems somebody didn't take backwards compatibility
too seriously, after all...

Ahhh! That (and your previous post) is the answer. This all happened
long after I was not involved,

Thank You Thomas.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Paul A. Clayton on Sun Jul 7 07:30:58 2024

"Paul A. Clayton" <paaronclayton@gmail.com> writes:

In theory, non-practicing patent licensors seem to make sense, similar
to ARM not making chips, but when the cost and risk to the single
patent holder is disproportionately small, patent trolling can be
profitable. (I suspect only part of the disparity comes from not
practicing; the U.S. legal system has significant weaknesses and
actual expertise is not easily communicated. My father, who worked for
AT&T, mentioned a lawyer who repeated sued AT&T who settled out of
court because such was cheaper than defending even against a claim
without basis.)

in 90s, there was semantic analysis of patents and found that something
like 30% of "computer/technology" patents were filed in other categories
using ambiguous wording ... "submarine" patents (unlikely to be found in
normal patent search) ... waiting for somebody that was making lots of
money that could be sued for patent infringement.

other trivia: around turn of century was doing some security chip work
for financial institution and was asked to work with patent boutique
legal firm, eventually had 50 draft (all assigned) patents and the legal
firm predicted that there would be over hundred before done ... some
executive looked at the filing costs and directed all the claims be
repackaged as nine patents. then the patent office came back and said
they were getting tired of these humongous patents where the filing fee
didn't even cover the cost of reading the patents ... and directed the
claims be repackaged as at least a couple dozen paents.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Joe Pfeiffer on Mon Jul 8 06:45:38 2024

On Thu, 04 Jul 2024 18:39:21 -0600, Joe Pfeiffer wrote:

Get an interrupt saying the teletype is ready, send a character, go back
to work, repeat.

But this is all still copying characters back and forth. What happened to
the idea of locate mode?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Mon Jul 8 07:43:28 2024

On Fri, 05 Jul 2024 15:35:50 -1000, Lynn Wheeler wrote:

The IMS group were complaining that RDBMS had twice the disk space (for
RDBMS index) and increased the number of disk I/Os by 4-5 times (for processing RDBMS index). Counter was that the RDBMS index significantly reduced the manual maintenance (compared to IMS).

Did IMS have a locate mode as well?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Lawrence D'Oliveiro on Mon Jul 8 08:01:22 2024

Lawrence D'Oliveiro <ldo@nz.invalid> writes:

Did IMS have a locate mode as well?

channel programs were built by filesystem library running as part of application or directly by applicatiion code .... and then executes
system call, EXCP/SVC0 to invoke the channel program. With MVS and
virtual memory its in application virtual address space.

QSAM the library code data is to/from library buffers and then
either copies to application buffers or "locate" mode passing
pointers in QSAM buffers.

For IMS has data buffer cache directly managed (aware of whether data record
is aleady in cache or must be read ... and/or is changed in cache and
must be written) ... also transaction log)

With transition to virtual memory, the channel programs passed to
EXCP/SVC0 now had virtual addresses and channel architecture required
real addresses ... so EXCP/SVC0 required making a copy of the passed
channel programs replacing virtual addresses with real addresses (as
well as pinning the associated virtual pages until I/O completes, code
to create channel program copies with real addresses and managing
virtual page pin/unpin initially done copy crafting virtual machine CP67 "CCWTRANS" into EXCP).

for priviliged apps that had fixed/pinned virtual pages for I/O buffers,
a new EXCPVR interface was built ... effectively the original EXCP w/o (CCWTRANS) channel program copying (and virtual page pinning/unpinning).

IMS "OSAM" and "VSAM" (OSAM may use QSAM https://www.ibm.com/docs/en/ims/15.3.0?topic=sets-using-osam-as-access-method IMS communicates with OSAM using OPEN, CLOSE, READ, and WRITE macros. In
turn, OSAM communicates with the I/O supervisor by using the I/O driver interface.

Data sets

An OSAM data set can be read by using either the BSAM or QSAM access method.

... snip ...

IMS Performance and Tuning guide page167 https://www.redbooks.ibm.com/redbooks/pdfs/sg247324.pdf
* EXCPVR=0
Prevents page fixing of the OSAM buffer pool. This is the correct choice
these days

... snip ..

START_Input/Output
https://en.wikipedia.org/wiki/Start_Input/Output
EXCPVR
https://en.wikipedia.org/wiki/Execute_Channel_Program_in_Real_Storage

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to Lawrence D'Oliveiro on Wed Jul 31 22:20:44 2024

On 7/31/2024 6:41 PM, Lawrence D'Oliveiro wrote:

On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:

As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell.

Would locate mode have helped with this?

No. The problem was that the PDS search was a linear search of records
on the disk drive i.e. typically multiple disk revolutions), and
furthermore, it required (until the fast PDS search came along) that the
host channel take action on each disk record checked, even the ones that
didn't match, including resending the search argument to the disk
controller.

This has nothing to do with locate mode.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Thu Aug 1 01:41:45 2024

On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:

As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell.

Would locate mode have helped with this?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Thu Aug 1 02:38:01 2024

On Mon, 8 Jul 2024 6:45:38 +0000, Lawrence D'Oliveiro wrote:

On Thu, 04 Jul 2024 18:39:21 -0600, Joe Pfeiffer wrote:

Get an interrupt saying the teletype is ready, send a character, go back
to work, repeat.

But this is all still copying characters back and forth. What happened
to
the idea of locate mode?

Even a touch typist typing at 120 words per minute would not benefit
from
LOCATE mode I/O--the input rate is just too slow.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Thu Aug 1 21:01:12 2024

According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:

On 7/31/2024 6:41 PM, Lawrence D'Oliveiro wrote:

On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:

As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell.

Would locate mode have helped with this?

No. The problem was that the PDS search was a linear search of records
on the disk drive i.e. typically multiple disk revolutions), and
furthermore, it required (until the fast PDS search came along) that the
host channel take action on each disk record checked, even the ones that >didn't match, including resending the search argument to the disk
controller.

This has nothing to do with locate mode.

He's made it painfully clear that he doesn't understand the relative
speed of CPUs and disks, or the costs of multiple I/O buffers on
systems with small memories, particularly back in the 1960s when this
stuff was being designed. Nor how records in COBOL data divisions were
designed so implementations could read and write file records directly
from and to the buffers, and the IOCS of the era enabled that in COBOL
and other languages.

Perhaps this would be a good time to stop taking the bait. I thought
this silly argument was over a month ago.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to John Levine on Thu Aug 1 22:58:16 2024

On 8/1/2024 2:01 PM, John Levine wrote:

According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:

On 7/31/2024 6:41 PM, Lawrence D'Oliveiro wrote:

On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:

As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell.

Would locate mode have helped with this?

No. The problem was that the PDS search was a linear search of records
on the disk drive i.e. typically multiple disk revolutions), and
furthermore, it required (until the fast PDS search came along) that the
host channel take action on each disk record checked, even the ones that
didn't match, including resending the search argument to the disk
controller.

This has nothing to do with locate mode.

He's made it painfully clear that he doesn't understand the relative
speed of CPUs and disks, or the costs of multiple I/O buffers on
systems with small memories, particularly back in the 1960s when this
stuff was being designed. Nor how records in COBOL data divisions were designed so implementations could read and write file records directly
from and to the buffers, and the IOCS of the era enabled that in COBOL
and other languages.

I believe you are right about that.

Perhaps this would be a good time to stop taking the bait.

Perhaps you are right again, but something about hope springs eternal,
and I couldn't resist pointing out the problems with the whole PDS
search mechanism.

I thought
this silly argument was over a month ago.

Again, you may be right. We'll see,

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Stephen Fuld on Sun Aug 11 03:09:07 2024

On Wed, 31 Jul 2024 22:20:44 -0700, Stephen Fuld wrote:

This has nothing to do with locate mode.

Why is it in this thread, then?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	547
Nodes:	16 (2 / 14)
Uptime:	71:30:34
Calls:	10,398
Files:	14,070
Messages:	6,417,621

Architectural implications of locate mode I/O

Who's Online

System Info