John Levine <johnl@taugh.com> schrieb:
The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.
Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?
Sending the next character to a teletype after the user program
fills a buffer and waiting for the next interrupt to tell you it's
ready makes sense, without a busy loop, makes sense anyway.
The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.
John Levine <johnl@taugh.com> schrieb:
The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.
Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?
Sending the next character to a teletype after the user program
fills a buffer and waiting for the next interrupt to tell you it's
ready makes sense, without a busy loop, makes sense anyway.
Thomas Koenig <tkoenig@netcologne.de> writes:
John Levine <johnl@taugh.com> schrieb:
The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.
Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?
Sending the next character to a teletype after the user program
fills a buffer and waiting for the next interrupt to tell you it's
ready makes sense, without a busy loop, makes sense anyway.
Although in the mainframe era, most terminals were block-mode
rather than character-by-character, which reduced the interrupt
frequency on the host (often via a front-end data communications
processor) at the expense of more logic in the terminal device.
After our recent silly arguments about locate vs move mode I/O, I got to thinking about what a computer needs for locate mode even to be
interesting.
John Levine <johnl@taugh.com> schrieb:
The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.
Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?
Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.
When doing IBM's HA/CMP and working with major RDBMS vendors on cluster scaleup in late 80s/early 90s, there was lots of references to POSIX light-weight threads ...
... and asynchronous I/O for RDBMS (with no buffer > copies) and the
RDBMS managing large record cache.
On the smallest
360s, the channel was implemented in CPU microcode. When running fast
devices like disks the channel used so much of the CPU that the
program stalled, but it was worth it to be compatible with faster
machines. Even then, disk seeks or tape rewinds or reading cards or
printing on printers let the CPU do useful work while the channel and
device did its thing.
Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.
On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):
Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.
The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.
This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.
I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).
John Levine <johnl@taugh.com> writes:
On the smallest
360s, the channel was implemented in CPU microcode. When running fast
devices like disks the channel used so much of the CPU that the
program stalled, but it was worth it to be compatible with faster
machines. Even then, disk seeks or tape rewinds or reading cards or
printing on printers let the CPU do useful work while the channel and
device did its thing.
This sounds very much like hardware multi-threading to me: The CPU had separate state for the channel and used its hardware for doing the
channel stuff when there was I/O to do, while running the non-channel
stuff the rest of the time, all without OS-level context switching.
The barrel processors implemented in the CDC 6600's PPs are another
variant of the same principle from around the same time, but using the
same hardware for such different tasks is a new twist.
Interestingly, this is one development that has not been repeated in microprocessors AFAIK. If they did not want to spend hardware on a
separate DMA device, they just let the software use polling of the I/O device. For the 8086 and 68000, I guess that patents may have
discouraged adopting this idea; when the patents ran out, they had established an ecosystem with separate DMA devices. And of course for
the early RISCs there was no way to do that in microcode.
IIRC some microcomputers (IBM PC I think) had dedicated central DMA processors (but not on the CPU chip at first IIRC), but these fell
into disuse soon when the I/O devices that do lots of I/O (like disk controllers) included their own DMA circuits. Having the DMA on the
I/O device wliminates the overhead of first requiring the bus for
getting the data from the I/O device, and then another bus cycle for
storing it into memory (or the other way round).
- anton
Interestingly, this is one development that has not been repeated in microprocessors AFAIK. If they did not want to spend hardware on a
separate DMA device, they just let the software use polling of the I/O device. For the 8086 and 68000, I guess that patents may have
discouraged adopting this idea; when the patents ran out, they had established an ecosystem with separate DMA devices. And of course for
the early RISCs there was no way to do that in microcode.
An alternative is each device has its own DMA controller
which is just a couple of counters.
On Wed, 03 Jul 2024 13:08:31 +0100
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:
On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):
Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.
The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.
This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.
I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).
That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65, it appears that despite advances KDF9 was not competitive to maximally configured 65 because of shortage of main memory.
mitchalsup@aol.com (MitchAlsup1) writes:
Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.
Jan1979, I had lots of use of an early engineering 4341 and was con'ed
into doing a (cdc6600) benchmark for national lab that was looking for
70 4341s for computer farm (sort of leading edge of the coming cluster supercomputing tsunami). Benchmark was fortran compute doing no I/O and executed with nothing else running.
4341: 36.21secs, 3031: 37.03secs, 158: 45.64secs
now integrated channel microcode ... 158 even with no I/O running was
still 45.64secs compared to the same hardware in 3031 but w/o channel microcode: 37.03secs.
Lynn Wheeler wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.
Jan1979, I had lots of use of an early engineering 4341 and was
con'ed into doing a (cdc6600) benchmark for national lab that was
looking for 70 4341s for computer farm (sort of leading edge of the
coming cluster supercomputing tsunami). Benchmark was fortran
compute doing no I/O and executed with nothing else running.
4341: 36.21secs, 3031: 37.03secs, 158: 45.64secs
Do you have data on how the CDC 6600 did ?
now integrated channel microcode ... 158 even with no I/O running
was still 45.64secs compared to the same hardware in 3031 but w/o
channel microcode: 37.03secs.
On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:
An alternative is each device has its own DMA controller
which is just a couple of counters.
The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For example,
PLX PCI to local bus bridges with bus mastering capability, like PCI
9080 costed non-trivially more than slave-only 9060.
Scott Lurndal wrote:
Thomas Koenig <tkoenig@netcologne.de> writes:
John Levine <johnl@taugh.com> schrieb:
The 709 introduced data channels in 1958 which allowed the CPU
to do other stuff while the channel did the I/O. Wikipedia says
the first I/O interrupt was on the NBS DYSEAC in 1954 but it's
hard to see how an I/O interrupt would be of much use before
channels. Once you had a channel, I/O buffering made sense,
have the channel read or write one area while you're working on
the other.
Not sure what you mean by "channel" in this context - hardware
channels like the /360 had, or any asynchronous I/O in general,
even without hardware support?
Sending the next character to a teletype after the user program
fills a buffer and waiting for the next interrupt to tell you it's
ready makes sense, without a busy loop, makes sense anyway.
Although in the mainframe era, most terminals were block-mode
rather than character-by-character, which reduced the interrupt
frequency on the host (often via a front-end data communications
processor) at the expense of more logic in the terminal device.
Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.
Michael S wrote:
On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:
An alternative is each device has its own DMA controller
which is just a couple of counters.
The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For example,
PLX PCI to local bus bridges with bus mastering capability, like PCI
9080 costed non-trivially more than slave-only 9060.
PCI is a different matter. I think they shot themselves in the foot.
That is because the PCI design used FAST TTL and was ridiculously
complex and had all sorts of unnecessary optional features like bridges.
To my eye the choice of FAST TTL looks wrong headed. They needed FAST
because they wanted to run at 33 MHz which is beyond LS TTL limit.
With a bus at 33 MHz and 4 bytes it superficially sounds like 133 MB/s.
But 33 MHz was too fast to decode or latch the address and data,
plus it multiplexes address and data, and took 5 cycles to do a
transfer.
So the bus actual data transfer rate was more like 133/5 = 26 MB/s.
I looked into PCI bus interface chips when it first came out and there
were just *TWO* manufacturers for them on the planet, and they charged $50,000 just to answer the phone, and you had to pay them to design a
custom chip for you even though it was supposed to be a standard design.
This all pushed the price of PCI cards way up so, for example,
an ISA bus modem card cost $50 but the exact same modem on PCI was $250.
No wonder most people stuck with ISA bus cards.
Whereas 74LS TTL was cheap and available from manufacturers everywhere.
I would have used 74LS TTL and done a straight 32-bit bus with no
options,
multiplexed address and data to keep down the connector pin count and
cost.
That could have run at 20 MHz which leaves 50 ns for address and data
decode and latch, and driven 8 bus loads with no bridges.
That gives 10 MT/s = 40 MB/s. Plus cheap and widely available.
John Levine <johnl@taugh.com> writes:
On the smallest
360s, the channel was implemented in CPU microcode. When running fast >>devices like disks the channel used so much of the CPU that the
program stalled, but it was worth it to be compatible with faster
machines. Even then, disk seeks or tape rewinds or reading cards or >>printing on printers let the CPU do useful work while the channel and >>device did its thing.
This sounds very much like hardware multi-threading to me: The CPU had >separate state for the channel and used its hardware for doing the
channel stuff when there was I/O to do, while running the non-channel
stuff the rest of the time, all without OS-level context switching.
Michael S wrote:
On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:
An alternative is each device has its own DMA controller
which is just a couple of counters.
The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For
example, PLX PCI to local bus bridges with bus mastering
capability, like PCI 9080 costed non-trivially more than slave-only
9060.
PCI is a different matter. I think they shot themselves in the foot.
That is because the PCI design used FAST TTL and was ridiculously
complex and had all sorts of unnecessary optional features like
bridges.
To my eye the choice of FAST TTL looks wrong headed. They needed FAST
because they wanted to run at 33 MHz which is beyond LS TTL limit.
With a bus at 33 MHz and 4 bytes it superficially sounds like 133
MB/s. But 33 MHz was too fast to decode or latch the address and data,
plus it multiplexes address and data, and took 5 cycles to do a
transfer. So the bus actual data transfer rate was more like 133/5 =
26 MB/s.
I looked into PCI bus interface chips when it first came out and there
were just *TWO* manufacturers for them on the planet, and they charged $50,000 just to answer the phone, and you had to pay them to design a
custom chip for you even though it was supposed to be a standard
design. This all pushed the price of PCI cards way up so, for example,
an ISA bus modem card cost $50 but the exact same modem on PCI was
$250. No wonder most people stuck with ISA bus cards.
Whereas 74LS TTL was cheap and available from manufacturers
everywhere. I would have used 74LS TTL and done a straight 32-bit bus
with no options, multiplexed address and data to keep down the
connector pin count and cost. That could have run at 20 MHz which
leaves 50 ns for address and data decode and latch, and driven 8 bus
loads with no bridges. That gives 10 MT/s = 40 MB/s. Plus cheap and
widely available.
On Wed, 03 Jul 2024 13:08:31 +0100
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:
On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):
Once you recognize that I/O is eating up your precious CPU, and you
get to the point you are willing to expend another fixed programmed
device to make the I/O burden manageable, then you basically have
CDC 6600 Peripheral Processors, programmed in code or microcode.
The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.
This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.
I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).
That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65, it >appears that despite advances KDF9 was not competitive to maximally >configured 65 because of shortage of main memory.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 03 Jul 2024 13:08:31 +0100
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:
you >> > get to the point you are willing to expend another fixedOn 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):
Once you recognize that I/O is eating up your precious CPU, and
programmed >> > device to make the I/O burden manageable, then you
basically have >> > CDC 6600 Peripheral Processors, programmed in
code or microcode. >>
The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.
This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.
I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).
That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65,
it appears that despite advances KDF9 was not competitive to
maximally configured 65 because of shortage of main memory.
The contemporaneous Burroughs B3500 I/O subsystem
fully supported asynchronous DMA transfers with no
CPU intervention.
Scott Lurndal wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 03 Jul 2024 13:08:31 +0100you >> > get to the point you are willing to expend another fixed
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:
On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):
Once you recognize that I/O is eating up your precious CPU, and
programmed >> > device to make the I/O burden manageable, then you
basically have >> > CDC 6600 Peripheral Processors, programmed in
code or microcode. >>
The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.
This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.
I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).
That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65,
it appears that despite advances KDF9 was not competitive to
maximally configured 65 because of shortage of main memory.
The contemporaneous Burroughs B3500 I/O subsystem
fully supported asynchronous DMA transfers with no
CPU intervention.
snipped description
Yes, that is an example of the kind of thing to which I was referring
in my response to Mitch's post. A question. Was all of this pure
hardware, or was it microcoded?
Scott Lurndal wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 03 Jul 2024 13:08:31 +0100you >> > get to the point you are willing to expend another fixed
Bill Findlay <findlaybill@blueyonder.co.uk> wrote:
On 2 Jul 2024, MitchAlsup1 wrote
(in article<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>):
Once you recognize that I/O is eating up your precious CPU, and
programmed >> > device to make the I/O burden manageable, then you
basically have >> > CDC 6600 Peripheral Processors, programmed in
code or microcode. >>
The EE KDF9 (~1960) allowed up to 16 connected devices at a time.
They all did DMA, interrupting only at the end of the transfer.
Slow devices accessed the core store for each character,
fast devices did so for each word.
This was mediated by one of the KDF9's many state machines,
I/O Control, which multiplexed core requests from devices
and interrupted the CPU at the end of a transfer
if the transfer had been initiated by a program
of higherCPU priority than the one currently running,
or if there was a possibility of priority inversion.
I/O Control also autonomously re-issued an I/O command
to a device that reported a parity error
if that device was capable of retrying the transfer
(e.g. MT controllers could backspace a block and re-read).
That sounds quite advanced.
But when I try to compare with contemporaries, like S/360 Model 65,
it appears that despite advances KDF9 was not competitive to
maximally configured 65 because of shortage of main memory.
The contemporaneous Burroughs B3500 I/O subsystem
fully supported asynchronous DMA transfers with no
CPU intervention.
snipped description
Yes, that is an example of the kind of thing to which I was referring
in my response to Mitch's post. A question. Was all of this pure
hardware, or was it microcoded?
IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.
https://patents.google.com/patent/US3812475A/en
John Levine <johnl@taugh.com> schrieb:
IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.
https://patents.google.com/patent/US3812475A/en
What a monster.
I've written long patents myself, but this one surely takes the
biscuit.
On Wed, 03 Jul 2024 13:34:39 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:
Michael S wrote:
On Wed, 03 Jul 2024 08:44:01 -0400PCI is a different matter. I think they shot themselves in the foot.
EricP <ThatWouldBeTelling@thevillage.com> wrote:
An alternative is each device has its own DMA controllerThe main cost is not counters, but bus mastering logic.
which is just a couple of counters.
For PCI it was non-trivial cost even as late as year 2000. For
example, PLX PCI to local bus bridges with bus mastering
capability, like PCI 9080 costed non-trivially more than slave-only
9060.
That is because the PCI design used FAST TTL and was ridiculously
complex and had all sorts of unnecessary optional features like
bridges.
Bridges were needed for the high-end. How else would you go over 4 or
5 slots with crappy edge conector of standard PCI? How else would you
go over 8-10 slots even with much much better Compact PCI connectors?
Bridges do not work very well in read direction, but in write direction
they do not impact performance at all.
To my eye the choice of FAST TTL looks wrong headed. They needed FAST
because they wanted to run at 33 MHz which is beyond LS TTL limit.
With a bus at 33 MHz and 4 bytes it superficially sounds like 133
MB/s. But 33 MHz was too fast to decode or latch the address and data,
plus it multiplexes address and data, and took 5 cycles to do a
transfer. So the bus actual data transfer rate was more like 133/5 =
26 MB/s.
But bursts work as advertaized.
We designed many boards PCI 32bx33MHz that sustained over 90 MB/s in
host memory to device direction and over 100 MB/s in device to host
memory.
Few still in produdction, although we will eventually move awayy from
this architectore for reason unrelated to system bus.
I looked into PCI bus interface chips when it first came out and there
were just *TWO* manufacturers for them on the planet, and they charged
$50,000 just to answer the phone, and you had to pay them to design a
custom chip for you even though it was supposed to be a standard
design. This all pushed the price of PCI cards way up so, for example,
an ISA bus modem card cost $50 but the exact same modem on PCI was
$250. No wonder most people stuck with ISA bus cards.
Sounds like very early days.
Whereas 74LS TTL was cheap and available from manufacturers
everywhere. I would have used 74LS TTL and done a straight 32-bit bus
with no options, multiplexed address and data to keep down the
connector pin count and cost. That could have run at 20 MHz which
leaves 50 ns for address and data decode and latch, and driven 8 bus
loads with no bridges. That gives 10 MT/s = 40 MB/s. Plus cheap and
widely available.
Pay attention that nothing of what you wrote above has anything to do
with difference between bus-mastering PCI devices and slave-only PCI
devices.
An Historical Perspective::
EricP wrote:
Michael S wrote:
On Wed, 03 Jul 2024 08:44:01 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:
An alternative is each device has its own DMA controller
which is just a couple of counters.
The main cost is not counters, but bus mastering logic.
For PCI it was non-trivial cost even as late as year 2000. For example,
PLX PCI to local bus bridges with bus mastering capability, like PCI
9080 costed non-trivially more than slave-only 9060.
PCI is a different matter. I think they shot themselves in the foot.
That is because the PCI design used FAST TTL and was ridiculously
complex and had all sorts of unnecessary optional features like bridges.
I don't think it was as much "shot themselves in the foot" as it was
not looking forward enough. CPUs had just dropped from 5.0V to 3.3V
and few peripherals were going 3.3--yet.
There were no real "popcorn" parts on 3.3V. CMOS was gradually taking
over, but was "essentially" compatible voltage wise with TTL.
To my eye the choice of FAST TTL looks wrong headed. They needed FAST
because they wanted to run at 33 MHz which is beyond LS TTL limit.
Was also faster than popcorn CMOS of the day.
With a bus at 33 MHz and 4 bytes it superficially sounds like 133 MB/s.
But 33 MHz was too fast to decode or latch the address and data,
plus it multiplexes address and data, and took 5 cycles to do a
transfer.
So the bus actual data transfer rate was more like 133/5 = 26 MB/s.
Welcome to "back when computers were hard".
I looked into PCI bus interface chips when it first came out and there
were just *TWO* manufacturers for them on the planet, and they charged
$50,000 just to answer the phone, and you had to pay them to design a
custom chip for you even though it was supposed to be a standard design.
When PC shipped in the thousands per month this was the way things were.
When PC started to ship hundreds of thousands per months things changed (early-mid 90s).
This all pushed the price of PCI cards way up so, for example,
an ISA bus modem card cost $50 but the exact same modem on PCI was $250.
No wonder most people stuck with ISA bus cards.
Exacerbating the above.
Whereas 74LS TTL was cheap and available from manufacturers everywhere.
I would have used 74LS TTL and done a straight 32-bit bus with no
options,
multiplexed address and data to keep down the connector pin count and
cost.
That could have run at 20 MHz which leaves 50 ns for address and data
decode and latch, and driven 8 bus loads with no bridges.
That gives 10 MT/s = 40 MB/s. Plus cheap and widely available.
It would take "too many" TTL parts to implement a small form factor initerface, so integration was needed.
This was back just after the "bus wars" era where many big players
were trying to grab control of the PC market with their proprietary
(and patented) next generation bus. PCI was hoped to resolve this
and provide that standard, open market, but failed because it didn't
address what card suppliers and customers wanted.
On Thu, 04 Jul 2024 13:33:36 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:
This was back just after the "bus wars" era where many big players
were trying to grab control of the PC market with their proprietary
(and patented) next generation bus. PCI was hoped to resolve this
and provide that standard, open market, but failed because it didn't
address what card suppliers and customers wanted.
PCI didn't fail. It was a stunniing success.
With emergence of PCI anything else either died at spot (IBM
Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
and off-springs).
On Thu, 04 Jul 2024 13:33:36 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:
This was back just after the "bus wars" era where many big players
were trying to grab control of the PC market with their proprietary
(and patented) next generation bus. PCI was hoped to resolve this
and provide that standard, open market, but failed because it didn't
address what card suppliers and customers wanted.
PCI didn't fail. It was a stunniing success.
With emergence of PCI anything else either died at spot (IBM
Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
and off-springs).
Thomas Koenig wrote:
John Levine <johnl@taugh.com> schrieb:
IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.
https://patents.google.com/patent/US3812475A/en
What a monster.
I've written long patents myself, but this one surely takes the
biscuit.
The amalgamation of the figures and the placement of the figures
via the figure placement "figure" enable one to directly implement
the device in logic.
IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.
https://patents.google.com/patent/US3812475A/en
But the sheer number of claims, 91, with around than half of them
indpendent (but quite a few formulated as "in combination", so there
may have been some dependency to other claims hidden in there...
must have taken the competition quite some time to figure out
what was actually covered, and if their own designs fell under
that patent or not.
And then it was granted after ~ 20 years, and continued to be
valid for another ~ 20 - US patent law used to be weird.
MitchAlsup1 <mitchalsup@aol.com> schrieb:
Thomas Koenig wrote:
John Levine <johnl@taugh.com> schrieb:
IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but
not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.
https://patents.google.com/patent/US3812475A/en
What a monster.
I've written long patents myself, but this one surely takes the
biscuit.
The amalgamation of the figures and the placement of the figures
via the figure placement "figure" enable one to directly implement
the device in logic.
That is, of course, very nice.
But the sheer number of claims, 91, with around than half of them
indpendent (but quite a few formulated as "in combination", so there
may have been some dependency to other claims hidden in there...
must have taken the competition quite some time to figure out
what was actually covered, and if their own designs fell under
that patent or not.
And then it was granted after ~ 20 years, and continued to be
valid for another ~ 20 - US patent law used to be weird.
According to Thomas Koenig <tkoenig@netcologne.de>:
IBM patented the 709's channel: US Patent 3,812,475 filed in 1957 but >>>>> not granted until 1974. The patent is 488 pages long including 409
pages of figures, 130 columns of narrative text, and 91 claims.
https://patents.google.com/patent/US3812475A/en
But the sheer number of claims, 91, with around than half of them >>indpendent (but quite a few formulated as "in combination", so there
may have been some dependency to other claims hidden in there...
must have taken the competition quite some time to figure out
what was actually covered, and if their own designs fell under
that patent or not.
And then it was granted after ~ 20 years, and continued to be
valid for another ~ 20 - US patent law used to be weird.
It is unusual for a patent to take that long without either the
inventor deliberately delaying it with endless amendments or it being classified, neither of which seems relevant here.
You can't challenge other people for violating a patent until it's
issued, and by 1974 channels were rather old news. I never heard of
IBM enforcing it. They probably put it in the patent pool they cross
licensed to other computer makers.
"IBM's Early Computers" says almost nothing about channels other than
that they were invented for the 709 and added to the last version of
the 705.
The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.
Michael S wrote:
On Thu, 04 Jul 2024 13:33:36 -0400
EricP <ThatWouldBeTelling@thevillage.com> wrote:
This was back just after the "bus wars" era where many big playersPCI didn't fail. It was a stunniing success.
were trying to grab control of the PC market with their proprietary
(and patented) next generation bus. PCI was hoped to resolve this
and provide that standard, open market, but failed because it didn't
address what card suppliers and customers wanted.
With emergence of PCI anything else either died at spot (IBM
Microchannel, Compaq-backed EISA) or became tiny high-cost niche (VME
and off-springs).
Also vying for attention in the wars were Multibus I & II (pushed by Intel), Futurebus (pushed by DEC), IIRC Apple had its own derivative of Futurebus
but different (of course) so you had to buy Apple devices,
NuBus, FASTBUS, Q-Bus.
PCI failed from the view that it was supposed to replace the 16-bit ISA
bus on PC's, but because of PCI card cost customers demanded that motherboards continue to support ISA. Also plug-n-play made ISA board
support a lot easier. Motherboards had at least to support AGP,
the PCI variant for graphics cards so they couldn't eliminate PCI.
So PC motherboards and device manufacturers wound up supporting both.
Not what anyone planned or wanted.
Where PCI succeeded I suppose is, as you point out, outside the PC market
it killed off all the others. But that left those systems tied to the
higher cost cards, increasing their cost relative to PC's.
On Tue, 02 Jul 2024 17:36:50 -1000, Lynn Wheeler wrote:
When doing IBM's HA/CMP and working with major RDBMS vendors on cluster
scaleup in late 80s/early 90s, there was lots of references to POSIX
light-weight threads ...
Threads were all the rage in the 1990s. People were using them for everything. One language (Java) absorbed threading right into its core DNA (where is the locking API? Oh, it’s attached to the base “Object” type itself!).
People backed off a bit after that. Nowadays we see a revival of the “coroutine” idea, where preemption only happens at explicit “await” points. For non-CPU-intensive workloads, this is much easier to cope with.
... and asynchronous I/O for RDBMS (with no buffer > copies) and the
RDBMS managing large record cache.
This is why POSIX has the disk-oriented “aio” API, for the diehard DBMS folks. Linux also added “io_uring”, for high performance but not disk- specific I/O.
John Levine <johnl@taugh.com> writes:
The 709 introduced data channels in 1958 which allowed the CPU to do
other stuff while the channel did the I/O. Wikipedia says the first
I/O interrupt was on the NBS DYSEAC in 1954 but it's hard to see how
an I/O interrupt would be of much use before channels. Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.
The day the CPU became faster than a teletype (or any other IO device
you care to name) interrupts became useful. Get an interrupt saying the teletype is ready, send a character, go back to work, repeat.
... Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.
The day the CPU became faster than a teletype (or any other IO device
you care to name) interrupts became useful. Get an interrupt saying the >teletype is ready, send a character, go back to work, repeat.
By putting most of the logic into the printer controller, the 1403 was
not just faster, but only took a small fraction of the CPU so the
whole system could do more work to keep the printer printing.
other trivia: when I transfer to San Jose, I got to wander around
datacenters in silicon valley, including disk engineering & product test (bldg14&15) across the street. They were doing prescheduled, 7x24, stand-alone mainframe testing. They mentioned they had recently tried
MVS, but it had 15min mean-time-between-failure, requiring manual re-ipl/reboot in that environment. I offered to rewrite I/O supervisor
to make it bullet-proof and never fail enabling any amount of on-demand, concurrent testing (greatly improving productivity). Downside was they
would point their finger at me whenever they had problem and I was
spending increasing amount of time diagnosing their hardware problems.
According to Joe Pfeiffer <pfeiffer@cs.nmsu.edu>:
... Once you had a
channel, I/O buffering made sense, have the channel read or write one
area while you're working on the other.
The day the CPU became faster than a teletype (or any other IO device
you care to name) interrupts became useful. Get an interrupt saying the >>teletype is ready, send a character, go back to work, repeat.
That's certainly the model that DEC used in the PDP-1 and their other
minis. Lightweight interrupts and simple device controllers worked
for them. But the tradeoffs can be a lot more complicated.
Let us turn back to the late, not very lamented IBM 1130 mini. It
usually came with an 1132 printer which printed about 100
lines/minute. A drum rotated behind the paper with 48 rows of
characters, each row being all the same character. In front of the
paper was the ribbon and a row of solenoid driven hammers.
When the 1130 wanted to print a line, it started the printer, which
would then tell it what the upcoming character was on the drum. The
computer then had less than 10ms to scan the line of characters to be
printed and put a bit map saying which solenoids to fire into fixed
locations in low memory that the printer then fetched using DMA.
Repeat until all of the characters were printed, and tell the printer
to advance the paper.
Given the modest speed of the 1130, while it was printing a line it
couldn't do anything else. But it was even worse than that. There were
two models of 1130, fast and slow, with the difference being a delay
circuit. The slow model couldn't produce the bitmaps fast enough, so
there was a "print mode" that disabled the delay circuit while it was printing. As you might expect, students quickly figured out how to put
their 1130s into print mode all the time.
The printer interrupted after a paper move was complete, giving the
computer some chance to compute the next line to print in the
meantime. To skip to the top of the next page or other paper motion,
it told the printer to start moving the paper, and a hole in which row
in the carriage control tape (look it up) to wait for. When the hole
came around, the printer interrupted the CPU which then told the
printer to stop the paper.
The other printer was a 1403 which had 300 and 600 LPM models. Its
print mechanism was sort of similar, a horizontal chain of characters spinning behind the paper, but that made the hammer management harder
since what character was at what position changed every character
time. But that wasn't the CPU's problem. The 1403 used its own unique character code probably related to the layout of the print chain, so
the CPU translated the line into printer code, stored the result in a
buffer, and then sent a command to the printer telling it to print the buffer. The printer printed, then interrupted, at which point the CPU
told it to either space one line or skip to row N in the carriage
control tape, again interrupting when done.
By putting most of the logic into the printer controller, the 1403 was
not just faster, but only took a small fraction of the CPU so the
whole system could do more work to keep the printer printing.
The point of this long anecdote is that you don't just want an
interrupt when the CPU is a little faster than the device. At least in
that era you wanted to offload as much work as possible so the CPU
could keep the device going and balance the speed of the CPU and
the devices.
As a final note, keep in mind when you look at the 400 page patent on
the 709's channel that the logic was built entirely out of vacuum
tubes, and was not a lot less complex than the computer to which it
was attached. A basic 709 rented for $10K/mo (about $100K now) and
each channel was $3600/mo ($37K now). But the speed improvement
was worth it.
John Levine <johnl@taugh.com> writes:
By putting most of the logic into the printer controller, the 1403
was not just faster, but only took a small fraction of the CPU so
the whole system could do more work to keep the printer printing.
360 "CKD DASD" and multi-track search trade-off.
Things would have been much better if they simply used some sort of
"table of contents" or index at the start of the PDS, read it in, then
did an in memory search. Even on small memory machines, if you had a
small sized index block and used something like a B-tree of them, it
would have been faster.
As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell. Doing a linear search, and worse yet, doing it on a device that
was slower than main memory, and tying up the disk controller and
channel to do it. It wasn't even sort of addressed until the early
1990s with the "fast PDS search" feature in the 3990 controller. The searches still took the same amount of elapsed time, but the key field comparison was done in the controller and it only returned status when
it found a match (or end of the extent), which freed up the channel.
Things would have been much better if they simply used some sort of
"table of contents" or index at the start of the PDS, read it in, then
did an in memory search. Even on small memory machines, if you had a
small sized index block and used something like a B-tree of them, it
would have been faster.
According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
Things would have been much better if they simply used some sort of
"table of contents" or index at the start of the PDS, read it in,
then did an in memory search. Even on small memory machines, if
you had a small sized index block and used something like a B-tree
of them, it would have been faster.
I believe that's what they did with VSAM.
I believe that's what they did with VSAM.
Agreed in the sense that VSAM replaced ISAM, but, and I am getting
beyond my depth here, I wasn't aware that PDSs used ISAM. I had
thought they were a thing unto themselves. Please correct me if I am
wrong. In any event, PDSs in their original form lasted beyond the >introduction of VSAM, or the PDS search assist functionality wouldn't
have been needed.
According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
I believe that's what they did with VSAM.
Agreed in the sense that VSAM replaced ISAM, but, and I am getting
beyond my depth here, I wasn't aware that PDSs used ISAM. I had
thought they were a thing unto themselves. Please correct me if I
am wrong. In any event, PDSs in their original form lasted beyond
the introduction of VSAM, or the PDS search assist functionality
wouldn't have been needed.
A PDS had a directory at the front followed by the members. The
directory had an entry per member with the name, the starting
location, and optional other stuff. The entries were in order by
member name, and packed into 256 byte records each of which had a
hardware key with the name of the last entry in the block. It searched
the PDS directory with the same kind of channel key search it did for
ISAM, leading to the performance issues Lynn described.
John Levine wrote:
According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
I believe that's what they did with VSAM.
Agreed in the sense that VSAM replaced ISAM, but, and I am getting
beyond my depth here, I wasn't aware that PDSs used ISAM. I had
thought they were a thing unto themselves. Please correct me if I
am wrong. In any event, PDSs in their original form lasted beyond
the introduction of VSAM, or the PDS search assist functionality
wouldn't have been needed.
A PDS had a directory at the front followed by the members. The
directory had an entry per member with the name, the starting
location, and optional other stuff. The entries were in order by
member name, and packed into 256 byte records each of which had a
hardware key with the name of the last entry in the block. It searched
the PDS directory with the same kind of channel key search it did for
ISAM, leading to the performance issues Lynn described.
Yes, I don't disagree with any of that. But I got the impression from
your previous posts that IBM had replaced the search key fields of a
PDS with some kind of VSAM (i.e. b-tree or such) varient, as they did
with ISAM. If that is true I had never heard about it.
Stephen Fuld <SFuld@alumni.cmu.edu.invalid> schrieb:
John Levine wrote:
According to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
I believe that's what they did with VSAM.
Agreed in the sense that VSAM replaced ISAM, but, and I am getting
beyond my depth here, I wasn't aware that PDSs used ISAM. I had
thought they were a thing unto themselves. Please correct me if I
am wrong. In any event, PDSs in their original form lasted beyond
the introduction of VSAM, or the PDS search assist functionality
wouldn't have been needed.
A PDS had a directory at the front followed by the members. The
directory had an entry per member with the name, the starting
location, and optional other stuff. The entries were in order by
member name, and packed into 256 byte records each of which had a
hardware key with the name of the last entry in the block. It searched
the PDS directory with the same kind of channel key search it did for
ISAM, leading to the performance issues Lynn described.
Yes, I don't disagree with any of that. But I got the impression from
your previous posts that IBM had replaced the search key fields of a
PDS with some kind of VSAM (i.e. b-tree or such) varient, as they did
with ISAM. If that is true I had never heard about it.
They introduced PDSE, see https://www.ibm.com/docs/en/zos-basic-skills?topic=sets-what-is-pdse
It appears they fixed many of the problems with the original
design, but not all (why is the number of extents still imited?)
It also seems that RECFM=U for load modules is no longer supported,
you have to do something different.
Thomas Koenig <tkoenig@netcologne.de> schrieb:
Stephen Fuld <SFuld@alumni.cmu.edu.invalid> schrieb:
John Levine wrote:
getting >>> > beyond my depth here, I wasn't aware that PDSs usedAccording to Stephen Fuld <SFuld@alumni.cmu.edu.invalid>:
I believe that's what they did with VSAM.
Agreed in the sense that VSAM replaced ISAM, but, and I am
ISAM. I had >>> > thought they were a thing unto themselves. Please
correct me if I >>> > am wrong. In any event, PDSs in their original
form lasted beyond >>> > the introduction of VSAM, or the PDS search
assist functionality >>> > wouldn't have been needed.
searched >>> the PDS directory with the same kind of channel key
A PDS had a directory at the front followed by the members. The
directory had an entry per member with the name, the starting
location, and optional other stuff. The entries were in order by
member name, and packed into 256 byte records each of which had a
hardware key with the name of the last entry in the block. It
search it did for >>> ISAM, leading to the performance issues Lynn
described.
from >> your previous posts that IBM had replaced the search keyYes, I don't disagree with any of that. But I got the impression
fields of a >> PDS with some kind of VSAM (i.e. b-tree or such)
varient, as they did >> with ISAM. If that is true I had never heard
about it.
They introduced PDSE, see https://www.ibm.com/docs/en/zos-basic-skills?topic=sets-what-is-pdse
It appears they fixed many of the problems with the original
design, but not all (why is the number of extents still imited?)
It also seems that RECFM=U for load modules is no longer supported,
you have to do something different.
Reading
it seems IBM really messed up PDSE the first time around,
introducing size limits on members which were not present in the
original PDS. Seems somebody didn't take backwards compatibility
too seriously, after all...
In theory, non-practicing patent licensors seem to make sense, similar
to ARM not making chips, but when the cost and risk to the single
patent holder is disproportionately small, patent trolling can be
profitable. (I suspect only part of the disparity comes from not
practicing; the U.S. legal system has significant weaknesses and
actual expertise is not easily communicated. My father, who worked for
AT&T, mentioned a lawyer who repeated sued AT&T who settled out of
court because such was cheaper than defending even against a claim
without basis.)
Get an interrupt saying the teletype is ready, send a character, go back
to work, repeat.
The IMS group were complaining that RDBMS had twice the disk space (for
RDBMS index) and increased the number of disk I/Os by 4-5 times (for processing RDBMS index). Counter was that the RDBMS index significantly reduced the manual maintenance (compared to IMS).
Did IMS have a locate mode as well?
On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:
As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell.
Would locate mode have helped with this?
As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell.
On Thu, 04 Jul 2024 18:39:21 -0600, Joe Pfeiffer wrote:
Get an interrupt saying the teletype is ready, send a character, go back
to work, repeat.
But this is all still copying characters back and forth. What happened
to
the idea of locate mode?
On 7/31/2024 6:41 PM, Lawrence D'Oliveiro wrote:
On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:
As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell.
Would locate mode have helped with this?
No. The problem was that the PDS search was a linear search of records
on the disk drive i.e. typically multiple disk revolutions), and
furthermore, it required (until the fast PDS search came along) that the
host channel take action on each disk record checked, even the ones that >didn't match, including resending the search argument to the disk
controller.
This has nothing to do with locate mode.
According to Stephen Fuld <sfuld@alumni.cmu.edu.invalid>:
On 7/31/2024 6:41 PM, Lawrence D'Oliveiro wrote:
On Sat, 6 Jul 2024 06:15:56 -0000 (UTC), Stephen Fuld wrote:
As you posted below, the whole PDS search stuff could easily be a
disaster. Even with moremodest sized PDSs, it was inefficient has
hell.
Would locate mode have helped with this?
No. The problem was that the PDS search was a linear search of records
on the disk drive i.e. typically multiple disk revolutions), and
furthermore, it required (until the fast PDS search came along) that the
host channel take action on each disk record checked, even the ones that
didn't match, including resending the search argument to the disk
controller.
This has nothing to do with locate mode.
He's made it painfully clear that he doesn't understand the relative
speed of CPUs and disks, or the costs of multiple I/O buffers on
systems with small memories, particularly back in the 1960s when this
stuff was being designed. Nor how records in COBOL data divisions were designed so implementations could read and write file records directly
from and to the buffers, and the IOCS of the era enabled that in COBOL
and other languages.
Perhaps this would be a good time to stop taking the bait.
I thought
this silly argument was over a month ago.
This has nothing to do with locate mode.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 547 |
Nodes: | 16 (2 / 14) |
Uptime: | 71:30:34 |
Calls: | 10,398 |
Files: | 14,070 |
Messages: | 6,417,621 |