Forum: >>> Magnum BBS <<<

Re: control co-processor

From Al Kossow@21:1/5 to Robert Finch on Mon May 5 03:01:12 2025

On 5/4/25 9:40 PM, Robert Finch wrote:

The CPU books I have do not cover a test/debug interface for the processor, so I am winging it a bit.

The ones I have seen are based on scan chains, going back to the "muffler" in the Xerox Dorado

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Al Kossow on Mon May 5 13:46:41 2025

Al Kossow <aek@bitsavers.org> writes:

On 5/4/25 9:40 PM, Robert Finch wrote:

The CPU books I have do not cover a test/debug interface for the processor, so I am winging it a bit.

The ones I have seen are based on scan chains, going back to the "muffler" in the Xerox Dorado

Even state-of-the-art CPUs today commonly use scan-chains (via JTAG) for debuggin.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Mon May 5 10:02:58 2025

Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
for debuggin.

Is there some blog somewhere that explains how scan-chains work (not
how they're used, but how they're implemented inside the CPU)?
Intuitively they sound very costly to me, because of things like the
need to run extra wires all over the place. I'm obviously
missing something.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Stefan Monnier on Mon May 5 16:19:21 2025

Stefan Monnier <monnier@iro.umontreal.ca> writes:

Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
for debuggin.

Is there some blog somewhere that explains how scan-chains work (not
how they're used, but how they're implemented inside the CPU)?
Intuitively they sound very costly to me, because of things like the
need to run extra wires all over the place. I'm obviously
missing something.

Actually, you're not far off. It's a serial shift chain which is shifted one-bit at a time to capture flop states. Each chain is a single wire;
a chip may have a few dozen individual shift chains.

https://www.design-reuse.com/articles/48331/scan-chains-pnr-outlook.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stefan Monnier on Tue May 6 22:17:40 2025

On Mon, 5 May 2025 14:02:58 +0000, Stefan Monnier wrote:

Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
for debuggin.

Is there some blog somewhere that explains how scan-chains work (not
how they're used, but how they're implemented inside the CPU)?
Intuitively they sound very costly to me, because of things like the
need to run extra wires all over the place. I'm obviously
missing something.

To a good first order:: designers don't think of scan paths, we just
select scannable flip-flops, and the tools build the scan paths for
us. At (or near) tape-out, we have the scan path tools emit the scan path-of-the-moment so we can adjust the SW that will drive the scan
paths at testing and debug.

Large blocks (such as a whole core or L2) have their own scan path
that will be individually addressable at the scan path controller
to keep the paths down to several Killo-bits each.

For the most part this is entirely automated--abut all the designers
ever do is break huge scan paths into several not-so-huge scan paths.

Design for test engineers use the simulators to figure out the bit
captured by any-random flip-flop via the Verilog model and use said
model to build many tests and then examine what happened to verify
that the design is operating as expected (or not). DFT engineers
use the simulators at least as much as the designers themselves.

-------------

In the distant path, I had the select lines (the heavy loads that
cross the data path and select things for it to do) captured at
the other end of the data path with a skewable scan clock so we
could (essentially) see the timing of the select lines after
crossing the data path. This made it possible to use the scan
path as an oscilloscope.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to All on Tue May 6 19:58:00 2025

MitchAlsup1 wrote:

In the distant path, I had the select lines (the heavy loads that
cross the data path and select things for it to do) captured at
the other end of the data path with a skewable scan clock so we
could (essentially) see the timing of the select lines after
crossing the data path. This made it possible to use the scan
path as an oscilloscope.

Was that the AMD with the 6 phase clock you mentioned before?
I was wondering about how boundary scan would work with that.
I suppose you'd have 6 separate scan paths to capture each
phases logic states.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Tue May 6 23:12:08 2025

Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
for debuggin.

Is there some blog somewhere that explains how scan-chains work (not
how they're used, but how they're implemented inside the CPU)?
Intuitively they sound very costly to me, because of things like the
need to run extra wires all over the place. I'm obviously
missing something.

Actually, you're not far off. It's a serial shift chain which is shifted one-bit at a time to capture flop states. Each chain is a single wire;
a chip may have a few dozen individual shift chains. https://www.design-reuse.com/articles/48331/scan-chains-pnr-outlook.html

Thanks. Wow. So it is really that bad, huh?
I also liked the note about speed limits and power consumption, how
shifting a state (in or out) causes (almost) all the flip-flops to
change state at each cycle, thus leading to very high power consumption.

What's the approximate cost of those scan chains. I.e. if we were to
take an existing working design and replace all the "flip-flop with
scan-chain" with "plain flip-flops", how much smaller would the
resulting chip be, how much faster could it run, and how much less power
could it consume?

I assume the cost in terms of power consumption is small because in
normal use, the scan-chain part stays completely stable so that barring
leakage it should not consume any power, save for the indirect costs
like the need to move the other bits over greater distances when
the extra wires of the scan chains get in the way.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Al Kossow@21:1/5 to Stefan Monnier on Tue May 6 21:08:41 2025

On 5/6/25 8:12 PM, Stefan Monnier wrote:

What's the approximate cost of those scan chains. I.e. if we were to
take an existing working design and replace all the "flip-flop with scan-chain" with "plain flip-flops", how much smaller would the
resulting chip be, how much faster could it run, and how much less power could it consume?

I wouldn't matter because it wouldn't be testable in production.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Wed May 7 10:58:52 2025

Al Kossow [2025-05-06 21:08:41] wrote:

On 5/6/25 8:12 PM, Stefan Monnier wrote:

What's the approximate cost of those scan chains. I.e. if we were to
take an existing working design and replace all the "flip-flop with
scan-chain" with "plain flip-flops", how much smaller would the
resulting chip be, how much faster could it run, and how much less power
could it consume?

I wouldn't matter because it wouldn't be testable in production.

Of course, but I'd still like to know. 🙂

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stefan Monnier on Wed May 7 16:57:32 2025

On Wed, 7 May 2025 3:12:08 +0000, Stefan Monnier wrote:

Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
for debuggin.

Is there some blog somewhere that explains how scan-chains work (not
how they're used, but how they're implemented inside the CPU)? >>>Intuitively they sound very costly to me, because of things like the
need to run extra wires all over the place. I'm obviously
missing something.

Actually, you're not far off. It's a serial shift chain which is
shifted
one-bit at a time to capture flop states. Each chain is a single wire;
a chip may have a few dozen individual shift chains.
https://www.design-reuse.com/articles/48331/scan-chains-pnr-outlook.html

Thanks. Wow. So it is really that bad, huh?
I also liked the note about speed limits and power consumption, how
shifting a state (in or out) causes (almost) all the flip-flops to
change state at each cycle, thus leading to very high power consumption.

two (2) things::

a) the scan path is only used when the rest of the logic is quiescent.
..in normal operation it is but fan-in load to the flip-flops.
b) the scan clock is typically around 200 MHz

Both eliminate the power consumption problem.

Remember it is going to take ~5,000 scan-clocks to scan out/in a core.
Scan paths en-the-large operate at human visualization frequencies.

What's the approximate cost of those scan chains. I.e. if we were to
take an existing working design and replace all the "flip-flop with scan-chain" with "plain flip-flops", how much smaller would the
resulting chip be, how much faster could it run, and how much less power could it consume?

Given a 16-gate delay design with 5 gates of "flop-jitter-skew"
the scan path adds about ½ a gate of delay (2%-ish).

I assume the cost in terms of power consumption is small because in
normal use, the scan-chain part stays completely stable so that barring leakage it should not consume any power, save for the indirect costs
like the need to move the other bits over greater distances when
the extra wires of the scan chains get in the way.

As far as power consumption of the extra logic is concerned when not
being used; it is also down in the 1% range (maybe lower).

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to EricP on Wed May 7 16:44:25 2025

On Tue, 6 May 2025 23:58:00 +0000, EricP wrote:

MitchAlsup1 wrote:

In the distant path, I had the select lines (the heavy loads that
cross the data path and select things for it to do) captured at
the other end of the data path with a skewable scan clock so we
could (essentially) see the timing of the select lines after
crossing the data path. This made it possible to use the scan
path as an oscilloscope.

Was that the AMD with the 6 phase clock you mentioned before?
I was wondering about how boundary scan would work with that.
I suppose you'd have 6 separate scan paths to capture each
phases logic states.

All you need is a skew generator--which could be external
(i.e., tester).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Wed May 7 15:03:31 2025

What's the approximate cost of those scan chains. I.e. if we were to
take an existing working design and replace all the "flip-flop with
scan-chain" with "plain flip-flops", how much smaller would the
resulting chip be, how much faster could it run, and how much less power
could it consume?

Given a 16-gate delay design with 5 gates of "flop-jitter-skew"
the scan path adds about � a gate of delay (2%-ish).

Almost lost in the noise. Thanks.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stefan Monnier on Thu May 8 01:04:42 2025

On Wed, 7 May 2025 19:03:31 +0000, Stefan Monnier wrote:

What's the approximate cost of those scan chains. I.e. if we were to
take an existing working design and replace all the "flip-flop with
scan-chain" with "plain flip-flops", how much smaller would the
resulting chip be, how much faster could it run, and how much less power >>> could it consume?

Given a 16-gate delay design with 5 gates of "flop-jitter-skew"
the scan path adds about ½ a gate of delay (2%-ish).

Almost lost in the noise. Thanks.

Such a big gain in debugging, such a small penalty for use ...

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to EricP on Tue Jul 15 17:09:28 2025

On Tue, 6 May 2025 23:58:00 +0000, EricP wrote:

MitchAlsup1 wrote:

In the distant path, I had the select lines (the heavy loads that
cross the data path and select things for it to do) captured at
the other end of the data path with a skewable scan clock so we
could (essentially) see the timing of the select lines after
crossing the data path. This made it possible to use the scan
path as an oscilloscope.

Was that the AMD with the 6 phase clock you mentioned before?
I was wondering about how boundary scan would work with that.
I suppose you'd have 6 separate scan paths to capture each
phases logic states.

When I arrived (1999) Athlon had a single phase clock chip=wide.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Stefan Monnier on Tue Jul 15 17:21:42 2025

On Wed, 7 May 2025 3:12:08 +0000, Stefan Monnier wrote:

Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
for debuggin.

Is there some blog somewhere that explains how scan-chains work (not
how they're used, but how they're implemented inside the CPU)? >>>Intuitively they sound very costly to me, because of things like the
need to run extra wires all over the place. I'm obviously
missing something.

Actually, you're not far off. It's a serial shift chain which is
shifted
one-bit at a time to capture flop states. Each chain is a single wire;
a chip may have a few dozen individual shift chains.
https://www.design-reuse.com/articles/48331/scan-chains-pnr-outlook.html

Thanks. Wow. So it is really that bad, huh?
I also liked the note about speed limits and power consumption, how
shifting a state (in or out) causes (almost) all the flip-flops to
change state at each cycle, thus leading to very high power consumption.

At a very slow clock rate, and without logic burning power.

What's the approximate cost of those scan chains. I.e. if we were to
take an existing working design and replace all the "flip-flop with scan-chain" with "plain flip-flops", how much smaller would the
resulting chip be, how much faster could it run, and how much less power could it consume?

A D-flip-flop can be implemented in 4 gates (but often 5)
A full scan D-flip flop is implemented in 10 gates.

I assume the cost in terms of power consumption is small because in
normal use, the scan-chain part stays completely stable so that barring leakage it should not consume any power, save for the indirect costs
like the need to move the other bits over greater distances when
the extra wires of the scan chains get in the way.

In normal use, the scan attachments are just additional capacitance
internal to the flip-flop. In scan use, the normal outputs of the
Flip-Flops remain stable--preventing downstream logic from toggling.
Once the scan is done, and the normal clock starts again, the scan
data is gated to the flip-flop data.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Gretchiie
  Mon Sep 15 05:16:29 2025
  from Derry, Nh via Telnet
- Fred Blogs
  Mon Sep 15 00:03:12 2025
  from Uk via SSH
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	04:22:30
Calls:	10,387
Calls today:	2
Files:	14,061
Messages:	6,416,782

Re: control co-processor

Who's Online

Recent Visitors

System Info