• Re: control co-processor

    From Al Kossow@21:1/5 to Robert Finch on Mon May 5 03:01:12 2025
    On 5/4/25 9:40 PM, Robert Finch wrote:

    The CPU books I have do not cover a test/debug interface for the processor, so I am winging it a bit.

    The ones I have seen are based on scan chains, going back to the "muffler" in the Xerox Dorado

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Al Kossow on Mon May 5 13:46:41 2025
    Al Kossow <aek@bitsavers.org> writes:
    On 5/4/25 9:40 PM, Robert Finch wrote:

    The CPU books I have do not cover a test/debug interface for the processor, so I am winging it a bit.

    The ones I have seen are based on scan chains, going back to the "muffler" in the Xerox Dorado

    Even state-of-the-art CPUs today commonly use scan-chains (via JTAG) for debuggin.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Mon May 5 10:02:58 2025
    Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
    for debuggin.

    Is there some blog somewhere that explains how scan-chains work (not
    how they're used, but how they're implemented inside the CPU)?
    Intuitively they sound very costly to me, because of things like the
    need to run extra wires all over the place. I'm obviously
    missing something.


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Stefan Monnier on Mon May 5 16:19:21 2025
    Stefan Monnier <monnier@iro.umontreal.ca> writes:
    Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
    for debuggin.

    Is there some blog somewhere that explains how scan-chains work (not
    how they're used, but how they're implemented inside the CPU)?
    Intuitively they sound very costly to me, because of things like the
    need to run extra wires all over the place. I'm obviously
    missing something.

    Actually, you're not far off. It's a serial shift chain which is shifted one-bit at a time to capture flop states. Each chain is a single wire;
    a chip may have a few dozen individual shift chains.

    https://www.design-reuse.com/articles/48331/scan-chains-pnr-outlook.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Stefan Monnier on Tue May 6 22:17:40 2025
    On Mon, 5 May 2025 14:02:58 +0000, Stefan Monnier wrote:

    Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
    for debuggin.

    Is there some blog somewhere that explains how scan-chains work (not
    how they're used, but how they're implemented inside the CPU)?
    Intuitively they sound very costly to me, because of things like the
    need to run extra wires all over the place. I'm obviously
    missing something.

    To a good first order:: designers don't think of scan paths, we just
    select scannable flip-flops, and the tools build the scan paths for
    us. At (or near) tape-out, we have the scan path tools emit the scan path-of-the-moment so we can adjust the SW that will drive the scan
    paths at testing and debug.

    Large blocks (such as a whole core or L2) have their own scan path
    that will be individually addressable at the scan path controller
    to keep the paths down to several Killo-bits each.

    For the most part this is entirely automated--abut all the designers
    ever do is break huge scan paths into several not-so-huge scan paths.

    Design for test engineers use the simulators to figure out the bit
    captured by any-random flip-flop via the Verilog model and use said
    model to build many tests and then examine what happened to verify
    that the design is operating as expected (or not). DFT engineers
    use the simulators at least as much as the designers themselves.

    -------------

    In the distant path, I had the select lines (the heavy loads that
    cross the data path and select things for it to do) captured at
    the other end of the data path with a skewable scan clock so we
    could (essentially) see the timing of the select lines after
    crossing the data path. This made it possible to use the scan
    path as an oscilloscope.



    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to All on Tue May 6 19:58:00 2025
    MitchAlsup1 wrote:

    In the distant path, I had the select lines (the heavy loads that
    cross the data path and select things for it to do) captured at
    the other end of the data path with a skewable scan clock so we
    could (essentially) see the timing of the select lines after
    crossing the data path. This made it possible to use the scan
    path as an oscilloscope.

    Was that the AMD with the 6 phase clock you mentioned before?
    I was wondering about how boundary scan would work with that.
    I suppose you'd have 6 separate scan paths to capture each
    phases logic states.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Tue May 6 23:12:08 2025
    Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
    for debuggin.
    Is there some blog somewhere that explains how scan-chains work (not
    how they're used, but how they're implemented inside the CPU)?
    Intuitively they sound very costly to me, because of things like the
    need to run extra wires all over the place. I'm obviously
    missing something.
    Actually, you're not far off. It's a serial shift chain which is shifted one-bit at a time to capture flop states. Each chain is a single wire;
    a chip may have a few dozen individual shift chains. https://www.design-reuse.com/articles/48331/scan-chains-pnr-outlook.html

    Thanks. Wow. So it is really that bad, huh?
    I also liked the note about speed limits and power consumption, how
    shifting a state (in or out) causes (almost) all the flip-flops to
    change state at each cycle, thus leading to very high power consumption.

    What's the approximate cost of those scan chains. I.e. if we were to
    take an existing working design and replace all the "flip-flop with
    scan-chain" with "plain flip-flops", how much smaller would the
    resulting chip be, how much faster could it run, and how much less power
    could it consume?

    I assume the cost in terms of power consumption is small because in
    normal use, the scan-chain part stays completely stable so that barring
    leakage it should not consume any power, save for the indirect costs
    like the need to move the other bits over greater distances when
    the extra wires of the scan chains get in the way.


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Al Kossow@21:1/5 to Stefan Monnier on Tue May 6 21:08:41 2025
    On 5/6/25 8:12 PM, Stefan Monnier wrote:

    What's the approximate cost of those scan chains. I.e. if we were to
    take an existing working design and replace all the "flip-flop with scan-chain" with "plain flip-flops", how much smaller would the
    resulting chip be, how much faster could it run, and how much less power could it consume?


    I wouldn't matter because it wouldn't be testable in production.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Wed May 7 10:58:52 2025
    Al Kossow [2025-05-06 21:08:41] wrote:
    On 5/6/25 8:12 PM, Stefan Monnier wrote:
    What's the approximate cost of those scan chains. I.e. if we were to
    take an existing working design and replace all the "flip-flop with
    scan-chain" with "plain flip-flops", how much smaller would the
    resulting chip be, how much faster could it run, and how much less power
    could it consume?
    I wouldn't matter because it wouldn't be testable in production.

    Of course, but I'd still like to know. 🙂


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Stefan Monnier on Wed May 7 16:57:32 2025
    On Wed, 7 May 2025 3:12:08 +0000, Stefan Monnier wrote:

    Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
    for debuggin.
    Is there some blog somewhere that explains how scan-chains work (not
    how they're used, but how they're implemented inside the CPU)? >>>Intuitively they sound very costly to me, because of things like the
    need to run extra wires all over the place. I'm obviously
    missing something.
    Actually, you're not far off. It's a serial shift chain which is
    shifted
    one-bit at a time to capture flop states. Each chain is a single wire;
    a chip may have a few dozen individual shift chains.
    https://www.design-reuse.com/articles/48331/scan-chains-pnr-outlook.html

    Thanks. Wow. So it is really that bad, huh?
    I also liked the note about speed limits and power consumption, how
    shifting a state (in or out) causes (almost) all the flip-flops to
    change state at each cycle, thus leading to very high power consumption.

    two (2) things::

    a) the scan path is only used when the rest of the logic is quiescent.
    ..in normal operation it is but fan-in load to the flip-flops.
    b) the scan clock is typically around 200 MHz

    Both eliminate the power consumption problem.

    Remember it is going to take ~5,000 scan-clocks to scan out/in a core.
    Scan paths en-the-large operate at human visualization frequencies.

    What's the approximate cost of those scan chains. I.e. if we were to
    take an existing working design and replace all the "flip-flop with scan-chain" with "plain flip-flops", how much smaller would the
    resulting chip be, how much faster could it run, and how much less power could it consume?

    Given a 16-gate delay design with 5 gates of "flop-jitter-skew"
    the scan path adds about ½ a gate of delay (2%-ish).

    I assume the cost in terms of power consumption is small because in
    normal use, the scan-chain part stays completely stable so that barring leakage it should not consume any power, save for the indirect costs
    like the need to move the other bits over greater distances when
    the extra wires of the scan chains get in the way.

    As far as power consumption of the extra logic is concerned when not
    being used; it is also down in the 1% range (maybe lower).


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to EricP on Wed May 7 16:44:25 2025
    On Tue, 6 May 2025 23:58:00 +0000, EricP wrote:

    MitchAlsup1 wrote:

    In the distant path, I had the select lines (the heavy loads that
    cross the data path and select things for it to do) captured at
    the other end of the data path with a skewable scan clock so we
    could (essentially) see the timing of the select lines after
    crossing the data path. This made it possible to use the scan
    path as an oscilloscope.

    Was that the AMD with the 6 phase clock you mentioned before?
    I was wondering about how boundary scan would work with that.
    I suppose you'd have 6 separate scan paths to capture each
    phases logic states.

    All you need is a skew generator--which could be external
    (i.e., tester).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Wed May 7 15:03:31 2025
    What's the approximate cost of those scan chains. I.e. if we were to
    take an existing working design and replace all the "flip-flop with
    scan-chain" with "plain flip-flops", how much smaller would the
    resulting chip be, how much faster could it run, and how much less power
    could it consume?
    Given a 16-gate delay design with 5 gates of "flop-jitter-skew"
    the scan path adds about ½ a gate of delay (2%-ish).

    Almost lost in the noise. Thanks.


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Stefan Monnier on Thu May 8 01:04:42 2025
    On Wed, 7 May 2025 19:03:31 +0000, Stefan Monnier wrote:

    What's the approximate cost of those scan chains. I.e. if we were to
    take an existing working design and replace all the "flip-flop with
    scan-chain" with "plain flip-flops", how much smaller would the
    resulting chip be, how much faster could it run, and how much less power >>> could it consume?
    Given a 16-gate delay design with 5 gates of "flop-jitter-skew"
    the scan path adds about ½ a gate of delay (2%-ish).

    Almost lost in the noise. Thanks.

    Such a big gain in debugging, such a small penalty for use ...


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to EricP on Tue Jul 15 17:09:28 2025
    On Tue, 6 May 2025 23:58:00 +0000, EricP wrote:

    MitchAlsup1 wrote:

    In the distant path, I had the select lines (the heavy loads that
    cross the data path and select things for it to do) captured at
    the other end of the data path with a skewable scan clock so we
    could (essentially) see the timing of the select lines after
    crossing the data path. This made it possible to use the scan
    path as an oscilloscope.

    Was that the AMD with the 6 phase clock you mentioned before?
    I was wondering about how boundary scan would work with that.
    I suppose you'd have 6 separate scan paths to capture each
    phases logic states.

    When I arrived (1999) Athlon had a single phase clock chip=wide.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Stefan Monnier on Tue Jul 15 17:21:42 2025
    On Wed, 7 May 2025 3:12:08 +0000, Stefan Monnier wrote:

    Even state-of-the-art CPUs today commonly use scan-chains (via JTAG)
    for debuggin.
    Is there some blog somewhere that explains how scan-chains work (not
    how they're used, but how they're implemented inside the CPU)? >>>Intuitively they sound very costly to me, because of things like the
    need to run extra wires all over the place. I'm obviously
    missing something.
    Actually, you're not far off. It's a serial shift chain which is
    shifted
    one-bit at a time to capture flop states. Each chain is a single wire;
    a chip may have a few dozen individual shift chains.
    https://www.design-reuse.com/articles/48331/scan-chains-pnr-outlook.html

    Thanks. Wow. So it is really that bad, huh?
    I also liked the note about speed limits and power consumption, how
    shifting a state (in or out) causes (almost) all the flip-flops to
    change state at each cycle, thus leading to very high power consumption.

    At a very slow clock rate, and without logic burning power.

    What's the approximate cost of those scan chains. I.e. if we were to
    take an existing working design and replace all the "flip-flop with scan-chain" with "plain flip-flops", how much smaller would the
    resulting chip be, how much faster could it run, and how much less power could it consume?

    A D-flip-flop can be implemented in 4 gates (but often 5)
    A full scan D-flip flop is implemented in 10 gates.

    I assume the cost in terms of power consumption is small because in
    normal use, the scan-chain part stays completely stable so that barring leakage it should not consume any power, save for the indirect costs
    like the need to move the other bits over greater distances when
    the extra wires of the scan chains get in the way.

    In normal use, the scan attachments are just additional capacitance
    internal to the flip-flop. In scan use, the normal outputs of the
    Flip-Flops remain stable--preventing downstream logic from toggling.
    Once the scan is done, and the normal clock starts again, the scan
    data is gated to the flip-flop data.


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)