• 44GB SRAM on a single chip

    From Scott Lurndal@21:1/5 to All on Wed Mar 13 19:51:34 2024
    Viz. earlier discussions related to memory speeds,
    here's a single chip (wafer-sized) with just under
    a million cores and 44GB of SRAM. Four trillion
    transistors. 21PB/s memory bandwidth. 23kW.

    https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Wed Mar 13 23:08:50 2024
    On Wed, 13 Mar 2024 19:51:34 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Viz. earlier discussions related to memory speeds,
    here's a single chip (wafer-sized) with just under
    a million cores and 44GB of SRAM. Four trillion
    transistors. 21PB/s memory bandwidth. 23kW.

    https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/

    You were talking about 1ns latency then.
    The latency on "chip" like that, corner to corner, would be measured in microseconds, at best. Quite possibly, over 10 usec.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Michael S on Wed Mar 13 21:37:06 2024
    Michael S wrote:

    On Wed, 13 Mar 2024 19:51:34 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Viz. earlier discussions related to memory speeds,
    here's a single chip (wafer-sized) with just under
    a million cores and 44GB of SRAM. Four trillion
    transistors. 21PB/s memory bandwidth. 23kW.

    https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/

    You were talking about 1ns latency then.
    The latency on "chip" like that, corner to corner, would be measured in microseconds, at best. Quite possibly, over 10 usec.

    A MECL 3 chip could deliver 1 ns latency and 1 ns edge speed
    in a 16-pin DIP.

    An ECL 10K gate with 1,000 gates with 0.5ns gates inside still took 4-5ns simply to go from input pin on one side of the package to an output pin
    on the other side with no logic being performed !! This is not a gate
    speed problem, but a wire delay problem.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Scott Lurndal on Wed Mar 13 21:33:09 2024
    Scott Lurndal wrote:

    Viz. earlier discussions related to memory speeds,
    here's a single chip (wafer-sized) with just under
    a million cores and 44GB of SRAM. Four trillion
    transistors. 21PB/s memory bandwidth. 23kW.

    https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/

    CPUs were limited to 100 W thermal dissipation,
    GPUs got up to 300 W thermal dissipation,
    Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1

    As to making racks of these things. The reason CRAYs were limited to 300
    KVA is because that is the largest load a non-governmental electrical
    consumer can turn on or off without calling the power company to coordinate changing the grid (so the power company can prepare to ramp (up or down)
    their generating capacity.) It generally takes them 15-30 minutes to prepare for such a change in load.

    That thermal limit (CRAYs) was a reason the memory sizes were so restricted early on.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to mitchalsup@aol.com on Wed Mar 13 22:30:40 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:

    Viz. earlier discussions related to memory speeds,
    here's a single chip (wafer-sized) with just under
    a million cores and 44GB of SRAM. Four trillion
    transistors. 21PB/s memory bandwidth. 23kW.

    https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/

    CPUs were limited to 100 W thermal dissipation,
    GPUs got up to 300 W thermal dissipation,
    Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1

    As to making racks of these things. The reason CRAYs were limited to 300
    KVA is because that is the largest load a non-governmental electrical >consumer can turn on or off without calling the power company to coordinate >changing the grid (so the power company can prepare to ramp (up or down) >their generating capacity.) It generally takes them 15-30 minutes to prepare >for such a change in load.

    Modern datacenters dissipate 16 to 20kW per rack, with hundreds
    or thousands of racks. Basically a megawatt-hour per square meter
    with cooling factored in.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Scott Lurndal on Thu Mar 14 00:28:41 2024
    Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:

    Viz. earlier discussions related to memory speeds,
    here's a single chip (wafer-sized) with just under
    a million cores and 44GB of SRAM. Four trillion
    transistors. 21PB/s memory bandwidth. 23kW.

    https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/

    CPUs were limited to 100 W thermal dissipation,
    GPUs got up to 300 W thermal dissipation,
    Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1

    As to making racks of these things. The reason CRAYs were limited to 300 >>KVA is because that is the largest load a non-governmental electrical >>consumer can turn on or off without calling the power company to coordinate >>changing the grid (so the power company can prepare to ramp (up or down) >>their generating capacity.) It generally takes them 15-30 minutes to prepare >>for such a change in load.

    Modern datacenters dissipate 16 to 20kW per rack, with hundreds
    or thousands of racks. Basically a megawatt-hour per square meter
    with cooling factored in.

    Yes, but racks (or motherboards with the racks) are power cycled individually.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to mitchalsup@aol.com on Thu Mar 14 15:20:49 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:

    Viz. earlier discussions related to memory speeds,
    here's a single chip (wafer-sized) with just under
    a million cores and 44GB of SRAM. Four trillion
    transistors. 21PB/s memory bandwidth. 23kW.

    https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/ >>>
    CPUs were limited to 100 W thermal dissipation,
    GPUs got up to 300 W thermal dissipation,
    Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1

    As to making racks of these things. The reason CRAYs were limited to 300 >>>KVA is because that is the largest load a non-governmental electrical >>>consumer can turn on or off without calling the power company to coordinate >>>changing the grid (so the power company can prepare to ramp (up or down) >>>their generating capacity.) It generally takes them 15-30 minutes to prepare >>>for such a change in load.

    Modern datacenters dissipate 16 to 20kW per rack, with hundreds
    or thousands of racks. Basically a megawatt-hour per square meter
    with cooling factored in.

    Yes, but racks (or motherboards with the racks) are power cycled individually.

    Sometimes, perhaps even usually. Unless the datacenter looses power
    completely (which does happen, if rarely).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Scott Lurndal on Thu Mar 14 16:24:25 2024
    Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:

    Viz. earlier discussions related to memory speeds,
    here's a single chip (wafer-sized) with just under
    a million cores and 44GB of SRAM. Four trillion
    transistors. 21PB/s memory bandwidth. 23kW.

    https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/ >>>>
    CPUs were limited to 100 W thermal dissipation,
    GPUs got up to 300 W thermal dissipation,
    Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1

    As to making racks of these things. The reason CRAYs were limited to 300 >>>>KVA is because that is the largest load a non-governmental electrical >>>>consumer can turn on or off without calling the power company to coordinate >>>>changing the grid (so the power company can prepare to ramp (up or down) >>>>their generating capacity.) It generally takes them 15-30 minutes to prepare
    for such a change in load.

    Modern datacenters dissipate 16 to 20kW per rack, with hundreds
    or thousands of racks. Basically a megawatt-hour per square meter
    with cooling factored in.

    Yes, but racks (or motherboards with the racks) are power cycled individually.

    Sometimes, perhaps even usually. Unless the datacenter looses power completely (which does happen, if rarely).

    When the entire data center looses power, its 300KVA+ is the least of the
    power companies worries.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to mitchalsup@aol.com on Thu Mar 14 17:11:50 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:

    Viz. earlier discussions related to memory speeds,
    here's a single chip (wafer-sized) with just under
    a million cores and 44GB of SRAM. Four trillion
    transistors. 21PB/s memory bandwidth. 23kW.

    https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/ >>>>>
    CPUs were limited to 100 W thermal dissipation,
    GPUs got up to 300 W thermal dissipation,
    Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1 >>>>>
    As to making racks of these things. The reason CRAYs were limited to 300 >>>>>KVA is because that is the largest load a non-governmental electrical >>>>>consumer can turn on or off without calling the power company to coordinate
    changing the grid (so the power company can prepare to ramp (up or down) >>>>>their generating capacity.) It generally takes them 15-30 minutes to prepare
    for such a change in load.

    Modern datacenters dissipate 16 to 20kW per rack, with hundreds
    or thousands of racks. Basically a megawatt-hour per square meter
    with cooling factored in.

    Yes, but racks (or motherboards with the racks) are power cycled individually.

    Sometimes, perhaps even usually. Unless the datacenter looses power
    completely (which does happen, if rarely).

    When the entire data center looses power, its 300KVA+ is the least of the >power companies worries.

    Not necessarily - it could be one of the lines to the DC that failed.

    IBM's almaden research center has two 115kv transmission lines feeding
    the center, from two different circuits. A good friend was the
    facilities manager there and has several stories about the effects of
    outages and switchovers, mainly when on-site switching or circuit
    breakre gear failed.

    I could see the pylons from a former residence and saw one of the
    transmission lines fail during a storm - the transmission line
    made a 90-degree bend to head up hill to the labs; as usual
    for such a severe bend, they terminated the lines from both
    directions at the pylon and connected them with stubs; one
    of the stubs had failed and was swinging in the wind - every
    time it hit the pylon there was a visible (and audible!)
    explosion. Lasted for a couple of hours before they were
    able to kill the circuit.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)