• Re: VLIW The Attack of the Killer Micros

    From John Levine@21:1/5 to All on Sat Feb 17 22:25:24 2024
    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Sat, 17 Feb 2024 11:41 +0000 (GMT Standard Time), John Dallman wrote:

    But most of all,
    the design is based on the compilers being able to solve a problem that
    can't be solved in practice: static scheduling of memory loads in a
    system with multiple levels of cache.

    That seems insane. Since when did architectural specs dictate the levels
    of cache you could have? Normally, that is an implementation detail, that
    can vary between different instances of the same architecture.

    The point of VLIW was to schedule this stuff statically at compile
    time to make the best use of the memory architecture. It more or less
    worked in the 1980s but as memory architectures got more complex, and
    dynamic hardware scheduling got better, VLIW performance could never
    keep up.



    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Sun Feb 18 00:24:05 2024
    On Sat, 17 Feb 2024 17:17:09 -0600, BGB wrote:

    A potential alternative would be something like a scaled-up 64-bit
    variant of an ESP32 style design (or a 64-bit version of the Qualcomm Hexagon).

    Would you end up with something similar to RISC-V?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to BGB on Sun Feb 18 09:11:26 2024
    BGB <cr88192@gmail.com> writes:
    Say, how well IA-64 could perform if only given, say, 16K of L1I$ and
    128K of L2 cache, ...

    Itanium (Merced) has 16KB I-cache and 96KB L2 cache.

    Itanium 2 (McKinley) has 16KB I-cache and 256KB L2 cache.

    They both have L3 caches.

    But these CPUs actually do fine (for their time) on HPC-style stuff,
    so the cache sizes are not the main problem. They perform badly at
    code where the compiler cannot predict the branches well, even on
    code that tends to perform well with small caches.

    Of course, the Cortex-A53 and Bonnell also performs badly, for the
    same reason, and Intel learned the lesson and replaced the in-order
    Bonnell with the OoO line beginning with Silvermont, and up to the
    recent Gracemont (Alder Lake E-core). Apple also went for OoO
    E-cores. Only ARM is sticking to in-order cores.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)