• Performance benefits of primitive-centric code (was: Actually... )

    From Anton Ertl@21:1/5 to Anton Ertl on Thu Jun 12 21:01:46 2025
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    As for performance, here is what I measure on gforth-itc:

    sieve bubble matrix fib fft compile,
    0.173 0.187 0.142 0.253 0.085 ,
    0.164 0.191 0.134 0.242 0.088 opt-compile,

    There is quite a bit of variation between the runs on the Zen4 machine
    where I measured this.

    That's not particularly impressive, but this primitive-centric code is
    a stepping stone for a number of further changes which overall produce
    a very good speedup. I demonstrate this with the following sequence
    of invocations:

    gforth-itc onebench.fs
    #let's add primitive-centric code
    gforth-itc -e "' opt-compile, is compile," onebench.fs
    #now switch to direct-threaded code:
    gforth --no-dynamic --ss-number=0 onebench.fs
    #now allow dynamic superinstructions with replication:
    gforth --ss-number=0 --opt-ip-updates=0 onebench.fs
    #switch to benchmarking engine (less precision in error reporting):
    gforth-fast --ss-number=0 --ss-states=1 --opt-ip-updates=0 onebench.fs
    #swith on static stack caching with three registers:
    gforth-fast --ss-number=0 --opt-ip-updates=0 onebench.fs
    #optimize away most IP updates:
    gforth-fast --ss-number=0 onebench.fs
    #enabe static superinstructions:
    gforth-fast onebench.fs

    The results on a 5GHz Zen4 are (smaller is better):

    sieve bubble matrix fib fft
    0.173 0.184 0.142 0.247 0.085 gforth-itc
    0.163 0.190 0.134 0.238 0.089 let's add primitive-centric code
    0.164 0.187 0.130 0.246 0.085 now switch to direct-threaded code
    0.084 0.128 0.051 0.105 0.030 +dynamic superinstructions with replication
    0.053 0.061 0.032 0.049 0.018 switch to benchmarking engine
    0.053 0.059 0.031 0.042 0.015 +static stack caching with three registers
    0.020 0.021 0.011 0.027 0.013 +optimize away most IP updates
    0.020 0.021 0.011 0.027 0.012 +enabe static superinstructions

    As you can see, the overall effect of these changes is quite big.

    You may wonder what these funny words all mean. Here's a list of
    papers about these topics:

    primitive-centric code:
    https://www.complang.tuwien.ac.at/papers/ertl02.ps.gz

    dynamic superinstructions with replication: https://www.complang.tuwien.ac.at/papers/ertl%26gregg03.ps.gz

    static stack caching: https://www.complang.tuwien.ac.at/papers/ertl%26gregg05.ps.gz

    IP update optimization: https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.14

    Static superinstructions: https://www.complang.tuwien.ac.at/papers/ertl+02.ps.gz

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
    EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)