anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
As for performance, here is what I measure on gforth-itc:
sieve bubble matrix fib fft compile,
0.173 0.187 0.142 0.253 0.085 ,
0.164 0.191 0.134 0.242 0.088 opt-compile,
There is quite a bit of variation between the runs on the Zen4 machine
where I measured this.
That's not particularly impressive, but this primitive-centric code is
a stepping stone for a number of further changes which overall produce
a very good speedup. I demonstrate this with the following sequence
of invocations:
gforth-itc onebench.fs
#let's add primitive-centric code
gforth-itc -e "' opt-compile, is compile," onebench.fs
#now switch to direct-threaded code:
gforth --no-dynamic --ss-number=0 onebench.fs
#now allow dynamic superinstructions with replication:
gforth --ss-number=0 --opt-ip-updates=0 onebench.fs
#switch to benchmarking engine (less precision in error reporting):
gforth-fast --ss-number=0 --ss-states=1 --opt-ip-updates=0 onebench.fs
#swith on static stack caching with three registers:
gforth-fast --ss-number=0 --opt-ip-updates=0 onebench.fs
#optimize away most IP updates:
gforth-fast --ss-number=0 onebench.fs
#enabe static superinstructions:
gforth-fast onebench.fs
The results on a 5GHz Zen4 are (smaller is better):
sieve bubble matrix fib fft
0.173 0.184 0.142 0.247 0.085 gforth-itc
0.163 0.190 0.134 0.238 0.089 let's add primitive-centric code
0.164 0.187 0.130 0.246 0.085 now switch to direct-threaded code
0.084 0.128 0.051 0.105 0.030 +dynamic superinstructions with replication
0.053 0.061 0.032 0.049 0.018 switch to benchmarking engine
0.053 0.059 0.031 0.042 0.015 +static stack caching with three registers
0.020 0.021 0.011 0.027 0.013 +optimize away most IP updates
0.020 0.021 0.011 0.027 0.012 +enabe static superinstructions
As you can see, the overall effect of these changes is quite big.
You may wonder what these funny words all mean. Here's a list of
papers about these topics:
primitive-centric code:
https://www.complang.tuwien.ac.at/papers/ertl02.ps.gz
dynamic superinstructions with replication:
https://www.complang.tuwien.ac.at/papers/ertl%26gregg03.ps.gz
static stack caching:
https://www.complang.tuwien.ac.at/papers/ertl%26gregg05.ps.gz
IP update optimization:
https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.14
Static superinstructions:
https://www.complang.tuwien.ac.at/papers/ertl+02.ps.gz
- anton
--
M. Anton Ertl
http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs:
http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard:
https://forth-standard.org/
EuroForth 2023 proceedings:
http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings:
http://www.euroforth.org/ef24/papers/
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)