Interesting, but the performance gain is not surprising
if you use matrix-adapted primitives.
Just for comparison:
I use different internal representations for vectors and
matrices. Vectors are simple fp-number fields. Matrices
in the heap are fat data and have a header for dimension,
numeric type and reference count. Matrices can only be
referenced via matrix-values (or matrix-stack elements).
The compiler automatically generates the correct and
therefore performant type-dependent code for
<indices> } -> matrix element
<indices> }^ -> address of the element
<indices> }! -> write matrix element
This is of course not FSL syntax, but more readable IMO.
There are also slices, but that is not the topic here.
On 12/14/23 06:44, minforth wrote:...
The compiler automatically generates the correct and
therefore performant type-dependent code for
In the FSL, this can be done for statically alloted arrays and matrices,
when the "}" and "}}" words can determine the element size (+number of columns for a matrix) at compile-time, by making them IMMEDIATE words.
I don't see how the element-size-specific reference can be done for
dynamic arrays and matrices at compile-time, when the element size and
number of columns are not known.
The details of the implementation would go too far here. Essentially,
my matrices are objects with methods for reading and writing cells
(and slices). There is a matrix stack with pointers to a pre-allocated
pool of empty matrix objects in the heap at the start of the programme.
Other frequently used global matrix values are also predefined.
This allows a lot of early-binding during compilation.
minforth wrote:
The details of the implementation would go too far here. Essentially,
my matrices are objects with methods for reading and writing cells
(and slices). There is a matrix stack with pointers to a pre-allocated
pool of empty matrix objects in the heap at the start of the programme.
Other frequently used global matrix values are also predefined.
This allows a lot of early-binding during compilation.
This is also the approach I have chosen for iForth. However, there I have
no need for a matrix stack (the parameter stack is perfectly well
suited to handle the matrix/array struct pointers).
In addition, the normal integer and fp data stacks
are full enough with other things, so the mstack is also useful for clearer code.
The operators "}" and "}}" resolve the adress of the indexed element of
the specified array, and this is where the inefficiency lies. I've been tempted to make "}" and "}}" machine code primitives for these words to increase their speed in kForth. It seems unlikely though that the words
"}" and "}}" would be generally adopted as common use words, so I have
not made them built-in words.
On 12/13/23 18:38, Krishna Myneni wrote:
...
The operators "}" and "}}" resolve the adress of the indexed element
of the specified array, and this is where the inefficiency lies. I've
been tempted to make "}" and "}}" machine code primitives for these
words to increase their speed in kForth. It seems unlikely though that
the words "}" and "}}" would be generally adopted as common use words,
so I have not made them built-in words.
I think the best approach forward for me, in kForth, is to include
efficient primitives in Forth system,
FSL_ARRAY_EL_REF ( a_arr nidx -- )
FSL_MATRIX_EL_REF ( a_mat nidx midx -- )
: random-matrix-access ( -- )
0 ?DO m{{ random-row-col }} drop LOOP ;
On 12/13/23 18:38, Krishna Myneni wrote:
...
The operators "}" and "}}" resolve the adress of the indexed element
of the specified array, and this is where the inefficiency lies. I've
been tempted to make "}" and "}}" machine code primitives for these
words to increase their speed in kForth. ...
I think the best approach forward for me, in kForth, is to include
efficient primitives in Forth system,
Looks good for existing FSL code. it might even be faster if you
introduced type-specific and thus shorter address computations
like }}F }}SF }}I and }F }SF }I.
To do that properly, I want to go to a dual-xt system and
avoid explicit STATE-dependence (which has other benefits we've
discussed in the past). That's a bigger project for later.
Krishna Myneni wrote:
To do that properly, I want to go to a dual-xt system and avoid
explicit STATE-dependence (which has other benefits we've discussed in
the past). That's a bigger project for later.
Forge ahead. Having worked with both versions (state-smart and
dual-xt), I have to say that the investment in converting a
well-running, debugged Forth system to dual-xt only pays off
if you "POSTPONE like crazy" or are in the habit of writing DSLs.
Otherwise, state-smartness can be an itch, but is not bad per se.
On 12/18/23 07:05, Krishna Myneni wrote:
...
I have pushed these changes in kForth-64 into the main branch at github:
1. make FSL matrix addressing word }} an intrinsic word for higher efficiency.
2. update the fsl-util.4th to only load the Forth source version of }}
if it is undefined.
The test code fsl-tester.4th (in forth-src/fsl/) may be used to verify
that FSL code runs correctly with the faster version of }}.
These changes will also appear in kForth-32 and in kForth-Win32, after a short period of testing.
R * R> +
minforth wrote:
[..]
In addition, the normal integer and fp data stacks
are full enough with other things, so the mstack is also useful for clearer code.
It probably comes as no surprise that I find locals to be a perfect solution for that.
mhx wrote:
OTOH I don't see much advantage in using matrix locals for dealing with global FSL matrices.
I don't use the FSL for serious work but I stuck with its basic interface words. Maybe I'll change that when it comes to a 2nd version of iForth
or iSPICE.
.. v' A m* v m* 0> IF ...
I'd rather invest in more readability i.e. easy transformation from usual Matlab notation to Forth. A simple example:
Matlab: (v'Av) where A is a matrix and v' is the transposed vector of v
Forth postfix:
.. v' A m* v m* 0> IF ...
minforth wrote:
I'd rather invest in more readability i.e. easy transformation from usual
Matlab notation to Forth. A simple example:
Matlab: (v'Av) where A is a matrix and v' is the transposed vector of v
Forth postfix:
.. v' A m* v m* 0> IF ...
I have to say
FORTH> 2 2 FLOAT MATRIX a{{
FORTH> 1e 2e 3e 4e a{{ #=> ok
FORTH> a{{ }}print
1.000000e+0000 2.000000e+0000
3.000000e+0000 4.000000e+0000 ok
FORTH> a{{ 1 =rowget a{{ a{{ 1 =colget =mat* swap =mat* }}print
3.000000e+0001 4.000000e+0001
6.600000e+0001 8.800000e+0001 ok
FORTH> a{{ }}print
1.000000e+0000 2.000000e+0000
3.000000e+0000 4.000000e+0000 ok
But I also have XOPG:
: test1 LET a = b*c-3.17e-5/TANH(w)+ABS(x): CR LET. a: ;
LET w = 1.e-3: LET x = -2.5: CR CR test1 -- 2.4682999894333340377777106878373968247191492
0-VALUE GVAL a 0-VALUE GVAL b 0-VALUE GVAL c
0-VALUE GVAL disc ( Used for discriminant )
: quadraticroot ( F: a b c -- y1 y2 )
TO c TO b TO a Pickup coefficients.
LET disc=SQRT(b*b-4*a*c): Set discriminant.
LET ((-b+disc)/(2*a),(-b-disc)/(2*a)): Put values on f-stack.
;
( x*x-3*x+2=0 ) LET quadratic root (1,-3, 2) :
CR .PRINT CR .PRINT -- prints 1 2
( goldenratio ) CR LET. MAX(quadra ticroot (1,-1,-1)): -- 1.6180339887498948482045868343656381177203
#[ 1 2 ; 3 4 ] :=> A ok parse matrix elements, create A and fill it[ 1. 2.
A #. read A and print
A' #. read A, transpose (but don't change A) and print[ 1. 3.
But, I want }} to do the size-specific compilation,
eventually. To do that properly, I want to go to a dual-xt system and
avoid explicit STATE-dependence (which has other benefits we've
discussed in the past).
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
But, I want }} to do the size-specific compilation,
eventually. To do that properly, I want to go to a dual-xt system and
avoid explicit STATE-dependence (which has other benefits we've
discussed in the past).
Avoiding STATE is a good idea. However, what you have in mind seems
to be an optimization, not something like S". Dual-xt words are good
for stuff like S"; you can use them for optimization, but the
intelligent COMPILE, is better for that.
mhx wrote:
0-VALUE GVAL a 0-VALUE GVAL b 0-VALUE GVAL c
0-VALUE GVAL disc ( Used for discriminant )
: quadraticroot ( F: a b c -- y1 y2 )
TO c TO b TO a Pickup coefficients.
LET disc=SQRT(b*b-4*a*c): Set discriminant. >> LET ((-b+disc)/(2*a),(-b-disc)/(2*a)): Put values on f-stack.
;
mhx wrote:
0-VALUE GVAL a 0-VALUE GVAL b 0-VALUE GVAL c
0-VALUE GVAL disc ( Used for discriminant )
: quadraticroot ( F: a b c -- y1 y2 )
TO c TO b TO a Pickup coefficients.
LET disc=SQRT(b*b-4*a*c): Set discriminant.
LET ((-b+disc)/(2*a),(-b-disc)/(2*a)): Put values on f-stack.
;
If I had your formula parser that would be here perhaps
: QUADRATICROOT { f: a b c | d == y1 y2 }
2.0 *to a
let d = sqrt(fma(b,b,-2*a*c))
let y1 = (-b+d)/a
let y2 = (-b-d)/a ;
mhx wrote:
0-VALUE GVAL a 0-VALUE GVAL b 0-VALUE GVAL c
0-VALUE GVAL disc ( Used for discriminant )
: quadraticroot ( F: a b c -- y1 y2 )
TO c TO b TO a Pickup coefficients.
LET disc=SQRT(b*b-4*a*c): Set discriminant.
LET ((-b+disc)/(2*a),(-b-disc)/(2*a)): Put values on f-stack.
;
If I had your formula parser that would be here perhaps
: QUADRATICROOT { f: a b c | d == y1 y2 }
2.0 *to a
let d = sqrt(fma(b,b,-2*a*c))
let y1 = (-b+d)/a
let y2 = (-b-d)/a ;
Krishna Myneni wrote:
On 12/19/23 10:43, Anton Ertl wrote:
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
But, I want }} to do the size-specific compilation,
eventually. To do that properly, I want to go to a dual-xt system and
avoid explicit STATE-dependence (which has other benefits we've
discussed in the past).
Avoiding STATE is a good idea. However, what you have in mind seems
to be an optimization, not something like S". Dual-xt words are good
for stuff like S"; you can use them for optimization, but the
intelligent COMPILE, is better for that.
We had a discussion about this earlier, and I did not like the design
of SET-OPTIMIZER changing the behavior of COMPILE, . I don't see the
drawback of changing the xt for compilation state as a method of
optimization. Why add extra complexity with SET-OPTIMIZER when you
don't have to?
--
Krishna
LXF has from the beginning used the compiling XT also for optimization
(code generation). COMPILE, just compiles a call to the XT.
This works perfectly well and is easy to implement.
LXF has about 400 words that generate code while compiling.
This is my preference -- COMPILE, should simply compile the code needed
to execute the xt given to it and not do something clever by
substituting a different xt for it.
On 12/19/23 10:43, Anton Ertl wrote:
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
But, I want }} to do the size-specific compilation,
eventually. To do that properly, I want to go to a dual-xt system and
avoid explicit STATE-dependence (which has other benefits we've
discussed in the past).
Avoiding STATE is a good idea. However, what you have in mind seems
to be an optimization, not something like S". Dual-xt words are good
for stuff like S"; you can use them for optimization, but the
intelligent COMPILE, is better for that.
We had a discussion about this earlier, and I did not like the design of >SET-OPTIMIZER changing the behavior of COMPILE, .
I don't see the
drawback of changing the xt for compilation state as a method of >optimization. Why add extra complexity with SET-OPTIMIZER when you don't
have to?
This is my preference -- COMPILE, should simply compile the code needed
to execute the xt given to it and not do something clever by
substituting a different xt for it.
Krishna Myneni wrote:
This is my preference -- COMPILE, should simply compile the code
needed to execute the xt given to it and not do something clever by
substituting a different xt for it.
Counterexample:
Code inlining is a simple optimisation technique. Why not use it?
On 12/22/23 03:00, minforth wrote:
Krishna Myneni wrote:
This is my preference -- COMPILE, should simply compile the code
needed to execute the xt given to it and not do something clever by
substituting a different xt for it.
Counterexample:
Code inlining is a simple optimisation technique. Why not use it?
I do use code inlining. I don't understand what you mean by this being a >counter-example. For example, you may need to give interpretation
semantics for a word that performs a sequence of words, but the
compilation semantics does inline compiling of the sequence.
Or am I misunderstanding you?
--
Krishna
I do use code inlining. I don't understand what you mean by this being a counter-example. For example, you may need to give interpretation
semantics for a word that performs a sequence of words, but the
compilation semantics does inline compiling of the sequence.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
This is my preference -- COMPILE, should simply compile the code needed
to execute the xt given to it and not do something clever by
substituting a different xt for it.
So IYO COMPILE, should be defined as
: compile, ( xt -- )
postpone literal postpone execute ;
?
And of course the POSTPONE EXECUTE must do something other than
"COMPILE," the xt or EXECUTE, or this results in an endless recursion.
Interestingly, development Gforth has this as "GENERAL-COMPILE," and
this is the implementation of "COMPILE," if no more specialized
version is installed. E.g., the more specialized implementation of
COMPILE, for a constant is:
: constant, ( xt -- )
>body @ postpone literal ;
This does not substitute a different xt for the given xt, it compiles
the value of the constant as a literal. Why do you think we should
compile with GENERAL-COMPILE, instead?
Krishna Myneni wrote:
I do use code inlining. I don't understand what you mean by this being
a counter-example. For example, you may need to give interpretation
semantics for a word that performs a sequence of words, but the
compilation semantics does inline compiling of the sequence.
I am a little surprised that this is a topic of discussion. See also: https://ethz.ch/content/dam/ethz/special-interest/infk/ast-dam/documents/Theodoridis-ASPLOS22-Inlining-Paper.pdf
Well, I consider compiling a single xt to be equivalent to compiling a function call. Including the function body (apart from reducing function
call overhead)
a) eliminates the xt, so it is no longer there for decompilation or introspection, and
b) creates a wider area for further optimisations.
For example, peephole optimisation can now extend across the boundaries
of the host code and inlined code, et cetera.
Krishna Myneni wrote:
This is my preference -- COMPILE, should simply compile the code needed
to execute the xt given to it and not do something clever by
substituting a different xt for it.
Quoting you:
Krishna Myneni wrote:
This is my preference -- COMPILE, should simply compile the code
needed to execute the xt given to it and not do something clever by
substituting a different xt for it.
Generally spoken, inlining substitutes the original xt with different xts (the inlined code) or machine code. But okay, just a misunderstanding.
Merry Xmas !! :-)
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
On 12/19/23 10:43, Anton Ertl wrote:
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
But, I want }} to do the size-specific compilation,
eventually. To do that properly, I want to go to a dual-xt system and
avoid explicit STATE-dependence (which has other benefits we've
discussed in the past).
Avoiding STATE is a good idea. However, what you have in mind seems
to be an optimization, not something like S". Dual-xt words are good
for stuff like S"; you can use them for optimization, but the
intelligent COMPILE, is better for that.
We had a discussion about this earlier, and I did not like the design of
SET-OPTIMIZER changing the behavior of COMPILE, .
If it changes the behaviour of COMPILE, (rather than the
implementation of that behaviour), that's a mistake in the use of SET-OPTIMIZER: Whatever you do, it must not change the behaviour.
I don't see the
drawback of changing the xt for compilation state as a method of
optimization. Why add extra complexity with SET-OPTIMIZER when you don't
have to?
* It's a separation of concerns: SET-OPTIMIZER is for optimization and
must not change the behaviour (i.e., if you replace the call to
SET-OPTIMIZER with DROP, the program should still work), whereas
SET->COMP (and related words such as INTERPRET/COMPILE:) changes the
compilation semantics, i.e., the behaviour.
* If you implement [COMPILE], you need to know if a word has
non-default compilation semantics. If you have the separation of
concerns above, that is easy: If it does not have the default
NAME>COMPILE method (DEFAULT-NAME>COMP), it has non-default
compilation semantics. If you are using this mechanism for a
purpose that does not change the compilation semantics, you have to
add a mechanism that tells the compiler about the difference, and
the user has to provide this information in some way, too (e.g., by
having INTERPRET/COMPILE: if the resulting word has non-default
compilation semantics and INTERPRET/OPTIMIZE: if it has).
* There is a difference in performance if an xt is COMPILE,ed; with
the intelligent COMPILE, the result is as good as going through the
text interpreter; with INTERPRET/COMPILE:, you get a generic call to
the xt, while the text interpreter produces better code.
* The COMPILE, methods get the xt that is COMPILE,d (they have the
same stack effect as COMPILE,), which helps in using the same
implementation for several xts. E.g., FOLD1-1 is the optimizer of
29 words (all with the stack effect ( x -- x ). INTERPRET/COMPILE:
lacks this flexibility. I guess you could have a way that provides
the xt or nt of the word for which you are performing the
compilation semantics; not sure how well that would work.
Finally, my vision for the (far-away) future is that words such as S"
and TO go away, and with them the need for words such as
INTERPRET/COMPILE: or (worse) STATE-smart words.
My impression is that, in Gforth, you implement xt for a word as follows:
{ xt-interp, xt-compile [,xt-opt] }
and SET-OPTIMIZER allows you to specify xt-opt.
int: noop noop a>int noop
comp: default-name>comp default-name>comp i/c>comp default-name>comp >string: named>string named>string named>string named>string
link: named>link named>link named>link named>link
Perhaps some pseudo-code
for INTERPRET will be helpful in understanding how compilation occurs in >Gforth.
On 12/18/23 07:37, Krishna Myneni wrote:...
I also introduced the word "*+" in the last commit in kForth-64.
*+ ( a b c -- n ) \ n = a*b + c
Note that *+ is not the same as the sequence "* +", ...
I expect the floating point version, F*+, to be equally, if not more
useful for improving readability of fp code and increasing efficiency.
F*+ ( F: r1 r2 r3 -- r ) \ r = r1*r2 + r3
F*+ provides a intrinsic scalar linear transformation. I have not yet
added this word.
I need to correct my terminology in the prior statement. It is incorrect
to refer to the word F*+ (and its integer counterpart, *+) as a scalar "linear transformation." They allow efficient calculation of linear functions, of the form f(x) = a*x + b.
A "linear transformation" is something else. A function effecting a
linear transformation must have the property f(x=0) = 0, which is most certainly not true in the general case for F*+ or *+.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
My impression is that, in Gforth, you implement xt for a word as follows:
{ xt-interp, xt-compile [,xt-opt] }
and SET-OPTIMIZER allows you to specify xt-opt.
Actually, Gforth has 8 header methods (method selectors) and 9 setter
words for specifying the method implementation for a method selector
in the current word:
execute ( ... xt -- ... ) set-execute ( code-address -- )
execute ( ... xt -- ... ) set-does> ( xt -- )
compile, ( xt -- ) set-optimizer ( xt -- )
name>interpret ( nt -- xt ) set->int ( xt -- )
name>compile ( nt -- xt1 xt2 ) set->comp ( xt -- )
(to) ( val xt -- ) set-to ( xt -- )
defer@ ( xt1 -- xt2 ) set-defer@ ( xt -- )
name>string ( nt -- c-addr u ) set->string ( xt -- )
name>link ( nt1 -- nt2|0 ) set->link ( xt -- )
In particular, you change the compilation semantics by specifying the NAME>COMPILE method implementation, which is somewhat different from providing xt-compsem (your xt-compile). Based on that, we also
provide the convenience words:
set-compsem ( xt -- )
compsem: ( -- ) \ dedicated to Stephen Pelc
intsem: ( -- ) \ special dedication to Stephen Pelc interpret/compile: ( xt-int xt-comp "name" -- )
Usage examples, all equivalent:
:noname ." compiling" ;
: foo ." interpreting" ; set-compsem
: foo ." interpreting" ; compsem: ." compiling" ;
: foo ." compiling" ; intsem: ." interpreting" ;
: foo-int ." interpreting" ;
: foo-comp ." compiling" ;
' foo-int ' foo-comp interpret/compile: foo
Note that several people (including me) have recommended to define,
for every dual-semantics word like FOO, also FOO-INT and FOO-COMP.
Usage:
9 interpret/compile:
1 set-compsem: (in the definition of COMPSEM:)
1 compsem:
0 intselm:
23 set-optimizer (and there are other, derived words)
Read all about it in:
http://www.euroforth.org/ef19/papers/paysan.pdf
Since then, we have moved nt and xt to point to the body.
You can see the header methods (except the code field) with
.hm ( nt -- )
E.g., for seeing the code generator of "!", you say
``! .hm
Let's compare the header methods of "!", ":", and "TO" and the
constant K-F1:
``! .hm ``: .hm ``to .hm ``k-f1 .hm
opt: peephole-compile, :, :, constant,
to: no-to no-to no-to no-to
extra: $0 $0 $0 $0
int: noop noop a>int noop
comp: default-name>comp default-name>comp i/c>comp default-name>comp >> string: named>string named>string named>string named>string
link: named>link named>link named>link named>link
The "extra:" field is used for SET-DOES>.
Perhaps some pseudo-code
for INTERPRET will be helpful in understanding how compilation occurs in
Gforth.
For dictionary words, what happens is, in essence:
parse-name find-name dup 0= #-13 and throw ( nt )
state @ if
name>compile
else
name>interpret
then
execute
You won't find it in this form in the current text interpreter,
because the text interpreter is now written to cover all kinds of recognizers.
What may also be interesting to you is what happens then: For words
with default interpretation and compilation semantics (most words),
the NAME>COMPILE implementation is (simplified)
: default-name>comp ( nt -- xt1 xt2 )
name>interpret ['] compile, ;
For an immediate word, the NAME>COMPILE implementation is:
: imm>comp ( nt -- xt1 xt2 )
name>interpret ['] execute ;
Not just optimization, but all code generation happens in the
implementations of COMPILE,. E.g., ":,", "CONSTANT," and
"PEEPHOLE-COMPILE," are (simplified):
: :, ( xt -- ) >body ['] call peephole-compile, , ;
: constant, ( xt -- ) >body @ postpone literal ;
: peephole-compile, ( xt -- )
\ compile xt, appending its code to the current dynamic superinstruction
lits, here swap , compile-prim1 ;
LITS, ensures that any literals on the literal stack are compiled
before the primitive (this is part of the constant folding
implementation), and COMPILE-PRIM1 ( addr -- ) is in the C part of
Gforth; in gforth-fast, it performs stack caching, combines primitives
into superinstructions, and performs native-code generation if these optimizations are enabled (they are by default), but at the very
least, turns the code-field address (ITC) into a code address (DTC).
Note that in the gforth and gforth-fast engines, "," alone does not
work for compiling a word, because these engines use hybrid
direct/indirect threaded code, which requires primitive-centric code
for colon definitions: In a colon definition, every word is compiled
to a primitive, possibly followed by an immediate argument; e.g., a
colon definition is compiled to the primitive CALL followed by the
address of the called word. See:
@InProceedings{ertl02,
author = {M. Anton Ertl},
title = {Threaded Code Variations and Optimizations (Extended
Version)},
booktitle = {Forth-Tagung 2002},
year = {2002},
address = {Garmisch-Partenkirchen},
url = {http://www.complang.tuwien.ac.at/papers/ertl02.ps.gz},
abstract = {Forth has been traditionally implemented as indirect
threaded code, where the code for non-primitives is
the code-field address of the word. To get the
maximum benefit from combining sequences of
primitives into superinstructions, the code produced
for a non-primitive should be a primitive followed
by a parameter (e.g., \code{lit} \emph{addr} for
variables). This paper takes a look at the steps
from a traditional threaded-code implementation to
superinstructions, and at the size and speed effects
of the various steps.\comment{It also compares these
variants of Gforth to various other Forth
implementations on contemporary machines.} The use
of superinstructions gives speedups of up to a
factor of 2 on large benchmarks on processors with
branch target buffers, but requires more space for
the primitives and the optimization tables, and also
a little more space for the threaded code.}
}
- anton
Not just optimization, but all code generation happens in the
implementations of COMPILE,. E.g., ":,", "CONSTANT," and
"PEEPHOLE-COMPILE," are (simplified):
: :, ( xt -- ) >body ['] call peephole-compile, , ;
: constant, ( xt -- ) >body @ postpone literal ;
: peephole-compile, ( xt -- )
compile xt, appending its code to the current dynamic superinstruction
lits, here swap , compile-prim1 ;
LITS, ensures that any literals on the literal stack are compiled
before the primitive (this is part of the constant folding
implementation), and COMPILE-PRIM1 ( addr -- ) is in the C part of
Gforth; in gforth-fast, it performs stack caching, combines primitives
into superinstructions, and performs native-code generation if these optimizations are enabled (they are by default), but at the very
least, turns the code-field address (ITC) into a code address (DTC).
Anton Ertl wrote:
Not just optimization, but all code generation happens in the
implementations of COMPILE,. E.g., ":,", "CONSTANT," and
"PEEPHOLE-COMPILE," are (simplified):
: :, ( xt -- ) >body ['] call peephole-compile, , ;
: constant, ( xt -- ) >body @ postpone literal ;
: peephole-compile, ( xt -- )
\ compile xt, appending its code to the current dynamic superinstruction >> lits, here swap , compile-prim1 ;
LITS, ensures that any literals on the literal stack are compiled
before the primitive (this is part of the constant folding
implementation), and COMPILE-PRIM1 ( addr -- ) is in the C part of
Gforth; in gforth-fast, it performs stack caching, combines primitives
into superinstructions, and performs native-code generation if these
optimizations are enabled (they are by default), but at the very
least, turns the code-field address (ITC) into a code address (DTC).
For constant folding, superinstructions and peephole optimisation,
my compiler has a FIFO token queue for delayed compilation.
Optimisations are based on simple pattern recognition of the queue content. >When active, this is fully automatic and does not require complicated >contortions and a bag of special words.
On 4/01/2024 10:33 pm, Anton Ertl wrote:
:noname ( xt -- )
lits# 1 u>= if
lits> case
0 of postpone dup drop exit endof
1 of postpone over drop exit endof
2 of postpone third drop exit endof
3 of postpone fourth drop exit endof
dup >lits
endcase
then
peephole-compile, ;
optimizes pick
What this demonstrates is how badly ANS CASE needs optimization.
PICK not so much.
On 5/01/2024 10:27 pm, Anton Ertl wrote:
dxf <dxforth@gmail.com> writes:
On 4/01/2024 10:33 pm, Anton Ertl wrote:
:noname ( xt -- )
lits# 1 u>= if
lits> case
0 of postpone dup drop exit endof
1 of postpone over drop exit endof
2 of postpone third drop exit endof
3 of postpone fourth drop exit endof
dup >lits
endcase
then
peephole-compile, ;
optimizes pick
What this demonstrates is how badly ANS CASE needs optimization.
PICK not so much.
What makes you think so?
Is it not obvious to all that ENDOF jumps in the above are compiled but
never used? I'm curious. What is it about ANS-FORTH that users and >implementers alike have become so uncritical?
Sure, this could be rewritten as:
create npicks ' dup , ' over , ' third , ' fourth ,
:noname ( xt -- )
lits# 1 u>= if
lits> dup 4 u< if
cells npicks + @ compile, exit then
>lits then
peephole-compile, ;
optimizes pick
and the result would be slightly faster when it is used, PICK is
compiled not that often, so why optimize it? Especially given that,
if the optimization was really useful, one could just write the latter
code.
Don't make a computer do what intelligence will do anyway.
Another question in this context is which version is easier to write
correctly and easier to read and understand.
When the cases are 0 1 2 3 etc execution tables are easy to write and
overall code is minimal. But such is not often the case.
0= | 1- < |;
R R> | NOOP |
R | NOOP |
On 5/01/2024 10:27 pm, Anton Ertl wrote:
dxf <dxforth@gmail.com> writes:
On 4/01/2024 10:33 pm, Anton Ertl wrote:
:noname ( xt -- )
lits# 1 u>= if
lits> case
0 of postpone dup drop exit endof
1 of postpone over drop exit endof
2 of postpone third drop exit endof
3 of postpone fourth drop exit endof
dup >lits
endcase
then
peephole-compile, ;
optimizes pick
What this demonstrates is how badly ANS CASE needs optimization.
PICK not so much.
What makes you think so?
Is it not obvious to all that ENDOF jumps in the above are compiled but
never used?
The array and matrix definitions given in the FSL utilities source,
fsl-util.x
or
fsl_util.x.
The arrays are quite nice for their flexibility in making arrays out of
any type of data, and are useful in many instances. However, the source definitions are slow on non-optimizing Forth systems.
I believe the design of the arrays and matrices traces back to Julian Noble's, "Scientific Forth." A 1-D array is named with a trailing left
brace "{" while 2-D matrices have a trailing double left brace, "{{".
This allows a convenient notation
a{ I } \ resolves to address of array element I
m{{ J I }} \ resolves to address of matrix element at row J, col I
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 475 |
Nodes: | 16 (2 / 14) |
Uptime: | 18:41:14 |
Calls: | 9,487 |
Calls today: | 6 |
Files: | 13,617 |
Messages: | 6,121,092 |