• VFX Forth does not optimise tail calls

    From =?UTF-8?Q?Micha=C5=82_Kasprzak?=@21:1/5 to All on Mon Jul 25 02:30:07 2022
    What are the reasons VFX Forth has given up on tail call optimization?

    The above question is addressed to Stephen Pelc from MPE as he designed the VFX code generator or to anyone who knows.

    Tail optimization turns a pair of call instruction followed by ret into single jmp instruction, which significantly speeds up the execution of the program and is a very simple optimization.
    VFX Forth uses very complicated techniques to speed up program execution and at the same time wastes the speed of execution so wastefully by giving up tail optimization.
    A natural question arises as to the reasons for doing so.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Pelc@21:1/5 to All on Mon Jul 25 14:57:29 2022
    On 25 Jul 2022 at 11:30:07 CEST, "Michał Kasprzak" <siarczek83@gmail.com> wrote:

    What are the reasons VFX Forth has given up on tail call optimization?

    We have not given up on it, it's just that once you have a code generator of a certain
    quality, tail call elimination does not buy you much. Many people wo want to write a
    code generator initially get excited about binary inlining and tail call elimination. Later
    they realise that there's little overall gain.

    Even later, one has to make decisions about support for common Forth practice which is forbidden by the standard. A particular example is using
    R> DROP
    to exit from an outer word. It's a design choice for the Forth system. VFX is particularly
    conservative about return stack handling; at the time of its design, return stack
    manipulation to produce branches was quite common. It is/was also widely used for inline literals and strings.

    It is probably true that return stack action can be be fully and correctly detected, but
    I have not yet persuaded myself that it can be done 100%, and do I want to introduce
    a significant block of code that produces little benefit?

    A large chunk of our business is in embedded systems. In plenty of these CPUs, e.g. Cortex-M, the item on the call stack is not a pure return address. In
    some devices
    the call return item is more than one cell on the stack.

    Stephen

    --
    Stephen Pelc, stephen@vfxforth.com
    MicroProcessor Engineering, Ltd. - More Real, Less Time
    133 Hill Lane, Southampton SO15 5AF, England
    tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612, +34 649 662 974 http://www.mpeforth.com - free VFX Forth downloads

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brian Fox@21:1/5 to Stephen Pelc on Mon Jul 25 17:38:31 2022
    On Monday, July 25, 2022 at 10:57:31 AM UTC-4, Stephen Pelc wrote:
    On 25 Jul 2022 at 11:30:07 CEST, "Michał Kasprzak" <siarc...@gmail.com> wrote:
    What are the reasons VFX Forth has given up on tail call optimization?
    We have not given up on it, it's just that once you have a code generator of a
    certain
    quality, tail call elimination does not buy you much. Many people wo want to write a
    code generator initially get excited about binary inlining and tail call elimination. Later
    they realise that there's little overall gain.

    Even later, one has to make decisions about support for common Forth practice
    which is forbidden by the standard. A particular example is using
    DROP
    to exit from an outer word. It's a design choice for the Forth system. VFX is
    particularly
    conservative about return stack handling; at the time of its design, return stack
    manipulation to produce branches was quite common. It is/was also widely used
    for inline literals and strings.

    It is probably true that return stack action can be be fully and correctly detected, but
    I have not yet persuaded myself that it can be done 100%, and do I want to introduce
    a significant block of code that produces little benefit?

    A large chunk of our business is in embedded systems. In plenty of these CPUs,
    e.g. Cortex-M, the item on the call stack is not a pure return address. In some devices
    the call return item is more than one cell on the stack.

    Stephen

    --
    Stephen Pelc, ste...@vfxforth.com
    MicroProcessor Engineering, Ltd. - More Real, Less Time
    133 Hill Lane, Southampton SO15 5AF, England
    tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612, +34 649 662 974 http://www.mpeforth.com - free VFX Forth downloads

    What if you did it in Forth style and made a word for it and let the programmer decide?
    Chuck had -; (minus-semi-colon) for tail-call optimization.
    In my hobby native code generator it makes a nice difference in size and execution speed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Micha=C5=82_Kasprzak?=@21:1/5 to All on Tue Jul 26 23:25:47 2022
    Thanks, Stephen, for the open explanations. I appreciate them and you for the work you have given to the community and for your carefulness and conservatism when working with delicate words. I smiled at the thought that beginners always start with
    inline and tail optimizations ;) because of course you are right.

    Hi Brian, are you the famous Brian Fox who made the GNU Bash shell?
    Your word -; might look like this considering that 64-bit VFX Forth uses the codes $C3 for ret, $E8 for call, and $E9 for jmp and that the latter two instructions have a 32-bit signed displacement to add to the address of the next instruction to get the
    target address.

    : -; [compile] ; here 1- c@ $C3 = here 6 - c@ $E8 = and if $E9 here 6 - c! else ." Can't -;" then ; immediate

    You use -; replacing ; wherever you try to force tail optimization at your own request.
    -; will complete the definition exactly as ; does and it will check if it can optimize tail call which might not be possible for example when using locals (even then, despite the warning message, the word is correctly defined).
    Alas, our tests in the content of -; can be successful in undesirable cases. For example there might be THEN before ; as in the definition of our -; word. Luckily with this side effect everything works fine in the end, but assume that we are using the
    word -; at our own request and that we know what we are doing.
    Disassembling the last example also suggests that trying to save one ret byte in the dictionary by using -1 ALLOT is not a good idea.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to siarczek83@gmail.com on Wed Jul 27 12:50:16 2022
    In article <a34ecf8f-32a4-4bbc-b968-957b56d0728bn@googlegroups.com>,
    MichaÅ Kasprzak <siarczek83@gmail.com> wrote:
    Thanks, Stephen, for the open explanations. I appreciate them and you for the work you have given to the community and for your carefulness and conservatism
    when working with delicate words. I smiled at the thought that beginners always start with inline and tail optimizations ;) because of course you are right.

    Hi Brian, are you the famous Brian Fox who made the GNU Bash shell?
    Your word -; might look like this considering that 64-bit VFX Forth uses the codes $C3 for ret, $E8 for call, and $E9 for jmp and that the latter two
    instructions have a 32-bit signed displacement to add to the address of the next instruction to get the target address.

    : -; [compile] ; here 1- c@ $C3 = here 6 - c@ $E8 = and if $E9 here 6 - c! else ." Can't -;" then ; immediate

    You use -; replacing ; wherever you try to force tail optimization at your own request.
    -; will complete the definition exactly as ; does and it will check if it can optimize tail call which might not be possible for example when using locals
    (even then, despite the warning message, the word is correctly defined). >Alas, our tests in the content of -; can be successful in undesirable cases. For example there might be THEN before ; as in the definition of our -; word.
    Luckily with this side effect everything works fine in the end, but assume that we are using the word -; at our own request and that we know what we are
    doing.
    Disassembling the last example also suggests that trying to save one ret byte in the dictionary by using -1 ALLOT is not a good idea.

    I'm working on an optimizer, and tail call optimization doesn't even need
    to be considered. The tail call is inlined, and their instructions are obliterated, like in non-tail calls.

    All the talk about tail call optimisation comes from lisp people, that
    want cutesy recursive call's where it is not necessary.
    Then they are burnt without tco.

    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brian Fox@21:1/5 to siarc...@gmail.com on Wed Jul 27 05:36:49 2022
    On Wednesday, July 27, 2022 at 2:25:49 AM UTC-4, siarc...@gmail.com wrote:
    Thanks, Stephen, for the open explanations. I appreciate them and you for the work you have given to the community and for your carefulness and conservatism when working with delicate words. I smiled at the thought that beginners always start with
    inline and tail optimizations ;) because of course you are right.

    Hi Brian, are you the famous Brian Fox who made the GNU Bash shell?
    Your word -; might look like this considering that 64-bit VFX Forth uses the codes $C3 for ret, $E8 for call, and $E9 for jmp and that the latter two instructions have a 32-bit signed displacement to add to the address of the next instruction to get
    the target address.

    : -; [compile] ; here 1- c@ $C3 = here 6 - c@ $E8 = and if $E9 here 6 - c! else ." Can't -;" then ; immediate

    You use -; replacing ; wherever you try to force tail optimization at your own request.
    -; will complete the definition exactly as ; does and it will check if it can optimize tail call which might not be possible for example when using locals (even then, despite the warning message, the word is correctly defined).
    Alas, our tests in the content of -; can be successful in undesirable cases. For example there might be THEN before ; as in the definition of our -; word. Luckily with this side effect everything works fine in the end, but assume that we are using the
    word -; at our own request and that we know what we are doing.
    Disassembling the last example also suggests that trying to save one ret byte in the dictionary by using -1 ALLOT is not a good idea.

    LOL. No I am the Brian Fox who is not famous and likes it that way. :-)

    Definition is in a hobby system for a retro machine that has fixed instruction sizes so I would
    never need to -1 ALLOT.

    I have not yet included protection for improper usage at this stage.
    Use it wrong, the program crashes. I will fix that soon.

    ( Excuse the use of 2- instead of CELL-. I have 24K of contiguous
    memory so word names are a luxury) :-))

    : LOOKBACK ( -- u) THERE 2- @ ; \ common factor for other words

    : -; ( -- )
    LOOKBACK ( addr ) \ get entry address of sub-routine
    2 CELLS + ( addr' ) \ move past the sub-routine ENTRY instructions
    -04 TALLOT \ erase BL @addr
    ( addr') @@ B, \ compile a branch to the NEW sub-routine
    ;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)