The big problem I see is the registers used for returning values from functions. R0-R3 can all be used for passing arguments to functions, as 32-bit (or smaller) values, pointers, in pairs as 64-bit values, and as
parts of structs.
But the ABI only allows returning a single 32-bit value in R0, or a
scalar 64-bit value in R0:R1. If a function returns a non-scalar that
is larger than 32-bit, the caller has to allocate space on the stack for
the return type and pass a pointer to that space in R0.
To my mind, this is massively inefficient, especially when using structs
that are made up of two 32-bit parts.
Is there any good reason why the ABI is designed with such limited
register usage for returns? Newer ABIs like RISC-V 32-bit and x86_64
can at least use two registers for return values. Modern compilers are
quite happy breaking structs into parts in individual registers - it's a /long/ time since they insisted that structs occupied a contiguous block
of memory. Can anyone give me an explanation why return types can't
simply use all the same registers that are available for argument passing?
Are there good technical reasons for the conventions on 32-bit ARM? Or
is this all just historical from the days when everything was an "int"
and that's all anyone ever returned from functions?
But the ABI only allows returning a single 32-bit value in R0, or a
scalar 64-bit value in R0:R1. If a function returns a non-scalar that
is larger than 32-bit, the caller has to allocate space on the stack for
the return type and pass a pointer to that space in R0.
To my mind, this is massively inefficient, especially when using structs
that are made up of two 32-bit parts.
Is there any good reason why the ABI is designed with such limited
register usage for returns?
Newer ABIs like RISC-V 32-bit and x86_64
can at least use two registers for return values. Modern compilers are
quite happy breaking structs into parts in individual registers - it's a >/long/ time since they insisted that structs occupied a contiguous block
of memory.
I also think code would be a bit more efficient if there more registers >available for parameter passing and as scratch registers - perhaps 6
would make more sense.
In more modern C++ programming, it's very practical to use types like >std::optional<>, std::variant<>, std::expected<> and std::tuple<> as a
way of dealing safely with status and multiple return values rather than >using C-style error codes or passing manual pointers to return value
slots.
But the limited return registers adds significant overhead to
small functions.
I'm trying to understand the reasoning behind some of the callingSuperScalar
conventions used with 32-bit ARM. I work primarily with small embedded systems, so the efficiency of code on 32-bit Cortex-M devices is very important to me - good calling conventions make a big difference.
No doubt most people here know this already, but in summary these
devices are a 32-bit load/store RISC architecture with 16 registers.
R0-R3 and R12 are scratch/volatile registers, R4-R11 are preserved
registers, R13 is the stack pointer, R14 is the link register and R15 is
the program counter. For most Cortex-M cores, there is no
super-scaling,
out-of-order execution, speculative execution, etc., but instructions are pipelined.
The big problem I see is the registers used for returning values from functions. R0-R3 can all be used for passing arguments to functions, as 32-bit (or smaller) values, pointers, in pairs as 64-bit values, and as
parts of structs.
But the ABI only allows returning a single 32-bit value in R0, or a
scalar 64-bit value in R0:R1. If a function returns a non-scalar that
is larger than 32-bit, the caller has to allocate space on the stack for
the return type and pass a pointer to that space in R0.
To my mind, this is massively inefficient, especially when using structs
that are made up of two 32-bit parts.
Is there any good reason why the ABI is designed with such limited
register usage for returns?
Newer ABIs like RISC-V 32-bit and x86_64
can at least use two registers for return values.
Modern compilers are
quite happy breaking structs into parts in individual registers - it's a /long/ time since they insisted that structs occupied a contiguous block
of memory. Can anyone give me an explanation why return types can't
simply use all the same registers that are available for argument
passing?
I also think code would be a bit more efficient if there more registers available for parameter passing and as scratch registers - perhaps 6
would make more sense.
In more modern C++ programming, it's very practical to use types like std::optional<>, std::variant<>, std::expected<> and std::tuple<> as a
way of dealing safely with status and multiple return values rather than using C-style error codes or passing manual pointers to return value
slots. But the limited return registers adds significant overhead to
small functions.
Are there good technical reasons for the conventions on 32-bit ARM? Or
is this all just historical from the days when everything was an "int"
and that's all anyone ever returned from functions?
Thanks for any pointers or explanations here.
David Brown <david.brown@hesbynett.no> writes:
But the ABI only allows returning a single 32-bit value in R0, or a
scalar 64-bit value in R0:R1. If a function returns a non-scalar that
is larger than 32-bit, the caller has to allocate space on the stack for >>the return type and pass a pointer to that space in R0.
To my mind, this is massively inefficient, especially when using structs >>that are made up of two 32-bit parts.
Is there any good reason why the ABI is designed with such limited
register usage for returns?
Most calling conventions on RISCs are oriented towards C (if you want
calling conventions that try to be more cross-language (and slower),
look at VAX) and its properties and limitations at the time when the
calling convention was designed, in particular, the PCC
implementation, which was the de-facto standard Unix C compiler at the
time. C compilers in the 1980s did not allocate structs to registers,
so passing structs in registers was foreign to them, so the solution
is that the caller passes the target struct as an additional
parameter.
And passing the return value in registers might not have saved
anything on a compiler that does not deal with structs in registers.
E.g., if you have
mystruct = myfunc(arg1, arg2);
you would see stores to mystruct behind the call. With the PCC
calling convention, the same stores would happen in the caller
(possibly resulting in smaller code if there are several calls to
myfunc()).
I wonder, though, how things look for
mystruct = foo(&mystruct);
Does PCC perform the return stores to mystruct only after performing
all other memory accesses in foo? Probably yes, anything else would complicate the compiler. In that case the caller could pass &mystruct
for the return value (a slight complication). But is that restriction reflected in the calling convention?
Struct returns were (and AFAIK still are, many decades after
they were added to C) a relatively rarely used feature, so Johnson
(PCC's author) probably did not want to waste a lot of effort on
making it more efficient.
I also think code would be a bit more efficient if there more registers >>available for parameter passing and as scratch registers - perhaps 6
would make more sense.
There is a tendency towards passing more parameters in registers in
more recent calling conventions. IA-32 (and IIRC VAX) passes none,
MIPS uses 4 integer registers (for either integer or FP parameters),
Alpha uses 6 integer and 6 FP registers, AMD64's System V ABI 6
integer and 8 FP registers, ARM A64 has 8 integer and 8 FP registers,
RISC-V has 8 integer and 8 FP registers. Not sure why they were so
reluctant to use more registers earlier.
- anton
I also think code would be a bit more efficient if there more registers
available for parameter passing and as scratch registers - perhaps 6
would make more sense.
Basically, here, there is competing pressure between the compiler
needing a handful of preserved registers, and the compiler being
more efficient if there were more argument/result passing registers.
My 66000 ABI has 8 argument registers, 7 temporary registers, 14
preserved registers, a FP, and a SP. IP is not part of the register
file. My ABI has a note indicating that the aggregations can be
altered, just that I need a good reason to change.
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
SUBROUTINE ZUNMR3( SIDE, TRANS, M, N, K, L, A, LDA, TAU, C, LDC,
$ WORK, INFO )
which has 13 arguments.
David Brown <david.brown@hesbynett.no> writes:
But the ABI only allows returning a single 32-bit value in R0, or a
scalar 64-bit value in R0:R1. If a function returns a non-scalar that
is larger than 32-bit, the caller has to allocate space on the stack for
the return type and pass a pointer to that space in R0.
To my mind, this is massively inefficient, especially when using structs
that are made up of two 32-bit parts.
Is there any good reason why the ABI is designed with such limited
register usage for returns?
Most calling conventions on RISCs are oriented towards C (if you want
calling conventions that try to be more cross-language (and slower),
look at VAX) and its properties and limitations at the time when the
calling convention was designed, in particular, the PCC
implementation, which was the de-facto standard Unix C compiler at the
time. C compilers in the 1980s did not allocate structs to registers,
so passing structs in registers was foreign to them, so the solution
is that the caller passes the target struct as an additional
parameter.
And passing the return value in registers might not have saved
anything on a compiler that does not deal with structs in registers.
Struct returns were (and AFAIK still are, many decades after
they were added to C) a relatively rarely used feature, so Johnson
(PCC's author) probably did not want to waste a lot of effort on
making it more efficient.
gcc has an option -freg-struct-return, which does what you want. Of
course, if you use this option on ARM A32/T32, you are not following
the calling convention, so you should only use it when all sides of a
struct return are compiled with that option.
Newer ABIs like RISC-V 32-bit and x86_64
can at least use two registers for return values. Modern compilers are
quite happy breaking structs into parts in individual registers - it's a
/long/ time since they insisted that structs occupied a contiguous block
of memory.
ARM A32 is from 1985, and its calling convention is probably not much younger.
I also think code would be a bit more efficient if there more registers
available for parameter passing and as scratch registers - perhaps 6
would make more sense.
There is a tendency towards passing more parameters in registers in
more recent calling conventions. IA-32 (and IIRC VAX) passes none,
MIPS uses 4 integer registers (for either integer or FP parameters),
Alpha uses 6 integer and 6 FP registers, AMD64's System V ABI 6
integer and 8 FP registers, ARM A64 has 8 integer and 8 FP registers,
RISC-V has 8 integer and 8 FP registers. Not sure why they were so
reluctant to use more registers earlier.
In more modern C++ programming, it's very practical to use types like
std::optional<>, std::variant<>, std::expected<> and std::tuple<> as a
way of dealing safely with status and multiple return values rather than
using C-style error codes or passing manual pointers to return value
slots.
The ARM calling convention is certainly much older than "modern C++ programming".
But the limited return registers adds significant overhead to
small functions.
C++ programmers think they know what C programming is about (and unfortunately they dominate not just C++ compiler writers, but they
also damage C compilers while they are at it), so my sympathy for your problem is very limited.
David Brown <david.brown@hesbynett.no> wrote:
The big problem I see is the registers used for returning values from
functions. R0-R3 can all be used for passing arguments to functions, as
32-bit (or smaller) values, pointers, in pairs as 64-bit values, and as
parts of structs.
But the ABI only allows returning a single 32-bit value in R0, or a
scalar 64-bit value in R0:R1. If a function returns a non-scalar that
is larger than 32-bit, the caller has to allocate space on the stack for
the return type and pass a pointer to that space in R0.
According to EABI, it's also possible to return a 128 bit vector in R0-3: https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#result-return
To my mind, this is massively inefficient, especially when using structs
that are made up of two 32-bit parts.
Is there any good reason why the ABI is designed with such limited
register usage for returns? Newer ABIs like RISC-V 32-bit and x86_64
can at least use two registers for return values. Modern compilers are
quite happy breaking structs into parts in individual registers - it's a
/long/ time since they insisted that structs occupied a contiguous block
of memory. Can anyone give me an explanation why return types can't
simply use all the same registers that are available for argument passing?
The 'composite type' return value, where a pointer is passed in as the first argument to the function and a struct at that pointer is filled in with the return values, has existed since the first ARM ABI - APCS-R: http://www.riscos.com/support/developers/dde/appf.html
That dates from the mid 1980s before 'modern compilers', and I'm guessing that has stuck around. A lot of early ARM code was in assembler. The original ARMCC was good but fairly basic - GCC didn't support ARM until
about 1993.
[*] technically APCS-R was the second ARM ABI, APCS-A was the first: https://heyrick.eu/assembler/apcsintro.html
but I don't think return value handling was any different.
Are there good technical reasons for the conventions on 32-bit ARM? Or
is this all just historical from the days when everything was an "int"
and that's all anyone ever returned from functions?
Probably the latter.
Also that AArch64 was an opportunity to throw all this
stuff away and start again, with a much richer calling convention: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#result-return
but obviously that's no help to the microcontroller folks. At this stage, a change of calling convention might be fairly big ask.
On Mon, 6 Jan 2025 15:32:04 +0000, Anton Ertl wrote:
David Brown <david.brown@hesbynett.no> writes:
But the ABI only allows returning a single 32-bit value in R0, or a
scalar 64-bit value in R0:R1. If a function returns a non-scalar that
is larger than 32-bit, the caller has to allocate space on the stack for >>> the return type and pass a pointer to that space in R0.
To my mind, this is massively inefficient, especially when using structs >>> that are made up of two 32-bit parts.
I wonder, though, how things look for
mystruct = foo(&mystruct);
Does PCC perform the return stores to mystruct only after performing
all other memory accesses in foo? Probably yes, anything else would
complicate the compiler. In that case the caller could pass &mystruct
for the return value (a slight complication). But is that restriction
reflected in the calling convention?
For VERY MANY circumstances passing a struct by address is more
efficient than passing it by value, AND especially when the
compiler does not optimize heavily.
Struct returns were (and AFAIK still are, many decades after
they were added to C) a relatively rarely used feature, so Johnson
(PCC's author) probably did not want to waste a lot of effort on
making it more efficient.
In addition, the programmer has the choice of changing into pointer
form (&struct) from value form (struct) which is what we learned
was better style way back then.
--------------------------
I also think code would be a bit more efficient if there more registers
available for parameter passing and as scratch registers - perhaps 6
would make more sense.
There is a tendency towards passing more parameters in registers in
more recent calling conventions. IA-32 (and IIRC VAX) passes none,
MIPS uses 4 integer registers (for either integer or FP parameters),
Alpha uses 6 integer and 6 FP registers, AMD64's System V ABI 6
integer and 8 FP registers, ARM A64 has 8 integer and 8 FP registers,
RISC-V has 8 integer and 8 FP registers. Not sure why they were so
reluctant to use more registers earlier.
Compiler people were telling us that more callee saved registers would
be higher performing than more argument registers. It did not turn out
to be that way.
Oh and BTW, lack of argument registers leads to an increased
desire for the linker to perform inline folding. ...
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
On 06/01/2025 21:19, MitchAlsup1 wrote:
------------------------Both C and C++ provide perfectly good ways to pass data around by
address when that's what you want to do. My problem is that the calling convention won't let me pass around data in registers when I want to do
that.
I don't care what the compiler does when not optimising heavily - or for compilers that can't optimise heavily. When I am looking for efficient
code, I use optimisation - caring about inefficiencies in the calling convention without heavy optimisation is like caring about how fast your
car goes when you keep it in first gear.
Struct returns were (and AFAIK still are, many decades after
they were added to C) a relatively rarely used feature, so Johnson
(PCC's author) probably did not want to waste a lot of effort on
making it more efficient.
In addition, the programmer has the choice of changing into pointer
form (&struct) from value form (struct) which is what we learned
was better style way back then.
I already know when it is best to pass a struct via a pointer, and when
it is best to pass it as a struct value. (The 32-bit ARM calling
convention happily uses registers to pass structs by value, using up to
4 registers. It's the return via registers that is missing.) I also
know when it is best for a struct return to be via an address or in
registers - but C has no way to let me choose that.
On Tue, 7 Jan 2025 9:09:20 +0000, David Brown wrote:
On 06/01/2025 21:19, MitchAlsup1 wrote:
------------------------Both C and C++ provide perfectly good ways to pass data around by
address when that's what you want to do. My problem is that the calling
convention won't let me pass around data in registers when I want to do
that.
I don't care what the compiler does when not optimising heavily - or for
compilers that can't optimise heavily. When I am looking for efficient
code, I use optimisation - caring about inefficiencies in the calling
convention without heavy optimisation is like caring about how fast your
car goes when you keep it in first gear.
Struct returns were (and AFAIK still are, many decades after
they were added to C) a relatively rarely used feature, so Johnson
(PCC's author) probably did not want to waste a lot of effort on
making it more efficient.
In addition, the programmer has the choice of changing into pointer
form (&struct) from value form (struct) which is what we learned
was better style way back then.
I already know when it is best to pass a struct via a pointer, and when
it is best to pass it as a struct value. (The 32-bit ARM calling
convention happily uses registers to pass structs by value, using up to
4 registers. It's the return via registers that is missing.) I also
know when it is best for a struct return to be via an address or in
registers - but C has no way to let me choose that.
My 66000 ABI passes structs up to 8 doublewords in size as
arguments and as results.
mitchalsup@aol.com (MitchAlsup1) writes:
My 66000 ABI passes structs up to 8 doublewords in size as
arguments and as results.
What is a doubleword in your architecture? In intel vernacular
it's 32-bits, but that's not universal.
Both x86_64 and ARM64 support passing eight 64-bit quantities
as arguments and as results architecturally without using
the SIMD registers.
Now, ABI conventions may be otherwise, but they're important
for interoperability, not basic functionality.
I looked high and low for codes using more than 8 arguments andLarge numbers of parameters may be generated either by closure
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
conversion or by lambda lifting.
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
For languages where the type systems ensures that the max number of
arguments is known (and the same) when compiling the function and when compiling the calls to it, you could adjust the number of caller-saved argument registers according to the actual number of arguments of the function, thus making it "cheap" to allow, say, 13 argument registers
for those functions that take 13 arguments, since it doesn't impact the
other functions.
But in any case, I suspect there are also diminishing returns at some
point: how much faster is it in practice to pass/return 13 values in registers instead of 8 of them in registers and the remaining 5 on
the stack? I expect a 13-arg function to perform an amount
of work that will dwarf the extra work of going through the stack.
Stefan
For languages where the type systems ensures that the max number of
arguments is known (and the same) when compiling the function and when >compiling the calls to it, you could adjust the number of caller-saved >argument registers according to the actual number of arguments of the >function, thus making it "cheap" to allow, say, 13 argument registers
for those functions that take 13 arguments, since it doesn't impact the
other functions.
But in any case, I suspect there are also diminishing returns at some
point: how much faster is it in practice to pass/return 13 values in >registers instead of 8 of them in registers and the remaining 5 on
the stack? I expect a 13-arg function to perform an amount
of work that will dwarf the extra work of going through the stack.
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
But in any case, I suspect there are also diminishing returns at some >>point: how much faster is it in practice to pass/return 13 values in >>registers instead of 8 of them in registers and the remaining 5 onI certainly have a use for as many arguments as the ABI provides,
the stack? I expect a 13-arg function to perform an amount
of work that will dwarf the extra work of going through the stack.
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
I can agree that it's important to support those use-cases (varargs obviously, mismatched arg numbers less so), but I think the focus of optimization of the ABI should be calls to functions known to take the
exact same number of arguments (after all, even in C we normally know
the prototype of the called function; only sloppy ancient C calls
functions without proper declarations), even if it comes at the cost of
using different calling conventions for the two cases.
But in any case, I suspect there are also diminishing returns at some >>>point: how much faster is it in practice to pass/return 13 values in >>>registers instead of 8 of them in registers and the remaining 5 on
the stack?
AFAIK in these cases the same compiler generates the code for the
function and for the calls, so it should be pretty much free to use any >calling convention it likes.
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
I can agree that it's important to support those use-cases (varargs >obviously, mismatched arg numbers less so),
only sloppy ancient C calls
functions without proper declarations)
even if it comes at the cost of
using different calling conventions for the two cases.
I certainly have a use for as many arguments as the ABI provides,
Ah, yes, machine-generated code can always defy intuitions about what
is "typical".
That would mean that you find it ok that existing programs that use
vararg functions like printf but do not declare them before use don't
work on your newfangled architecture.
Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
I can agree that it's important to support those use-cases (varargs >>obviously, mismatched arg numbers less so),
You are head of a group of people who design a new architecture (say,
it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
Your ABI designer comes to you and tells you that his life would be
easier if it was ok that programs with mismatched arguments don't need
to work. Would you tell him that they don't need to work?
If yes, a few years down the road your prospective customers have to
decide whether to go for your newfangled architecture or one of the established ones. They learn that a number of programs work
everywhere else, but not on your architecture. How many of them will
be placated by your reasoning that these programs are not strictly
confoming standard programs?
How many will be alarmed by your
admission that you find it ok that you find it ok that such programs
don't work on your architecture? After all, hardly any program is a
strictly conforming standard program.
only sloppy ancient C calls
functions without proper declarations)
You find it ok to design a calling convention such that ancient C
programs do not work?
What benefit do you expect from such a calling convention? To allow
to use registers as arguments (and not callee-saved) that would
otherwise be preferably used as callee-saved registers?
However, I wonder why, e.g., RISC-V does not allow the use of all caller-saved registers as arguments.
In addition to the 8 argument
registers (a0-a7=x10-x17), RISC-V has 7 additional caller-saved
registers: t0-t6(=x5-x7,x28-x31); for FP register's it's even more
extreme: 8 argument registers fa0-fa7=f10-f17, and 12 additional
caller-saved registers ft0-ft12=f0-f7,f28-f31.
even if it comes at the cost of
using different calling conventions for the two cases.
That would mean that you find it ok that existing programs that use
vararg functions like printf but do not declare them before use don't
work on your newfangled architecture. Looking at <https://pdos.csail.mit.edu/6.828/2023/readings/riscv-calling.pdf>,
the RISC-V people find that acceptable:
|If argument i < 8 is a floating-point type, it is passed in
|floating-point register fai; [...] Additionally, floating-point
|arguments to variadic functions (except those that are explicitly
|named in the parameter list) are passed in integer registers.
So if I 'printf("%f",1.0)' without first declaring printf, the program
won't work. I just tried out compiling the following program on
RISC-V with gcc 10.3.1:
int main()
{
printf("%f\n",1.0);
}
int xxx()
{
yyy("%f\n",1.0,2);
}
Note that there is no "#include <stdio.h>" or any declaration of
printf() or yyy(). Yet 1.0 is passed to printf() in a1, while it is
passed to yyy() in fa0, and 2 is passed to yyy() in a1.
And gcc works around the varargs decision by using the varargs calling convention for some well-known vararg functions like printf, while
other undeclared functions use the non-varargs calling convention.
Apparently the fallout of that decision by the RISC-V people hit a
"relevant" program.
[1] Apparently they stuck with the decision to deal differently with
varargs, and then decided to change the rest of the calling convention
to benefit from that decision by not leaving holes in the FP argument registers for integers and vice versa. I don't find this clearly
expressed in <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc>.
The only thing that points in that direction is:
|Values are passed in floating-point registers whenever possible,
|whether or not the integer registers have been exhausted.
But this does not talk about how the integer argument register
numbering is changed by the "Hardware Floating-point Calling
Convention".
I certainly have a use for as many arguments as the ABI provides,
Ah, yes, machine-generated code can always defy intuitions about what
is "typical".
While I use a generator for my interpreter engines, many other people hand-code them. They would probably use macros for the function
declaration and the tail-call, though. Or maybe a macro that wraps
the whole payload so that one can easily switch between this technique
and one of the others.
- anton
On Thu, 9 Jan 2025 7:23:57 +0000, Anton Ertl wrote:
Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
I can agree that it's important to support those use-cases (varargs >>>obviously, mismatched arg numbers less so),
You are head of a group of people who design a new architecture (say,
it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
Your ABI designer comes to you and tells you that his life would be
easier if it was ok that programs with mismatched arguments don't need
to work. Would you tell him that they don't need to work?
No, I would stand my ground and mandate that they do work.
MitchAlsup1 <mitchalsup@aol.com> schrieb:
On Thu, 9 Jan 2025 7:23:57 +0000, Anton Ertl wrote:
Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the >>>>> number of arguments in the caller and callee.
I can agree that it's important to support those use-cases (varargs >>>>obviously, mismatched arg numbers less so),
You are head of a group of people who design a new architecture (say,
it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
Your ABI designer comes to you and tells you that his life would be
easier if it was ok that programs with mismatched arguments don't need
to work. Would you tell him that they don't need to work?
No, I would stand my ground and mandate that they do work.
That can be tricky. You can read
https://blog.r-project.org/2019/05/15/gfortran-issues-with-lapack/index.html
and its sequel
https://blog.r-project.org/2019/09/25/gfortran-issues-with-lapack-ii/
as a cautionary tale.
To cut this a little shorter: Assume eight arguments are passed in registers, like for My 66000.
Caller calls
foo (a1, a2, a3, a4, a5, a6, a7, a8);
Callee side:
foo (a1, a2, a3, a4, a5, a6, a7, a8, a9)
Foo ends with
bar (b1, b2, b3, b4, b5, b6, b7, b8, b9);
and wants to save stack space, so it stores the value of b9 into
the space where it was supposed to be, and then branches to bar.
Result: Stack corruption.
What would you tell your ABI designer in that case? Don't do tail
calls, it is better to use more stack space, with all effect on
stack sizes and locality that would have?
Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
I can agree that it's important to support those use-cases (varargs >>obviously, mismatched arg numbers less so),
You are head of a group of people who design a new architecture (say,
it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
Your ABI designer comes to you and tells you that his life would be
easier if it was ok that programs with mismatched arguments don't need
to work. Would you tell him that they don't need to work?
If yes, a few years down the road your prospective customers have to
decide whether to go for your newfangled architecture or one of the established ones. They learn that a number of programs work
everywhere else, but not on your architecture. How many of them will
be placated by your reasoning that these programs are not strictly
confoming standard programs? How many will be alarmed by your
admission that you find it ok that you find it ok that such programs
don't work on your architecture? After all, hardly any program is a
strictly conforming standard program.
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
That would mean that you find it ok that existing programs that use
vararg functions like printf but do not declare them before use don't
work on your newfangled architecture.
Interestingly, tail call optimization (which I believe you like)
can cause bugs with mismatched arguments when different functions
disagree abuout the stack size.
So, if you want to allow mismatched declarations, better
disable tail calls, to be on the safe side.
On Thu, 9 Jan 2025 21:23:30 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
On Thu, 9 Jan 2025 7:23:57 +0000, Anton Ertl wrote:
Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
ABI calling conventions tend to be designed to support at least C, >>>>>> including varargs and often also tolerant of differences between the >>>>>> number of arguments in the caller and callee.
I can agree that it's important to support those use-cases (varargs >>>>>obviously, mismatched arg numbers less so),
You are head of a group of people who design a new architecture (say,
it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V). >>>> Your ABI designer comes to you and tells you that his life would be
easier if it was ok that programs with mismatched arguments don't need >>>> to work. Would you tell him that they don't need to work?
No, I would stand my ground and mandate that they do work.
That can be tricky. You can read
https://blog.r-project.org/2019/05/15/gfortran-issues-with-lapack/index.html >>
and its sequel
https://blog.r-project.org/2019/09/25/gfortran-issues-with-lapack-ii/
as a cautionary tale.
Yes, I had to make a nasty ABI work on the HEP (Denelcor)
To cut this a little shorter: Assume eight arguments are passed in
registers, like for My 66000.
Caller calls
foo (a1, a2, a3, a4, a5, a6, a7, a8);
Callee side:
foo (a1, a2, a3, a4, a5, a6, a7, a8, a9)
Foo ends with
bar (b1, b2, b3, b4, b5, b6, b7, b8, b9);
and wants to save stack space, so it stores the value of b9 into
the space where it was supposed to be, and then branches to bar.
Result: Stack corruption.
What would you tell your ABI designer in that case? Don't do tail
calls, it is better to use more stack space, with all effect on
stack sizes and locality that would have?
Same response I would give to::
printf( "%d %d %d %d %d/r", a[i] );
"They deserve what they get".
You will notice that no ISA has ever had a "go jump in the lake"
instruction. For had there been, computers would not have survived
the the present--they would all be in the lake...
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
If yes, a few years down the road your prospective customers have to
decide whether to go for your newfangled architecture or one of the
established ones. They learn that a number of programs work
everywhere else, but not on your architecture. How many of them will
be placated by your reasoning that these programs are not strictly
confoming standard programs? How many will be alarmed by your
admission that you find it ok that you find it ok that such programs
don't work on your architecture? After all, hardly any program is a
strictly conforming standard program.
Such things happended many times in the past. AFAIK standard
setup on a VAX was that accessing data at address 0 gave you 0.
A lot of VAX programs needed fixes to run on different machines.
I remember issue with writing to strings: early C compilers
put literal strings in writable memory and programs assumed that
they can change strings.
C 'errno' was made more abstract due
to multithreading, it broke some programs.
Concerning varags,
Power PC and later AMD-64 used calling convention incompatible
with popular expectations.
Concerning customers, they will tolerate a lot of things, as long
as there are benefits (faster
or cheaper machines,
better security,
etc.) and fixes require reasonable amount of work.
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
I can agree that it's important to support those use-cases (varargs >>>obviously, mismatched arg numbers less so),
You are head of a group of people who design a new architecture (say,
it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
Your ABI designer comes to you and tells you that his life would be
easier if it was ok that programs with mismatched arguments don't need
to work. Would you tell him that they don't need to work?
If yes, a few years down the road your prospective customers have to
decide whether to go for your newfangled architecture or one of the
established ones. They learn that a number of programs work
everywhere else, but not on your architecture. How many of them will
be placated by your reasoning that these programs are not strictly
confoming standard programs? How many will be alarmed by your
admission that you find it ok that you find it ok that such programs
don't work on your architecture? After all, hardly any program is a
strictly conforming standard program.
Such things happended many times in the past. AFAIK standard
setup on a VAX was that accessing data at address 0 gave you 0.
Such things happended many times in the past. AFAIK standard
setup on a VAX was that accessing data at address 0 gave you 0.
A lot of VAX programs needed fixes to run on different machines.
That case is interesting. It's certainly a benefit to programmers if
most uses of NULL produce a SIGSEGV, but for existing programs mapping >allowing to have accessible memory in page 0 is an advantage. So how
did we get from there to where we are now?
First, my guess is that the VAX is only called out because it was so
popular, and it was one of the first Unix machines where doing it
differently was possible. I am sure that earlier Unix tragets without >virtual memory used memory starting with address 1 because they would >otherwise have wasted precious memory.
Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
I can agree that it's important to support those use-cases (varargs
obviously, mismatched arg numbers less so),
You are head of a group of people who design a new architecture (say,
it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
Your ABI designer comes to you and tells you that his life would be
easier if it was ok that programs with mismatched arguments don't need
to work. Would you tell him that they don't need to work?
If yes, a few years down the road your prospective customers have to
decide whether to go for your newfangled architecture or one of the established ones. They learn that a number of programs work
everywhere else, but not on your architecture. How many of them will
be placated by your reasoning that these programs are not strictly
confoming standard programs? How many will be alarmed by your
admission that you find it ok that you find it ok that such programs
don't work on your architecture? After all, hardly any program is a
strictly conforming standard program.
only sloppy ancient C calls
functions without proper declarations)
You find it ok to design a calling convention such that ancient C
programs do not work?
What benefit do you expect from such a calling convention? To allow
to use registers as arguments (and not callee-saved) that would
otherwise be preferably used as callee-saved registers?
even if it comes at the cost of
using different calling conventions for the two cases.
That would mean that you find it ok that existing programs that use
vararg functions like printf but do not declare them before use don't
work on your newfangled architecture.
On 09/01/2025 08:23, Anton Ertl wrote:
Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.
Why should an ABI be tolerant of such differences? In C, calling a
function with an unexpected number (or type) of arguments has always
been undefined behaviour, and always been something that programmers
have strived to avoid. For variadic functions (including old
pre-standard functions), the code does not declare the number or types
of arguments, but you still have to match up the caller and callee.
Call printf() with a mismatch between the format string and the
arguments, and you can expect nasal daemons.
How many people actually want to use code where some functions are
called with an incorrect number of parameters? Such code is /broken/.
If it ever gave results that the original users were happy with, it is
by luck - no matter what ABI you have for your new architecture and new tools, it's pure luck whether things work or not in any sense.
So the best you can do for your prospective customers is tell them that
you prioritise the results for correct code and help them with tools to
find mistakes in their ancient broken code.
David Brown <david.brown@hesbynett.no> writes:
On 09/01/2025 08:23, Anton Ertl wrote:
Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the >>>>> number of arguments in the caller and callee.
Why should an ABI be tolerant of such differences? In C, calling a
function with an unexpected number (or type) of arguments has always
been undefined behaviour, and always been something that programmers
have strived to avoid. For variadic functions (including old
pre-standard functions), the code does not declare the number or types
of arguments, but you still have to match up the caller and callee.
I'm not sure that's completely true. Consider, for example,
main(). It's sort of variadic, but most applications only declare
the standard C argc/argv arguments. POSIX systems supply
a third parameter (envp) and most unix/linux implementations
supply a fourth parameter (auxv).
I should think so long as the caller provides at least enough
parameters to match the callee, there shouldn't be any
issues.
Call printf() with a mismatch between the format string and the
arguments, and you can expect nasal daemons.
Not if you provide _more_ parameters than the format string
requires, which can happen with e.g. i18n error message strings.
David Brown <david.brown@hesbynett.no> schrieb:
How many people actually want to use code where some functions are
called with an incorrect number of parameters? Such code is /broken/.
Agreed (at least in priciple).
If it ever gave results that the original users were happy with, it is
by luck - no matter what ABI you have for your new architecture and new
tools, it's pure luck whether things work or not in any sense.
It gets worse when the code in question has been around for decades,
and is widely used. Some ABIs, such as the x86-64 psABI, are very
forgiving of errors.
So the best you can do for your prospective customers is tell them that
you prioritise the results for correct code and help them with tools to
find mistakes in their ancient broken code.
Now, you can also tell them to use LTO for checks for any old
software.
Excuses are running out.
MitchAlsup1 <mitchalsup@aol.com> wrote:
I also think code would be a bit more efficient if there more registers
available for parameter passing and as scratch registers - perhaps 6
would make more sense.
Basically, here, there is competing pressure between the compiler
needing a handful of preserved registers, and the compiler being
more efficient if there were more argument/result passing registers.
My 66000 ABI has 8 argument registers, 7 temporary registers, 14
preserved registers, a FP, and a SP. IP is not part of the register
file. My ABI has a note indicating that the aggregations can be
altered, just that I need a good reason to change.
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
I meet such code with reasonable frequency. I peeked semi
randomly into Lapack. First routine that I looked at had
8 arguments, so within your limit. Second is:
SUBROUTINE ZUNMR3( SIDE, TRANS, M, N, K, L, A, LDA, TAU, C, LDC,
$ WORK, INFO )
which has 13 arguments.
Large number of arguments is typical in old style Fortran numeric
code.
On 1/6/2025 6:11 PM, Waldek Hebisch wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
I also think code would be a bit more efficient if there more registers >>>> available for parameter passing and as scratch registers - perhaps 6
would make more sense.
Basically, here, there is competing pressure between the compiler
needing a handful of preserved registers, and the compiler being
more efficient if there were more argument/result passing registers.
My 66000 ABI has 8 argument registers, 7 temporary registers, 14
preserved registers, a FP, and a SP. IP is not part of the register
file. My ABI has a note indicating that the aggregations can be
altered, just that I need a good reason to change.
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
I meet such code with reasonable frequency. I peeked semi
randomly into Lapack. First routine that I looked at had
8 arguments, so within your limit. Second is:
SUBROUTINE ZUNMR3( SIDE, TRANS, M, N, K, L, A, LDA, TAU, C, LDC,
$ WORK, INFO )
which has 13 arguments.
Large number of arguments is typical in old style Fortran numeric
code.
While there has been much discussion down thread relating to Waldek's
other points, there hasn't been much about these.
So, some questions. Has Lapack (and the other old style Fortran numeric
code that Waldek mentioned) lost its/their importance as a major user of
CPU cycles? Or do these subroutines consume so many CPU cycles that the overhead of the large number of parameters is lost in the noise? Or is
there some other explanation for Mitch not considering their importance?
antispam@fricas.org (Waldek Hebisch) writes:
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
If yes, a few years down the road your prospective customers have to
decide whether to go for your newfangled architecture or one of the
established ones. They learn that a number of programs work
everywhere else, but not on your architecture. How many of them will
be placated by your reasoning that these programs are not strictly
confoming standard programs? How many will be alarmed by your
admission that you find it ok that you find it ok that such programs
don't work on your architecture? After all, hardly any program is a
strictly conforming standard program.
Such things happended many times in the past. AFAIK standard
setup on a VAX was that accessing data at address 0 gave you 0.
A lot of VAX programs needed fixes to run on different machines.
That case is interesting. It's certainly a benefit to programmers if
most uses of NULL produce a SIGSEGV, but for existing programs mapping allowing to have accessible memory in page 0 is an advantage. So how
did we get from there to where we are now?
C 'errno' was made more abstract due
to multithreading, it broke some programs.
That's pretty similar to an ABI issue (not sure if errno is in the
ABIs or not).
On 1/6/2025 6:11 PM, Waldek Hebisch wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
I also think code would be a bit more efficient if there more registers >>>> available for parameter passing and as scratch registers - perhaps 6
would make more sense.
Basically, here, there is competing pressure between the compiler
needing a handful of preserved registers, and the compiler being
more efficient if there were more argument/result passing registers.
My 66000 ABI has 8 argument registers, 7 temporary registers, 14
preserved registers, a FP, and a SP. IP is not part of the register
file. My ABI has a note indicating that the aggregations can be
altered, just that I need a good reason to change.
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
I meet such code with reasonable frequency. I peeked semi
randomly into Lapack. First routine that I looked at had
8 arguments, so within your limit. Second is:
SUBROUTINE ZUNMR3( SIDE, TRANS, M, N, K, L, A, LDA, TAU, C, LDC,
$ WORK, INFO )
which has 13 arguments.
Large number of arguments is typical in old style Fortran numeric
code.
While there has been much discussion down thread relating to Waldek's
other points, there hasn't been much about these.
So, some questions. Has Lapack (and the other old style Fortran numeric
code that Waldek mentioned) lost its/their importance as a major user of
CPU cycles? Or do these subroutines consume so many CPU cycles that the overhead of the large number of parameters is lost in the noise? Or is
there some other explanation for Mitch not considering their importance?
On Fri, 10 Jan 2025 10:25:23 +0000, Anton Ertl wrote:
antispam@fricas.org (Waldek Hebisch) writes:
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
If yes, a few years down the road your prospective customers have to
decide whether to go for your newfangled architecture or one of the
established ones. They learn that a number of programs work
everywhere else, but not on your architecture. How many of them will
be placated by your reasoning that these programs are not strictly
confoming standard programs? How many will be alarmed by your
admission that you find it ok that you find it ok that such programs
don't work on your architecture? After all, hardly any program is a
strictly conforming standard program.
Such things happended many times in the past. AFAIK standard
setup on a VAX was that accessing data at address 0 gave you 0.
A lot of VAX programs needed fixes to run on different machines.
That case is interesting. It's certainly a benefit to programmers if
most uses of NULL produce a SIGSEGV, but for existing programs mapping
allowing to have accessible memory in page 0 is an advantage. So how
did we get from there to where we are now?
The blame goes to defining NULL as a pointer that is not pointing at >anything. We have no integer that has the property of one value that
is not an integer--we COULD have had such a value (NEG_MAX on 2's
complement, -0 on 1's complement), but no..........
Stefan Monnier <monnier@iro.umontreal.ca> writes:
AFAIK in these cases the same compiler generates the code for theWith separate compilation, the compiler does not know which other
function and for the calls, so it should be pretty much free to use any >>calling convention it likes.
compiler generates the code for the caller of a function or the callee
of a function.
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Has Lapack (and the other old style Fortran numeric
code that Waldek mentioned) lost its/their importance as a major user of
CPU cycles?
Or do these subroutines consume so many CPU cycles that the
overhead of the large number of parameters is lost in the noise?
Or is
there some other explanation for Mitch not considering their importance?
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
On Mon, 13 Jan 2025 21:53:55 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
The functions mentioned have the property of taking x as
any IEEE 754 number (including NaNs, infinities, denorms)
and produce a IEEE 754 number {NaNs, infinities, norms,
denorms}.
But if POSIX wants to spend as many cycles setting errno
as performing the calculation, that is for POSIX to decide.
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 21:53:55 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
The functions mentioned have the property of taking x as
any IEEE 754 number (including NaNs, infinities, denorms)
and produce a IEEE 754 number {NaNs, infinities, norms,
denorms}.
But if POSIX wants to spend as many cycles setting errno
as performing the calculation, that is for POSIX to decide.
POSIX leaves it up to the programmer to decide. If the
programmer desires EDOM or ERANGE, they set the
appropriate bit in math_errhandling before calling the
sin et alia functions.
POSIX leaves it up to the programmer to decide. If the
programmer desires EDOM or ERANGE, they set the
appropriate bit in math_errhandling before calling the
sin et alia functions.
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
Has Lapack (and the other old style Fortran numeric
code that Waldek mentioned) lost its/their importance as a major user of
CPU cycles?
It's less than it used to be in the days when supercomputers
roamed the computer centers, but for these applications where
it matters, it can be significant.
Or do these subroutines consume so many CPU cycles that the
overhead of the large number of parameters is lost in the noise?
If you have many small matrices to multiply, startup overhead
can be quite significant. Not on a 2000*2000 matrix, though.
Or is
there some other explanation for Mitch not considering their importance?
I think eight arguments, passed by reference in registers, is not
too bad.
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 21:53:55 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
The functions mentioned have the property of taking x as
any IEEE 754 number (including NaNs, infinities, denorms)
and produce a IEEE 754 number {NaNs, infinities, norms,
denorms}.
But if POSIX wants to spend as many cycles setting errno
as performing the calculation, that is for POSIX to decide.
POSIX leaves it up to the programmer to decide. If the
programmer desires EDOM or ERANGE, they set the
appropriate bit in math_errhandling before calling the
sin et alia functions.
On Mon, 13 Jan 2025 22:40:02 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 21:53:55 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
The functions mentioned have the property of taking x as
any IEEE 754 number (including NaNs, infinities, denorms)
and produce a IEEE 754 number {NaNs, infinities, norms,
denorms}.
But if POSIX wants to spend as many cycles setting errno
as performing the calculation, that is for POSIX to decide.
POSIX leaves it up to the programmer to decide. If the
programmer desires EDOM or ERANGE, they set the
appropriate bit in math_errhandling before calling the
sin et alia functions.
So, now the subroutine, which computes all work in a single
instruction, has to check a global variable to decide if it
has to LD in TLS pointer just to set errno ?!!?
On Mon, 13 Jan 2025 22:40:02 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 21:53:55 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
The functions mentioned have the property of taking x as
any IEEE 754 number (including NaNs, infinities, denorms)
and produce a IEEE 754 number {NaNs, infinities, norms,
denorms}.
But if POSIX wants to spend as many cycles setting errno
as performing the calculation, that is for POSIX to decide.
POSIX leaves it up to the programmer to decide. If the
programmer desires EDOM or ERANGE, they set the
appropriate bit in math_errhandling before calling the
sin et alia functions.
So, now the subroutine, which computes all work in a single
instruction, has to check a global variable to decide if it
has to LD in TLS pointer just to set errno ?!!?
On 13/01/2025 23:40, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 21:53:55 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
The functions mentioned have the property of taking x as
any IEEE 754 number (including NaNs, infinities, denorms)
and produce a IEEE 754 number {NaNs, infinities, norms,
denorms}.
But if POSIX wants to spend as many cycles setting errno
as performing the calculation, that is for POSIX to decide.
POSIX leaves it up to the programmer to decide. If the
programmer desires EDOM or ERANGE, they set the
appropriate bit in math_errhandling before calling the
sin et alia functions.
You know POSIX better than I do, but AFAIK "math_errhandling" is a fixed >value set by the implementation, usually as a macro. Certainly with a
quick check with gcc on Linux, I could not set the bits in math_errhandling.
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 22:40:02 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 21:53:55 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing >>>>>>> direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
The functions mentioned have the property of taking x as
any IEEE 754 number (including NaNs, infinities, denorms)
and produce a IEEE 754 number {NaNs, infinities, norms,
denorms}.
But if POSIX wants to spend as many cycles setting errno
as performing the calculation, that is for POSIX to decide.
POSIX leaves it up to the programmer to decide. If the
programmer desires EDOM or ERANGE, they set the
appropriate bit in math_errhandling before calling the
sin et alia functions.
So, now the subroutine, which computes all work in a single
instruction, has to check a global variable to decide if it
has to LD in TLS pointer just to set errno ?!!?
The subroutine clearly does more than "do all the work in a single instruction".
How does your instruction support all the functionality
required by the POSIX specification for the sin(3) library function?
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
David Brown <david.brown@hesbynett.no> writes:
On 13/01/2025 23:40, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 21:53:55 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing >>>>>>>> direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
The functions mentioned have the property of taking x as
any IEEE 754 number (including NaNs, infinities, denorms)
and produce a IEEE 754 number {NaNs, infinities, norms,
denorms}.
But if POSIX wants to spend as many cycles setting errno
as performing the calculation, that is for POSIX to decide.
POSIX leaves it up to the programmer to decide. If the
programmer desires EDOM or ERANGE, they set the
appropriate bit in math_errhandling before calling the
sin et alia functions.
You know POSIX better than I do, but AFAIK "math_errhandling" is a fixed
value set by the implementation, usually as a macro. Certainly with a
quick check with gcc on Linux, I could not set the bits in math_errhandling. >>
Yes, the programmer in this case would instruct the compiler what
the value of math_errhandling should be, e.g. with --ffast-math.
https://gcc.gnu.org/wiki/FloatingPointMath
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> wrote:
On 1/6/2025 6:11 PM, Waldek Hebisch wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
I also think code would be a bit more efficient if there more registers >>>>> available for parameter passing and as scratch registers - perhaps 6 >>>>> would make more sense.
Basically, here, there is competing pressure between the compiler
needing a handful of preserved registers, and the compiler being
more efficient if there were more argument/result passing registers.
My 66000 ABI has 8 argument registers, 7 temporary registers, 14
preserved registers, a FP, and a SP. IP is not part of the register
file. My ABI has a note indicating that the aggregations can be
altered, just that I need a good reason to change.
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
I meet such code with reasonable frequency. I peeked semi
randomly into Lapack. First routine that I looked at had
8 arguments, so within your limit. Second is:
SUBROUTINE ZUNMR3( SIDE, TRANS, M, N, K, L, A, LDA, TAU, C, LDC, >>> $ WORK, INFO )
which has 13 arguments.
Large number of arguments is typical in old style Fortran numeric
code.
While there has been much discussion down thread relating to Waldek's
other points, there hasn't been much about these.
So, some questions. Has Lapack (and the other old style Fortran numeric
code that Waldek mentioned) lost its/their importance as a major user of
CPU cycles? Or do these subroutines consume so many CPU cycles that the
overhead of the large number of parameters is lost in the noise? Or is
there some other explanation for Mitch not considering their importance?
Some comments to this:
You are implicitely assuming that passing large number of
arguments is expensive.
Of course, if you can do the job with
smaller number of arguments, then there may be some saving.
However, large number of arguments is partially to increase
performance.
I wrote:
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:
Has Lapack (and the other old style Fortran numeric
code that Waldek mentioned) lost its/their importance as a major user of >>> CPU cycles?
It's less than it used to be in the days when supercomputers
roamed the computer centers, but for these applications where
it matters, it can be significant.
Or do these subroutines consume so many CPU cycles that the
overhead of the large number of parameters is lost in the noise?
If you have many small matrices to multiply, startup overhead
can be quite significant. Not on a 2000*2000 matrix, though.
Or is
there some other explanation for Mitch not considering their importance?
I think eight arguments, passed by reference in registers, is not
too bad.
.... when the rest can be passed on the stack.
On Tue, 14 Jan 2025 14:22:19 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Raising of FP exceptions is orthogonal to question of one instruction
vs library call. If anything, when exceptions are enabled, with >single-instruction implementation it is probably easier for exception
handler to find the reason and generate useful diagnostics.
As to what POSIX allows, on the manual page that you quoted I see no >indication that implementation is required to give to programmer to
select this or that behavior. I read it like implementation is allowed
to make the choice fully by itself.
On 1/12/2025 5:20 PM, Waldek Hebisch wrote:
You are implicitely assuming that passing large number of
arguments is expensive.
I guess. I am actually assuming that passing arguments in memory is
more expensive than passing them in registers. I don't think that is controversial.
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 22:40:02 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 21:53:55 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
errno is an atrocity all by itself; single handedly preventing >>>>>>>> direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.
Fortunately, the C standard does not require errno to be set
for these functions. Apple, for example, does not do so.
Nor will I.
POSIX does, however, require errno to be set conditionally
based on an application global variable 'math_errhandling'.
The functions mentioned have the property of taking x as
any IEEE 754 number (including NaNs, infinities, denorms)
and produce a IEEE 754 number {NaNs, infinities, norms,
denorms}.
But if POSIX wants to spend as many cycles setting errno
as performing the calculation, that is for POSIX to decide.
POSIX leaves it up to the programmer to decide. If the
programmer desires EDOM or ERANGE, they set the
appropriate bit in math_errhandling before calling the
sin et alia functions.
So, now the subroutine, which computes all work in a single
instruction, has to check a global variable to decide if it
has to LD in TLS pointer just to set errno ?!!?
The subroutine clearly does more than "do all the work in a single instruction".
How does your instruction support all the functionality
required by the POSIX specification for the sin(3) library function?
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Scott Lurndal <scott@slp53.sl.home> schrieb:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Clearly, there is a committee which wanted to be able for people
to detect certain error conditions on a fine-grained level.
One assumes tht they did not consider the consequences.
Scott Lurndal <scott@slp53.sl.home> schrieb:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Clearly, there is a committee which wanted to be able for people
to detect certain error conditions on a fine-grained level.
One assumes tht they did not consider the consequences.
Scott Lurndal <scott@slp53.sl.home> schrieb:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Clearly, there is a committee which wanted to be able for people
to detect certain error conditions on a fine-grained level.
One assumes tht they did not consider the consequences.
Thomas Koenig wrote:
Scott Lurndal <scott@slp53.sl.home> schrieb:Without exposing any internal discussions, it should be obvious to
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Clearly, there is a committee which wanted to be able for people
to detect certain error conditions on a fine-grained level.
One assumes tht they did not consider the consequences.
anyone "versed in the field" that the ieee754 standard has some warts
and mistakes. It has been possible to correct very few of them since
1978.
OTOH, Kahan & co did an amazingly good job to start with, the fact that
they didn't really consider the needs of massively parallel
implementations 40-50 years later cannot be blamed on them.
It is possible that one or two of the grandfather clauses in 754 can be removed in the future, simply because the architectures that made those exceptional choices are going away permanently.
I do not see any way to support things like "trap and rescale" as a way
to handle exponent overruns, even though that was a neat idea back then.
It is much more likely that we will simply switch to quad/f128 (or even arbitrary precision) for those few computations that could need it.
Terje
Thomas Koenig wrote:
Scott Lurndal <scott@slp53.sl.home> schrieb:Without exposing any internal discussions, it should be obvious to
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Clearly, there is a committee which wanted to be able for people
to detect certain error conditions on a fine-grained level.
One assumes tht they did not consider the consequences.
anyone "versed in the field" that the ieee754 standard has some warts
and mistakes. It has been possible to correct very few of them since 1978.
Scott Lurndal <scott@slp53.sl.home> schrieb:
Thomas Koenig <tkoenig@netcologne.de> writes:
Scott Lurndal <scott@slp53.sl.home> schrieb:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Clearly, there is a committee which wanted to be able for people
to detect certain error conditions on a fine-grained level.
One assumes tht they did not consider the consequences.
I spent several years on one of those committees[*] in the 90s.
There were math and IEEE FP experts who very carefully considered
all the consequences of changes to the math interfaces.
Putting in mandatory errno handling for transcencental intrinsics,
and making this dependent on a global flag, was a huge mistake.
Either the people on that particular committee didn't consider
the consequences, or they (second option to the one above) didn't
understand the consequences what they were doing. Vector computers
had already been in service for a decade when POSIX was released,
and a question "Would it run well on a Cray" would have answered
itself.
OTOH, they can be excused if they thought that C should not be
be used for serious numerical work, and would not be. People had
FORTRAN for that...
Thomas Koenig <tkoenig@netcologne.de> writes:
Scott Lurndal <scott@slp53.sl.home> schrieb:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Clearly, there is a committee which wanted to be able for people
to detect certain error conditions on a fine-grained level.
One assumes tht they did not consider the consequences.
I spent several years on one of those committees[*] in the 90s. There were math and IEEE
FP experts who very carefully considered all the consequences of changes
to the math interfaces.
Stephen Fuld wrote:
On 1/12/2025 5:20 PM, Waldek Hebisch wrote:
You are implicitely assuming that passing large number of
arguments is expensive.
I guess. I am actually assuming that passing arguments in memory
is more expensive than passing them in registers. I don't think
that is controversial.
Usually true, except for recursive functions where you have to store
most stuff on the stack anyway, so going directly there can sometimes generate more compact code.
Terje
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Thomas Koenig wrote:
Scott Lurndal <scott@slp53.sl.home> schrieb:Without exposing any internal discussions, it should be obvious to
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sin.html
Clearly there are programmers who wish to be able to detect
certain exceptions, and POSIX allows programmers to
select that behavior.
Clearly, there is a committee which wanted to be able for people
to detect certain error conditions on a fine-grained level.
One assumes tht they did not consider the consequences.
anyone "versed in the field" that the ieee754 standard has some warts
and mistakes. It has been possible to correct very few of them since 1978.
I'm not throwing shade on the IEEE committe, they did quite a good
job, considering what they did and did not know.
What I was criticising was the comittee(s) which made errno handling
for functions like sin() and cos() mandatory, and put activating
it in a globel flag.
It is much more likely that we will simply switch to quad/f128 (or
even arbitrary precision) for those few computations that could need
it.
Terje
On Tue, 14 Jan 2025 19:18:27 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
Stephen Fuld wrote:
On 1/12/2025 5:20 PM, Waldek Hebisch wrote:
You are implicitely assuming that passing large number of
arguments is expensive.
I guess. I am actually assuming that passing arguments in memory
is more expensive than passing them in registers. I don't think
that is controversial.
Usually true, except for recursive functions where you have to store
most stuff on the stack anyway, so going directly there can sometimes
generate more compact code.
Terje
I would think that for Fortran (==everything passed by reference)
memory would beat registers most of the time.
May be, except for
functions with 0-4 parameters.
Do common Fortarn compilers even bother with passing in register?
It would require replacement of natural by-reference "pointer in
register points to value in memory" calling sequence to something like copy-in/copy-out, right?
scott@slp53.sl.home (Scott Lurndal) writes:
Thomas Koenig <tkoenig@netcologne.de> writes:[...]
What I was criticising was the comittee(s) which made errno handling
for functions like sin() and cos() mandatory, and put activating
it in a globel flag.
It's not mandatory. It's listed as an optional extension, and
even when implemented, it's opt-in at compile time.
"The functionality described is optional. The functionality
described is mandated by the ISO C standard only for implementations
that define __STDC_IEC_559__."
I can't find that anywhere in ISO C or POSIX. What exactly are you
quoting? ISO C doesn't tie math_errhandling to __STDC_IEC_559__.
scott@slp53.sl.home (Scott Lurndal) writes:
Thomas Koenig <tkoenig@netcologne.de> writes:
There's no requirement in ISO C or POSIX for an implementation to let
users affect the value of math_errhandling, at compile time or
otherwise. (And POSIX isn't directly relevant; this is all defined by
ISO C. There might be something in POSIX that goes beyond the ISO C >requirements.)
gcc has "-f[no-]fast-math" and "-f[no-]math-errno" options that can
affect the value of math_errhandling.
I would guess that today majority of numerical work is done from python
by calling libraries.
I would think that for Fortran (==everything passed by reference)
memory would beat registers most of the time.
Pass by COMMON block was even faster.
According to MitchAlsup1 <mitchalsup@aol.com>:
I would think that for Fortran (==everything passed by reference)
memory would beat registers most of the time.
Pass by COMMON block was even faster.
Sometimes. On machines that don't have direct addressing, such as
S/360,
the code needs to load a pointer to the data either way so it's a wash.
Even when you do have direct addressing, if code is compiled to be
position indepedent, the common block wouldn't be in the same module
as the code that references it so it still needs to load a pointer
from the GOT or whatever its equivalent is.
On Wed, 15 Jan 2025 3:31:47 +0000, John Levine wrote:
According to MitchAlsup1 <mitchalsup@aol.com>:
Pass by COMMON block was even faster.
Sometimes. On machines that don't have direct addressing, such as
S/360,
the code needs to load a pointer to the data either way so it's a wash.
Even when you do have direct addressing, if code is compiled to be
position indepedent, the common block wouldn't be in the same module
as the code that references it so it still needs to load a pointer
from the GOT or whatever its equivalent is.
Pass by COMMON block allows one to pass hundreds of data values in a
single call.
You are treating the common block as if it had but one data container.
On Tue, 14 Jan 2025 21:48:19 +0000, Michael S wrote:
On Tue, 14 Jan 2025 19:18:27 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
Stephen Fuld wrote:
On 1/12/2025 5:20 PM, Waldek Hebisch wrote:
You are implicitely assuming that passing large number of
arguments is expensive.
I guess. I am actually assuming that passing arguments in memory
is more expensive than passing them in registers. I don't think
that is controversial.
Usually true, except for recursive functions where you have to store
most stuff on the stack anyway, so going directly there can sometimes
generate more compact code.
Terje
I would think that for Fortran (==everything passed by reference)
memory would beat registers most of the time.
Pass by COMMON block was even faster.
It would require replacement of natural by-reference "pointer in
register points to value in memory" calling sequence to something like
copy-in/copy-out, right?
No, Fortran will pass dope vectors to called subroutines. The
called subroutine needs to understand the dope vector.
MitchAlsup1 <mitchalsup@aol.com> wrote:
Pass by COMMON block was even faster.
I do not think so. I LAPACK-like cases there are array arguments.
Normal calling convention needs to store and later read parameters
and pass addresses. COMMON would force copying of entire arrays,
much less efficienct than handling parameters.
On Thu, 16 Jan 2025 3:02:44 +0000, Waldek Hebisch wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
Pass by COMMON block was even faster.
I do not think so. I LAPACK-like cases there are array arguments.
Normal calling convention needs to store and later read parameters
and pass addresses. COMMON would force copying of entire arrays,
much less efficienct than handling parameters.
SUBROUTINE FOO
COMMON /ALPHA/ i,j,k,a[100],b[100],c[100,100]
See no arguments, passed directly by common-block, no copying of
data, no dope vectors needed.
On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
Large numbers of parameters may be generated either by closure
conversion or by lambda lifting. These are FP language
transformations that are analogous to, but potentially more complex
than, the rewriting of object methods and their call sites to pass the current object in an OO language.
[The difference between closure conversion and lambda lifting is the
scope of the tranformation: conversion limits code transformations to
within the defining call chain, whereas lifting pulls the closure to
top level making it (at least potentially) globally available.]
In either case the original function is rewritten such that non-local variables can be passed as parameters. The function's code must be
altered to access the non-locals - either directly as explicit
individual parameters, or by indexing from a pointer to an environment
data structure.
While in a simple case this could look exactly like the OO method transformation, recall that a general closure may require access to
non-local variables spread through multiple environments. Even if
whole environments are passed via single pointers, there still may
need to be multiple parameters added.
George Neuner <gneuner2@comcast.net> writes:
On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1)
wrote:
I looked high and low for codes using more than 8 arguments and
returning aggregates larger than 8 double words, and about the
only things I found were a handful of []print[]() calls.
Large numbers of parameters may be generated either by closure
conversion or by lambda lifting. These are FP language
transformations that are analogous to, but potentially more complex
than, the rewriting of object methods and their call sites to pass the
current object in an OO language.
[The difference between closure conversion and lambda lifting is the
scope of the tranformation: conversion limits code transformations to
within the defining call chain, whereas lifting pulls the closure to
top level making it (at least potentially) globally available.]
In either case the original function is rewritten such that non-local
variables can be passed as parameters. The function's code must be
altered to access the non-locals - either directly as explicit
individual parameters, or by indexing from a pointer to an environment
data structure.
While in a simple case this could look exactly like the OO method
transformation, recall that a general closure may require access to
non-local variables spread through multiple environments. Even if
whole environments are passed via single pointers, there still may
need to be multiple parameters added.
Isn't it the case that access to all of the enclosing environments
can be provided by passing a single pointer? I'm pretty sure it
is.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 07:16:49 |
Calls: | 10,388 |
Calls today: | 3 |
Files: | 14,061 |
Messages: | 6,416,822 |
Posted today: | 1 |