Forum: >>> Magnum BBS <<<

=?UTF-8?Q?Re=3a_technology_discussion_=e2=86=92_does_the_world_need?= =

From David Brown@21:1/5 to BGB on Sat Jul 6 19:01:54 2024

On 06/07/2024 05:30, BGB wrote:

Ironically, I probably should have leaned harder into the "daisy chained types" interpretation, rather than treating it as an undesirable implementation tradeoff.

I once had to use a very limited C compiler that supported arrays, and
structs, but not arrays of structs or structs containing arrays. It was
truly a PITA. A language that has such inconvenient limitations would
be unusable for me - I'd rather go back to using BASIC on a ZX Spectrum.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to BGB on Tue Jul 9 16:37:31 2024

On 06/07/2024 21:33, BGB wrote:

In my compiler (BGBCC), such an internal pointer exists for arrays and structures in the local stack frame.

No separate pointer exists inside of things like structs, where, as can
be noted, the array exists at a fixed size and location.

So, eg:
void Foo()
{
     int a[100];
     ...
}

There is both the space for 100 integers reserved in the stack frame,
and a variable 'a' which exists as an implicit pointer to that location.

But, say:
void Foo()
{
     int a[8192];
     ...
}

There is no space reserved on the stack, and the array is instead
allocated dynamically (from the heap). In this case, the "a" variable
exists as a pointer to that location in memory.

Similar treatment also applies to structs.

The C standard does not require a stack or say how local data is
implemented, it just gives rules for the scope and lifetime of locals.
However, I would be surprised and shocked to find a compiler I was using allocate local data on the heap in some manner. If I have an array as
local data, it is with the expectation that it is allocated and freed
trivially (an add or subtract to the stack pointer, typically combined
with any other stack frame). If I want something on the heap, I will
use malloc and put it on the heap.

Such an implementation as yours is not, I think, against the C standards
- but IMHO it is very much against C philosophy.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to BGB on Tue Jul 9 16:31:30 2024

On 08/07/2024 19:39, BGB wrote:

On 7/7/2024 11:28 PM, James Kuyper wrote:

On 7/7/24 20:02, Kaz Kylheku wrote:
...

I see no point in having implicit pointers, but I don't believe that
they are prohibited.

They mostly exist in a "sort of simpler to implement the compiler this
way" sense.

In the implicit pointer case, the compiler just treats it as-if it were
an explicit pointer. In this case, both are basically treated as being roughly equivalent at the IR levels.

And, most of the code-generation stage doesn't need separate handling
for arrays and pointers, but can use combined "ArrayOrPointer" handling
or similar.

It had all seemed "obvious enough".

Similar reasoning for passing structs by-reference in the ABI:
Pass by reference is easy to implement;
In place copying and decomposing into registers, kinda bad.

Though, this one seems to be a common point of divergence between "SysV"
and "Microsoft" ABIs. Sometimes a target will have an ABI defined, and
the MS version was almost the same, just typically differing in that it passes structs by reference and provides a spill space for register arguments.

I don't think it is helpful that you keep mixing /logical/ terms with /implementation/ terms.

In C, there is no "pass by reference" or "return by reference". It is
all done by value. Even when you use pointer arguments or return types,
you are passing or returning pointer values. C programmers use pointers
to get the effect of passing by reference, but in C you use pointers to
be explicit about references.

Structs in C are passed by value, and returned by value. Not by reference.

The C standards don't say how passing structs around by value is to be implemented - that is hidden from the programmer. Usually ABI's (which
are also hidden from the programmer) specify the implementation details,
but some ABI's are weak in that area. Generally, structs up to a
certain size or complexity are passed in registers while bigger or more advanced types are passed via addresses (pointing to stack areas) in
registers or the stack, just like any other bigger types. This is not
"pass by reference" as far as the C programming is concerned - but you
could well call it that at the assembly level. Where the line between
"passing in registers" and "passing via addresses to space on the stack"
is drawn, is entirely up to the compiler implementation and any ABI requirements. Some simpler compilers will pass all structs via
addresses, no matter how simple they are, while others will aim to use registers whenever possible.

So if you have these structs and declarations :

struct small { uint16_t a; uint16_t b; };
struct big { uint32_t xs[10]; };

struct small foos(struct small y);
struct big foob(struct big y);

Then compilers will typically implement "x = foos(y)" as though it were:

extern uint32_t foos(uint32_t ab);
uint32_t _1 = foos(y.a << 16) | (y.b);
struct small x = { _1 >> 16, _1 & 0xffff };

And they will typically implement "x = foosb(y)" as though it were:

extern void foob(struct big * ret, const struct big * xs);
struct big x;
foob(&x, &y);

This is not, as you wrote somewhere, something peculiar to MSVC - it is
the technique used by virtually every C compiler, except perhaps for
outdated brain-dead 8-bit microcontrollers that have difficulty handling
data on a stack.

And it is not really "pass by reference" or "implicit pointers", it is
just passing addresses around behind the scenes in the implementation.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Tue Jul 9 18:31:43 2024

On 09/07/2024 16:54, bart wrote:

On 09/07/2024 15:31, David Brown wrote:

On 08/07/2024 19:39, BGB wrote:

Though, this one seems to be a common point of divergence between
"SysV" and "Microsoft" ABIs. Sometimes a target will have an ABI
defined, and the MS version was almost the same, just typically
differing in that it passes structs by reference and provides a spill
space for register arguments.

I don't think it is helpful that you keep mixing /logical/ terms with
/implementation/ terms.

In C, there is no "pass by reference" or "return by reference". It is
all done by value.

Arrays are passed by reference:

void F(int a[20]) {}

int main(void) {
    int x[20];
    F(x);
}

Although the type of 'a' inside 'F' will be int* rather than int(*)[20].

Arrays are not passed by reference in C. When you use the array in most expression contexts, including as an argument to a function call, the
array expression is converted to a pointer to its first element, and
that pointer is passed by value.

That's why the array type information is lost in the call, and you get a pointer in the function - /not/ a reference to an array.

So if you have these structs and declarations :

struct small { uint16_t a; uint16_t b; };
struct big { uint32_t xs[10]; };

struct small foos(struct small y);
struct big foob(struct big y);

Then compilers will typically implement "x = foos(y)" as though it were:

     extern uint32_t foos(uint32_t ab);
     uint32_t _1 = foos(y.a << 16) | (y.b);
     struct small x = { _1 >> 16, _1 & 0xffff };

And they will typically implement "x = foosb(y)" as though it were:

     extern void foob(struct big * ret, const struct big * xs);
     struct big x;
     foob(&x, &y);

From what I've seen, structs that are not small enough to be passed in registers, are copied to a temporary, and the address of that temporary
is passed.

That will depend on the details of the compiler, optimisation, and what
happens to the array after the call. But yes, that is certainly
something that is done.

This seems to be the case even when the struct param is marked 'const'.

"const" in C is not strong enough to guarantee that things will not be
changed in this context. If you have a pointer to non-const data,
convert it to a pointer to const, and then later convert it back to a
pointer to non-const, you can use that to change the data. Thus using a pointer to const does not let the compiler be sure that the data cannot
be changed by the function - so if the struct/array is used again later,
and its value must be preserved, the compiler needs to make a copy.

(I'd like it if there were a way to have such guarantees, but C is what
C is.)

(My compiler won't create a copy when the parameter is 'const'. I
assumed that was how gcc did it; I was wrong.)

Your compiler is wrong. But if you only give it code where the const
pointer is never converted to a non-const pointer, you'll be safe.

This is for Win64 ABI, however an ABI will only say they are passed by reference; it will not stipulate making a copy. That is up to the
language implementation.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to BGB on Wed Jul 10 09:41:04 2024

On 09/07/2024 23:43, BGB wrote:

On 7/9/2024 3:22 PM, James Kuyper wrote:

On 7/9/24 14:55, BGB wrote:
...

The pass by reference, in this context, was referring to the ABI, not
to C itself.

It looks from C's POV as-if it were by-value.

Which it is, depends on if one is looking at things at the language
level, ABI level, or IR level, ...

The C standard doesn't explicitly specify pass by value, or pass by
reference, or anything other passing mechanism. What it does say is what
a programmer needs to know to use the passing mechanism. It says that
the value of a function parameter that is seen by the code inside that
function is a copy of the value passed as an argument to the function.
The copy can be modified without changing the original. When a C
function's declaration looks as though it takes an array as an argument,
what that declaration actually means is that it takes a pointer value as
an argument, and it is a copy of that pointer's value which is seen
inside the function, and can be modified. The memory it points at is the
same as the memory pointed at by the corresponding argument.

We can probably agree that, in C:
typedef struct Foo_s Foo;
struct Foo_s {
    int x, y, z, a, b, c;
};

int FooFunc(Foo obj)
{
    obj.z = obj.x + obj.y;
    return(obj.z);
}

int main()
{
    Foo obj;
    int z1;
    obj.x=3;
    obj.y=4;
    obj.z=0;
    z1=FooFunc(obj);
    printf("%d %d\n", obj.z, z1);
}

Should print "0 7" regardless of how the structure is passed in the ABI.

ABI's are irrelevant to how the language is defined and how these
expressions are evaluated. ABI's go along with details of the target -
they can affect implementation-dependent behaviour but no more than
that. (A clear example would be that alignment of fundamental types
would normally be specified by an ABI.)

So code that does not depend on implementation-dependent behaviour, such
as your code here, will necessarily give the same results on all
conforming C implementations.

Though, one possibility being to relax the language such that both "0 7"
and "7 7" are valid possibilities (the latter potentially allowing more performance by not needing to make a temporary local copy). Though,
AFAIK, C doesn't really allow for this.

It continues to astound me that people who claim to have written C
compilers themselves - such as you and Bart - regularly show
misunderstandings or ignorance about things that are clear in the C
standards and which I would expect any experienced C programmer to know.

All arguments to function calls in C are passed by value. This is in
6.5.2.2p4 - it is a two sentence paragraph that you really ought to have
read before even considering writing a C compiler.

Some languages do have pass by reference (or other types of parameter
passing systems), and would give "7 7". C is not such a language. (And
I have never heard of a language that would either result.)

An implementation could be clever though and only make local copies in
cases where the structure is modified by the callee, as in the example
above.

A clever implementation would turn the whole of main() into a single
puts("0 7") call.

Compliers can generate whatever code they like, as long as the results
are correct in the end.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Wed Jul 17 19:07:17 2024

On 17/07/2024 13:38, Bart wrote:

On 13/07/2024 10:39, BGB wrote:

But, as I see it, no real point in arguing this stuff (personally, I
have better stuff to be doing...).

We all do. But this group seems to be about arguing about pointless
stuff and you might come here when you want a respite from proper work.

I don't think that's the /only/ purpose of this group, but people like
chatting about things that interest them even if they are not important.
And something might seem pointless to one person and important to another.

However (here I assume you've gone back to Quake but that other
interested parties might be reading this), consider the program below.

That sets up an array and then sums its elements by calling 3 different functions to do the job:

(1) Using normal C pass-by-value

Just to be clear - it is passing a pointer by value. The array is not
passed in any way.

(2) Using C pass-by-value to emulate call-by-reference

This is identical to (1). Both are passing a pointer by value, and both
are emulating a limited type of pass by reference (limited in that the
length of the array is not part of the passed parameter, but must be
given independently).

(3) Using fantasy true call-by-reference as it might appear if C had the
    feature

The fantasy matches C++ pretty closely - except you are making a
reference to an unbound array of type T, not an actual array.

(I'd hoped C++ would run this, but it didn't even like the middle
function.)

It is fine in C++20 or C++23, but not in C++17 or before. (If you want
to know why, ask in comp.lang.c++, because I don't know the answer!)

But even in C++, you are not really referencing the array here - you are referencing an unbound (unsized) array, and there is no information
about the real size. It's somewhat like passing a pointer here (like
function 1) by using a reference to the first element. But you can't
(in C++) use "sizeof A" here, and unlike the other functions, you can't
pass a null pointer.

C++'s arrays are limited in functionality and features by compatibility
with C, so you can't pass C-style arrays around in C++ because array expressions convert to pointer expressions in the same way as in C. If
you want the C++ version of arrays, use std::array<T, n> for fixed-size
arrays or std::vector<T> for variable sized arrays. These can be passed
around into and out of functions, by value or reference, and have a
selection of high-level methods and functions. But that's all C++
rather than C.

I'm asking people to compare the first and third functions and their
calls, and to see if there's any appreciable difference between them.
There will obviously be a difference in how the A parameter is declared.

The generated code (for gcc -x c++ -std=c++20) is the same for both, so
there is no difference there. Using the reference means you can't pass
a null pointer, you can't pass a pointer to an individual int, and you
can't use sizeof, so they are not entirely the same. And as I wrote
above, "sum_bytrueref" is not actually passing the array at all.

Both 1 and 3 give a reasonable emulation to passing the array by
reference, but neither of them is actually doing so.

---------------------------------------------
#include <stdio.h>

typedef int T;

int sum_byvalue(T* A, int n) {
    int i, sum=0;
    for (i=0; i<n; ++i) sum += A[i];
    return sum;
}

int sum_bymanualref(T(*A)[], int n) {
    int i, sum=0;
    for (i=0; i<n; ++i) sum += (*A)[i];
    return sum;
}

int sum_bytrueref(T (&A)[], int n) {
    int i, sum=0;
    for (i=0; i<n; ++i) sum += A[i];
    return sum;
}

int main(void) {
    enum {N = 10};
    T A[N] = {10,20,30,40,50,60,70,80,90,100};
    int total=0;

    total += sum_byvalue     (A, N);
    total += sum_bymanualref (&A, N);
    total += sum_bytrueref   (A, N);

    printf("%d\n", total);             // would show 1650
}
---------------------------------------------

Find anything? I thought not.

Those findings might suggest that C doesn't need call-by-reference, not
for arrays anyway.

It is correct that C does not need pass by reference for arrays. It
does not need to be able to pass arrays at all. This is clearly
demonstrated by the fact that C does not have pass by reference, and
cannot pass arrays at all, and yet has been used happily (and sometimes unhappily) for half a century.

The question is not if C /needs/ any of this, but if the language could
be /improved/ by adding such features. And I think that we all know
that it would add very, very little of interest to the language unless
arrays were changed to act like other object types. And that would be a
major and massively incompatible change to the language, so it's not
going to happen.

That doesn't mean that supporting pass by reference and/or supporting
arrays as first-class objects is not something that people might want in language X. Nor does it mean that C programmers prefer C's way of doing things. It is simply how C is, and programmers can work fine with C the
way it is.

Except that at present you can do this:

    T x=42;
    sum_byvalue(&x, N);

which would not be possible with call-by-reference. Nor with
sum_bymanualref, but apparently nobody wants to be doing with all that
extra, fiddly syntax. Better to be unsafe!

No, better to be safe.

But it doesn't make sense to make massive incompatible changes to a
language to reduce the risk of one minor kind of error. I really don't
see that there is any more likelihood of passing the int "x" when you
meant to pass the int array "xs", than there is of passing "ys" when you
meant to pass "xs".

I'm all in favour of safety. My preference, for more serious code, is
C++ rather than C. It doesn't just have a couple of extra bits that
make some things look marginally safer, such as you are suggesting here,
but has enough to make many things /actually/ safer. So I'd use a
std::array<> type that is a /real/ array object type, references that
are /real/ pass by reference that includes the full type, and a syntax
that makes it hard to get the details wrong:

template <size_t N>
constexpr auto sum_by_array(const std::array<T, N> &A) {
auto sum = 0;
for (auto x : A) {
sum += x;
}
return sum;
}

I am not suggesting that C++ is for everyone - people make their choices
of language for all kinds of reasons. And std::array<> is not the
answer for all C++ code either. But for /me/, when I have fixed size
arrays (and that is the norm for my code), I prefer std::array<> because
it is safer and clearer. (And as a bonus, the generated code is usually
more efficient.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Wed Jul 17 19:31:15 2024

On 17/07/2024 15:42, Bart wrote:

On 13/07/2024 10:37, David Brown wrote:

If you say stupid things, repeatedly,

Start applying a bit of your intelligence (you say stupid things
sometimes, but I know you are far from stupid), and you'll find the
level of conversation going up.

I made the tweak to see how hard it would be to detect value-arrays
declared in parameter list (it was very easy), and what the
consequences would be on existing code (significant).

No, the consequences are non-existent because no one uses your tool,
and no one will ever copy that change in other tools (of significance).

You are spectacularly missing the point. IT DOESN'T WHOSE TOOL IT IS.

Of course it matters. /You/ can make changes to /your/ tool with a
total disregard for the consequences. You don't need to care about the
effects it might have on the rest of the tool, because it's a small and self-contained tool. You don't need to care about how it affects code
other people have written over the decades - you are only interested in
the code /you/ wrote (if you even care about that). You don't need to
care about how it will affect code other people will write in the
future. You don't need to care about how it might affect possible
language changes in the future.

Basically, you can tweak your tool because you don't need to think
further than your own nose. And that's great for experimenting and
playing around, but things get very different in the real world.

Somebody could have done the same exercise with gcc, and come to the
same conclusion: too many programs use array parameters.

Did it never occur to you that most people use array parameters
/correctly/ ? Some people don't like using them (I'm not a big fan
myself), but other people like them.

The example I posted showed a type (const char* x[]) where there was
no advantage to having that value array notation. Using 'const
char**' would be a more accurate description of the actual parameter
type.

You can write your code the way you want to write it - it will not
change the way anyone else writes their code. It really is that
simple. Why is this so difficult for you to understand?

Do you really suppose that if /you/ make "foo(char x[])" a syntax
error in /your/ compiler, it will have the slightest effect on how
other people write their C code?

What WOULD be the effect if a compiler did that? How would a particular codebase be affected?

Code that uses that syntax would no longer work. It's quite simple, really.

You can just modify a compiler and try it, which is what I did. What difference does it make which compiler it is? You just have a blind, irrational hatred for anything I do.

No, I just have a more realistic idea of the influence you have, and of
what is involved in changing real tools used by other people, or
changing real languages used by other people.

You seem to imagine that you can pick something that you personally
don't like in C, and solve the world's programming problems by making a
change to your compiler. I am pointing out that this is far from
reality - even in cases where I might agree that the change would be an improvement to the C language.

Another way to do it is for someone to painstakingly go through every
line of a codebase by hand, expanding macros and typedefs as needed, and checking whether any parameters declared top-level array types.

I think if you were given that job to do, then applying my toy compiler wouldn't be so bad after all!

In what universe would it be a good idea to change compilers to stop
them compiling existing correct working code?

Or on what other C compilers do? Or on how half a century of existing
C code is written?

Personally, I don't like that C allows something that /looks/ like
arrays can be passed to functions, but doesn't work that way.

gcc could conceivably have an option that detects and warns about that.

It has warnings on the risky parts - such as applying "sizeof" to
parameters declared to look like arrays. That's all you need, really.
(I'd prefer if that kind of warning were enabled by default.)

Whoever is thinking about doing that might well do a test exactly like
mine.

No, they would be better off allowing established C syntax to work
correctly while offering warnings where possible about likely code
errors. That's what real tools do.

I don't think I have ever written a function with an array-like
parameter - I use a pointer parameter if I mean a pointer, or have the
array wrapped in a struct if I want to pass an array by value.

So all /your/ code would still pass; great!

But that's just /my/ code. I am not in favour of changes that break
other people's code.

But I don't think my opinions make a difference to C, and even if I
were elected dictator of C for a day, I don't think my opinions should
count more than any one else's - including those that like the way C
works here

Half of the programming language you call "C" is defined by the way you invoke your compiler. So that already allows for myriad, slightly
different dialects.

That's just nonsense - and you know it.

Flags, such as warnings, can lead to subsetting the language, not
redefining it.

Somebody had to think up all those options, and they would have been influence by people's opinions. This is just one more, which I would
happily have as a default.

If someone thought that it would be a good idea to warn about using
array syntax in function parameter declarations, then it could certainly
be added to tools as a /warning/. That's totally different from
disallowing it in the language.

File a "feature request" bug with gcc or clang asking for this warning,
and you'll soon get comments about how likely it is that people will see
it as a good idea.

And I don't confuse my opinions or preferences with how C actually
works and how it is actually defined, and I certainly don't spread
such confusions and misunderstandings to others.

C does that well enough by itself. There are any number of behaviours
where: (1) saying nothing and passing; (2) warning and passing; (3)
reporting an error and failing are all perfectly valid.

You just choose which one you want, at least if using an equivocal
compiler like gcc.

This is like taking an examination and being able to choose how strictly
it should be marked! Yeah, I think I'll got a pass today...

You really don't understand how people use compilers, do you? You know
how /you/ use /your/ language and /your/ tools, but the idea of there
being other programmers in the world, with different needs and
preferences, seems beyond you sometimes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to BGB on Thu Jul 18 09:46:01 2024

On 17/07/2024 19:53, BGB wrote:

On 7/17/2024 6:38 AM, Bart wrote:

On 13/07/2024 10:39, BGB wrote:

But, as I see it, no real point in arguing this stuff (personally, I
have better stuff to be doing...).

We all do. But this group seems to be about arguing about pointless
stuff and you might come here when you want a respite from proper work.

However (here I assume you've gone back to Quake but that other
interested parties might be reading this), consider the program below.

I got back to debugging...

To be clear - you are talking about debugging your compiler here, yes?

Ironically, one of the big bugs I ended up finding was related to
internal struct handling "leaking through" and negatively effecting stuff.

say:
typedef struct foo_s foo_t; // don't care what it contains for now.

foo_t arr[...];
foo_t temp;
int i, j;
...
temp=arr[i];
arr[i]=arr[j];
arr[j]=temp;

Internally, it would load a reference to arr[i] into temp, but then this location would get overwritten before the third assignment happened,
causing the incorrect contents to be copied to arr[j].

For now, have ended up changing stuff such that any struct-assignment
(for structs in the "by-ref" category) to a local variable will instead
copy the contents to the memory location associated with that struct.

How could it possibly mean anything else? Structs in C are objects - contiguous blocks of bytes interpreted by a type. Assigning them will
mean copying those bytes. Pretending the language sometimes means
structs and sometimes means magical auto-dereferencing pointers to
structs is simply wrong.

If "foo_t" is 2000 bytes long, then "foo_t temp" makes a 2000 byte space
in your local variables (the stack, on virtually every platform) and
"temp = arr[i];" does a 2000 byte memcpy(). The same thing applies if
"foo_t" is 2 bytes long, or 2 megabytes long. And if there is a stack
overflow making "temp", that's the programmer's problem.

It is only once that is all working properly and reliably that you can
begin to think about optimisation - code that gives the same observable behaviour "as if" you had direct translation, but is more efficient.
Maybe your target architecture has a way to make a "memswap()" function
that is more efficient than two "memcpy()" calls, and you can use that.

And it is only when the direct translation is working properly that you
can start to think of improving user convenience. Perhaps you could
allocate large temporary objects in non-stack memory somewhere to avoid
stack overflows. I don't think that is a good idea, but it's a
possibility. Giving compiler warnings about large stack objects is a
much better solution IMHO.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to BGB on Thu Jul 18 14:41:40 2024

On 18/07/2024 12:05, BGB wrote:

On 7/18/2024 2:46 AM, David Brown wrote:

On 17/07/2024 19:53, BGB wrote:

On 7/17/2024 6:38 AM, Bart wrote:

On 13/07/2024 10:39, BGB wrote:

But, as I see it, no real point in arguing this stuff (personally,
I have better stuff to be doing...).

We all do. But this group seems to be about arguing about pointless
stuff and you might come here when you want a respite from proper work. >>>>
However (here I assume you've gone back to Quake but that other
interested parties might be reading this), consider the program below. >>>>

I got back to debugging...

To be clear - you are talking about debugging your compiler here, yes?

My compiler and my Quake 3 port, but most of the bugs in the Quake 3
port thus far were due to bugs either in my compiler or in the runtime libraries.

Ironically, one of the big bugs I ended up finding was related to
internal struct handling "leaking through" and negatively effecting
stuff.

say:
   typedef struct foo_s foo_t; // don't care what it contains for now. >>>
   foo_t arr[...];
   foo_t temp;
   int i, j;
   ...
   temp=arr[i];
   arr[i]=arr[j];
   arr[j]=temp;

Internally, it would load a reference to arr[i] into temp, but then
this location would get overwritten before the third assignment
happened, causing the incorrect contents to be copied to arr[j].

For now, have ended up changing stuff such that any struct-assignment
(for structs in the "by-ref" category) to a local variable will
instead copy the contents to the memory location associated with that
struct.

How could it possibly mean anything else? Structs in C are objects -
contiguous blocks of bytes interpreted by a type. Assigning them will
mean copying those bytes. Pretending the language sometimes means
structs and sometimes means magical auto-dereferencing pointers to
structs is simply wrong.

The "magical auto dereferencing pointers" interpretation gives better performance, when it works. In this case, it didn't work...

It's very easy to make high performance code if correctness doesn't
matter! Obviously, correctness is more important.

Sadly, there is no good way at the moment to know whether or not it will work, for now forcing the slower and more conservative option.

I would think the simple test is that for data that is never changed (or
not changed within the function), you can use a constant reference -
otherwise you cannot. It is not by coincidence that in C++, it is
common to use pass by /const/ reference as an efficient alternative to
pass by value for big objects.

If "foo_t" is 2000 bytes long, then "foo_t temp" makes a 2000 byte
space in your local variables (the stack, on virtually every platform)
and "temp = arr[i];" does a 2000 byte memcpy(). The same thing
applies if "foo_t" is 2 bytes long, or 2 megabytes long. And if there
is a stack overflow making "temp", that's the programmer's problem.

For now:
1 - 16 bytes, goes in registers, except when accessing a member where it needs to be in-memory; unless it is a SIMD type which is special and
allows accessing members with the value still in registers.

17 bytes to 15.999K: Accessed by an implicit reference, uses hidden
copying to mimic by-value semantics (not quite foolproof as of yet it
seems).

16K and beyond, quietly turned into a heap allocation (with a compiler warning). Should otherwise look the same as the prior case.

The normal system is that local objects are data on the stack -
regardless of the size or type, scaler or aggregate. Parameter passing
is done by register for some types (for the first few parameters), or
the stack otherwise. Returns are in a register or two for some times,
or by a stack slot assigned by the caller. For struct parameters or
return values, it can be efficient for the caller to pass hidden
pointers, but that's not strictly necessary if you have a fixed stack
frame layout. (Struct parameters still get copied to the stack to
ensure value semantics - the hidden pointer points to the stack copy.)

Trying to have special cases for different sizes, or to eliminate extra
copies of structs by having pointers and hoping nothing gets changed, is
just extra complication with a high risk and low gains. If the
programmer knows the structs are big and it is better to put them on the
heap, or to pass around references, then let the /programmer/ do that.
We are talking about /C/ here - programmers are expected to take
responsibility for doing this stuff manually. It is not a high-level hand-holding automated language. If a programmer passes a struct by
value, it is because they know it is small enough to do so efficiently,
or because they know the caller might find it convenient to modify a
local copy and don't want the caller's copy affected. If they know it
is safe to pass a pointer, they will use a (possibly const) pointer to
the struct as the parameter.

You are trying to be too smart here, IMHO - the compiler's job is to let
the programmer be smart. It's always nice to have optimisations, but
not at the expense of correctness.

And it is only when the direct translation is working properly that
you can start to think of improving user convenience. Perhaps you
could allocate large temporary objects in non-stack memory somewhere
to avoid stack overflows. I don't think that is a good idea, but it's
a possibility. Giving compiler warnings about large stack objects is
a much better solution IMHO.

It both warns and also turns it into a heap allocation.

Because warning and the code still working, is better than warning and
the program most likely crashing due to a stack overflow (and in cases
with no memory protection, probably overwriting a bunch of other stuff
in the process).

A rarely used, unreliable feature with unexpected effects and costs is
not likely to be a good idea. People are happy to use compilers that
don't have this kind of dynamic memory allocation for big local
variables - but no one is going to be happy to use a compiler that
doesn't accurately support the C semantics for copying structs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Thu Jul 18 18:01:21 2024

On 18/07/2024 15:00, Bart wrote:

On 18/07/2024 13:41, David Brown wrote:

On 18/07/2024 12:05, BGB wrote:

The "magical auto dereferencing pointers" interpretation gives better
performance, when it works. In this case, it didn't work...

It's very easy to make high performance code if correctness doesn't
matter! Obviously, correctness is more important.

It is useful to explore ways of making some scenarios faster. Here there
is simply a bug in one of those ways. But you don't just give up
completely; you can try fixing the bug.

Sure. But correctness is more important than speed, especially in a
tool that other people rely on for correctness like a compiler. You
don't have optimisations unless you are completely sure that they are
always correct, even for the most inconvenient source code.

Of course you can have experimental optimisations when playing around or testing - perhaps enabled with flags that make it clear that they might
be flawed, or that they only work for a restricted subset of C code. I
don't mean you should not try out new stuff and then find and fix the
problems - of course you should! But if a tool is going to be useful,
priority one must be to make it give correct results, rather than give
fast results.

Sadly, there is no good way at the moment to know whether or not it
will work, for now forcing the slower and more conservative option.

I would think the simple test is that for data that is never changed
(or not changed within the function), you can use a constant reference
- otherwise you cannot. It is not by coincidence that in C++, it is
common to use pass by /const/ reference as an efficient alternative to
pass by value for big objects.

You said the other day that my C compiler was wrong to do that: to use efficient pass-by-pointer for structs marked as 'const' in the function signature; they always have to be copied no matter what.

No, it is fine to omit copies if you (the compiler) /know/ something
cannot change. And it is also good practice for programmers to use
"const" to mark things that should not be changed. But unfortunately
the compiler can't be sure that a const pointer (or const reference in
C++) cannot be cast to a non-const pointer. Using a const pointer or
reference makes it harder for a programmer to change the object
accidentally, but does not make it impossible for them to change it intentionally. And the compiler has to believe the worst case here, and
can't optimise on the assumption that the thing pointed to with a const
pointer remains unchanged by an external function. However, if it can
see the definition of the function and can see that it cannot change,
then it can use that information for optimisation - regardless of the
const or lack of const in the pointer.

(I'd prefer if "const" were a stronger promise in C and C++. But I
don't make the rules.)

The normal system is that local objects are data on the stack -
regardless of the size or type, scaler or aggregate. Parameter
passing is done by register for some types (for the first few
parameters), or the stack otherwise. Returns are in a register or two
for some times, or by a stack slot assigned by the caller. For struct
parameters or return values, it can be efficient for the caller to
pass hidden pointers, but that's not strictly necessary if you have a
fixed stack frame layout. (Struct parameters still get copied to the
stack to ensure value semantics - the hidden pointer points to the
stack copy.)

Trying to have special cases for different sizes,

That's exactly what common 64-bit ABIs do. In fact the SYS V ABI is so complicated that I can't understand its struct passing rules at all.

It is what every ABI I know of does - from 8-bit to 64-bit. They vary significantly in how much data is passed in registers, and when structs
are passed in registers or on the stack. (Older ABI's passed more on
the stack - newer ones use registers more. Passing small structs in
registers became a lot more important as C++ gained popularity.)

I agree that the SYS V x86_64 ABI is very complicated. So is the Win64
x86_64 ABI, with different kinds of complications.

(If I ever have to use that, I'd need to write test code for each
possible size of struct, up to 100 bytes or so (past the largest machine register), and see how an existing compliant compiler handles each case.)

Here the context appears to be a custom ISA, so anything is possible.

Sure.

You are trying to be too smart here, IMHO - the compiler's job is to
let the programmer be smart. It's always nice to have optimisations,
but not at the expense of correctness.

That's an odd remark from a devotee of gcc. Your usual attitude is to
let the programmer write code in the most natural manner, and let a
smart optimising compiler sort it out.

Touché :-)

Compilers can get smarter once you have them simple and correct. /Too/
smart - generating efficient but sometimes incorrect code - is not helpful.

And some decisions are always up to the programmer in C, regardless of
how good the compiler optimisation is. If the programmer makes a local variable, they expect it to be in a register, optimised away, or on the
stack. They don't expect the function to call malloc() and free() in
some hidden manner. And if they want to pass a struct by reference,
they use a pointer (that's all you've fot in C) - if they pass the
struct by value, they expect value semantics.

The smaller stuff can be optimised freely by the compiler, but the big decisions are made by the programmer.

And note that these two features of BGB's compiler under discussion -
putting big locals on the heap and using hidden pointers for structs -
are /not/ optimisations. Putting locals on the heap is a pessimisation,
making code bigger and slower. And using references for passing structs
which must logically be passed by value is not an optimisation either,
because it changes the semantics of the language.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Fri Aug 16 10:04:31 2024

On 15/08/2024 18:08, Bart wrote:

These were my original comments on the subject made to DB:

DB:

. In C, there is no "pass by reference" or "return by reference". It

is all done by value.

BC:

Arrays are passed by reference:

void F(int a[20]) {}

int main(void) {
int x[20];
F(x);
}

Although the type of 'a' inside 'F' will be int* rather than int(*)[20].

It was in reply to DB which appear to imply that arrays were passed by
value. Obviously they're not passed by value, so what then? (Please,
don't rerun the thread! This is where everyone jumped in.)

I am not sure if you want an answer here or not - you asked "so what
then", but also asked to avoid a re-run of the thread.

I can give a summary - and I also hope this doesn't lead to a re-run of
the discussion. However, since you are asking the same question as you
did at the start, and the language C has not changed in the meantime,
the factual and correct answers will inevitably be the same:

1. C has no "pass by reference" - it is all "pass by value".

2. In C, you cannot pass an array as a function parameter.

3. The automatic conversion of many array expressions to pointer
expressions, along with the similar conversions of function parameter
types, gives C users a syntax that is similar - but not identical to -
what you would have if the language supported passing arrays by reference.

4. Adding "pass by reference" and "arrays as first class objects" would
both be very significant changes to C - the language, the tools, and the
way C code is written. Doing so in a way that significantly improves on
the current situation (such as including element count information when
passing arrays) would require even more changes. Either array objects
would need to hold run-time count information (changing the data held, massively complicating "slices", and introducing run-time efficiency overheads), or the information would need to be part of the array's
type, requiring templates, overloaded functions, and other mechanisms.
Both are viable techniques for other languages, but not appropriate for C.

The C solution to handling arrays has its limitations, and has potential
to cause some people significant confusion. It is important to be
careful in your coding, and to make use of tools that help spot errors,
but that applies to all programming in all languages. The C solution
works well in practice for many situations, with minimal complications
and maximal run-time efficiency.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Ben Bacarisse on Fri Aug 16 11:42:43 2024

On 16/08/2024 02:08, Ben Bacarisse wrote:

Bart <bc@freeuk.com> writes:

In general there is no reason, in a language with true call-by-reference,
why any parameter type T (which has the form U*, a pointer to anything),
cannot be passed by reference. It doesn't matter whether U is an array type >> or not.

I can't unravel this. Take, as a concrete example, C++. You can't pass
a pointer to function that takes an array passed by reference. You can,
of course, pass a pointer by reference, but that is neither here nor
there.

In C++, you can't pass arrays as parameters at all - the language
inherited C's handling of arrays. You can, of course, pass objects of std::array<> type by value or by reference, just like any other class types.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Ben Bacarisse on Fri Aug 16 16:31:33 2024

On 16/08/2024 12:00, Ben Bacarisse wrote:

David Brown <david.brown@hesbynett.no> writes:

On 16/08/2024 02:08, Ben Bacarisse wrote:

Bart <bc@freeuk.com> writes:

In general there is no reason, in a language with true call-by-reference, >>>> why any parameter type T (which has the form U*, a pointer to anything), >>>> cannot be passed by reference. It doesn't matter whether U is an array type
or not.

I can't unravel this. Take, as a concrete example, C++. You can't pass >>> a pointer to function that takes an array passed by reference. You can, >>> of course, pass a pointer by reference, but that is neither here nor
there.

In C++, you can't pass arrays as parameters at all - the language inherited >> C's handling of arrays. You can, of course, pass objects of std::array<>
type by value or by reference, just like any other class types.

The best way to think about C++ (in my very non-expert opinion) is to consider references as values that are passed by, err..., value. But
you seem prepared to accept that some things can be "passed by reference"
in C++.

That seems a subtle distinction - I'll have to think about it a little.
I like your description of arguments being like local variable
initialisation - it makes sense equally well regardless of whether the parameter is "int", "int*", or "int&". (It's probably best not to
mention the other one in this group...)

So if this:

#include <iostream>

void g(int &i) { std::cout << i << "\n"; }

int main(void)
{
int I{0};
g(I);
}

shows an int object, I, being passed to g, why does this

#include <iostream>

void f(int (&ar)[10]) { std::cout << sizeof ar << "\n"; }

int main(void)
{
int A[10];
f(A);
}

not show an array, A, being passed to f?

That's backwards compatibility with C array handling at play. I
personally would use std::array<int, 10> here, and pass that by
reference (or pass a reference to it, if that is a more accurate
description).

C++ does suffer somewhat from being "C with classes added as
afterthought", rather than being designed as a new language from the start.

As I said, I don't think it's wise to look at it this way, but I am just borrowing your use of terms to try to tease out what you are getting at.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bart on Fri Aug 16 16:51:02 2024

On 16/08/2024 13:45, Bart wrote:

On 16/08/2024 09:04, David Brown wrote:

On 15/08/2024 18:08, Bart wrote:

These were my original comments on the subject made to DB:

DB:

;. In C, there is no "pass by reference" or "return by reference".

It is all done by value.

BC:

;Arrays are passed by reference:

; void F(int a[20]) {}

; int main(void) {
; int x[20];
; F(x);
; }

Although the type of 'a' inside 'F' will be int* rather than

int(*)[20].

It was in reply to DB which appear to imply that arrays were passed
by value. Obviously they're not passed by value, so what then?
(Please, don't rerun the thread! This is where everyone jumped in.)

I am not sure if you want an answer here or not - you asked "so what
then", but also asked to avoid a re-run of the thread.

I can give a summary - and I also hope this doesn't lead to a re-run
of the discussion. However, since you are asking the same question as
you did at the start, and the language C has not changed in the
meantime, the factual and correct answers will inevitably be the same:

1. C has no "pass by reference" - it is all "pass by value".

2. In C, you cannot pass an array as a function parameter.

3. The automatic conversion of many array expressions to pointer
expressions, along with the similar conversions of function parameter
types, gives C users a syntax that is similar - but not identical to -
what you would have if the language supported passing arrays by
reference.

So, you agree that it is similar to.

Yes. That has never been in doubt - I've agreed to it all along, as has everyone else. But "similar to" does not mean "the same as".

And not just the resulting syntax,
but the semantics and even the generated code can be the same (as I demonstrated but somebody complained).

They /can/ be the same - and they can be different. If C could pass
arrays as parameters (it can't), and if it had pass by reference (it
doesn't), then "sizeof" would give the size of the array passed, not the
size of a pointer. The type of the parameter would be an array type,
not a pointer. So while the semantics and expected generated code will
be the same for some functions, it will be different for other
functions. Hence, "similar to", and not "the same as".

Would you agree that they are effectively passed by-reference for all practical purposes?

No.

I would agree that they let you write code in the same (or very similar)
manner and with the same effect, for /some/ practical code. But they
are most certainly not the same for /all/ practical purposes.

int sum(int A[]) {
int s = 0;
for (int i = 0; i < sizeof(A) / sizeof(int); i++) {
s += A[i];
}
}

That would seem an obvious way to write an array sum function, if C
could pass arrays by reference. And if such code were supported by C
and did what a naïve programmer thought, it would certainly be of
practical use. But C does not support passing arrays, and it does not
support passing by reference - the conversion of the parameter to a
pointer to the first element means such code has a very significant
difference.

But of course for /some/ practical code, the results are the same.

All the other differences in detail are mostly due to the weird way that
C handles arrays anyway.

Well, yes. (Though the "details" are important.) But if C did not
handle arrays in that way, there'd be other issues. You might be able
to pass arrays by value, but you'd still not be able to pass them by
reference. And now a function that takes an array of 10 int's is
completely different from a function that takes an array of 20 int's.

4. Adding "pass by reference" and "arrays as first class objects"
would both be very significant changes to C

Adding pass-by-reference would not be a huge change. I added that using
a cheap and cheerful approach that seems work well enough (a parameter
marked as by-ref, would have '&' automatically applied on arguments, and
'*' automatically applied to parameter accesses in the callee**).

Cheap and cheerful approaches are fine for cheap and cheerful languages.
You don't have to think about the implications of such changes, or
corner cases, or what other people think, or how it could affect
existing code, or how to document or teach it.

But what would complicate it in C is how it interacts with how arrays currently work. For example, passing array A already passes '&A[0]'; it
can't really pass '&&A[0]' if it's marked as being by-reference!

(** There were some side-effects: while you can pass a char or short to
an int parameter for example and it will promote it, if the int is by-reference, you can only pass an exact int type. And also, I wasn't
able to apply default values to optional by-reference parameters.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Tim Rentsch on Mon Aug 19 09:26:46 2024

On 19/08/2024 03:03, Tim Rentsch wrote:

Ben Bacarisse <ben@bsb.me.uk> writes:

David Brown <david.brown@hesbynett.no> writes:

On 16/08/2024 12:00, Ben Bacarisse wrote:

David Brown <david.brown@hesbynett.no> writes:

On 16/08/2024 02:08, Ben Bacarisse wrote:

Bart <bc@freeuk.com> writes:

In general there is no reason, in a language with true
call-by-reference, why any parameter type T (which has the form
U*, a pointer to anything), cannot be passed by reference. It
doesn't matter whether U is an array type or not.

I can't unravel this. Take, as a concrete example, C++. You
can't pass a pointer to function that takes an array passed by
reference. You can, of course, pass a pointer by reference, but
that is neither here nor there.

In C++, you can't pass arrays as parameters at all - the language
inherited C's handling of arrays. You can, of course, pass
objects of std::array<> type by value or by reference, just like
any other class types.

The best way to think about C++ (in my very non-expert opinion) is
to consider references as values that are passed by, err...,
value. But you seem prepared to accept that some things can be
"passed by reference" in C++.

That seems a subtle distinction - I'll have to think about it a
little. I like your description of arguments being like local
variable initialisation - it makes sense equally well regardless of
whether the parameter is "int", "int*", or "int&". (It's probably
best not to mention the other one in this group...)

So if this:
#include <iostream>
void g(int &i) { std::cout << i << "\n"; }
int main(void)
{
int I{0};
g(I);
}
shows an int object, I, being passed to g, why does this
#include <iostream>
void f(int (&ar)[10]) { std::cout << sizeof ar << "\n"; }
int main(void)
{
int A[10];
f(A);
}
not show an array, A, being passed to f?

That's backwards compatibility with C array handling at play.

I'm not sure how this answers my question. Maybe you weren't
answering it and were just making a remark...

My guess is he didn't understand the question. The code shown
has nothing to do with backwards compatibility with C array
handling.

I had intended to make a brief remark and thought that was all that was
needed to answer the question. But having thought about it a bit more (prompted by these last two posts), and tested the code (on the
assumption that the gcc writers know the details better than I do), you
are correct - I did misunderstand the question. I was wrong in how I
thought array reference parameters worked in C++, and the way Ben worded
the question re-enforced that misunderstanding.

I interpreted his question as saying that the code "f" does not show an
array type being passed by reference, with the implication that the
"sizeof" showed the size of a pointer, not the size of an array of 10
ints, and asking why C++ was defined that way. The answer, as I saw it,
was that C++ made reference parameters to arrays work much like pointer parameters to arrays, and those work like in C for backwards compatibility.

Of course, it turns out I was completely wrong about how array type
reference parameters work in C++. It's not something I have had use for
in my own C++ programming or something I have come across in other code
that I can remember, and I had made incorrect assumptions about it. Now
that I corrected that, it all makes a lot more sense.

And so I presume Ben was actually asking why I /thought/ this was not
passing an array type (thus with its full type information, including
its size). Then answer there is now obvious - I thought that because I
had jumped to incorrect conclusions about array reference parameters in C++.

So thank you (Ben and Tim) for pushing me to correct my C++
misunderstanding here, and apologies to anyone confused by my mistake.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 06:57:56 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 06:47:28 2025
  from Rognac, France via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	146:45:40
Calls:	10,383
Calls today:	8
Files:	14,054
D/L today:	2 files (1,861K bytes)
Messages:	6,417,714

=?UTF-8?Q?Re=3a_technology_discussion_=e2=86=92_does_the_world_need?= =

Who's Online

Recent Visitors

System Info