• =?UTF-8?Q?Re=3a_technology_discussion_=e2=86=92_does_the_world_need?= =

    From David Brown@21:1/5 to BGB on Sat Jul 6 19:01:54 2024
    On 06/07/2024 05:30, BGB wrote:

    Ironically, I probably should have leaned harder into the "daisy chained types" interpretation, rather than treating it as an undesirable implementation tradeoff.


    I once had to use a very limited C compiler that supported arrays, and
    structs, but not arrays of structs or structs containing arrays. It was
    truly a PITA. A language that has such inconvenient limitations would
    be unusable for me - I'd rather go back to using BASIC on a ZX Spectrum.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to BGB on Tue Jul 9 16:37:31 2024
    On 06/07/2024 21:33, BGB wrote:

    In my compiler (BGBCC), such an internal pointer exists for arrays and structures in the local stack frame.

    No separate pointer exists inside of things like structs, where, as can
    be noted, the array exists at a fixed size and location.


    So, eg:
      void Foo()
      {
         int a[100];
         ...
      }

    There is both the space for 100 integers reserved in the stack frame,
    and a variable 'a' which exists as an implicit pointer to that location.


    But, say:
      void Foo()
      {
         int a[8192];
         ...
      }

    There is no space reserved on the stack, and the array is instead
    allocated dynamically (from the heap). In this case, the "a" variable
    exists as a pointer to that location in memory.

    Similar treatment also applies to structs.



    The C standard does not require a stack or say how local data is
    implemented, it just gives rules for the scope and lifetime of locals.
    However, I would be surprised and shocked to find a compiler I was using allocate local data on the heap in some manner. If I have an array as
    local data, it is with the expectation that it is allocated and freed
    trivially (an add or subtract to the stack pointer, typically combined
    with any other stack frame). If I want something on the heap, I will
    use malloc and put it on the heap.

    Such an implementation as yours is not, I think, against the C standards
    - but IMHO it is very much against C philosophy.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to BGB on Tue Jul 9 16:31:30 2024
    On 08/07/2024 19:39, BGB wrote:
    On 7/7/2024 11:28 PM, James Kuyper wrote:
    On 7/7/24 20:02, Kaz Kylheku wrote:
    ...


    I see no point in having implicit pointers, but I don't believe that
    they are prohibited.


    They mostly exist in a "sort of simpler to implement the compiler this
    way" sense.

    In the implicit pointer case, the compiler just treats it as-if it were
    an explicit pointer. In this case, both are basically treated as being roughly equivalent at the IR levels.

    And, most of the code-generation stage doesn't need separate handling
    for arrays and pointers, but can use combined "ArrayOrPointer" handling
    or similar.

    It had all seemed "obvious enough".




    Similar reasoning for passing structs by-reference in the ABI:
      Pass by reference is easy to implement;
      In place copying and decomposing into registers, kinda bad.

    Though, this one seems to be a common point of divergence between "SysV"
    and "Microsoft" ABIs. Sometimes a target will have an ABI defined, and
    the MS version was almost the same, just typically differing in that it passes structs by reference and provides a spill space for register arguments.


    I don't think it is helpful that you keep mixing /logical/ terms with /implementation/ terms.

    In C, there is no "pass by reference" or "return by reference". It is
    all done by value. Even when you use pointer arguments or return types,
    you are passing or returning pointer values. C programmers use pointers
    to get the effect of passing by reference, but in C you use pointers to
    be explicit about references.

    Structs in C are passed by value, and returned by value. Not by reference.

    The C standards don't say how passing structs around by value is to be implemented - that is hidden from the programmer. Usually ABI's (which
    are also hidden from the programmer) specify the implementation details,
    but some ABI's are weak in that area. Generally, structs up to a
    certain size or complexity are passed in registers while bigger or more advanced types are passed via addresses (pointing to stack areas) in
    registers or the stack, just like any other bigger types. This is not
    "pass by reference" as far as the C programming is concerned - but you
    could well call it that at the assembly level. Where the line between
    "passing in registers" and "passing via addresses to space on the stack"
    is drawn, is entirely up to the compiler implementation and any ABI requirements. Some simpler compilers will pass all structs via
    addresses, no matter how simple they are, while others will aim to use registers whenever possible.


    So if you have these structs and declarations :

    struct small { uint16_t a; uint16_t b; };
    struct big { uint32_t xs[10]; };

    struct small foos(struct small y);
    struct big foob(struct big y);

    Then compilers will typically implement "x = foos(y)" as though it were:

    extern uint32_t foos(uint32_t ab);
    uint32_t _1 = foos(y.a << 16) | (y.b);
    struct small x = { _1 >> 16, _1 & 0xffff };

    And they will typically implement "x = foosb(y)" as though it were:

    extern void foob(struct big * ret, const struct big * xs);
    struct big x;
    foob(&x, &y);


    This is not, as you wrote somewhere, something peculiar to MSVC - it is
    the technique used by virtually every C compiler, except perhaps for
    outdated brain-dead 8-bit microcontrollers that have difficulty handling
    data on a stack.

    And it is not really "pass by reference" or "implicit pointers", it is
    just passing addresses around behind the scenes in the implementation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Tue Jul 9 18:31:43 2024
    On 09/07/2024 16:54, bart wrote:
    On 09/07/2024 15:31, David Brown wrote:
    On 08/07/2024 19:39, BGB wrote:

    Though, this one seems to be a common point of divergence between
    "SysV" and "Microsoft" ABIs. Sometimes a target will have an ABI
    defined, and the MS version was almost the same, just typically
    differing in that it passes structs by reference and provides a spill
    space for register arguments.


    I don't think it is helpful that you keep mixing /logical/ terms with
    /implementation/ terms.

    In C, there is no "pass by reference" or "return by reference".  It is
    all done by value.

    Arrays are passed by reference:

      void F(int a[20]) {}

      int main(void) {
        int x[20];
        F(x);
      }

    Although the type of 'a' inside 'F' will be int* rather than int(*)[20].

    Arrays are not passed by reference in C. When you use the array in most expression contexts, including as an argument to a function call, the
    array expression is converted to a pointer to its first element, and
    that pointer is passed by value.

    That's why the array type information is lost in the call, and you get a pointer in the function - /not/ a reference to an array.


    So if you have these structs and declarations :

    struct small { uint16_t a; uint16_t b; };
    struct big { uint32_t xs[10]; };

    struct small foos(struct small y);
    struct big foob(struct big y);

    Then compilers will typically implement "x = foos(y)" as though it were:

         extern uint32_t foos(uint32_t ab);
         uint32_t _1 = foos(y.a << 16) | (y.b);
         struct small x = { _1 >> 16, _1 & 0xffff };

    And they will typically implement "x = foosb(y)" as though it were:

         extern void foob(struct big * ret, const struct big * xs);
         struct big x;
         foob(&x, &y);

    From what I've seen, structs that are not small enough to be passed in registers, are copied to a temporary, and the address of that temporary
    is passed.

    That will depend on the details of the compiler, optimisation, and what
    happens to the array after the call. But yes, that is certainly
    something that is done.


    This seems to be the case even when the struct param is marked 'const'.

    "const" in C is not strong enough to guarantee that things will not be
    changed in this context. If you have a pointer to non-const data,
    convert it to a pointer to const, and then later convert it back to a
    pointer to non-const, you can use that to change the data. Thus using a pointer to const does not let the compiler be sure that the data cannot
    be changed by the function - so if the struct/array is used again later,
    and its value must be preserved, the compiler needs to make a copy.

    (I'd like it if there were a way to have such guarantees, but C is what
    C is.)


    (My compiler won't create a copy when the parameter is 'const'. I
    assumed that was how gcc did it; I was wrong.)

    Your compiler is wrong. But if you only give it code where the const
    pointer is never converted to a non-const pointer, you'll be safe.


    This is for Win64 ABI, however an ABI will only say they are passed by reference; it will not stipulate making a copy. That is up to the
    language implementation.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to BGB on Wed Jul 10 09:41:04 2024
    On 09/07/2024 23:43, BGB wrote:
    On 7/9/2024 3:22 PM, James Kuyper wrote:
    On 7/9/24 14:55, BGB wrote:
    ...
    The pass by reference, in this context, was referring to the ABI, not
    to C itself.

    It looks from C's POV as-if it were by-value.


    Which it is, depends on if one is looking at things at the language
    level, ABI level, or IR level, ...

    The C standard doesn't explicitly specify pass by value, or pass by
    reference, or anything other passing mechanism. What it does say is what
    a programmer needs to know to use the passing mechanism. It says that
    the value of a function parameter that is seen by the code inside that
    function is a copy of the value passed as an argument to the function.
    The copy can be modified without changing the original. When a C
    function's declaration looks as though it takes an array as an argument,
    what that declaration actually means is that it takes a pointer value as
    an argument, and it is a copy of that pointer's value which is seen
    inside the function, and can be modified. The memory it points at is the
    same as the memory pointed at by the corresponding argument.




    We can probably agree that, in C:
      typedef struct Foo_s Foo;
      struct Foo_s {
        int x, y, z, a, b, c;
      };

      int FooFunc(Foo obj)
      {
        obj.z = obj.x + obj.y;
        return(obj.z);
      }

      int main()
      {
        Foo obj;
        int z1;
        obj.x=3;
        obj.y=4;
        obj.z=0;
        z1=FooFunc(obj);
        printf("%d %d\n", obj.z, z1);
      }

    Should print "0 7" regardless of how the structure is passed in the ABI.


    ABI's are irrelevant to how the language is defined and how these
    expressions are evaluated. ABI's go along with details of the target -
    they can affect implementation-dependent behaviour but no more than
    that. (A clear example would be that alignment of fundamental types
    would normally be specified by an ABI.)

    So code that does not depend on implementation-dependent behaviour, such
    as your code here, will necessarily give the same results on all
    conforming C implementations.


    Though, one possibility being to relax the language such that both "0 7"
    and "7 7" are valid possibilities (the latter potentially allowing more performance by not needing to make a temporary local copy). Though,
    AFAIK, C doesn't really allow for this.

    It continues to astound me that people who claim to have written C
    compilers themselves - such as you and Bart - regularly show
    misunderstandings or ignorance about things that are clear in the C
    standards and which I would expect any experienced C programmer to know.

    All arguments to function calls in C are passed by value. This is in
    6.5.2.2p4 - it is a two sentence paragraph that you really ought to have
    read before even considering writing a C compiler.

    Some languages do have pass by reference (or other types of parameter
    passing systems), and would give "7 7". C is not such a language. (And
    I have never heard of a language that would either result.)


    An implementation could be clever though and only make local copies in
    cases where the structure is modified by the callee, as in the example
    above.


    A clever implementation would turn the whole of main() into a single
    puts("0 7") call.

    Compliers can generate whatever code they like, as long as the results
    are correct in the end.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Wed Jul 17 19:07:17 2024
    On 17/07/2024 13:38, Bart wrote:
    On 13/07/2024 10:39, BGB wrote:

    But, as I see it, no real point in arguing this stuff (personally, I
    have better stuff to be doing...).

    We all do. But this group seems to be about arguing about pointless
    stuff and you might come here when you want a respite from proper work.

    I don't think that's the /only/ purpose of this group, but people like
    chatting about things that interest them even if they are not important.
    And something might seem pointless to one person and important to another.


    However (here I assume you've gone back to Quake but that other
    interested parties might be reading this), consider the program below.

    That sets up an array and then sums its elements by calling 3 different functions to do the job:

    (1) Using normal C pass-by-value

    Just to be clear - it is passing a pointer by value. The array is not
    passed in any way.


    (2) Using C pass-by-value to emulate call-by-reference

    This is identical to (1). Both are passing a pointer by value, and both
    are emulating a limited type of pass by reference (limited in that the
    length of the array is not part of the passed parameter, but must be
    given independently).


    (3) Using fantasy true call-by-reference as it might appear if C had the
        feature

    The fantasy matches C++ pretty closely - except you are making a
    reference to an unbound array of type T, not an actual array.


    (I'd hoped C++ would run this, but it didn't even like the middle
    function.)

    It is fine in C++20 or C++23, but not in C++17 or before. (If you want
    to know why, ask in comp.lang.c++, because I don't know the answer!)

    But even in C++, you are not really referencing the array here - you are referencing an unbound (unsized) array, and there is no information
    about the real size. It's somewhat like passing a pointer here (like
    function 1) by using a reference to the first element. But you can't
    (in C++) use "sizeof A" here, and unlike the other functions, you can't
    pass a null pointer.

    C++'s arrays are limited in functionality and features by compatibility
    with C, so you can't pass C-style arrays around in C++ because array expressions convert to pointer expressions in the same way as in C. If
    you want the C++ version of arrays, use std::array<T, n> for fixed-size
    arrays or std::vector<T> for variable sized arrays. These can be passed
    around into and out of functions, by value or reference, and have a
    selection of high-level methods and functions. But that's all C++
    rather than C.


    I'm asking people to compare the first and third functions and their
    calls, and to see if there's any appreciable difference between them.
    There will obviously be a difference in how the A parameter is declared.

    The generated code (for gcc -x c++ -std=c++20) is the same for both, so
    there is no difference there. Using the reference means you can't pass
    a null pointer, you can't pass a pointer to an individual int, and you
    can't use sizeof, so they are not entirely the same. And as I wrote
    above, "sum_bytrueref" is not actually passing the array at all.

    Both 1 and 3 give a reasonable emulation to passing the array by
    reference, but neither of them is actually doing so.


    ---------------------------------------------
    #include <stdio.h>

    typedef int T;

    int sum_byvalue(T* A, int n) {
        int i, sum=0;
        for (i=0; i<n; ++i) sum += A[i];
        return sum;
    }

    int sum_bymanualref(T(*A)[], int n) {
        int i, sum=0;
        for (i=0; i<n; ++i) sum += (*A)[i];
        return sum;
    }

    int sum_bytrueref(T (&A)[], int n) {
        int i, sum=0;
        for (i=0; i<n; ++i) sum += A[i];
        return sum;
    }

    int main(void) {
        enum {N = 10};
        T A[N] = {10,20,30,40,50,60,70,80,90,100};
        int total=0;

        total += sum_byvalue     (A, N);
        total += sum_bymanualref (&A, N);
        total += sum_bytrueref   (A, N);

        printf("%d\n", total);             // would show 1650
    }
    ---------------------------------------------

    Find anything? I thought not.

    Those findings might suggest that C doesn't need call-by-reference, not
    for arrays anyway.

    It is correct that C does not need pass by reference for arrays. It
    does not need to be able to pass arrays at all. This is clearly
    demonstrated by the fact that C does not have pass by reference, and
    cannot pass arrays at all, and yet has been used happily (and sometimes unhappily) for half a century.

    The question is not if C /needs/ any of this, but if the language could
    be /improved/ by adding such features. And I think that we all know
    that it would add very, very little of interest to the language unless
    arrays were changed to act like other object types. And that would be a
    major and massively incompatible change to the language, so it's not
    going to happen.

    That doesn't mean that supporting pass by reference and/or supporting
    arrays as first-class objects is not something that people might want in language X. Nor does it mean that C programmers prefer C's way of doing things. It is simply how C is, and programmers can work fine with C the
    way it is.


    Except that at present you can do this:

        T x=42;
        sum_byvalue(&x, N);

    which would not be possible with call-by-reference. Nor with
    sum_bymanualref, but apparently nobody wants to be doing with all that
    extra, fiddly syntax. Better to be unsafe!

    No, better to be safe.

    But it doesn't make sense to make massive incompatible changes to a
    language to reduce the risk of one minor kind of error. I really don't
    see that there is any more likelihood of passing the int "x" when you
    meant to pass the int array "xs", than there is of passing "ys" when you
    meant to pass "xs".

    I'm all in favour of safety. My preference, for more serious code, is
    C++ rather than C. It doesn't just have a couple of extra bits that
    make some things look marginally safer, such as you are suggesting here,
    but has enough to make many things /actually/ safer. So I'd use a
    std::array<> type that is a /real/ array object type, references that
    are /real/ pass by reference that includes the full type, and a syntax
    that makes it hard to get the details wrong:

    template <size_t N>
    constexpr auto sum_by_array(const std::array<T, N> &A) {
    auto sum = 0;
    for (auto x : A) {
    sum += x;
    }
    return sum;
    }

    I am not suggesting that C++ is for everyone - people make their choices
    of language for all kinds of reasons. And std::array<> is not the
    answer for all C++ code either. But for /me/, when I have fixed size
    arrays (and that is the norm for my code), I prefer std::array<> because
    it is safer and clearer. (And as a bonus, the generated code is usually
    more efficient.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Wed Jul 17 19:31:15 2024
    On 17/07/2024 15:42, Bart wrote:
    On 13/07/2024 10:37, David Brown wrote:

    If you say stupid things, repeatedly,

    Start applying a bit of your intelligence (you say stupid things
    sometimes, but I know you are far from stupid), and you'll find the
    level of conversation going up.

    I made the tweak to see how hard it would be to detect value-arrays
    declared in parameter list (it was very easy), and what the
    consequences would be on existing code (significant).

    No, the consequences are non-existent because no one uses your tool,
    and no one will ever copy that change in other tools (of significance).

    You are spectacularly missing the point. IT DOESN'T WHOSE TOOL IT IS.

    Of course it matters. /You/ can make changes to /your/ tool with a
    total disregard for the consequences. You don't need to care about the
    effects it might have on the rest of the tool, because it's a small and self-contained tool. You don't need to care about how it affects code
    other people have written over the decades - you are only interested in
    the code /you/ wrote (if you even care about that). You don't need to
    care about how it will affect code other people will write in the
    future. You don't need to care about how it might affect possible
    language changes in the future.

    Basically, you can tweak your tool because you don't need to think
    further than your own nose. And that's great for experimenting and
    playing around, but things get very different in the real world.

    Somebody could have done the same exercise with gcc, and come to the
    same conclusion: too many programs use array parameters.

    Did it never occur to you that most people use array parameters
    /correctly/ ? Some people don't like using them (I'm not a big fan
    myself), but other people like them.





    The example I posted showed a type (const char* x[]) where there was
    no advantage to having that value array notation. Using 'const
    char**' would be a more accurate description of the actual parameter
    type.


    You can write your code the way you want to write it - it will not
    change the way anyone else writes their code.  It really is that
    simple.   Why is this so difficult for you to understand?

    Do you really suppose that if /you/ make "foo(char x[])" a syntax
    error in /your/ compiler, it will have the slightest effect on how
    other people write their C code?

    What WOULD be the effect if a compiler did that? How would a particular codebase be affected?

    Code that uses that syntax would no longer work. It's quite simple, really.


    You can just modify a compiler and try it, which is what I did. What difference does it make which compiler it is? You just have a blind, irrational hatred for anything I do.


    No, I just have a more realistic idea of the influence you have, and of
    what is involved in changing real tools used by other people, or
    changing real languages used by other people.

    You seem to imagine that you can pick something that you personally
    don't like in C, and solve the world's programming problems by making a
    change to your compiler. I am pointing out that this is far from
    reality - even in cases where I might agree that the change would be an improvement to the C language.

    Another way to do it is for someone to painstakingly go through every
    line of a codebase by hand, expanding macros and typedefs as needed, and checking whether any parameters declared top-level array types.

    I think if you were given that job to do,  then applying my toy compiler wouldn't be so bad after all!

    In what universe would it be a good idea to change compilers to stop
    them compiling existing correct working code?


    Or on what other C compilers do?  Or on how half a century of existing
    C code is written?


    Personally, I don't like that C allows something that /looks/ like
    arrays can be passed to functions, but doesn't work that way.

    gcc could conceivably have an option that detects and warns about that.

    It has warnings on the risky parts - such as applying "sizeof" to
    parameters declared to look like arrays. That's all you need, really.
    (I'd prefer if that kind of warning were enabled by default.)

    Whoever is thinking about doing that might well do a test exactly like
    mine.

    No, they would be better off allowing established C syntax to work
    correctly while offering warnings where possible about likely code
    errors. That's what real tools do.


      I don't think I have ever written a function with an array-like
    parameter - I use a pointer parameter if I mean a pointer, or have the
    array wrapped in a struct if I want to pass an array by value.

    So all /your/ code would still pass; great!

    But that's just /my/ code. I am not in favour of changes that break
    other people's code.


      But I don't think my opinions make a difference to C, and even if I
    were elected dictator of C for a day, I don't think my opinions should
    count more than any one else's - including those that like the way C
    works here

    Half of the programming language you call "C" is defined by the way you invoke your compiler. So that already allows for myriad, slightly
    different dialects.


    That's just nonsense - and you know it.

    Flags, such as warnings, can lead to subsetting the language, not
    redefining it.

    Somebody had to think up all those options, and they would have been influence by people's opinions. This is just one more, which I would
    happily have as a default.


    If someone thought that it would be a good idea to warn about using
    array syntax in function parameter declarations, then it could certainly
    be added to tools as a /warning/. That's totally different from
    disallowing it in the language.

    File a "feature request" bug with gcc or clang asking for this warning,
    and you'll soon get comments about how likely it is that people will see
    it as a good idea.


    And I don't confuse my opinions or preferences with how C actually
    works and how it is actually defined, and I certainly don't spread
    such confusions and misunderstandings to others.

    C does that well enough by itself. There are any number of behaviours
    where: (1) saying nothing and passing; (2) warning and passing; (3)
    reporting an error and failing are all perfectly valid.

    You just choose which one you want, at least if using an equivocal
    compiler like gcc.

    This is like taking an examination and being able to choose how strictly
    it should be marked! Yeah, I think I'll got a pass today...


    You really don't understand how people use compilers, do you? You know
    how /you/ use /your/ language and /your/ tools, but the idea of there
    being other programmers in the world, with different needs and
    preferences, seems beyond you sometimes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to BGB on Thu Jul 18 09:46:01 2024
    On 17/07/2024 19:53, BGB wrote:
    On 7/17/2024 6:38 AM, Bart wrote:
    On 13/07/2024 10:39, BGB wrote:

    But, as I see it, no real point in arguing this stuff (personally, I
    have better stuff to be doing...).

    We all do. But this group seems to be about arguing about pointless
    stuff and you might come here when you want a respite from proper work.

    However (here I assume you've gone back to Quake but that other
    interested parties might be reading this), consider the program below.


    I got back to debugging...

    To be clear - you are talking about debugging your compiler here, yes?


    Ironically, one of the big bugs I ended up finding was related to
    internal struct handling "leaking through" and negatively effecting stuff.

    say:
      typedef struct foo_s foo_t;  // don't care what it contains for now.

      foo_t arr[...];
      foo_t temp;
      int i, j;
      ...
      temp=arr[i];
      arr[i]=arr[j];
      arr[j]=temp;

    Internally, it would load a reference to arr[i] into temp, but then this location would get overwritten before the third assignment happened,
    causing the incorrect contents to be copied to arr[j].

    For now, have ended up changing stuff such that any struct-assignment
    (for structs in the "by-ref" category) to a local variable will instead
    copy the contents to the memory location associated with that struct.

    How could it possibly mean anything else? Structs in C are objects - contiguous blocks of bytes interpreted by a type. Assigning them will
    mean copying those bytes. Pretending the language sometimes means
    structs and sometimes means magical auto-dereferencing pointers to
    structs is simply wrong.

    If "foo_t" is 2000 bytes long, then "foo_t temp" makes a 2000 byte space
    in your local variables (the stack, on virtually every platform) and
    "temp = arr[i];" does a 2000 byte memcpy(). The same thing applies if
    "foo_t" is 2 bytes long, or 2 megabytes long. And if there is a stack
    overflow making "temp", that's the programmer's problem.

    It is only once that is all working properly and reliably that you can
    begin to think about optimisation - code that gives the same observable behaviour "as if" you had direct translation, but is more efficient.
    Maybe your target architecture has a way to make a "memswap()" function
    that is more efficient than two "memcpy()" calls, and you can use that.

    And it is only when the direct translation is working properly that you
    can start to think of improving user convenience. Perhaps you could
    allocate large temporary objects in non-stack memory somewhere to avoid
    stack overflows. I don't think that is a good idea, but it's a
    possibility. Giving compiler warnings about large stack objects is a
    much better solution IMHO.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to BGB on Thu Jul 18 14:41:40 2024
    On 18/07/2024 12:05, BGB wrote:
    On 7/18/2024 2:46 AM, David Brown wrote:
    On 17/07/2024 19:53, BGB wrote:
    On 7/17/2024 6:38 AM, Bart wrote:
    On 13/07/2024 10:39, BGB wrote:

    But, as I see it, no real point in arguing this stuff (personally,
    I have better stuff to be doing...).

    We all do. But this group seems to be about arguing about pointless
    stuff and you might come here when you want a respite from proper work. >>>>
    However (here I assume you've gone back to Quake but that other
    interested parties might be reading this), consider the program below. >>>>

    I got back to debugging...

    To be clear - you are talking about debugging your compiler here, yes?


    My compiler and my Quake 3 port, but most of the bugs in the Quake 3
    port thus far were due to bugs either in my compiler or in the runtime libraries.



    Ironically, one of the big bugs I ended up finding was related to
    internal struct handling "leaking through" and negatively effecting
    stuff.

    say:
       typedef struct foo_s foo_t;  // don't care what it contains for now. >>>
       foo_t arr[...];
       foo_t temp;
       int i, j;
       ...
       temp=arr[i];
       arr[i]=arr[j];
       arr[j]=temp;

    Internally, it would load a reference to arr[i] into temp, but then
    this location would get overwritten before the third assignment
    happened, causing the incorrect contents to be copied to arr[j].

    For now, have ended up changing stuff such that any struct-assignment
    (for structs in the "by-ref" category) to a local variable will
    instead copy the contents to the memory location associated with that
    struct.

    How could it possibly mean anything else?  Structs in C are objects -
    contiguous blocks of bytes interpreted by a type.  Assigning them will
    mean copying those bytes.  Pretending the language sometimes means
    structs and sometimes means magical auto-dereferencing pointers to
    structs is simply wrong.


    The "magical auto dereferencing pointers" interpretation gives better performance, when it works. In this case, it didn't work...

    It's very easy to make high performance code if correctness doesn't
    matter! Obviously, correctness is more important.


    Sadly, there is no good way at the moment to know whether or not it will work, for now forcing the slower and more conservative option.

    I would think the simple test is that for data that is never changed (or
    not changed within the function), you can use a constant reference -
    otherwise you cannot. It is not by coincidence that in C++, it is
    common to use pass by /const/ reference as an efficient alternative to
    pass by value for big objects.



    If "foo_t" is 2000 bytes long, then "foo_t temp" makes a 2000 byte
    space in your local variables (the stack, on virtually every platform)
    and "temp = arr[i];" does a 2000 byte memcpy().  The same thing
    applies if "foo_t" is 2 bytes long, or 2 megabytes long.  And if there
    is a stack overflow making "temp", that's the programmer's problem.


    For now:
    1 - 16 bytes, goes in registers, except when accessing a member where it needs to be in-memory; unless it is a SIMD type which is special and
    allows accessing members with the value still in registers.

    17 bytes to 15.999K: Accessed by an implicit reference, uses hidden
    copying to mimic by-value semantics (not quite foolproof as of yet it
    seems).

    16K and beyond, quietly turned into a heap allocation (with a compiler warning). Should otherwise look the same as the prior case.


    The normal system is that local objects are data on the stack -
    regardless of the size or type, scaler or aggregate. Parameter passing
    is done by register for some types (for the first few parameters), or
    the stack otherwise. Returns are in a register or two for some times,
    or by a stack slot assigned by the caller. For struct parameters or
    return values, it can be efficient for the caller to pass hidden
    pointers, but that's not strictly necessary if you have a fixed stack
    frame layout. (Struct parameters still get copied to the stack to
    ensure value semantics - the hidden pointer points to the stack copy.)

    Trying to have special cases for different sizes, or to eliminate extra
    copies of structs by having pointers and hoping nothing gets changed, is
    just extra complication with a high risk and low gains. If the
    programmer knows the structs are big and it is better to put them on the
    heap, or to pass around references, then let the /programmer/ do that.
    We are talking about /C/ here - programmers are expected to take
    responsibility for doing this stuff manually. It is not a high-level hand-holding automated language. If a programmer passes a struct by
    value, it is because they know it is small enough to do so efficiently,
    or because they know the caller might find it convenient to modify a
    local copy and don't want the caller's copy affected. If they know it
    is safe to pass a pointer, they will use a (possibly const) pointer to
    the struct as the parameter.

    You are trying to be too smart here, IMHO - the compiler's job is to let
    the programmer be smart. It's always nice to have optimisations, but
    not at the expense of correctness.



    And it is only when the direct translation is working properly that
    you can start to think of improving user convenience.  Perhaps you
    could allocate large temporary objects in non-stack memory somewhere
    to avoid stack overflows.  I don't think that is a good idea, but it's
    a possibility.  Giving compiler warnings about large stack objects is
    a much better solution IMHO.


    It both warns and also turns it into a heap allocation.

    Because warning and the code still working, is better than warning and
    the program most likely crashing due to a stack overflow (and in cases
    with no memory protection, probably overwriting a bunch of other stuff
    in the process).

    A rarely used, unreliable feature with unexpected effects and costs is
    not likely to be a good idea. People are happy to use compilers that
    don't have this kind of dynamic memory allocation for big local
    variables - but no one is going to be happy to use a compiler that
    doesn't accurately support the C semantics for copying structs.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Thu Jul 18 18:01:21 2024
    On 18/07/2024 15:00, Bart wrote:
    On 18/07/2024 13:41, David Brown wrote:
    On 18/07/2024 12:05, BGB wrote:

    The "magical auto dereferencing pointers" interpretation gives better
    performance, when it works. In this case, it didn't work...

    It's very easy to make high performance code if correctness doesn't
    matter!  Obviously, correctness is more important.

    It is useful to explore ways of making some scenarios faster. Here there
    is simply a bug in one of those ways. But you don't just give up
    completely; you can try fixing the bug.

    Sure. But correctness is more important than speed, especially in a
    tool that other people rely on for correctness like a compiler. You
    don't have optimisations unless you are completely sure that they are
    always correct, even for the most inconvenient source code.

    Of course you can have experimental optimisations when playing around or testing - perhaps enabled with flags that make it clear that they might
    be flawed, or that they only work for a restricted subset of C code. I
    don't mean you should not try out new stuff and then find and fix the
    problems - of course you should! But if a tool is going to be useful,
    priority one must be to make it give correct results, rather than give
    fast results.



    Sadly, there is no good way at the moment to know whether or not it
    will work, for now forcing the slower and more conservative option.

    I would think the simple test is that for data that is never changed
    (or not changed within the function), you can use a constant reference
    - otherwise you cannot.  It is not by coincidence that in C++, it is
    common to use pass by /const/ reference as an efficient alternative to
    pass by value for big objects.

    You said the other day that my C compiler was wrong to do that: to use efficient pass-by-pointer for structs marked as 'const' in the function signature; they always have to be copied no matter what.


    No, it is fine to omit copies if you (the compiler) /know/ something
    cannot change. And it is also good practice for programmers to use
    "const" to mark things that should not be changed. But unfortunately
    the compiler can't be sure that a const pointer (or const reference in
    C++) cannot be cast to a non-const pointer. Using a const pointer or
    reference makes it harder for a programmer to change the object
    accidentally, but does not make it impossible for them to change it intentionally. And the compiler has to believe the worst case here, and
    can't optimise on the assumption that the thing pointed to with a const
    pointer remains unchanged by an external function. However, if it can
    see the definition of the function and can see that it cannot change,
    then it can use that information for optimisation - regardless of the
    const or lack of const in the pointer.

    (I'd prefer if "const" were a stronger promise in C and C++. But I
    don't make the rules.)


    The normal system is that local objects are data on the stack -
    regardless of the size or type, scaler or aggregate.  Parameter
    passing is done by register for some types (for the first few
    parameters), or the stack otherwise.  Returns are in a register or two
    for some times, or by a stack slot assigned by the caller.  For struct
    parameters or return values, it can be efficient for the caller to
    pass hidden pointers, but that's not strictly necessary if you have a
    fixed stack frame layout.  (Struct parameters still get copied to the
    stack to ensure value semantics - the hidden pointer points to the
    stack copy.)

    Trying to have special cases for different sizes,

    That's exactly what common 64-bit ABIs do. In fact the SYS V ABI is so complicated that I can't understand its struct passing rules at all.


    It is what every ABI I know of does - from 8-bit to 64-bit. They vary significantly in how much data is passed in registers, and when structs
    are passed in registers or on the stack. (Older ABI's passed more on
    the stack - newer ones use registers more. Passing small structs in
    registers became a lot more important as C++ gained popularity.)

    I agree that the SYS V x86_64 ABI is very complicated. So is the Win64
    x86_64 ABI, with different kinds of complications.

    (If I ever have to use that, I'd need to write test code for each
    possible size of struct, up to 100 bytes or so (past the largest machine register), and see how an existing compliant compiler handles each case.)

    Here the context appears to be a custom ISA, so anything is possible.

    Sure.


    You are trying to be too smart here, IMHO - the compiler's job is to
    let the programmer be smart.  It's always nice to have optimisations,
    but not at the expense of correctness.

    That's an odd remark from a devotee of gcc. Your usual attitude is to
    let the programmer write code in the most natural manner, and let a
    smart optimising compiler sort it out.

    Touché :-)

    Compilers can get smarter once you have them simple and correct. /Too/
    smart - generating efficient but sometimes incorrect code - is not helpful.

    And some decisions are always up to the programmer in C, regardless of
    how good the compiler optimisation is. If the programmer makes a local variable, they expect it to be in a register, optimised away, or on the
    stack. They don't expect the function to call malloc() and free() in
    some hidden manner. And if they want to pass a struct by reference,
    they use a pointer (that's all you've fot in C) - if they pass the
    struct by value, they expect value semantics.

    The smaller stuff can be optimised freely by the compiler, but the big decisions are made by the programmer.

    And note that these two features of BGB's compiler under discussion -
    putting big locals on the heap and using hidden pointers for structs -
    are /not/ optimisations. Putting locals on the heap is a pessimisation,
    making code bigger and slower. And using references for passing structs
    which must logically be passed by value is not an optimisation either,
    because it changes the semantics of the language.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Fri Aug 16 10:04:31 2024
    On 15/08/2024 18:08, Bart wrote:

    These were my original comments on the subject made to DB:


    DB:
    . In C, there is no "pass by reference" or "return by reference".  It
    is all done by value.

    BC:

    Arrays are passed by reference:

      void F(int a[20]) {}

      int main(void) {
        int x[20];
        F(x);
      }

    Although the type of 'a' inside 'F' will be int* rather than int(*)[20].

    It was in reply to DB which appear to imply that arrays were passed by
    value. Obviously they're not passed by value, so what then? (Please,
    don't rerun the thread! This is where everyone jumped in.)


    I am not sure if you want an answer here or not - you asked "so what
    then", but also asked to avoid a re-run of the thread.

    I can give a summary - and I also hope this doesn't lead to a re-run of
    the discussion. However, since you are asking the same question as you
    did at the start, and the language C has not changed in the meantime,
    the factual and correct answers will inevitably be the same:

    1. C has no "pass by reference" - it is all "pass by value".

    2. In C, you cannot pass an array as a function parameter.

    3. The automatic conversion of many array expressions to pointer
    expressions, along with the similar conversions of function parameter
    types, gives C users a syntax that is similar - but not identical to -
    what you would have if the language supported passing arrays by reference.

    4. Adding "pass by reference" and "arrays as first class objects" would
    both be very significant changes to C - the language, the tools, and the
    way C code is written. Doing so in a way that significantly improves on
    the current situation (such as including element count information when
    passing arrays) would require even more changes. Either array objects
    would need to hold run-time count information (changing the data held, massively complicating "slices", and introducing run-time efficiency overheads), or the information would need to be part of the array's
    type, requiring templates, overloaded functions, and other mechanisms.
    Both are viable techniques for other languages, but not appropriate for C.

    The C solution to handling arrays has its limitations, and has potential
    to cause some people significant confusion. It is important to be
    careful in your coding, and to make use of tools that help spot errors,
    but that applies to all programming in all languages. The C solution
    works well in practice for many situations, with minimal complications
    and maximal run-time efficiency.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Ben Bacarisse on Fri Aug 16 11:42:43 2024
    On 16/08/2024 02:08, Ben Bacarisse wrote:
    Bart <bc@freeuk.com> writes:



    In general there is no reason, in a language with true call-by-reference,
    why any parameter type T (which has the form U*, a pointer to anything),
    cannot be passed by reference. It doesn't matter whether U is an array type >> or not.

    I can't unravel this. Take, as a concrete example, C++. You can't pass
    a pointer to function that takes an array passed by reference. You can,
    of course, pass a pointer by reference, but that is neither here nor
    there.


    In C++, you can't pass arrays as parameters at all - the language
    inherited C's handling of arrays. You can, of course, pass objects of std::array<> type by value or by reference, just like any other class types.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Ben Bacarisse on Fri Aug 16 16:31:33 2024
    On 16/08/2024 12:00, Ben Bacarisse wrote:
    David Brown <david.brown@hesbynett.no> writes:

    On 16/08/2024 02:08, Ben Bacarisse wrote:
    Bart <bc@freeuk.com> writes:

    In general there is no reason, in a language with true call-by-reference, >>>> why any parameter type T (which has the form U*, a pointer to anything), >>>> cannot be passed by reference. It doesn't matter whether U is an array type
    or not.
    I can't unravel this. Take, as a concrete example, C++. You can't pass >>> a pointer to function that takes an array passed by reference. You can, >>> of course, pass a pointer by reference, but that is neither here nor
    there.

    In C++, you can't pass arrays as parameters at all - the language inherited >> C's handling of arrays. You can, of course, pass objects of std::array<>
    type by value or by reference, just like any other class types.

    The best way to think about C++ (in my very non-expert opinion) is to consider references as values that are passed by, err..., value. But
    you seem prepared to accept that some things can be "passed by reference"
    in C++.

    That seems a subtle distinction - I'll have to think about it a little.
    I like your description of arguments being like local variable
    initialisation - it makes sense equally well regardless of whether the parameter is "int", "int*", or "int&". (It's probably best not to
    mention the other one in this group...)

    So if this:

    #include <iostream>

    void g(int &i) { std::cout << i << "\n"; }

    int main(void)
    {
    int I{0};
    g(I);
    }

    shows an int object, I, being passed to g, why does this

    #include <iostream>

    void f(int (&ar)[10]) { std::cout << sizeof ar << "\n"; }

    int main(void)
    {
    int A[10];
    f(A);
    }

    not show an array, A, being passed to f?


    That's backwards compatibility with C array handling at play. I
    personally would use std::array<int, 10> here, and pass that by
    reference (or pass a reference to it, if that is a more accurate
    description).

    C++ does suffer somewhat from being "C with classes added as
    afterthought", rather than being designed as a new language from the start.


    As I said, I don't think it's wise to look at it this way, but I am just borrowing your use of terms to try to tease out what you are getting at.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Fri Aug 16 16:51:02 2024
    On 16/08/2024 13:45, Bart wrote:
    On 16/08/2024 09:04, David Brown wrote:
    On 15/08/2024 18:08, Bart wrote:

    These were my original comments on the subject made to DB:


    DB:
    ;. In C, there is no "pass by reference" or "return by reference".
    It is all done by value.

    BC:

    ;Arrays are passed by reference:

    ;  void F(int a[20]) {}

    ;  int main(void) {
    ;    int x[20];
    ;    F(x);
    ;  }

    Although the type of 'a' inside 'F' will be int* rather than
    int(*)[20].

    It was in reply to DB which appear to imply that arrays were passed
    by value. Obviously they're not passed by value, so what then?
    (Please, don't rerun the thread! This is where everyone jumped in.)


    I am not sure if you want an answer here or not - you asked "so what
    then", but also asked to avoid a re-run of the thread.

    I can give a summary - and I also hope this doesn't lead to a re-run
    of the discussion.  However, since you are asking the same question as
    you did at the start, and the language C has not changed in the
    meantime, the factual and correct answers will inevitably be the same:

    1. C has no "pass by reference" - it is all "pass by value".

    2. In C, you cannot pass an array as a function parameter.

    3. The automatic conversion of many array expressions to pointer
    expressions, along with the similar conversions of function parameter
    types, gives C users a syntax that is similar - but not identical to -
    what you would have if the language supported passing arrays by
    reference.

    So, you agree that it is similar to.

    Yes. That has never been in doubt - I've agreed to it all along, as has everyone else. But "similar to" does not mean "the same as".

    And not just the resulting syntax,
    but the semantics and even the generated code can be the same (as I demonstrated but somebody complained).

    They /can/ be the same - and they can be different. If C could pass
    arrays as parameters (it can't), and if it had pass by reference (it
    doesn't), then "sizeof" would give the size of the array passed, not the
    size of a pointer. The type of the parameter would be an array type,
    not a pointer. So while the semantics and expected generated code will
    be the same for some functions, it will be different for other
    functions. Hence, "similar to", and not "the same as".


    Would you agree that they are effectively passed by-reference for all practical purposes?

    No.

    I would agree that they let you write code in the same (or very similar)
    manner and with the same effect, for /some/ practical code. But they
    are most certainly not the same for /all/ practical purposes.

    int sum(int A[]) {
    int s = 0;
    for (int i = 0; i < sizeof(A) / sizeof(int); i++) {
    s += A[i];
    }
    }

    That would seem an obvious way to write an array sum function, if C
    could pass arrays by reference. And if such code were supported by C
    and did what a naïve programmer thought, it would certainly be of
    practical use. But C does not support passing arrays, and it does not
    support passing by reference - the conversion of the parameter to a
    pointer to the first element means such code has a very significant
    difference.

    But of course for /some/ practical code, the results are the same.


    All the other differences in detail are mostly due to the weird way that
    C handles arrays anyway.

    Well, yes. (Though the "details" are important.) But if C did not
    handle arrays in that way, there'd be other issues. You might be able
    to pass arrays by value, but you'd still not be able to pass them by
    reference. And now a function that takes an array of 10 int's is
    completely different from a function that takes an array of 20 int's.


    4. Adding "pass by reference" and "arrays as first class objects"
    would both be very significant changes to C


    Adding pass-by-reference would not be a huge change. I added that using
    a cheap and cheerful approach that seems work well enough (a parameter
    marked as by-ref, would have '&' automatically applied on arguments, and
    '*' automatically applied to parameter accesses in the callee**).

    Cheap and cheerful approaches are fine for cheap and cheerful languages.
    You don't have to think about the implications of such changes, or
    corner cases, or what other people think, or how it could affect
    existing code, or how to document or teach it.


    But what would complicate it in C is how it interacts with how arrays currently work. For example, passing array A already passes '&A[0]'; it
    can't really pass '&&A[0]' if it's marked as being by-reference!


    (** There were some side-effects: while you can pass a char or short to
    an int parameter for example and it will promote it, if the int is by-reference, you can only pass an exact int type. And also, I wasn't
    able to apply default values to optional by-reference parameters.)



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Tim Rentsch on Mon Aug 19 09:26:46 2024
    On 19/08/2024 03:03, Tim Rentsch wrote:
    Ben Bacarisse <ben@bsb.me.uk> writes:

    David Brown <david.brown@hesbynett.no> writes:

    On 16/08/2024 12:00, Ben Bacarisse wrote:

    David Brown <david.brown@hesbynett.no> writes:

    On 16/08/2024 02:08, Ben Bacarisse wrote:

    Bart <bc@freeuk.com> writes:

    In general there is no reason, in a language with true
    call-by-reference, why any parameter type T (which has the form
    U*, a pointer to anything), cannot be passed by reference. It
    doesn't matter whether U is an array type or not.

    I can't unravel this. Take, as a concrete example, C++. You
    can't pass a pointer to function that takes an array passed by
    reference. You can, of course, pass a pointer by reference, but
    that is neither here nor there.

    In C++, you can't pass arrays as parameters at all - the language
    inherited C's handling of arrays. You can, of course, pass
    objects of std::array<> type by value or by reference, just like
    any other class types.

    The best way to think about C++ (in my very non-expert opinion) is
    to consider references as values that are passed by, err...,
    value. But you seem prepared to accept that some things can be
    "passed by reference" in C++.

    That seems a subtle distinction - I'll have to think about it a
    little. I like your description of arguments being like local
    variable initialisation - it makes sense equally well regardless of
    whether the parameter is "int", "int*", or "int&". (It's probably
    best not to mention the other one in this group...)

    So if this:
    #include <iostream>
    void g(int &i) { std::cout << i << "\n"; }
    int main(void)
    {
    int I{0};
    g(I);
    }
    shows an int object, I, being passed to g, why does this
    #include <iostream>
    void f(int (&ar)[10]) { std::cout << sizeof ar << "\n"; }
    int main(void)
    {
    int A[10];
    f(A);
    }
    not show an array, A, being passed to f?

    That's backwards compatibility with C array handling at play.

    I'm not sure how this answers my question. Maybe you weren't
    answering it and were just making a remark...

    My guess is he didn't understand the question. The code shown
    has nothing to do with backwards compatibility with C array
    handling.

    I had intended to make a brief remark and thought that was all that was
    needed to answer the question. But having thought about it a bit more (prompted by these last two posts), and tested the code (on the
    assumption that the gcc writers know the details better than I do), you
    are correct - I did misunderstand the question. I was wrong in how I
    thought array reference parameters worked in C++, and the way Ben worded
    the question re-enforced that misunderstanding.

    I interpreted his question as saying that the code "f" does not show an
    array type being passed by reference, with the implication that the
    "sizeof" showed the size of a pointer, not the size of an array of 10
    ints, and asking why C++ was defined that way. The answer, as I saw it,
    was that C++ made reference parameters to arrays work much like pointer parameters to arrays, and those work like in C for backwards compatibility.

    Of course, it turns out I was completely wrong about how array type
    reference parameters work in C++. It's not something I have had use for
    in my own C++ programming or something I have come across in other code
    that I can remember, and I had made incorrect assumptions about it. Now
    that I corrected that, it all makes a lot more sense.

    And so I presume Ben was actually asking why I /thought/ this was not
    passing an array type (thus with its full type information, including
    its size). Then answer there is now obvious - I thought that because I
    had jumped to incorrect conclusions about array reference parameters in C++.

    So thank you (Ben and Tim) for pushing me to correct my C++
    misunderstanding here, and apologies to anyone confused by my mistake.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)