• Why volatile may make sense for parallel code today.

    From Bonita Montero@21:1/5 to All on Wed Nov 22 17:35:08 2023
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
    constexpr size_t ROUNDS = 1'000'000;
    size_t volatile r = 1'000'000;
    jthread thr( [&]()
    {
    while( r )
    SleepEx( INFINITE, TRUE );
    } );
    for( size_t r = ROUNDS; r--; )
    QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, thr.native_handle(), (ULONG_PTR)&r );
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Wed Nov 22 13:22:06 2023
    On 11/22/2023 8:35 AM, Bonita Montero wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
        constexpr size_t ROUNDS = 1'000'000;
        size_t volatile r = 1'000'000;
        jthread thr( [&]()
            {
                while( r )
                    SleepEx( INFINITE, TRUE );
            } );
        for( size_t r = ROUNDS; r--; )
            QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, thr.native_handle(), (ULONG_PTR)&r );
    }

    std::atomic<size_t> r

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From red floyd@21:1/5 to Chris M. Thomasson on Wed Nov 22 16:17:40 2023
    On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
    On 11/22/2023 8:35 AM, Bonita Montero wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
         constexpr size_t ROUNDS = 1'000'000;
         size_t volatile r = 1'000'000;
         jthread thr( [&]()
             {
                 while( r )
                     SleepEx( INFINITE, TRUE );
             } );
         for( size_t r = ROUNDS; r--; )
             QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
    thr.native_handle(), (ULONG_PTR)&r );
    }

    std::atomic<size_t> r


    I'm confused. Does std:atomic imply "do not optimize access to this
    variable"? Because if it doesn't, then I can see how the "while (r)"
    loop can just spin.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to red floyd on Wed Nov 22 21:05:13 2023
    On 11/22/2023 4:17 PM, red floyd wrote:
    On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
    On 11/22/2023 8:35 AM, Bonita Montero wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
         constexpr size_t ROUNDS = 1'000'000;
         size_t volatile r = 1'000'000;
         jthread thr( [&]()
             {
                 while( r )
                     SleepEx( INFINITE, TRUE );
             } );
         for( size_t r = ROUNDS; r--; )
             QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>> thr.native_handle(), (ULONG_PTR)&r );
    }

    std::atomic<size_t> r


    I'm confused.  Does std:atomic imply "do not optimize access to this variable"?  Because if it doesn't, then I can see how the "while (r)"
    loop can just spin.



    std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Chris M. Thomasson on Wed Nov 22 21:06:25 2023
    On 11/22/2023 9:05 PM, Chris M. Thomasson wrote:
    On 11/22/2023 4:17 PM, red floyd wrote:
    On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
    On 11/22/2023 8:35 AM, Bonita Montero wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
         constexpr size_t ROUNDS = 1'000'000;
         size_t volatile r = 1'000'000;
         jthread thr( [&]()
             {
                 while( r )
                     SleepEx( INFINITE, TRUE );
             } );
         for( size_t r = ROUNDS; r--; )
             QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>> thr.native_handle(), (ULONG_PTR)&r );
    }

    std::atomic<size_t> r


    I'm confused.  Does std:atomic imply "do not optimize access to this
    variable"?  Because if it doesn't, then I can see how the "while (r)"
    loop can just spin.



    std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

    Afaict, std::atomic should imply volatile? Right? If not, please correct me!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Thu Nov 23 08:08:13 2023
    Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

    std::atomic<size_t> r

    The trick with my code is that the APC function is executed in the same
    thread context as the function rpeatedly probing r as an end-indicator,
    so I don't need atomic here. You should have known it better.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Thu Nov 23 09:31:21 2023
    Am 23.11.2023 um 09:26 schrieb Kaz Kylheku:
    On 2023-11-23, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

    std::atomic<size_t> r

    The trick with my code is that the APC function is executed in the same
    thread context as the function rpeatedly probing r as an end-indicator,

    So what is "parallel" doing your subject line?

    The parallel part is injecting the APC with QueueUserAPC().
    Interestingly there's some framework-code calling the function object
    which I use for the thread code that brings the thread into an alertable
    state so that the loop never loops because r is already zero.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Bonita Montero on Thu Nov 23 08:26:39 2023
    On 2023-11-23, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

    std::atomic<size_t> r

    The trick with my code is that the APC function is executed in the same thread context as the function rpeatedly probing r as an end-indicator,

    So what is "parallel" doing your subject line?

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Chris M. Thomasson on Thu Nov 23 10:02:58 2023
    On 23/11/2023 06:05, Chris M. Thomasson wrote:
    On 11/22/2023 4:17 PM, red floyd wrote:
    On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
    On 11/22/2023 8:35 AM, Bonita Montero wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
         constexpr size_t ROUNDS = 1'000'000;
         size_t volatile r = 1'000'000;
         jthread thr( [&]()
             {
                 while( r )
                     SleepEx( INFINITE, TRUE );
             } );
         for( size_t r = ROUNDS; r--; )
             QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>> thr.native_handle(), (ULONG_PTR)&r );
    }

    std::atomic<size_t> r


    I'm confused.  Does std:atomic imply "do not optimize access to this
    variable"?  Because if it doesn't, then I can see how the "while (r)"
    loop can just spin.



    std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?


    You will probably find that compilers in practice will re-read "r" each
    round of the loop, regardless of the memory order. I am not convinced
    this would be required for "relaxed", but compilers generally do not
    optimise atomics as much as they are allowed to. They are, as far as I
    have seen in my far from comprehensive testing, treated as though
    "atomic" implied "volatile".

    But as far as I can tell from the C and C++ standards, "atomic" does not
    imply "volatile". There are situations where atomics can be "optimised"
    - re-ordered with respect to other code, or simplified - while volatile
    atomics cannot. I can see no reason why adjacent non-volatile relaxed
    atomic reads of the same object cannot be combined, even if separated by
    other code (with no volatile or atomic accesses). The same goes for
    writes. If you have :

    std::atomic<int> ax = 100;

    ...

    x = 1;
    x += 2;
    x = x * x;

    then you are guaranteed that any other thread reading "ax" will see
    either the old value (100, if it was not changed), or the final value of
    9. It /might/ also see values of 1 or 3 along the way, but there is no requirement for the code to produce these intermediate values or for
    them to be visible to other threads.

    At least, that is how I interpret things. And I believe the fact that
    the C and C++ standards make a distinction between atomics and volatile
    atomics indicates that the standard authors do not see "atomic" as
    implying the semantics of "volatile" - even if compiler writers choose
    to act that way.


    I personally thing it was a terrible mistake to mix sequencing and
    ordering with atomics when multi-threading was introduced to the C and
    C++ standards. Atomics would have been simpler, more efficient, and
    consistent with their naming if their semantics had not included any
    kind of synchronisation. Synchronisation and ordering is a very
    different concept from atomic access, and should be covered differently
    (by fences of various sorts).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Thu Nov 23 10:35:36 2023
    Am 23.11.2023 um 10:02 schrieb David Brown:

    I am not convinced ....
    ... but compilers generally do not optimise atomics as much as they are allowed to.

    Is there some kind of contradiction ?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Thu Nov 23 15:39:07 2023
    Am 23.11.2023 um 09:31 schrieb Bonita Montero:

    Interestingly there's some framework-code calling the function object
    which I use for the thread code that brings the thread into an alertable state so that the loop never loops because r is already zero.

    It's for sure no framework code: I've checked the code with a Win32
    thread created through CreateThread() and my APCs are eaten up before
    the thread's main function runs. Really strange.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Chris M. Thomasson on Thu Nov 23 11:07:31 2023
    On 11/23/23 12:05 AM, Chris M. Thomasson wrote:
    On 11/22/2023 4:17 PM, red floyd wrote:
    On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
    On 11/22/2023 8:35 AM, Bonita Montero wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
         constexpr size_t ROUNDS = 1'000'000;
         size_t volatile r = 1'000'000;
         jthread thr( [&]()
             {
                 while( r )
                     SleepEx( INFINITE, TRUE );
             } );
         for( size_t r = ROUNDS; r--; )
             QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>> thr.native_handle(), (ULONG_PTR)&r );
    }

    std::atomic<size_t> r


    I'm confused.  Does std:atomic imply "do not optimize access to this
    variable"?  Because if it doesn't, then I can see how the "while (r)"
    loop can just spin.



    std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

    My understanding is that std:atomic needs to honor a read in the sense
    that it wlll get the most recent value that has happened "before" the
    read (as determined by memory order).

    So, if nothing in the loop can establish a time order with respect to
    other threads, then it should be allowed for the compiler to optimize
    out the read. SleepEx could (and should) establish time order, so the
    compiler can't, in this case, optimize the read away.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Thu Nov 23 17:20:33 2023
    Am 23.11.2023 um 17:07 schrieb Richard Damon:

    My understanding is that std:atomic needs to honor a read in the sense
    that it wlll get the most recent value that has happened "before" the
    read (as determined by memory order).

    ... and the read is atomic - even if the trivial object is 1kB in size.

    So, if nothing in the loop can establish a time order with respect to
    other threads, then it should be allowed for the compiler to optimize
    out the read. ...

    An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Bonita Montero on Thu Nov 23 12:50:50 2023
    On 11/23/23 11:20 AM, Bonita Montero wrote:
    Am 23.11.2023 um 17:07 schrieb Richard Damon:

    My understanding is that std:atomic needs to honor a read in the sense
    that it wlll get the most recent value that has happened "before" the
    read (as determined by memory order).

    ... and the read is atomic - even if the trivial object is 1kB in size.

    Yes, which has nothing to do with the question.


    So, if nothing in the loop can establish a time order with respect to
    other threads, then it should be allowed for the compiler to optimize
    out the read. ...

    An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.


    Yes, the atomic itself doesn't "cache" the data, but as far as I read,
    there is no requirement to refetch the data if the code still has the
    old value around, and it hasn't been invalidated by possible memory
    ordering.

    If there can't be a write "before" the second read, that wasn't also
    also "after" the first read, then there is no requirement to refetch the
    data. In relaxed memory orders, just being physically before isn't
    enough to be "before", but you need some explicit "barrier" to establish it.

    I will admit this isn't an area I consider myself an expert in, but I
    find no words that prohibit the optimization. The implementation does
    need to consider possible action by other "threads" but only as far a constrained by memory order, so two reads in the same ordering "slot"
    are not forced.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to David Brown on Thu Nov 23 19:55:27 2023
    David Brown <david.brown@hesbynett.no> writes:
    On 23/11/2023 06:05, Chris M. Thomasson wrote:
    On 11/22/2023 4:17 PM, red floyd wrote:
    On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
    On 11/22/2023 8:35 AM, Bonita Montero wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
         constexpr size_t ROUNDS = 1'000'000;
         size_t volatile r = 1'000'000;
         jthread thr( [&]()
             {
                 while( r )
                     SleepEx( INFINITE, TRUE );
             } );
         for( size_t r = ROUNDS; r--; )
             QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>>> thr.native_handle(), (ULONG_PTR)&r );
    }

    std::atomic<size_t> r


    I'm confused.  Does std:atomic imply "do not optimize access to this
    variable"?  Because if it doesn't, then I can see how the "while (r)"
    loop can just spin.



    std::atomic should honor a read, when you read it even from
    std::memory_order_relaxed. If not, imvvvvhhooo, its broken?


    You will probably find that compilers in practice will re-read "r" each
    round of the loop, regardless of the memory order. I am not convinced
    this would be required for "relaxed", but compilers generally do not
    optimise atomics as much as they are allowed to. They are, as far as I
    have seen in my far from comprehensive testing, treated as though
    "atomic" implied "volatile".

    Linux tends to apply the volatile qualifier on the access, rather
    than the definition.

    #define ACCESS_ONCE(x) (*(volatile __typeof__(x) *)&(x))

    while (ACCESS_ONCE(r)) {
    }

    Makes it rather obvious when reading the code what the intent
    is, and won't be affected of someone accidentially removes the
    volatile qualifier from the declaration of r.

    Works just fine in c++, too.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Bonita Montero on Thu Nov 23 19:56:27 2023
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

    std::atomic<size_t> r

    The trick with my code

    That's enough to fail a job interview....

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Bonita Montero on Thu Nov 23 20:32:33 2023
    On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
    constexpr size_t ROUNDS = 1'000'000;
    size_t volatile r = 1'000'000;
    jthread thr( [&]()
    {
    while( r )
    SleepEx( INFINITE, TRUE );
    } );
    for( size_t r = ROUNDS; r--; )

    This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

    QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, thr.native_handle(), (ULONG_PTR)&r );

    Thus, this takes the address of the for loop's r variable, not the volatile one that the thread is accessing. Is that what you wanted?

    BTW, is the C++ lambda too broken to access the r via lexical scoping?
    Why can't the APC just do "--r".

    I believe local functions in Pascal from 1971 can do this.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Fri Nov 24 05:27:24 2023
    Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:
    On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
    constexpr size_t ROUNDS = 1'000'000;
    size_t volatile r = 1'000'000;
    jthread thr( [&]()
    {
    while( r )
    SleepEx( INFINITE, TRUE );
    } );
    for( size_t r = ROUNDS; r--; )

    This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

    QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
    thr.native_handle(), (ULONG_PTR)&r );

    Thus, this takes the address of the for loop's r variable, not the volatile one
    that the thread is accessing. Is that what you wanted?

    BTW, is the C++ lambda too broken to access the r via lexical scoping?
    Why can't the APC just do "--r".

    I believe local functions in Pascal from 1971 can do this.


    I already corrected that with my code and I guessed no one will notice
    that here; usually I'm right with that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Fri Nov 24 05:26:11 2023
    Am 23.11.2023 um 20:56 schrieb Scott Lurndal:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

    std::atomic<size_t> r

    The trick with my code

    That's enough to fail a job interview....

    ... with a nerd like you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Scott Lurndal on Thu Nov 23 21:47:03 2023
    On 11/23/2023 11:56 AM, Scott Lurndal wrote:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

    std::atomic<size_t> r

    The trick with my code

    That's enough to fail a job interview....

    Wow, no shit Scott. Yikes!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Thu Nov 23 21:53:27 2023
    On 11/23/2023 8:26 PM, Bonita Montero wrote:
    Am 23.11.2023 um 20:56 schrieb Scott Lurndal:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

    std::atomic<size_t> r

    The trick with my code

    That's enough to fail a job interview....

    ... with a nerd like you.


    Huh? What does that even mean? Really, humm... ;^o

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Thu Nov 23 21:59:17 2023
    On 11/23/2023 8:26 PM, Bonita Montero wrote:
    Am 23.11.2023 um 20:56 schrieb Scott Lurndal:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

    std::atomic<size_t> r

    The trick with my code

    That's enough to fail a job interview....

    ... with a nerd like you.


    Do you secretly like nerds? https://youtu.be/7dP1Vp1E-bo

    lol!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Thu Nov 23 21:55:12 2023
    On 11/23/2023 8:27 PM, Bonita Montero wrote:
    Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:
    On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
        constexpr size_t ROUNDS = 1'000'000;
        size_t volatile r = 1'000'000;
        jthread thr( [&]()
            {
                while( r )
                    SleepEx( INFINITE, TRUE );
            } );
        for( size_t r = ROUNDS; r--; )

    This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

            QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
    thr.native_handle(), (ULONG_PTR)&r );

    Thus, this takes the address of the for loop's r variable, not the
    volatile one
    that the thread is accessing. Is that what you wanted?

    BTW, is the C++ lambda too broken to access the r via lexical scoping?
    Why can't the APC just do "--r".

    I believe local functions in Pascal from 1971 can do this.


    I already corrected that with my code and I guessed no one will notice
    that here; usually I'm right with that.


    Usually, wrong, or always right? humm...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Thu Nov 23 22:05:39 2023
    On 11/23/2023 8:20 AM, Bonita Montero wrote:
    Am 23.11.2023 um 17:07 schrieb Richard Damon:

    My understanding is that std:atomic needs to honor a read in the sense
    that it wlll get the most recent value that has happened "before" the
    read (as determined by memory order).

    ... and the read is atomic - even if the trivial object is 1kB in size.

    humm.. Say, the read is from a word in memory. Define your trivial
    object, POD, l2 cache line sized, and aligned on a l2 cache line
    boundary? Are you refering to how certain arch works?


    So, if nothing in the loop can establish a time order with respect to
    other threads, then it should be allowed for the compiler to optimize
    out the read. ...

    An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.




    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Chris M. Thomasson on Thu Nov 23 22:53:24 2023
    On 11/23/2023 10:05 PM, Chris M. Thomasson wrote:
    On 11/23/2023 8:20 AM, Bonita Montero wrote:
    Am 23.11.2023 um 17:07 schrieb Richard Damon:

    My understanding is that std:atomic needs to honor a read in the
    sense that it wlll get the most recent value that has happened
    "before" the read (as determined by memory order).

    ... and the read is atomic - even if the trivial object is 1kB in size.

    humm.. Say, the read is from a word in memory. Define your trivial
    object, POD, l2 cache line sized, and aligned on a l2 cache line
    boundary? Are you refering to how certain arch works?

    How many words in your cache lines, say l2?



    So, if nothing in the loop can establish a time order with respect to
    other threads, then it should be allowed for the compiler to optimize
    out the read. ...

    An atomic doesn't cache repeatable reads. The order memory-consistency
    parameter is just for the ordering of other reads and writes.





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Fri Nov 24 08:53:49 2023
    Am 24.11.2023 um 07:05 schrieb Chris M. Thomasson:

    humm.. Say, the read is from a word in memory. Define your trivial
    object, POD, l2 cache line sized, and aligned on a l2 cache line
    boundary? Are you refering to how certain arch works?

    Read that: https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Fri Nov 24 08:57:38 2023
    Am 23.11.2023 um 18:50 schrieb Richard Damon:

    Yes, the atomic itself doesn't "cache" the data, but as far as I read,
    there is no requirement to refetch the data if the code still has the
    old value around, and it hasn't been invalidated by possible memory
    ordering.

    I don't believe that, think about an atomic flag that is periodically
    polled. The compiler shouldn't cache that value.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Fri Nov 24 00:16:07 2023
    On 11/23/2023 11:57 PM, Bonita Montero wrote:
    Am 23.11.2023 um 18:50 schrieb Richard Damon:

    Yes, the atomic itself doesn't "cache" the data, but as far as I read,
    there is no requirement to refetch the data if the code still has the
    old value around, and it hasn't been invalidated by possible memory
    ordering.

    I don't believe that, think about an atomic flag that is periodically
    polled. The compiler shouldn't cache that value.



    std::atomic is going to work for such a flag. Depending on your setup,
    it should be using std::memory_order_relaxed for the polling.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Fri Nov 24 09:30:15 2023
    Am 24.11.2023 um 09:16 schrieb Chris M. Thomasson:

    std::atomic is going to work for such a flag. Depending on your
    setup, it should be using std::memory_order_relaxed for the polling.

    There's also atomic_flag, but it has some limitations over atomic_bool
    that I've never used it. You can set it only in conjunction with an
    atomic read and I never had a use for that. And this relies on a atomic exchange, which costs a lot more than just a byte write.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Fri Nov 24 01:04:50 2023
    On 11/24/2023 12:30 AM, Bonita Montero wrote:
    Am 24.11.2023 um 09:16 schrieb Chris M. Thomasson:

    std::atomic is going to work for such a flag. Depending on your
    setup, it should be using std::memory_order_relaxed for the polling.

    There's also atomic_flag, but it has some limitations over atomic_bool
    that I've never used it. You can set it only in conjunction with an
    atomic read and I never had a use for that. And this relies on a atomic exchange, which costs a lot more than just a byte write.

    Fwiw, this flag should be aligned on a l2 cache line boundary, and
    padded up to a l2 cache line size.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Chris M. Thomasson on Fri Nov 24 01:05:44 2023
    On 11/24/2023 1:04 AM, Chris M. Thomasson wrote:
    On 11/24/2023 12:30 AM, Bonita Montero wrote:
    Am 24.11.2023 um 09:16 schrieb Chris M. Thomasson:

    std::atomic is going to work for such a flag. Depending on your
    setup, it should be using std::memory_order_relaxed for the polling.

    There's also atomic_flag, but it has some limitations over atomic_bool
    that I've never used it. You can set it only in conjunction with an
    atomic read and I never had a use for that. And this relies on a atomic
    exchange, which costs a lot more than just a byte write.

    Fwiw, this flag should be aligned on a l2 cache line boundary, and
    padded up to a l2 cache line size.

    You can stuff a cache line with words, as long as you do not straddle a
    cache line boundary... YIKES!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Fri Nov 24 01:06:47 2023
    On 11/23/2023 11:53 PM, Bonita Montero wrote:
    Am 24.11.2023 um 07:05 schrieb Chris M. Thomasson:

    humm.. Say, the read is from a word in memory. Define your trivial
    object, POD, l2 cache line sized, and aligned on a l2 cache line
    boundary? Are you refering to how certain arch works?

    Read that: https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects

    I know. Btw, what the hell happened to std::is_pod? ;^)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Fri Nov 24 10:10:13 2023
    Am 24.11.2023 um 10:06 schrieb Chris M. Thomasson:
    On 11/23/2023 11:53 PM, Bonita Montero wrote:
    Am 24.11.2023 um 07:05 schrieb Chris M. Thomasson:

    humm.. Say, the read is from a word in memory. Define your trivial
    object, POD, l2 cache line sized, and aligned on a l2 cache line
    boundary? Are you refering to how certain arch works?

    Read that:
    https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects

    I know. Btw, what the hell happened to std::is_pod? ;^)

    PODs are also trivial but go beyond since you can copy them
    with a memcpy():

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Fri Nov 24 01:14:04 2023
    On 11/23/2023 8:20 AM, Bonita Montero wrote:
    Am 23.11.2023 um 17:07 schrieb Richard Damon:

    My understanding is that std:atomic needs to honor a read in the sense
    that it wlll get the most recent value that has happened "before" the
    read (as determined by memory order).

    ... and the read is atomic - even if the trivial object is 1kB in size.

    How is that read atomic with 1kb of data? On what arch?


    So, if nothing in the loop can establish a time order with respect to
    other threads, then it should be allowed for the compiler to optimize
    out the read. ...

    An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.




    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Scott Lurndal on Fri Nov 24 10:08:35 2023
    On 23/11/2023 20:55, Scott Lurndal wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 23/11/2023 06:05, Chris M. Thomasson wrote:
    On 11/22/2023 4:17 PM, red floyd wrote:
    On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
    On 11/22/2023 8:35 AM, Bonita Montero wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
         constexpr size_t ROUNDS = 1'000'000;
         size_t volatile r = 1'000'000;
         jthread thr( [&]()
             {
                 while( r )
                     SleepEx( INFINITE, TRUE );
             } );
         for( size_t r = ROUNDS; r--; )
             QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
    thr.native_handle(), (ULONG_PTR)&r );
    }

    std::atomic<size_t> r


    I'm confused.  Does std:atomic imply "do not optimize access to this
    variable"?  Because if it doesn't, then I can see how the "while (r)" >>>> loop can just spin.



    std::atomic should honor a read, when you read it even from
    std::memory_order_relaxed. If not, imvvvvhhooo, its broken?


    You will probably find that compilers in practice will re-read "r" each
    round of the loop, regardless of the memory order. I am not convinced
    this would be required for "relaxed", but compilers generally do not
    optimise atomics as much as they are allowed to. They are, as far as I
    have seen in my far from comprehensive testing, treated as though
    "atomic" implied "volatile".

    Linux tends to apply the volatile qualifier on the access, rather
    than the definition.

    #define ACCESS_ONCE(x) (*(volatile __typeof__(x) *)&(x))

    while (ACCESS_ONCE(r)) {
    }

    Makes it rather obvious when reading the code what the intent
    is, and won't be affected of someone accidentially removes the
    volatile qualifier from the declaration of r.

    Works just fine in c++, too.

    That is often my preference too, since it is the access that is
    "volatile" - a "volatile object" is simply one for which all accesses
    are "volatile".

    For the pedants, it might be worth noting that the "cast to pointer to volatile" technique of ACCESS_ONCE is not actually guaranteed to be
    treated as a volatile access in C until C17/C18 when the wording was
    changed to talk about accesses via "volatile lvalues" rather than
    accesses to objects declared as volatile. (When the topic was discussed
    by the committee, everyone agreed that all known compiler vendors
    treated "cast to pointer to volatile" accesses as volatile, so the
    change was a formality rather than any practical difference.) I don't
    know if and when this change was added to C++.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Richard Damon on Fri Nov 24 10:23:00 2023
    On 23/11/2023 18:50, Richard Damon wrote:
    On 11/23/23 11:20 AM, Bonita Montero wrote:
    Am 23.11.2023 um 17:07 schrieb Richard Damon:

    My understanding is that std:atomic needs to honor a read in the
    sense that it wlll get the most recent value that has happened
    "before" the read (as determined by memory order).

    ... and the read is atomic - even if the trivial object is 1kB in size.

    Yes, which has nothing to do with the question.


    So, if nothing in the loop can establish a time order with respect to
    other threads, then it should be allowed for the compiler to optimize
    out the read. ...

    An atomic doesn't cache repeatable reads. The order memory-consistency
    parameter is just for the ordering of other reads and writes.


    Yes, the atomic itself doesn't "cache" the data, but as far as I read,
    there is no requirement to refetch the data if the code still has the
    old value around, and it hasn't been invalidated by possible memory
    ordering.

    If there can't be a write "before" the second read, that wasn't also
    also "after" the first read, then there is no requirement to refetch the data. In relaxed memory orders, just being physically before isn't
    enough to be "before", but you need some explicit "barrier" to establish
    it.

    I will admit this isn't an area I consider myself an expert in, but I
    find no words that prohibit the optimization. The implementation does
    need to consider possible action by other "threads" but only as far a constrained by memory order, so two reads in the same ordering "slot"
    are not forced.

    That is exactly how I see it (I also do not consider myself an expert in
    this area). I cannot see any requirement in the description of the
    execution, covering sequencing, ordering, "happens before", and all the
    rest, that suggests that the number of atomic accesses, or their order
    amongst each other, or their order with respect to volatile accesses or non-volatile accesses, is forced to follow the source code except where
    the atomics have specific sequencing. Atomic accesses are not
    "volatile" - they are not, in themselves, "observable behaviour".

    Because the the sequencing requirements for atomics depends partly on
    things happening in other threads, compilers are much more limited in
    how they can re-order or otherwise optimise atomic accesses than they
    are for normal accesses (unless the compiler knows all about the other
    threads too!). Compilers must be pessimistic about optimisation. But
    for certain simple cases, such as multiple neighbouring atomic reads of
    the same address or multiple neighbouring writes to the same address, I
    can't see any reason why they cannot be combined.

    (Again, I am not an expert here - and I will be happy to be corrected.
    They say the best way to learn something on the internet is not by
    asking questions, but by writing something that is wrong!)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bonita Montero on Fri Nov 24 10:35:49 2023
    On 24/11/2023 05:27, Bonita Montero wrote:

    I already corrected that with my code and I guessed no one will notice
    that here; usually I'm right with that.


    You really believe that?

    I think one of the (many) reasons people don't take you seriously is
    that you never check your work. You invariably post code that is badly
    wrong, followed by multiple replies to yourself making corrections and improvements. Every time you claim your code is bug-free, we know you
    will follow up shortly with a bug fix. Every time you claim it is
    "perfect", we know that you will follow it with an "improved" version ("perfect" and "improved" being in your opinion only).

    Yes, people have noticed. Yes, people will continue to notice.


    It's nice that you post code, however, as it can start some interesting discussions - before descending into a pantomime farce. But it might
    make things a little better if you bothered to re-read your code before posting, or even try testing it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Bonita Montero on Fri Nov 24 18:27:37 2023
    On 2023-11-24, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:
    On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
    constexpr size_t ROUNDS = 1'000'000;
    size_t volatile r = 1'000'000;
    jthread thr( [&]()
    {
    while( r )
    SleepEx( INFINITE, TRUE );
    } );
    for( size_t r = ROUNDS; r--; )

    This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

    QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
    thr.native_handle(), (ULONG_PTR)&r );

    Thus, this takes the address of the for loop's r variable, not the volatile one
    that the thread is accessing. Is that what you wanted?

    BTW, is the C++ lambda too broken to access the r via lexical scoping?
    Why can't the APC just do "--r".

    I believe local functions in Pascal from 1971 can do this.


    I already corrected that with my code and I guessed no one will notice
    that here; usually I'm right with that.

    Anyway, this APC mechanism is quite similar to signal handling.
    Particularly asynchronous signal handling. Just like POSIX signals, it
    makes the execution abruptly call an unrelated function and then resume
    at the interrupted point.

    The main difference is that the signal has a number, which selects a
    registered handler, rather than specifying a function directly.

    Why I bring this up is that ISO C (since 1990, I think), has specified a
    use of a "volatile sig_atomic_t" type in regard to asynchronous signal handlers. (Look it up.)

    The use of volatile with interrupt-like mechanisms is nothing new.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Chris M. Thomasson on Fri Nov 24 15:32:54 2023
    On 11/24/2023 1:14 AM, Chris M. Thomasson wrote:
    On 11/23/2023 8:20 AM, Bonita Montero wrote:
    Am 23.11.2023 um 17:07 schrieb Richard Damon:

    My understanding is that std:atomic needs to honor a read in the
    sense that it wlll get the most recent value that has happened
    "before" the read (as determined by memory order).

    ... and the read is atomic - even if the trivial object is 1kB in size.

    How is that read atomic with 1kb of data? On what arch?

    Unless you atomically read a pointer that points to 1kB of memory.





    So, if nothing in the loop can establish a time order with respect to
    other threads, then it should be allowed for the compiler to optimize
    out the read. ...

    An atomic doesn't cache repeatable reads. The order memory-consistency
    parameter is just for the ordering of other reads and writes.





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Kaz Kylheku on Fri Nov 24 15:37:26 2023
    On 11/24/2023 10:27 AM, Kaz Kylheku wrote:
    On 2023-11-24, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:
    On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:
    #include <Windows.h>
    #include <thread>

    using namespace std;

    int main()
    {
    constexpr size_t ROUNDS = 1'000'000;
    size_t volatile r = 1'000'000;
    jthread thr( [&]()
    {
    while( r )
    SleepEx( INFINITE, TRUE );
    } );
    for( size_t r = ROUNDS; r--; )

    This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"? >>>
    QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
    thr.native_handle(), (ULONG_PTR)&r );

    Thus, this takes the address of the for loop's r variable, not the volatile one
    that the thread is accessing. Is that what you wanted?

    BTW, is the C++ lambda too broken to access the r via lexical scoping?
    Why can't the APC just do "--r".

    I believe local functions in Pascal from 1971 can do this.


    I already corrected that with my code and I guessed no one will notice
    that here; usually I'm right with that.

    Anyway, this APC mechanism is quite similar to signal handling.
    Particularly asynchronous signal handling. Just like POSIX signals, it
    makes the execution abruptly call an unrelated function and then resume
    at the interrupted point.

    The main difference is that the signal has a number, which selects a registered handler, rather than specifying a function directly.

    Why I bring this up is that ISO C (since 1990, I think), has specified a
    use of a "volatile sig_atomic_t" type in regard to asynchronous signal handlers. (Look it up.)

    The use of volatile with interrupt-like mechanisms is nothing new.


    After reading this, for some reason I am now thinking about signal safe
    sync primitives in POSIX. Fwiw, certain pure lock/wait free algorithms
    in signal handlers are okay.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bonita Montero@21:1/5 to All on Sat Nov 25 13:11:22 2023
    Am 24.11.2023 um 19:27 schrieb Kaz Kylheku:

    Anyway, this APC mechanism is quite similar to signal handling.
    Particularly asynchronous signal handling. Just like POSIX signals,
    it makes the execution abruptly call an unrelated function and then
    resume at the interrupted point.

    I don't think so because APCs only can interrupt threads in an alertable
    mode. Signals can interrupt nearly any code and they have implications
    on the compiler's ABI through defining the size of the red zone. So com-
    pared to signals APCs are rather clean, nevertheless you can do a lot of interesting things with signals, as reported lately when I was informed
    that mutexes with the glibc rely on signals; I gues it's the same when
    a thread waits for a condition variable and a mutex at once.

    The main difference is that the signal has a number, which selects
    a registered handler, rather than specifying a function directly.

    The ugly thing with synchronous signals is that the signal handler is
    global for all threads. You can concatenate them but the next signal
    handler in the chain may be in a shared object already unloaded. I
    think this should be corrected my making synchronous signals' handlers thread-specific.

    The use of volatile with interrupt-like mechanisms is nothing new.

    I think this pattern doesn't happen very often since it's rare that
    a signal shares state with the interrupted code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris M. Thomasson@21:1/5 to Bonita Montero on Sat Nov 25 14:46:18 2023
    On 11/25/2023 4:11 AM, Bonita Montero wrote:
    Am 24.11.2023 um 19:27 schrieb Kaz Kylheku:

    Anyway, this APC mechanism is quite similar to signal handling.
    Particularly asynchronous signal handling. Just like POSIX signals,
    it makes the execution abruptly call an unrelated function and then
    resume at the interrupted point.

    I don't think so because APCs only can interrupt threads in an alertable mode. Signals can interrupt nearly any code and they have implications
    on the compiler's ABI through defining the size of the red zone. So com- pared to signals APCs are rather clean, nevertheless you can do a lot of interesting things with signals, as reported lately when I was informed
    that mutexes with the glibc rely on signals; I gues it's the same when
    a thread waits for a condition variable and a mutex at once.

    The main difference is that the signal has a number, which selects
    a registered handler, rather than specifying a function directly.

    The ugly thing with synchronous signals is that the signal handler is
    global for all threads. You can concatenate them but the next signal
    handler in the chain may be in a shared object already unloaded. I
    think this should be corrected my making synchronous signals' handlers thread-specific.

    The use of volatile with interrupt-like mechanisms is nothing new.

    I think this pattern doesn't happen very often since it's rare that
    a signal shares state with the interrupted code.


    You might be interested in how pthreads-win32 handles async thread cancellation. Iirc, it uses a kernel module. It's hackish, but interesting.

    https://sourceware.org/pthreads-win32/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Mueller@21:1/5 to All on Fri Dec 1 08:50:04 2023
    Am 23.11.23 um 06:06 schrieb Chris M. Thomasson:
    Afaict, std::atomic should imply volatile? Right? If not, please correct
    me!

    In practice yes. But is this required by the standard? I could not find
    any hint. Strictly speaking it is still required.

    In fact memory ordering does not guarantee any particular time when the
    change appears at another thread. So there is always some delay. But
    could it be infinite? So the compiler could cache anything if no other
    memory access according to the memory barrier is generated by the code?
    This applies to read and write.
    But I think it is almost impossible to write any reasonable code, that
    causes no other memory access that forces the atomic value to be read or written.


    Marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)