Forum: >>> Magnum BBS <<<

Why volatile may make sense for parallel code today.

From Bonita Montero@21:1/5 to All on Wed Nov 22 17:35:08 2023

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, thr.native_handle(), (ULONG_PTR)&r );
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Wed Nov 22 13:22:06 2023

On 11/22/2023 8:35 AM, Bonita Montero wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
    constexpr size_t ROUNDS = 1'000'000;
    size_t volatile r = 1'000'000;
    jthread thr( [&]()
        {
            while( r )
                SleepEx( INFINITE, TRUE );
        } );
    for( size_t r = ROUNDS; r--; )
        QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From red floyd@21:1/5 to Chris M. Thomasson on Wed Nov 22 16:17:40 2023

On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:

On 11/22/2023 8:35 AM, Bonita Montero wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to red floyd on Wed Nov 22 21:05:13 2023

On 11/22/2023 4:17 PM, red floyd wrote:

On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:

On 11/22/2023 8:35 AM, Bonita Montero wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>> thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Chris M. Thomasson on Wed Nov 22 21:06:25 2023

On 11/22/2023 9:05 PM, Chris M. Thomasson wrote:

On 11/22/2023 4:17 PM, red floyd wrote:

On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:

On 11/22/2023 8:35 AM, Bonita Montero wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>> thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

Afaict, std::atomic should imply volatile? Right? If not, please correct me!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Thu Nov 23 08:08:13 2023

Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

std::atomic<size_t> r

The trick with my code is that the APC function is executed in the same
thread context as the function rpeatedly probing r as an end-indicator,
so I don't need atomic here. You should have known it better.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Thu Nov 23 09:31:21 2023

Am 23.11.2023 um 09:26 schrieb Kaz Kylheku:

On 2023-11-23, Bonita Montero <Bonita.Montero@gmail.com> wrote:

Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

std::atomic<size_t> r

The trick with my code is that the APC function is executed in the same
thread context as the function rpeatedly probing r as an end-indicator,

So what is "parallel" doing your subject line?

The parallel part is injecting the APC with QueueUserAPC().
Interestingly there's some framework-code calling the function object
which I use for the thread code that brings the thread into an alertable
state so that the loop never loops because r is already zero.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Bonita Montero on Thu Nov 23 08:26:39 2023

On 2023-11-23, Bonita Montero <Bonita.Montero@gmail.com> wrote:

Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

std::atomic<size_t> r

The trick with my code is that the APC function is executed in the same thread context as the function rpeatedly probing r as an end-indicator,

So what is "parallel" doing your subject line?

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Chris M. Thomasson on Thu Nov 23 10:02:58 2023

On 23/11/2023 06:05, Chris M. Thomasson wrote:

On 11/22/2023 4:17 PM, red floyd wrote:

On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:

On 11/22/2023 8:35 AM, Bonita Montero wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>> thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

You will probably find that compilers in practice will re-read "r" each
round of the loop, regardless of the memory order. I am not convinced
this would be required for "relaxed", but compilers generally do not
optimise atomics as much as they are allowed to. They are, as far as I
have seen in my far from comprehensive testing, treated as though
"atomic" implied "volatile".

But as far as I can tell from the C and C++ standards, "atomic" does not
imply "volatile". There are situations where atomics can be "optimised"
- re-ordered with respect to other code, or simplified - while volatile
atomics cannot. I can see no reason why adjacent non-volatile relaxed
atomic reads of the same object cannot be combined, even if separated by
other code (with no volatile or atomic accesses). The same goes for
writes. If you have :

std::atomic<int> ax = 100;

...

x = 1;
x += 2;
x = x * x;

then you are guaranteed that any other thread reading "ax" will see
either the old value (100, if it was not changed), or the final value of
9. It /might/ also see values of 1 or 3 along the way, but there is no requirement for the code to produce these intermediate values or for
them to be visible to other threads.

At least, that is how I interpret things. And I believe the fact that
the C and C++ standards make a distinction between atomics and volatile
atomics indicates that the standard authors do not see "atomic" as
implying the semantics of "volatile" - even if compiler writers choose
to act that way.

I personally thing it was a terrible mistake to mix sequencing and
ordering with atomics when multi-threading was introduced to the C and
C++ standards. Atomics would have been simpler, more efficient, and
consistent with their naming if their semantics had not included any
kind of synchronisation. Synchronisation and ordering is a very
different concept from atomic access, and should be covered differently
(by fences of various sorts).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Thu Nov 23 10:35:36 2023

Am 23.11.2023 um 10:02 schrieb David Brown:

I am not convinced ....
... but compilers generally do not optimise atomics as much as they are allowed to.

Is there some kind of contradiction ?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Thu Nov 23 15:39:07 2023

Am 23.11.2023 um 09:31 schrieb Bonita Montero:

Interestingly there's some framework-code calling the function object
which I use for the thread code that brings the thread into an alertable state so that the loop never loops because r is already zero.

It's for sure no framework code: I've checked the code with a Win32
thread created through CreateThread() and my APCs are eaten up before
the thread's main function runs. Really strange.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Chris M. Thomasson on Thu Nov 23 11:07:31 2023

On 11/23/23 12:05 AM, Chris M. Thomasson wrote:

On 11/22/2023 4:17 PM, red floyd wrote:

On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:

On 11/22/2023 8:35 AM, Bonita Montero wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>> thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).

So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. SleepEx could (and should) establish time order, so the
compiler can't, in this case, optimize the read away.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Thu Nov 23 17:20:33 2023

Am 23.11.2023 um 17:07 schrieb Richard Damon:

My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Bonita Montero on Thu Nov 23 12:50:50 2023

On 11/23/23 11:20 AM, Bonita Montero wrote:

Am 23.11.2023 um 17:07 schrieb Richard Damon:

My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

Yes, which has nothing to do with the question.

So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.

Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.

If there can't be a write "before" the second read, that wasn't also
also "after" the first read, then there is no requirement to refetch the
data. In relaxed memory orders, just being physically before isn't
enough to be "before", but you need some explicit "barrier" to establish it.

I will admit this isn't an area I consider myself an expert in, but I
find no words that prohibit the optimization. The implementation does
need to consider possible action by other "threads" but only as far a constrained by memory order, so two reads in the same ordering "slot"
are not forced.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to David Brown on Thu Nov 23 19:55:27 2023

David Brown <david.brown@hesbynett.no> writes:

On 23/11/2023 06:05, Chris M. Thomasson wrote:

On 11/22/2023 4:17 PM, red floyd wrote:

On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:

On 11/22/2023 8:35 AM, Bonita Montero wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>>> thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.

std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

You will probably find that compilers in practice will re-read "r" each
round of the loop, regardless of the memory order. I am not convinced
this would be required for "relaxed", but compilers generally do not
optimise atomics as much as they are allowed to. They are, as far as I
have seen in my far from comprehensive testing, treated as though
"atomic" implied "volatile".

Linux tends to apply the volatile qualifier on the access, rather
than the definition.

#define ACCESS_ONCE(x) (*(volatile __typeof__(x) *)&(x))

while (ACCESS_ONCE(r)) {
}

Makes it rather obvious when reading the code what the intent
is, and won't be affected of someone accidentially removes the
volatile qualifier from the declaration of r.

Works just fine in c++, too.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Bonita Montero on Thu Nov 23 19:56:27 2023

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Bonita Montero on Thu Nov 23 20:32:33 2023

On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the volatile one that the thread is accessing. Is that what you wanted?

BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".

I believe local functions in Pascal from 1971 can do this.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Fri Nov 24 05:27:24 2023

Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:

On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?

BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".

I believe local functions in Pascal from 1971 can do this.

I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Fri Nov 24 05:26:11 2023

Am 23.11.2023 um 20:56 schrieb Scott Lurndal:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

... with a nerd like you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Scott Lurndal on Thu Nov 23 21:47:03 2023

On 11/23/2023 11:56 AM, Scott Lurndal wrote:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

Wow, no shit Scott. Yikes!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Thu Nov 23 21:53:27 2023

On 11/23/2023 8:26 PM, Bonita Montero wrote:

Am 23.11.2023 um 20:56 schrieb Scott Lurndal:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

... with a nerd like you.

Huh? What does that even mean? Really, humm... ;^o

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Thu Nov 23 21:59:17 2023

On 11/23/2023 8:26 PM, Bonita Montero wrote:

Am 23.11.2023 um 20:56 schrieb Scott Lurndal:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:

std::atomic<size_t> r

The trick with my code

That's enough to fail a job interview....

... with a nerd like you.

Do you secretly like nerds? https://youtu.be/7dP1Vp1E-bo

lol!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Thu Nov 23 21:55:12 2023

On 11/23/2023 8:27 PM, Bonita Montero wrote:

Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:

On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
    constexpr size_t ROUNDS = 1'000'000;
    size_t volatile r = 1'000'000;
    jthread thr( [&]()
        {
            while( r )
                SleepEx( INFINITE, TRUE );
        } );
    for( size_t r = ROUNDS; r--; )

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

        QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the
volatile one
that the thread is accessing. Is that what you wanted?

BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".

I believe local functions in Pascal from 1971 can do this.

I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

Usually, wrong, or always right? humm...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Thu Nov 23 22:05:39 2023

On 11/23/2023 8:20 AM, Bonita Montero wrote:

Am 23.11.2023 um 17:07 schrieb Richard Damon:

My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?

So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Chris M. Thomasson on Thu Nov 23 22:53:24 2023

On 11/23/2023 10:05 PM, Chris M. Thomasson wrote:

On 11/23/2023 8:20 AM, Bonita Montero wrote:

Am 23.11.2023 um 17:07 schrieb Richard Damon:

My understanding is that std:atomic needs to honor a read in the
sense that it wlll get the most recent value that has happened
"before" the read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?

How many words in your cache lines, say l2?

So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Fri Nov 24 08:53:49 2023

Am 24.11.2023 um 07:05 schrieb Chris M. Thomasson:

humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?

Read that: https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Fri Nov 24 08:57:38 2023

Am 23.11.2023 um 18:50 schrieb Richard Damon:

Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.

I don't believe that, think about an atomic flag that is periodically
polled. The compiler shouldn't cache that value.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Fri Nov 24 00:16:07 2023

On 11/23/2023 11:57 PM, Bonita Montero wrote:

Am 23.11.2023 um 18:50 schrieb Richard Damon:

Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.

I don't believe that, think about an atomic flag that is periodically
polled. The compiler shouldn't cache that value.

std::atomic is going to work for such a flag. Depending on your setup,
it should be using std::memory_order_relaxed for the polling.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Fri Nov 24 09:30:15 2023

Am 24.11.2023 um 09:16 schrieb Chris M. Thomasson:

std::atomic is going to work for such a flag. Depending on your
setup, it should be using std::memory_order_relaxed for the polling.

There's also atomic_flag, but it has some limitations over atomic_bool
that I've never used it. You can set it only in conjunction with an
atomic read and I never had a use for that. And this relies on a atomic exchange, which costs a lot more than just a byte write.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Fri Nov 24 01:04:50 2023

On 11/24/2023 12:30 AM, Bonita Montero wrote:

Am 24.11.2023 um 09:16 schrieb Chris M. Thomasson:

std::atomic is going to work for such a flag. Depending on your
setup, it should be using std::memory_order_relaxed for the polling.

There's also atomic_flag, but it has some limitations over atomic_bool
that I've never used it. You can set it only in conjunction with an
atomic read and I never had a use for that. And this relies on a atomic exchange, which costs a lot more than just a byte write.

Fwiw, this flag should be aligned on a l2 cache line boundary, and
padded up to a l2 cache line size.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Chris M. Thomasson on Fri Nov 24 01:05:44 2023

On 11/24/2023 1:04 AM, Chris M. Thomasson wrote:

On 11/24/2023 12:30 AM, Bonita Montero wrote:

Am 24.11.2023 um 09:16 schrieb Chris M. Thomasson:

std::atomic is going to work for such a flag. Depending on your
setup, it should be using std::memory_order_relaxed for the polling.

There's also atomic_flag, but it has some limitations over atomic_bool
that I've never used it. You can set it only in conjunction with an
atomic read and I never had a use for that. And this relies on a atomic
exchange, which costs a lot more than just a byte write.

Fwiw, this flag should be aligned on a l2 cache line boundary, and
padded up to a l2 cache line size.

You can stuff a cache line with words, as long as you do not straddle a
cache line boundary... YIKES!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Fri Nov 24 01:06:47 2023

On 11/23/2023 11:53 PM, Bonita Montero wrote:

Am 24.11.2023 um 07:05 schrieb Chris M. Thomasson:

humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?

Read that: https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects

I know. Btw, what the hell happened to std::is_pod? ;^)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Fri Nov 24 10:10:13 2023

Am 24.11.2023 um 10:06 schrieb Chris M. Thomasson:

On 11/23/2023 11:53 PM, Bonita Montero wrote:

Am 24.11.2023 um 07:05 schrieb Chris M. Thomasson:

humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?

Read that:
https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects

I know. Btw, what the hell happened to std::is_pod? ;^)

PODs are also trivial but go beyond since you can copy them
with a memcpy():

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Fri Nov 24 01:14:04 2023

On 11/23/2023 8:20 AM, Bonita Montero wrote:

Am 23.11.2023 um 17:07 schrieb Richard Damon:

My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

How is that read atomic with 1kb of data? On what arch?

So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Scott Lurndal on Fri Nov 24 10:08:35 2023

On 23/11/2023 20:55, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 23/11/2023 06:05, Chris M. Thomasson wrote:

On 11/22/2023 4:17 PM, red floyd wrote:

On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:

On 11/22/2023 8:35 AM, Bonita Montero wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
     constexpr size_t ROUNDS = 1'000'000;
     size_t volatile r = 1'000'000;
     jthread thr( [&]()
         {
             while( r )
                 SleepEx( INFINITE, TRUE );
         } );
     for( size_t r = ROUNDS; r--; )
         QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}

std::atomic<size_t> r

I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)" >>>> loop can just spin.

std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?

You will probably find that compilers in practice will re-read "r" each
round of the loop, regardless of the memory order. I am not convinced
this would be required for "relaxed", but compilers generally do not
optimise atomics as much as they are allowed to. They are, as far as I
have seen in my far from comprehensive testing, treated as though
"atomic" implied "volatile".

Linux tends to apply the volatile qualifier on the access, rather
than the definition.

#define ACCESS_ONCE(x) (*(volatile __typeof__(x) *)&(x))

while (ACCESS_ONCE(r)) {
}

Makes it rather obvious when reading the code what the intent
is, and won't be affected of someone accidentially removes the
volatile qualifier from the declaration of r.

Works just fine in c++, too.

That is often my preference too, since it is the access that is
"volatile" - a "volatile object" is simply one for which all accesses
are "volatile".

For the pedants, it might be worth noting that the "cast to pointer to volatile" technique of ACCESS_ONCE is not actually guaranteed to be
treated as a volatile access in C until C17/C18 when the wording was
changed to talk about accesses via "volatile lvalues" rather than
accesses to objects declared as volatile. (When the topic was discussed
by the committee, everyone agreed that all known compiler vendors
treated "cast to pointer to volatile" accesses as volatile, so the
change was a formality rather than any practical difference.) I don't
know if and when this change was added to C++.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Richard Damon on Fri Nov 24 10:23:00 2023

On 23/11/2023 18:50, Richard Damon wrote:

On 11/23/23 11:20 AM, Bonita Montero wrote:

Am 23.11.2023 um 17:07 schrieb Richard Damon:

My understanding is that std:atomic needs to honor a read in the
sense that it wlll get the most recent value that has happened
"before" the read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

Yes, which has nothing to do with the question.

So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.

If there can't be a write "before" the second read, that wasn't also
also "after" the first read, then there is no requirement to refetch the data. In relaxed memory orders, just being physically before isn't
enough to be "before", but you need some explicit "barrier" to establish
it.

I will admit this isn't an area I consider myself an expert in, but I
find no words that prohibit the optimization. The implementation does
need to consider possible action by other "threads" but only as far a constrained by memory order, so two reads in the same ordering "slot"
are not forced.

That is exactly how I see it (I also do not consider myself an expert in
this area). I cannot see any requirement in the description of the
execution, covering sequencing, ordering, "happens before", and all the
rest, that suggests that the number of atomic accesses, or their order
amongst each other, or their order with respect to volatile accesses or non-volatile accesses, is forced to follow the source code except where
the atomics have specific sequencing. Atomic accesses are not
"volatile" - they are not, in themselves, "observable behaviour".

Because the the sequencing requirements for atomics depends partly on
things happening in other threads, compilers are much more limited in
how they can re-order or otherwise optimise atomic accesses than they
are for normal accesses (unless the compiler knows all about the other
threads too!). Compilers must be pessimistic about optimisation. But
for certain simple cases, such as multiple neighbouring atomic reads of
the same address or multiple neighbouring writes to the same address, I
can't see any reason why they cannot be combined.

(Again, I am not an expert here - and I will be happy to be corrected.
They say the best way to learn something on the internet is not by
asking questions, but by writing something that is wrong!)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Bonita Montero on Fri Nov 24 10:35:49 2023

On 24/11/2023 05:27, Bonita Montero wrote:

I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

You really believe that?

I think one of the (many) reasons people don't take you seriously is
that you never check your work. You invariably post code that is badly
wrong, followed by multiple replies to yourself making corrections and improvements. Every time you claim your code is bug-free, we know you
will follow up shortly with a bug fix. Every time you claim it is
"perfect", we know that you will follow it with an "improved" version ("perfect" and "improved" being in your opinion only).

Yes, people have noticed. Yes, people will continue to notice.

It's nice that you post code, however, as it can start some interesting discussions - before descending into a pantomime farce. But it might
make things a little better if you bothered to re-read your code before posting, or even try testing it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Bonita Montero on Fri Nov 24 18:27:37 2023

On 2023-11-24, Bonita Montero <Bonita.Montero@gmail.com> wrote:

Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:

On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?

QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?

BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".

I believe local functions in Pascal from 1971 can do this.

I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals, it
makes the execution abruptly call an unrelated function and then resume
at the interrupted point.

The main difference is that the signal has a number, which selects a
registered handler, rather than specifying a function directly.

Why I bring this up is that ISO C (since 1990, I think), has specified a
use of a "volatile sig_atomic_t" type in regard to asynchronous signal handlers. (Look it up.)

The use of volatile with interrupt-like mechanisms is nothing new.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Chris M. Thomasson on Fri Nov 24 15:32:54 2023

On 11/24/2023 1:14 AM, Chris M. Thomasson wrote:

On 11/23/2023 8:20 AM, Bonita Montero wrote:

Am 23.11.2023 um 17:07 schrieb Richard Damon:

My understanding is that std:atomic needs to honor a read in the
sense that it wlll get the most recent value that has happened
"before" the read (as determined by memory order).

... and the read is atomic - even if the trivial object is 1kB in size.

How is that read atomic with 1kb of data? On what arch?

Unless you atomically read a pointer that points to 1kB of memory.

So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...

An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Kaz Kylheku on Fri Nov 24 15:37:26 2023

On 11/24/2023 10:27 AM, Kaz Kylheku wrote:

On 2023-11-24, Bonita Montero <Bonita.Montero@gmail.com> wrote:

Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:

On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:

#include <Windows.h>
#include <thread>

using namespace std;

int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )

This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"? >>>

QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );

Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?

BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".

I believe local functions in Pascal from 1971 can do this.

I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.

Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals, it
makes the execution abruptly call an unrelated function and then resume
at the interrupted point.

The main difference is that the signal has a number, which selects a registered handler, rather than specifying a function directly.

Why I bring this up is that ISO C (since 1990, I think), has specified a
use of a "volatile sig_atomic_t" type in regard to asynchronous signal handlers. (Look it up.)

The use of volatile with interrupt-like mechanisms is nothing new.

After reading this, for some reason I am now thinking about signal safe
sync primitives in POSIX. Fwiw, certain pure lock/wait free algorithms
in signal handlers are okay.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Sat Nov 25 13:11:22 2023

Am 24.11.2023 um 19:27 schrieb Kaz Kylheku:

Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals,
it makes the execution abruptly call an unrelated function and then
resume at the interrupted point.

I don't think so because APCs only can interrupt threads in an alertable
mode. Signals can interrupt nearly any code and they have implications
on the compiler's ABI through defining the size of the red zone. So com-
pared to signals APCs are rather clean, nevertheless you can do a lot of interesting things with signals, as reported lately when I was informed
that mutexes with the glibc rely on signals; I gues it's the same when
a thread waits for a condition variable and a mutex at once.

The main difference is that the signal has a number, which selects
a registered handler, rather than specifying a function directly.

The ugly thing with synchronous signals is that the signal handler is
global for all threads. You can concatenate them but the next signal
handler in the chain may be in a shared object already unloaded. I
think this should be corrected my making synchronous signals' handlers thread-specific.

The use of volatile with interrupt-like mechanisms is nothing new.

I think this pattern doesn't happen very often since it's rare that
a signal shares state with the interrupted code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris M. Thomasson@21:1/5 to Bonita Montero on Sat Nov 25 14:46:18 2023

On 11/25/2023 4:11 AM, Bonita Montero wrote:

Am 24.11.2023 um 19:27 schrieb Kaz Kylheku:

Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals,
it makes the execution abruptly call an unrelated function and then
resume at the interrupted point.

I don't think so because APCs only can interrupt threads in an alertable mode. Signals can interrupt nearly any code and they have implications
on the compiler's ABI through defining the size of the red zone. So com- pared to signals APCs are rather clean, nevertheless you can do a lot of interesting things with signals, as reported lately when I was informed
that mutexes with the glibc rely on signals; I gues it's the same when
a thread waits for a condition variable and a mutex at once.

The main difference is that the signal has a number, which selects
a registered handler, rather than specifying a function directly.

The ugly thing with synchronous signals is that the signal handler is
global for all threads. You can concatenate them but the next signal
handler in the chain may be in a shared object already unloaded. I
think this should be corrected my making synchronous signals' handlers thread-specific.

The use of volatile with interrupt-like mechanisms is nothing new.

I think this pattern doesn't happen very often since it's rare that
a signal shares state with the interrupted code.

You might be interested in how pthreads-win32 handles async thread cancellation. Iirc, it uses a kernel module. It's hackish, but interesting.

https://sourceware.org/pthreads-win32/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Mueller@21:1/5 to All on Fri Dec 1 08:50:04 2023

Am 23.11.23 um 06:06 schrieb Chris M. Thomasson:

Afaict, std::atomic should imply volatile? Right? If not, please correct
me!

In practice yes. But is this required by the standard? I could not find
any hint. Strictly speaking it is still required.

In fact memory ordering does not guarantee any particular time when the
change appears at another thread. So there is always some delay. But
could it be infinite? So the compiler could cache anything if no other
memory access according to the memory barrier is generated by the code?
This applies to read and write.
But I think it is almost impossible to write any reasonable code, that
causes no other memory access that forces the atomic value to be read or written.

Marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Centurion
  Tue Jun 10 04:44:57 2025
  from Berea, Ohio via Telnet
- Bob Worm
  Mon Jun 9 23:08:44 2025
  from Wales, Uk via Telnet
- Bob Worm
  Mon Jun 9 21:49:37 2025
  from Wales, Uk via Telnet
- Plume
  Mon Jun 9 20:39:48 2025
  from Uk via SSH
- Michal Wronka
  Mon Jun 9 19:31:41 2025
  from Wroclaw, Poland via Telnet
- Driswillis156
  Sun Jun 8 22:29:00 2025
  from Nope via SSH
- Bob Worm
  Sun Jun 8 21:04:22 2025
  from Wales, Uk via Telnet
- Logan
  Sun Jun 8 15:24:00 2025
  from Adelaide via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	489
Nodes:	16 (2 / 14)
Uptime:	41:09:28
Calls:	9,670
Calls today:	1
Files:	13,716
Messages:	6,169,728

Why volatile may make sense for parallel code today.

Who's Online

Recent Visitors

System Info