#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, thr.native_handle(), (ULONG_PTR)&r );
}
On 11/22/2023 8:35 AM, Bonita Montero wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}
std::atomic<size_t> r
On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
On 11/22/2023 8:35 AM, Bonita Montero wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>> thr.native_handle(), (ULONG_PTR)&r );
}
std::atomic<size_t> r
I'm confused. Does std:atomic imply "do not optimize access to this variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.
On 11/22/2023 4:17 PM, red floyd wrote:
On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
On 11/22/2023 8:35 AM, Bonita Montero wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>> thr.native_handle(), (ULONG_PTR)&r );
}
std::atomic<size_t> r
I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.
std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?
std::atomic<size_t> r
On 2023-11-23, Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:
std::atomic<size_t> r
The trick with my code is that the APC function is executed in the same
thread context as the function rpeatedly probing r as an end-indicator,
So what is "parallel" doing your subject line?
Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:
std::atomic<size_t> r
The trick with my code is that the APC function is executed in the same thread context as the function rpeatedly probing r as an end-indicator,
On 11/22/2023 4:17 PM, red floyd wrote:
On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
On 11/22/2023 8:35 AM, Bonita Montero wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>> thr.native_handle(), (ULONG_PTR)&r );
}
std::atomic<size_t> r
I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.
std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?
I am not convinced ....
... but compilers generally do not optimise atomics as much as they are allowed to.
Interestingly there's some framework-code calling the function object
which I use for the thread code that brings the thread into an alertable state so that the loop never loops because r is already zero.
On 11/22/2023 4:17 PM, red floyd wrote:
On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
On 11/22/2023 8:35 AM, Bonita Montero wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>> thr.native_handle(), (ULONG_PTR)&r );
}
std::atomic<size_t> r
I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.
std::atomic should honor a read, when you read it even from std::memory_order_relaxed. If not, imvvvvhhooo, its broken?
My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...
Am 23.11.2023 um 17:07 schrieb Richard Damon:
My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).
... and the read is atomic - even if the trivial object is 1kB in size.
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...
An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.
On 23/11/2023 06:05, Chris M. Thomasson wrote:
On 11/22/2023 4:17 PM, red floyd wrote:
On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
On 11/22/2023 8:35 AM, Bonita Montero wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, >>>>> thr.native_handle(), (ULONG_PTR)&r );
}
std::atomic<size_t> r
I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)"
loop can just spin.
std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?
You will probably find that compilers in practice will re-read "r" each
round of the loop, regardless of the memory order. I am not convinced
this would be required for "relaxed", but compilers generally do not
optimise atomics as much as they are allowed to. They are, as far as I
have seen in my far from comprehensive testing, treated as though
"atomic" implied "volatile".
Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:
std::atomic<size_t> r
The trick with my code
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; }, thr.native_handle(), (ULONG_PTR)&r );
On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?
BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".
I believe local functions in Pascal from 1971 can do this.
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:
std::atomic<size_t> r
The trick with my code
That's enough to fail a job interview....
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:
std::atomic<size_t> r
The trick with my code
That's enough to fail a job interview....
Am 23.11.2023 um 20:56 schrieb Scott Lurndal:
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:
std::atomic<size_t> r
The trick with my code
That's enough to fail a job interview....
... with a nerd like you.
Am 23.11.2023 um 20:56 schrieb Scott Lurndal:
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 22.11.2023 um 22:22 schrieb Chris M. Thomasson:
std::atomic<size_t> r
The trick with my code
That's enough to fail a job interview....
... with a nerd like you.
Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:
On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
Thus, this takes the address of the for loop's r variable, not the
volatile one
that the thread is accessing. Is that what you wanted?
BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".
I believe local functions in Pascal from 1971 can do this.
I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.
Am 23.11.2023 um 17:07 schrieb Richard Damon:
My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).
... and the read is atomic - even if the trivial object is 1kB in size.
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...
An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.
On 11/23/2023 8:20 AM, Bonita Montero wrote:
Am 23.11.2023 um 17:07 schrieb Richard Damon:
My understanding is that std:atomic needs to honor a read in the
sense that it wlll get the most recent value that has happened
"before" the read (as determined by memory order).
... and the read is atomic - even if the trivial object is 1kB in size.
humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...
An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.
humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?
Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.
Am 23.11.2023 um 18:50 schrieb Richard Damon:
Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.
I don't believe that, think about an atomic flag that is periodically
polled. The compiler shouldn't cache that value.
std::atomic is going to work for such a flag. Depending on your
setup, it should be using std::memory_order_relaxed for the polling.
Am 24.11.2023 um 09:16 schrieb Chris M. Thomasson:
std::atomic is going to work for such a flag. Depending on your
setup, it should be using std::memory_order_relaxed for the polling.
There's also atomic_flag, but it has some limitations over atomic_bool
that I've never used it. You can set it only in conjunction with an
atomic read and I never had a use for that. And this relies on a atomic exchange, which costs a lot more than just a byte write.
On 11/24/2023 12:30 AM, Bonita Montero wrote:
Am 24.11.2023 um 09:16 schrieb Chris M. Thomasson:
std::atomic is going to work for such a flag. Depending on your
setup, it should be using std::memory_order_relaxed for the polling.
There's also atomic_flag, but it has some limitations over atomic_bool
that I've never used it. You can set it only in conjunction with an
atomic read and I never had a use for that. And this relies on a atomic
exchange, which costs a lot more than just a byte write.
Fwiw, this flag should be aligned on a l2 cache line boundary, and
padded up to a l2 cache line size.
Am 24.11.2023 um 07:05 schrieb Chris M. Thomasson:
humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?
Read that: https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects
On 11/23/2023 11:53 PM, Bonita Montero wrote:
Am 24.11.2023 um 07:05 schrieb Chris M. Thomasson:
humm.. Say, the read is from a word in memory. Define your trivial
object, POD, l2 cache line sized, and aligned on a l2 cache line
boundary? Are you refering to how certain arch works?
Read that:
https://stackoverflow.com/questions/61329240/what-is-the-difference-between-trivial-and-non-trivial-objects
I know. Btw, what the hell happened to std::is_pod? ;^)
Am 23.11.2023 um 17:07 schrieb Richard Damon:
My understanding is that std:atomic needs to honor a read in the sense
that it wlll get the most recent value that has happened "before" the
read (as determined by memory order).
... and the read is atomic - even if the trivial object is 1kB in size.
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...
An atomic doesn't cache repeatable reads. The order memory-consistency parameter is just for the ordering of other reads and writes.
David Brown <david.brown@hesbynett.no> writes:
On 23/11/2023 06:05, Chris M. Thomasson wrote:
On 11/22/2023 4:17 PM, red floyd wrote:
On 11/22/2023 1:22 PM, Chris M. Thomasson wrote:
On 11/22/2023 8:35 AM, Bonita Montero wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
}
std::atomic<size_t> r
I'm confused. Does std:atomic imply "do not optimize access to this
variable"? Because if it doesn't, then I can see how the "while (r)" >>>> loop can just spin.
std::atomic should honor a read, when you read it even from
std::memory_order_relaxed. If not, imvvvvhhooo, its broken?
You will probably find that compilers in practice will re-read "r" each
round of the loop, regardless of the memory order. I am not convinced
this would be required for "relaxed", but compilers generally do not
optimise atomics as much as they are allowed to. They are, as far as I
have seen in my far from comprehensive testing, treated as though
"atomic" implied "volatile".
Linux tends to apply the volatile qualifier on the access, rather
than the definition.
#define ACCESS_ONCE(x) (*(volatile __typeof__(x) *)&(x))
while (ACCESS_ONCE(r)) {
}
Makes it rather obvious when reading the code what the intent
is, and won't be affected of someone accidentially removes the
volatile qualifier from the declaration of r.
Works just fine in c++, too.
On 11/23/23 11:20 AM, Bonita Montero wrote:
Am 23.11.2023 um 17:07 schrieb Richard Damon:
My understanding is that std:atomic needs to honor a read in the
sense that it wlll get the most recent value that has happened
"before" the read (as determined by memory order).
... and the read is atomic - even if the trivial object is 1kB in size.
Yes, which has nothing to do with the question.
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...
An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.
Yes, the atomic itself doesn't "cache" the data, but as far as I read,
there is no requirement to refetch the data if the code still has the
old value around, and it hasn't been invalidated by possible memory
ordering.
If there can't be a write "before" the second read, that wasn't also
also "after" the first read, then there is no requirement to refetch the data. In relaxed memory orders, just being physically before isn't
enough to be "before", but you need some explicit "barrier" to establish
it.
I will admit this isn't an area I consider myself an expert in, but I
find no words that prohibit the optimization. The implementation does
need to consider possible action by other "threads" but only as far a constrained by memory order, so two reads in the same ordering "slot"
are not forced.
I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.
Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:
On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"?
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?
BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".
I believe local functions in Pascal from 1971 can do this.
I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.
On 11/23/2023 8:20 AM, Bonita Montero wrote:
Am 23.11.2023 um 17:07 schrieb Richard Damon:
My understanding is that std:atomic needs to honor a read in the
sense that it wlll get the most recent value that has happened
"before" the read (as determined by memory order).
... and the read is atomic - even if the trivial object is 1kB in size.
How is that read atomic with 1kb of data? On what arch?
So, if nothing in the loop can establish a time order with respect to
other threads, then it should be allowed for the compiler to optimize
out the read. ...
An atomic doesn't cache repeatable reads. The order memory-consistency
parameter is just for the ordering of other reads and writes.
On 2023-11-24, Bonita Montero <Bonita.Montero@gmail.com> wrote:
Am 23.11.2023 um 21:32 schrieb Kaz Kylheku:
On 2023-11-22, Bonita Montero <Bonita.Montero@gmail.com> wrote:
#include <Windows.h>
#include <thread>
using namespace std;
int main()
{
constexpr size_t ROUNDS = 1'000'000;
size_t volatile r = 1'000'000;
jthread thr( [&]()
{
while( r )
SleepEx( INFINITE, TRUE );
} );
for( size_t r = ROUNDS; r--; )
This shadows the r variable. Did you mean "for (size_t i = ROUNDS; i--)"? >>>
QueueUserAPC( (PAPCFUNC)[]( auto p ) { --*(size_t*)p; },
thr.native_handle(), (ULONG_PTR)&r );
Thus, this takes the address of the for loop's r variable, not the volatile one
that the thread is accessing. Is that what you wanted?
BTW, is the C++ lambda too broken to access the r via lexical scoping?
Why can't the APC just do "--r".
I believe local functions in Pascal from 1971 can do this.
I already corrected that with my code and I guessed no one will notice
that here; usually I'm right with that.
Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals, it
makes the execution abruptly call an unrelated function and then resume
at the interrupted point.
The main difference is that the signal has a number, which selects a registered handler, rather than specifying a function directly.
Why I bring this up is that ISO C (since 1990, I think), has specified a
use of a "volatile sig_atomic_t" type in regard to asynchronous signal handlers. (Look it up.)
The use of volatile with interrupt-like mechanisms is nothing new.
Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals,
it makes the execution abruptly call an unrelated function and then
resume at the interrupted point.
The main difference is that the signal has a number, which selects
a registered handler, rather than specifying a function directly.
The use of volatile with interrupt-like mechanisms is nothing new.
Am 24.11.2023 um 19:27 schrieb Kaz Kylheku:
Anyway, this APC mechanism is quite similar to signal handling.
Particularly asynchronous signal handling. Just like POSIX signals,
it makes the execution abruptly call an unrelated function and then
resume at the interrupted point.
I don't think so because APCs only can interrupt threads in an alertable mode. Signals can interrupt nearly any code and they have implications
on the compiler's ABI through defining the size of the red zone. So com- pared to signals APCs are rather clean, nevertheless you can do a lot of interesting things with signals, as reported lately when I was informed
that mutexes with the glibc rely on signals; I gues it's the same when
a thread waits for a condition variable and a mutex at once.
The main difference is that the signal has a number, which selects
a registered handler, rather than specifying a function directly.
The ugly thing with synchronous signals is that the signal handler is
global for all threads. You can concatenate them but the next signal
handler in the chain may be in a shared object already unloaded. I
think this should be corrected my making synchronous signals' handlers thread-specific.
The use of volatile with interrupt-like mechanisms is nothing new.
I think this pattern doesn't happen very often since it's rare that
a signal shares state with the interrupted code.
Afaict, std::atomic should imply volatile? Right? If not, please correct
me!
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 489 |
Nodes: | 16 (2 / 14) |
Uptime: | 41:09:28 |
Calls: | 9,670 |
Calls today: | 1 |
Files: | 13,716 |
Messages: | 6,169,728 |