Forum: >>> Magnum BBS <<<

Re: C++20 futex with heavy contention slower than mutex

From Michael S@21:1/5 to Bonita Montero on Mon Apr 1 12:18:00 2024

On Fri, 29 Mar 2024 14:14:11 +0100
Bonita Montero <Bonita.Montero@gmail.com> wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
Linux.

In case of heavy contention what to consider 'faster' is not at all
obvious.

On lightly loaded system with more cores than work to do (a typical
client) 'faster' means faster forward progress of group of contending
threads. Achieved by very long polling before switching to wait,
probably up to several tens of usec and by hyperactive tickless OS
scheduler.

On heavily loaded system with much more work to do than available
cores, 'faster' means more work done by unrelated threads and processes. Achieved by very short polling before switching to wait, probably less
than 500 nsec and by 'passive' OS scheduler that rarely intervenes
outside of clock tick.

And of course there are cases in the middle.

And then traditional HPC with MPI that is completely different kettle of
fish.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Bonita Montero on Thu Apr 18 22:03:12 2024

On 4/18/24 9:12 AM, Bonita Montero wrote:

Am 16.04.2024 um 23:14 schrieb Chris M. Thomasson:

Well, futex notify might have fast paths in and of itself. To be
prudent I would need to see how they implement it to allow a futex
notify by, every time. Fair enough?

I'm asking myself if it would be possible to have context-switching as
most as possible in userspace. If there would be a context-switch from
one thread of a process to another thread because a timeslice expired
the kernel should send a signal to the thread and the thread does the userspace context-switch by itself. Only if there's a context switch
to another process' thread or in kernel mode the kernel's scheduler
acts itself.
This would give the opportunity to have voluntary context switches
when doing locking much faster than trough the kernel, and voluntary
context switches usually happen with a much higher frequency that
there would be a real gain.
With Linux this would be possible trough signals and on Windows the
kernel could induce SEH-exceptions for a thread-switch.

How do you "signal" a user-thread without doing a kernel operation and a
thread switch?

Admittedly, if the kernel knows it is switching from one thread to
another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Bonita Montero on Fri Apr 19 13:41:36 2024

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 04:03 schrieb Richard Damon:

Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.

A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.

SVR4.2MP implemented a M-N thread model (M user threads mapped to
N kernel threads). Turned out not to work well.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Bonita Montero on Fri Apr 19 14:48:44 2024

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 15:41 schrieb Scott Lurndal:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 04:03 schrieb Richard Damon:

Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations. >>>

A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.

SVR4.2MP implemented a M-N thread model (M user threads mapped to
N kernel threads). Turned out not to work well.

The thing that I'm imaging is still 1:1 but if threads are in userspace >thread-switching would be done by the userspace.

Feel free to prototype it using setcontext(2), getcontext(2) and makecontext(2).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Bonita Montero on Fri Apr 19 18:27:14 2024

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

Feel free to prototype it using setcontext(2), getcontext(2) and
makecontext(2).

I'd need the support of the kernel which should not make context
switches to another thread inside the same process if the thread
is within userspace. And the kernel should have to periodically
inject signals from the timer interrupt to userspace to make it
possible that the userspace-code does the involuntary context
-switch on its own. And I'd need synchronization-primitives like
mutexes and semaphores that would do the otherwise costly context
-switch in userspace; but that's rather easy compared to the kernel
support.

https://www.kernel.org/

Feel free to modify the kernel to your heart's content.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Bonita Montero on Fri Apr 19 23:00:18 2024

On 4/19/24 12:18 AM, Bonita Montero wrote:

Am 19.04.2024 um 04:03 schrieb Richard Damon:

How do you "signal" a user-thread without doing a kernel operation
and a thread switch?

The signals for the involuntary userspace thread-switch would be sent
by a dedicated kernel-thread or by the timer interrupt. This would be
more expensive than a thread-switch through the timer interrupt but
as voluntary thread-switches have a much higher frequency this would
be outweighed.

TO WHAT?

Are you going to reserve a core with a dedicated thread to do this?

To "interrupt" a user thread to notify it, you would either need to
perform a context switch to save the threads previous context or make
the interrupt non-returnable. If you are going to context switch to the notification thread, you might as well switch the the new user-thread
that you want to go to.

Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.

A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.

But you still had the kernal doing a context switch.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Bonita Montero on Sat Apr 20 15:16:49 2024

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 20:27 schrieb Scott Lurndal:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

Feel free to prototype it using setcontext(2), getcontext(2) and
makecontext(2).

I'd need the support of the kernel which should not make context
switches to another thread inside the same process if the thread
is within userspace. And the kernel should have to periodically
inject signals from the timer interrupt to userspace to make it
possible that the userspace-code does the involuntary context
-switch on its own. And I'd need synchronization-primitives like
mutexes and semaphores that would do the otherwise costly context
-switch in userspace; but that's rather easy compared to the kernel
support.

https://www.kernel.org/

Feel free to modify the kernel to your heart's content.

Seems you don't understand the idead and you think this isn't
possible.

It seems the lack of understanding is on you. Du verstehst es nicht.

"... should not make context switches to another thread inside
the same process if the thread is within userspace".

It seems clear that the thread within userspace is completely
invisible to the kernel, thus it cannot by definition switch to it.

It's not that I don't think it is possible, I just don't think
it provides any measurable benefit for the added complexity.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Gretchiie
  Wed Sep 17 08:54:03 2025
  from Derry, Nh via Telnet
- Bob Worm
  Wed Sep 17 08:43:18 2025
  from Wales, Uk via Telnet
- Bob Worm
  Wed Sep 17 08:14:37 2025
  from Wales, Uk via Telnet
- Volatile_Memory
  Wed Sep 17 07:20:57 2025
  from Des Moines, Iowa via SSH
- Volatile_Memory
  Wed Sep 17 07:17:26 2025
  from Des Moines, Iowa via SSH
- Bob Worm
  Tue Sep 16 21:01:27 2025
  from Wales, Uk via Telnet
- Bob Worm
  Tue Sep 16 15:15:42 2025
  from Wales, Uk via Telnet
- Gretchiie
  Tue Sep 16 05:20:21 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	57:10:02
Calls:	10,397
Calls today:	5
Files:	14,067
Messages:	6,417,448
Posted today:	1

Re: C++20 futex with heavy contention slower than mutex

Who's Online

Recent Visitors

System Info