• Re: C++20 futex with heavy contention slower than mutex

    From Michael S@21:1/5 to Bonita Montero on Mon Apr 1 12:18:00 2024
    On Fri, 29 Mar 2024 14:14:11 +0100
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
    Linux.


    In case of heavy contention what to consider 'faster' is not at all
    obvious.

    On lightly loaded system with more cores than work to do (a typical
    client) 'faster' means faster forward progress of group of contending
    threads. Achieved by very long polling before switching to wait,
    probably up to several tens of usec and by hyperactive tickless OS
    scheduler.

    On heavily loaded system with much more work to do than available
    cores, 'faster' means more work done by unrelated threads and processes. Achieved by very short polling before switching to wait, probably less
    than 500 nsec and by 'passive' OS scheduler that rarely intervenes
    outside of clock tick.

    And of course there are cases in the middle.

    And then traditional HPC with MPI that is completely different kettle of
    fish.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Bonita Montero on Thu Apr 18 22:03:12 2024
    On 4/18/24 9:12 AM, Bonita Montero wrote:
    Am 16.04.2024 um 23:14 schrieb Chris M. Thomasson:

    Well, futex notify might have fast paths in and of itself. To be
    prudent I would need to see how they implement it to allow a futex
    notify by, every time. Fair enough?

    I'm asking myself if it would be possible to have context-switching as
    most as possible in userspace. If there would be a context-switch from
    one thread of a process to another thread because a timeslice expired
    the kernel should send a signal to the thread and the thread does the userspace context-switch by itself. Only if there's a context switch
    to another process' thread or in kernel mode the kernel's scheduler
    acts itself.
    This would give the opportunity to have voluntary context switches
    when doing locking much faster than trough the kernel, and voluntary
    context switches usually happen with a much higher frequency that
    there would be a real gain.
    With Linux this would be possible trough signals and on Windows the
    kernel could induce SEH-exceptions for a thread-switch.


    How do you "signal" a user-thread without doing a kernel operation and a
    thread switch?

    Admittedly, if the kernel knows it is switching from one thread to
    another in the same process it can do a lighter weight sort of
    context-switch, but it still needs to deal with kernel space operations.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Bonita Montero on Fri Apr 19 13:41:36 2024
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 04:03 schrieb Richard Damon:

    Admittedly, if the kernel knows it is switching from one thread
    to another in the same process it can do a lighter weight sort of
    context-switch, but it still needs to deal with kernel space operations.

    A context switch through the kernel is always expensive. A user
    -level thread switch when blocking for a lock would be much faster.

    SVR4.2MP implemented a M-N thread model (M user threads mapped to
    N kernel threads). Turned out not to work well.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Bonita Montero on Fri Apr 19 14:48:44 2024
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 15:41 schrieb Scott Lurndal:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 04:03 schrieb Richard Damon:

    Admittedly, if the kernel knows it is switching from one thread
    to another in the same process it can do a lighter weight sort of
    context-switch, but it still needs to deal with kernel space operations. >>>
    A context switch through the kernel is always expensive. A user
    -level thread switch when blocking for a lock would be much faster.

    SVR4.2MP implemented a M-N thread model (M user threads mapped to
    N kernel threads). Turned out not to work well.

    The thing that I'm imaging is still 1:1 but if threads are in userspace >thread-switching would be done by the userspace.

    Feel free to prototype it using setcontext(2), getcontext(2) and makecontext(2).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Bonita Montero on Fri Apr 19 18:27:14 2024
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

    Feel free to prototype it using setcontext(2), getcontext(2) and
    makecontext(2).

    I'd need the support of the kernel which should not make context
    switches to another thread inside the same process if the thread
    is within userspace. And the kernel should have to periodically
    inject signals from the timer interrupt to userspace to make it
    possible that the userspace-code does the involuntary context
    -switch on its own. And I'd need synchronization-primitives like
    mutexes and semaphores that would do the otherwise costly context
    -switch in userspace; but that's rather easy compared to the kernel
    support.

    https://www.kernel.org/

    Feel free to modify the kernel to your heart's content.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Bonita Montero on Fri Apr 19 23:00:18 2024
    On 4/19/24 12:18 AM, Bonita Montero wrote:
    Am 19.04.2024 um 04:03 schrieb Richard Damon:

    How do you "signal" a user-thread without doing a kernel operation
    and a thread switch?

    The signals for the involuntary userspace thread-switch would be sent
    by a dedicated kernel-thread or by the timer interrupt. This would be
    more expensive than a thread-switch through the timer interrupt but
    as voluntary thread-switches have a much higher frequency this would
    be outweighed.

    TO WHAT?

    Are you going to reserve a core with a dedicated thread to do this?

    To "interrupt" a user thread to notify it, you would either need to
    perform a context switch to save the threads previous context or make
    the interrupt non-returnable. If you are going to context switch to the notification thread, you might as well switch the the new user-thread
    that you want to go to.


    Admittedly, if the kernel knows it is switching from one thread
    to another in the same process it can do a lighter weight sort of
    context-switch, but it still needs to deal with kernel space operations.

    A context switch through the kernel is always expensive. A user
    -level thread switch when blocking for a lock would be much faster.

    But you still had the kernal doing a context switch.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Bonita Montero on Sat Apr 20 15:16:49 2024
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 20:27 schrieb Scott Lurndal:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

    Feel free to prototype it using setcontext(2), getcontext(2) and
    makecontext(2).

    I'd need the support of the kernel which should not make context
    switches to another thread inside the same process if the thread
    is within userspace. And the kernel should have to periodically
    inject signals from the timer interrupt to userspace to make it
    possible that the userspace-code does the involuntary context
    -switch on its own. And I'd need synchronization-primitives like
    mutexes and semaphores that would do the otherwise costly context
    -switch in userspace; but that's rather easy compared to the kernel
    support.

    https://www.kernel.org/

    Feel free to modify the kernel to your heart's content.

    Seems you don't understand the idead and you think this isn't
    possible.


    It seems the lack of understanding is on you. Du verstehst es nicht.

    "... should not make context switches to another thread inside
    the same process if the thread is within userspace".

    It seems clear that the thread within userspace is completely
    invisible to the kernel, thus it cannot by definition switch to it.

    It's not that I don't think it is possible, I just don't think
    it provides any measurable benefit for the added complexity.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)