• thread_local initialization

    From jseigh@21:1/5 to All on Fri Nov 22 07:42:54 2024
    Apparently class type thread_local variable with are initialized
    dynamically, not at load time. This means every time the
    thread_local variable is accessed, the code checks to see if
    the variable needs initialization. This doesn't appear to be
    the case for native types.

    This caused the c++ version of smrproxy to go from about
    0.6 nanoseconds for a lock()/unlock() operation in c to
    about 3.2 nanoseconds.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to jseigh on Fri Nov 22 12:10:50 2024
    On 11/22/24 07:42, jseigh wrote:
    Apparently class type thread_local variable with are initialized
    dynamically, not at load time.  This means every time the
    thread_local variable is accessed, the code checks to see if
    the variable needs initialization.  This doesn't appear to be
    the case for native types.

    This caused the c++ version of  smrproxy to go from about
    0.6 nanoseconds for a lock()/unlock() operation in c to
    about 3.2 nanoseconds.

    And if the thread_local isn't accessed, no object is created and
    thus no dtor is run, in case anyone wonders why dtors seem to not
    always run on thread locals. Confirmed via testcase.

    I verified the checks by a timing loop, 100,000,000. The object
    may only be created once but it checks on every single access.
    Also the generated code confirms that as well.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Fri Nov 22 19:21:59 2024
    On 11/22/24 16:05, Chris M. Thomasson wrote:
    On 11/22/2024 4:42 AM, jseigh wrote:
    Apparently class type thread_local variable with are initialized
    dynamically, not at load time.  This means every time the
    thread_local variable is accessed, the code checks to see if
    the variable needs initialization.  This doesn't appear to be
    the case for native types.

    This caused the c++ version of  smrproxy to go from about
    0.6 nanoseconds for a lock()/unlock() operation in c to
    about 3.2 nanoseconds.

    Shit. Humm...


    thread_local struct ct_per_thread* ct_g_per_thread = nullptr;


    ct_per_thread*
    proxy_register_thread()
    {
        if (! ct_g_per_thread)
        {
            thread_local ct_per_thread l_ct_per_thread(_whatever_);
            ct_g_per_thread = &l_ct_per_thread;
        }

        return ct_g_per_thread;
    }


    void proxy_lock()
    {
        ct_per_thread* per_thread = ct_g_per_thread;
        assert(per_thread);
    }


    void proxy_unlock()
    {
        ct_per_thread* per_thread = ct_g_per_thread;
        assert(per_thread);
    }


    If those asserts trip then it means that proxy_register_thread was not
    called before them. Humm... Change the API to accept a pointer to a ct_per_thread... ;^)

    void proxy_lock(ct_per_thread* per_thread)
    {

    }


    void proxy_unlock(ct_per_thread* per_thread)
    {

    }


    void test()
    {
        ct_per_thread* per_thread = proxy_register_thread();

        for (unsigned long i = 0; i < 1000000; ++i)
        {
            proxy_lock(per_thread);
                //... do you thing!
            proxy_unlock(per_thread);
        }
    }

    The dtor of ct_per_thread would set ct_g_per_thread to 0?

    For any registered thread, ct_g_per_thread is valid can can be accessed
    in any function it calls.

    Is that crap, or kind of crap?

    It should work okay.

    An object would have dtors. Pointers to objects don't have dtors.
    I was using unique_pointer so a dtor would run on the contained
    object pointer. Unfortunately, the runtime check on every access
    is too much. So now I have 2 thread locals.
    unique_pointer<ref> to run the delete on the ref pointer and
    ref* to access the ref pointer w/o the runtime check

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)