• Atomic caching of smart pointers

    From Paavo Helde@21:1/5 to All on Sun Sep 15 21:54:50 2024
    I am thinking of developing some lock-free data structures for better
    scaling on multi-core hardware and avoiding potential deadlocks. In
    particular, I have got a lot of classes which are mostly immutable after construction, except for some cached data members which are calculated
    on demand only, then stored in the object for later use.

    Caching single numeric values is easy. However, some cached data is
    large and accessed via a std::shared_ptr type refcounted smartpointers. Updating such a smartpointer in a thread-shared object is a bit more
    tricky. There is a std::atomic<std::shared_ptr> in C++20, but I wonder
    if I can do a bit better by providing my own implementation which uses
    CAS on a single pointer (instead of DCAS with additional data fields or
    other trickery).

    This is assuming that

    a) the cached value will not change any more after assigned, and will
    stay intact until the containing object destruction;

    b) it's ok if multiple threads calculate the value at the same time; the
    first one stored will be the one which gets used.

    My current prototype code is as follows (Ptr<T> is similar to std::shared_ptr<T>, but is using an internal atomic refcounter; using an internal counter allows me to generate additional smartpointers from a
    raw pointer).

    template<typename T>
    class CachedAtomicPtr {
    public:
    CachedAtomicPtr(): ptr_(nullptr) {}

    /// Store p in *this if *this is not yet assigned.
    /// Return pointer stored in *this, which can be \a p or not.
    Ptr<T> AssignIfNull(Ptr<T> p) {
    const T* other = nullptr;
    if (ptr_.compare_exchange_weak(other, p.get(), std::memory_order_release, std::memory_order_acquire)) {
    p->IncrementRefcount();
    return p;
    } else {
    // wrap in an extra smartptr (increments refcount)
    return Ptr<T>(other);
    }
    }

    /// Return pointer stored in *this,
    Ptr<T> Load() const {
    return Ptr<T>(ptr_);
    }

    ~CachedAtomicPtr() {
    if (const T* ptr = ptr_) {
    ptr->DecrementRefcount();
    }
    }
    private:
    std::atomic<const T*> ptr_;
    };

    Example usage:

    /// Objects of this class are in shared use by multiple threads.
    class A {
    // Returns B corresponding to the value of *this.
    // If not yet in cache, B is calculated and cached in *this.
    // Calculating can happen in multiple threads in parallel,
    // the first cached result will be used in all threads.
    Ptr<B> GetOrCalcB() const {
    Ptr<B> b = cached_.Load();
    if (!b) {
    b = cached_.AssignIfNull(CalcB());
    }
    return b;
    }
    // ...
    private:
    // Calculates cached B object according to value of *this.
    Ptr<B> CalcB() const;
    private:
    mutable CachedAtomicPtr<B> cached_;
    // ... own data ...
    };

    So, what do you think? Should I just use std::atomic<std::shared_ptr>
    instead? Any other suggestions? Did I get the memory order parameters
    right in compare_exchange_weak()?

    TIA
    Paavo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paavo Helde@21:1/5 to Chris M. Thomasson on Mon Sep 16 09:01:47 2024
    On 15.09.2024 23:15, Chris M. Thomasson wrote:
    On 9/15/2024 11:54 AM, Paavo Helde wrote:
    [...]
    So, what do you think? Should I just use std::atomic<std::shared_ptr>
    instead? Any other suggestions? Did I get the memory order parameters
    right in compare_exchange_weak()?

    Keep in mind that you need to make sure that
    std::atomic<std::shared_ptr> is actually lock-free...

    Make sure to investigate is_always_lock_free on your various arch's:

    https://en.cppreference.com/w/cpp/atomic/atomic/is_always_lock_free

    I already checked this, it is returning false at least on one of my
    target platforms (Visual Studio 2022, Windows x86_64). IIRC Bonita
    claimed this might be a false negative though.

    Meanwhile I think I found a bug in my posted code, I should probably use compare_exchange_strong() instead of compare_exchange_weak(). I somehow
    thought the latter would err on the safe side, but it does not. In my
    test harness there seems to be no difference.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Muttley@dastardlyhq.com@21:1/5 to All on Mon Sep 16 07:26:44 2024
    On Sun, 15 Sep 2024 21:54:50 +0300
    Paavo Helde <eesnimi@osa.pri.ee> boringly babbled:
    So, what do you think? Should I just use std::atomic<std::shared_ptr> >instead? Any other suggestions? Did I get the memory order parameters
    right in compare_exchange_weak()?

    Looks to me like you're simply protecting the pointers rather than the data they're actually pointing to but I only skim read it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paavo Helde@21:1/5 to Muttley@dastardlyhq.com on Mon Sep 16 11:59:25 2024
    On 16.09.2024 10:26, Muttley@dastardlyhq.com wrote:
    On Sun, 15 Sep 2024 21:54:50 +0300
    Paavo Helde <eesnimi@osa.pri.ee> boringly babbled:
    So, what do you think? Should I just use std::atomic<std::shared_ptr>
    instead? Any other suggestions? Did I get the memory order parameters
    right in compare_exchange_weak()?

    Looks to me like you're simply protecting the pointers rather than the data they're actually pointing to but I only skim read it.

    You are right, at the moment I am interested in protecting the pointers.

    The pointed data will be immutable after initial creation (except for
    potential refcounter updates inside it, but the refcounters are atomic
    anyway, so no problems there).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Mueller@21:1/5 to All on Mon Sep 16 18:23:26 2024
    Am 16.09.24 um 08:01 schrieb Paavo Helde:
    Meanwhile I think I found a bug in my posted code, I should probably use compare_exchange_strong() instead of compare_exchange_weak().

    Indeed. With ..._weak you most likely need a loop. But if you already
    have a loop for other purposes using the _weak function could avoid an additional loop in the implementation of the function.

    I somehow
    thought the latter would err on the safe side, but it does not. In my
    test harness there seems to be no difference.

    On many platforms the underlying CAS primitives already are strong. In
    this cases both function are identical.


    Marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paavo Helde@21:1/5 to Chris M. Thomasson on Tue Sep 17 00:31:02 2024
    On 15.09.2024 23:13, Chris M. Thomasson wrote:
    On 9/15/2024 11:54 AM, Paavo Helde wrote:

    I am thinking of developing some lock-free data structures for better
    scaling on multi-core hardware and avoiding potential deadlocks. In
    particular, I have got a lot of classes which are mostly immutable
    after construction, except for some cached data members which are
    calculated on demand only, then stored in the object for later use.

    Caching single numeric values is easy. However, some cached data is
    large and accessed via a std::shared_ptr type refcounted
    smartpointers. Updating such a smartpointer in a thread-shared object
    is a bit more tricky. There is a std::atomic<std::shared_ptr> in C+
    +20, but I wonder if I can do a bit better by providing my own
    implementation which uses CAS on a single pointer (instead of DCAS
    with additional data fields or other trickery).

    This is assuming that

    a) the cached value will not change any more after assigned, and will
    stay intact until the containing object destruction;

    b) it's ok if multiple threads calculate the value at the same time;
    the first one stored will be the one which gets used.

    My current prototype code is as follows (Ptr<T> is similar to
    std::shared_ptr<T>, but is using an internal atomic refcounter; using
    an internal counter allows me to generate additional smartpointers
    from a raw pointer).

    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}

         /// Store p in *this if *this is not yet assigned.
         /// Return pointer stored in *this, which can be \a p or not.
         Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_weak(other, p.get(),
    std::memory_order_release, std::memory_order_acquire)) {
                 p->IncrementRefcount();
                 return p;
             } else {
                 // wrap in an extra smartptr (increments refcount) >>              return Ptr<T>(other);
             }
         }

         /// Return pointer stored in *this,
         Ptr<T> Load() const {
             return Ptr<T>(ptr_);
         }

         ~CachedAtomicPtr() {
             if (const T* ptr = ptr_) {
                 ptr->DecrementRefcount();
             }
         }
    private:
         std::atomic<const T*> ptr_;
    };

    Example usage:

    /// Objects of this class are in shared use by multiple threads.
    class A {
         // Returns B corresponding to the value of *this.
         // If not yet in cache, B is calculated and cached in *this.
         // Calculating can happen in multiple threads in parallel,
         // the first cached result will be used in all threads.
         Ptr<B> GetOrCalcB() const {
             Ptr<B> b = cached_.Load();
             if (!b) {
                 b = cached_.AssignIfNull(CalcB());
             }
             return b;
         }
         // ...
    private:
         // Calculates cached B object according to value of *this.
         Ptr<B> CalcB() const;
    private:
         mutable CachedAtomicPtr<B> cached_;
         // ... own data ...
    };

    So, what do you think? Should I just use std::atomic<std::shared_ptr>
    instead? Any other suggestions? Did I get the memory order parameters
    right in compare_exchange_weak()?


    I need to look at this when I get some time. Been very busy lately.
    Humm... Perhaps, when you get some free time to burn... Try to model it
    in Relacy and see what happens:

    https://www.1024cores.net/home/relacy-race-detector/rrd-introduction

    https://groups.google.com/g/relacy



    Took a look at that. It appears for using Relacy I have to somehow
    translate my algorithm into Relacy-language, but this seems non-trivial.
    There is about zero documentation and zero code comments, and examples
    are not compiling with VS2022. Looks like for using it I would need to
    grow some 20 extra IQ points and spend a significant amount of time.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paavo Helde@21:1/5 to Chris M. Thomasson on Tue Sep 17 08:54:28 2024
    On 17.09.2024 00:46, Chris M. Thomasson wrote:
    On 9/16/2024 2:31 PM, Paavo Helde wrote:
    On 15.09.2024 23:13, Chris M. Thomasson wrote:
    On 9/15/2024 11:54 AM, Paavo Helde wrote:

    I am thinking of developing some lock-free data structures for
    better scaling on multi-core hardware and avoiding potential
    deadlocks. In particular, I have got a lot of classes which are
    mostly immutable after construction, except for some cached data
    members which are calculated on demand only, then stored in the
    object for later use.

    Caching single numeric values is easy. However, some cached data is
    large and accessed via a std::shared_ptr type refcounted
    smartpointers. Updating such a smartpointer in a thread-shared
    object is a bit more tricky. There is a std::atomic<std::shared_ptr>
    in C+ +20, but I wonder if I can do a bit better by providing my own
    implementation which uses CAS on a single pointer (instead of DCAS
    with additional data fields or other trickery).

    This is assuming that

    a) the cached value will not change any more after assigned, and
    will stay intact until the containing object destruction;

    b) it's ok if multiple threads calculate the value at the same time;
    the first one stored will be the one which gets used.

    My current prototype code is as follows (Ptr<T> is similar to
    std::shared_ptr<T>, but is using an internal atomic refcounter;
    using an internal counter allows me to generate additional
    smartpointers from a raw pointer).

    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}

         /// Store p in *this if *this is not yet assigned.
         /// Return pointer stored in *this, which can be \a p or not. >>>>      Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_weak(other, p.get(),
    std::memory_order_release, std::memory_order_acquire)) {
                 p->IncrementRefcount();
                 return p;
             } else {
                 // wrap in an extra smartptr (increments refcount)
                 return Ptr<T>(other);
             }
         }

    So as long as CachedAtomicPtr is alive, the cached pointer, the one that
    gets installed in your AssignIfNull function, will be alive?

    Correct. The `p->IncrementRefcount();` line will keep the assigned T
    object alive, regardless of how many other smartpointers there are
    pointing to it in any threads. The refcount is decremented in ~CachedAtomicPtr().

    Sorry if my
    question sounds stupid or something. Get trying to get a handle on your
    usage pattern. Also, the first pointer installed in CachedAtomicPtr will remain that way for the entire duration of the lifetime of said CachedAtomicPtr instance?

    Correct. Otherwise I would need a mutex, DCAS or some other trickery for avoiding races.

    Thanks for the answer!
    Paavo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paavo Helde@21:1/5 to Chris M. Thomasson on Tue Sep 17 09:09:26 2024
    On 17.09.2024 00:50, Chris M. Thomasson wrote:
    On 9/16/2024 2:46 PM, Chris M. Thomasson wrote:
    On 9/16/2024 2:31 PM, Paavo Helde wrote:
    On 15.09.2024 23:13, Chris M. Thomasson wrote:
    On 9/15/2024 11:54 AM, Paavo Helde wrote:

    I am thinking of developing some lock-free data structures for
    better scaling on multi-core hardware and avoiding potential
    deadlocks. In particular, I have got a lot of classes which are
    mostly immutable after construction, except for some cached data
    members which are calculated on demand only, then stored in the
    object for later use.

    Caching single numeric values is easy. However, some cached data is
    large and accessed via a std::shared_ptr type refcounted
    smartpointers. Updating such a smartpointer in a thread-shared
    object is a bit more tricky. There is a
    std::atomic<std::shared_ptr> in C+ +20, but I wonder if I can do a
    bit better by providing my own implementation which uses CAS on a
    single pointer (instead of DCAS with additional data fields or
    other trickery).

    This is assuming that

    a) the cached value will not change any more after assigned, and
    will stay intact until the containing object destruction;

    b) it's ok if multiple threads calculate the value at the same
    time; the first one stored will be the one which gets used.

    My current prototype code is as follows (Ptr<T> is similar to
    std::shared_ptr<T>, but is using an internal atomic refcounter;
    using an internal counter allows me to generate additional
    smartpointers from a raw pointer).

    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}

         /// Store p in *this if *this is not yet assigned.
         /// Return pointer stored in *this, which can be \a p or not. >>>>>      Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_weak(other, p.get(),
    std::memory_order_release, std::memory_order_acquire)) {
                 p->IncrementRefcount();
                 return p;
             } else {
                 // wrap in an extra smartptr (increments refcount)
                 return Ptr<T>(other);
             }
         }

    So as long as CachedAtomicPtr is alive, the cached pointer, the one
    that gets installed in your AssignIfNull function, will be alive?
    Sorry if my question sounds stupid or something. Get trying to get a
    handle on your usage pattern. Also, the first pointer installed in
    CachedAtomicPtr will remain that way for the entire duration of the
    lifetime of said CachedAtomicPtr instance?

    So as long as CachedAtomicPtr stays alive, the refcount on its
    successfully _installed_ point will always be +1. In other words in
    order for the count to drop to zero wrt its installed smart pointer, is
    only _after_ CachedAtomicPtr has been destroyed and its dtor called? Am
    I getting close or WAY off in the damn weeds somewhere out
    there.... ;^) ? So one a smart pointer is installed in a
    CachedAtomicPtr, it will never change? Am I right, or totally wrong?

    Yes, all correct!

    Now I see I also need to forbid assignment in the CachedAtomicPtr class
    to avoid modifying the installed smartpointer, and add a proper copy
    ctor. But that's a bit aside the main functionality of installing the
    cached smartpointer in a lock-free manner.

    Thanks for the answer!
    Paavo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paavo Helde@21:1/5 to Chris M. Thomasson on Tue Sep 17 12:22:34 2024
    On 17.09.2024 09:04, Chris M. Thomasson wrote:
    On 9/16/2024 10:59 PM, Chris M. Thomasson wrote:
    On 9/16/2024 10:54 PM, Paavo Helde wrote:
    [...]
    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}

         /// Store p in *this if *this is not yet assigned.
         /// Return pointer stored in *this, which can be \a p or not. >>>>>>>      Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_weak(other, p.get(),
    std::memory_order_release, std::memory_order_acquire)) {
                 p->IncrementRefcount();
                 return p;
             } else {
                 // wrap in an extra smartptr (increments refcount)
                 return Ptr<T>(other);
             }

    ^^^^^^^^^^^^^^^^^^

    Is Ptr<T> an intrusive reference count? I assume it is.

    Yes. Otherwise I could not generate new smartpointers from bare T*.


    FYI, here is my current full compilable code together with a test
    harness (no relacy, could not get it working, so this just creates a
    number of threads which make use of the CachedAtomicPtr objects in parallel.

    #include <cstddef>
    #include <atomic>
    #include <iostream>
    #include <stdexcept>
    #include <deque>
    #include <mutex>
    #include <thread>
    #include <vector>

    /// debug instrumentation
    std::atomic<int> gAcount = 0, gBcount = 0, gCASFailureCount = 0;
    /// program exit code
    std::atomic<int> exitCode = EXIT_SUCCESS;

    void Assert(bool x) {
    if (!x) {
    throw std::logic_error("Assert failed");
    }
    }

    class RefCountedBase {
    public:
    RefCountedBase(): refcount_(0) {}
    RefCountedBase(const RefCountedBase&): refcount_(0) {}
    RefCountedBase(RefCountedBase&&) = delete;
    RefCountedBase& operator=(const RefCountedBase&) = delete;
    RefCountedBase& operator=(RefCountedBase&&) = delete;

    void Capture() const noexcept {
    ++refcount_;
    }
    void Release() const noexcept {
    if (--refcount_ == 0) {
    delete const_cast<RefCountedBase*>(this);
    }
    }
    virtual ~RefCountedBase() {}


    private:
    mutable std::atomic<std::size_t> refcount_;
    };

    template<class T>
    class Ptr {
    public:
    Ptr(): ptr_(nullptr) {}
    explicit Ptr(const T* ptr): ptr_(ptr) { if (ptr_) { ptr_->Capture(); } }
    Ptr(const Ptr& b): ptr_(b.ptr_) { if (ptr_) { ptr_->Capture(); } }
    Ptr(Ptr&& b) noexcept: ptr_(b.ptr_) { b.ptr_ = nullptr; }
    ~Ptr() { if (ptr_) { ptr_->Release(); } }
    Ptr& operator=(const Ptr& b) {
    if (b.ptr_) { b.ptr_->Capture(); }
    if (ptr_) { ptr_->Release(); }
    ptr_ = b.ptr_;
    return *this;
    }
    Ptr& operator=(Ptr&& b) noexcept {
    if (ptr_) { ptr_->Release(); }
    ptr_ = b.ptr_;
    b.ptr_ = nullptr;
    return *this;
    }
    const T* operator->() const { return ptr_; }
    const T& operator*() const { return *ptr_; }
    explicit operator bool() const { return ptr_!=nullptr; }
    const T* get() const { return ptr_; }
    private:
    mutable const T* ptr_;
    };

    template<typename T>
    class CachedAtomicPtr {
    public:
    CachedAtomicPtr(): ptr_(nullptr) {}
    /// Store p in *this if *this is not yet assigned.
    /// Return pointer stored in *this, which can be \a p or not.
    Ptr<T> AssignIfNull(Ptr<T> p) {
    const T* other = nullptr;
    if (ptr_.compare_exchange_strong(other, p.get(), std::memory_order_release, std::memory_order_acquire)) {
    p->Capture();
    return p;
    } else {
    ++gCASFailureCount;
    return Ptr<T>(other);
    }
    }
    Ptr<T> Load() const {
    return Ptr<T>(ptr_);
    }
    ~CachedAtomicPtr() {
    if (const T* ptr = ptr_) {
    ptr->Release();
    }
    }

    /// A copy is made for modifying the enclosing object, which would make the cached value wrong, so we reset it to null on copying.
    CachedAtomicPtr(const CachedAtomicPtr& other): ptr_(nullptr) {}

    /// Any standard operations modifying ptr_ during the object lifetime would create data races and need to be forbidden.
    CachedAtomicPtr(CachedAtomicPtr&&) = delete;
    CachedAtomicPtr& operator=(const CachedAtomicPtr&) = delete;
    CachedAtomicPtr& operator=(CachedAtomicPtr&&) = delete;


    private:
    std::atomic<const T*> ptr_;
    };

    class B: public RefCountedBase {
    public:
    static Ptr<B> Make(int data) {
    return Ptr<B>(new B(data));
    }
    int Data() const {
    return data_;
    }
    Ptr<B> Clone() const {
    return Ptr<B>(new B(*this));
    }
    private:
    B(int data): data_(data) { ++gBcount; }
    B(const B& other): data_(other.data_) { ++gBcount; }
    ~B() { data_ = -1; Assert(typeid(*this) == typeid(B)); --gBcount; } private:
    int data_;
    };

    class A: public RefCountedBase {
    public:
    static Ptr<A> Make(int data) {
    return Ptr<A>(new A(data));
    }
    Ptr<B> GetOrCalcB() const {
    Ptr<B> b = b_.Load();
    if (!b) {
    b = b_.AssignIfNull(CalcB());
    }
    if (!b) {
    Assert(false);
    }
    return b;
    }
    int Data() const {
    return data_;
    }
    Ptr<A> Clone() const {
    return Ptr<A>(new A(*this));
    }
    private:
    A(int data): data_(data) { ++gAcount; }
    A(const A& other): data_(other.data_) { ++gAcount; }
    ~A() { --gAcount; }
    Ptr<B> CalcB() const {
    return B::Make(2 * data_);
    }
    private:
    int data_;
    mutable CachedAtomicPtr<B> b_;
    };

    // Test harness

    class Queue {
    public:
    void Push(Ptr<A> a) {
    {
    std::unique_lock<std::mutex> lock{ mutex_ };
    deque_.push_back(a);
    }
    cond_.notify_all();
    }
    Ptr<A> Pop() {
    std::unique_lock<std::mutex> lock{ mutex_ };
    cond_.wait(lock, [this]() {return !deque_.empty(); });
    Ptr<A> a = deque_.front();
    deque_.pop_front();
    return a;
    }
    private:
    std::mutex mutex_;
    std::condition_variable cond_;
    std::deque<Ptr<A>> deque_;
    };

    void Test1(Ptr<A>&& a) {
    Ptr<B> b = a->GetOrCalcB();
    int m = a->Data();
    // Destroy a, b must remain alive.
    a = Ptr<A>();
    int k = b->Data();
    Assert(k == 2 * m);
    }

    void Test2(Ptr<A>&& a) {
    int m = a->Data();
    Ptr<A> a2 = a->Clone();
    int m2 = a2->Data();
    Assert(m == m2);
    Ptr<B> b = a->GetOrCalcB();
    Ptr<B> b2 = a2->GetOrCalcB();
    Ptr<B> b3 = a->Clone()->GetOrCalcB();
    a = Ptr<A>();

    Ptr<B> b4 = b;
    Ptr<B> b5 = b->Clone();
    int k = b->Data();
    Assert(k == 2 * m);
    int k2 = b2->Data();
    Assert(k == 2 * m);
    int k3 = b3->Data();
    Assert(k == 2 * m);
    int k4 = b5->Data();
    Assert(k == 2 * m);
    int k5 = b4->Data();
    Assert(k == 2 * m);
    }

    int main() {
    try {

    /// Number of parallel threads
    int N = 16;
    /// Number of iterations
    int n = 1000000;

    std::vector<Queue> queues(N);
    std::vector<std::thread> threads;
    for (int i = 0; i < N; ++i) {
    Queue& queue = queues[i];
    threads.emplace_back([&queue]() {
    try {
    while (true) {
    Ptr<A> a = queue.Pop();
    if (!a) {
    break;
    }
    Test1(std::move(a));

    a = queue.Pop();
    if (!a) {
    break;
    }
    Test2(std::move(a));
    }
    } catch (const std::exception& e) {
    std::cerr << "Exception: " << e.what() << "\n";
    exitCode = EXIT_FAILURE;
    }
    }
    );
    }

    for (int i = 0; i < n; ++i) {
    if (i % (n / 100) == 0) {
    std::cout << i / (n / 100) << "%\n";
    }
    Ptr<A> a = A::Make(i);
    for (int j = 0; j < N; ++j) {
    queues[j].Push(a);
    }
    }

    for (int j = 0; j < N; ++j) {
    queues[j].Push(Ptr<A>());
    }

    for (int j = 0; j < N; ++j) {
    threads[j].join();
    }

    std::cout << "n = " << n << "\n";
    std::cout << "Leaked gAcount = " << gAcount << "\n";
    std::cout << "Leaked gBcount = " << gBcount << "\n";
    std::cout << "gCASFailureCount = " << gCASFailureCount << " (" <<
    (gCASFailureCount * 100.0 / (n * N)) << "%)\n";

    if (gAcount || gBcount) {
    std::cerr << "Error: there were leaks or double destroys.\n";
    exitCode = EXIT_FAILURE;
    }
    if (gCASFailureCount == 0) {
    std::cerr << "Warning: no CAS failures, so could not test the code
    fully (single-core machine?)\n";
    }

    } catch (const std::exception& e) {
    std::cerr << "Exception: " << e.what() << "\n";
    exitCode = EXIT_FAILURE;
    }
    return exitCode;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paavo Helde@21:1/5 to Chris M. Thomasson on Thu Sep 26 13:14:02 2024
    On 26.09.2024 08:49, Chris M. Thomasson wrote:
    On 9/17/2024 2:22 AM, Paavo Helde wrote:
    On 17.09.2024 09:04, Chris M. Thomasson wrote:
    On 9/16/2024 10:59 PM, Chris M. Thomasson wrote:
    On 9/16/2024 10:54 PM, Paavo Helde wrote:
    [...]
    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}

         /// Store p in *this if *this is not yet assigned.
         /// Return pointer stored in *this, which can be \a p or not.
         Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_weak(other, p.get(), >>>>>>>>> std::memory_order_release, std::memory_order_acquire)) {
                 p->IncrementRefcount();
                 return p;
             } else {
                 // wrap in an extra smartptr (increments refcount)
                 return Ptr<T>(other);
             }

    ^^^^^^^^^^^^^^^^^^

    Is Ptr<T> an intrusive reference count? I assume it is.

    Yes. Otherwise I could not generate new smartpointers from bare T*.


    FYI, here is my current full compilable code together with a test
    harness (no relacy, could not get it working, so this just creates a
    number of threads which make use of the CachedAtomicPtr objects in
    parallel.

    #include <cstddef>
    #include <atomic>
    #include <iostream>
    #include <stdexcept>
    #include <deque>
    #include <mutex>
    #include <thread>
    #include <vector>

    /// debug instrumentation
    std::atomic<int> gAcount = 0, gBcount = 0, gCASFailureCount = 0;
    /// program exit code
    std::atomic<int> exitCode = EXIT_SUCCESS;

    void Assert(bool x) {
         if (!x) {
             throw std::logic_error("Assert failed");
         }
    }

    class RefCountedBase {
    public:
         RefCountedBase(): refcount_(0) {}
         RefCountedBase(const RefCountedBase&): refcount_(0) {}
         RefCountedBase(RefCountedBase&&) = delete;
         RefCountedBase& operator=(const RefCountedBase&) = delete;
         RefCountedBase& operator=(RefCountedBase&&) = delete;

         void Capture() const noexcept {
             ++refcount_;
         }
         void Release() const noexcept {
             if (--refcount_ == 0) {
                 delete const_cast<RefCountedBase*>(this);
             }
         }
         virtual ~RefCountedBase() {}


    private:
         mutable std::atomic<std::size_t> refcount_;
    };

    template<class T>
    class Ptr {
    public:
         Ptr(): ptr_(nullptr) {}
         explicit Ptr(const T* ptr): ptr_(ptr) { if (ptr_) { ptr_-
    ;Capture(); } }
         Ptr(const Ptr& b): ptr_(b.ptr_) { if (ptr_) { ptr_->Capture(); } } >>      Ptr(Ptr&& b) noexcept: ptr_(b.ptr_) { b.ptr_ = nullptr; }
         ~Ptr() { if (ptr_) { ptr_->Release(); } }
         Ptr& operator=(const Ptr& b) {
             if (b.ptr_) { b.ptr_->Capture(); }
             if (ptr_) { ptr_->Release(); }
             ptr_ = b.ptr_;
             return *this;
         }
         Ptr& operator=(Ptr&& b) noexcept {
             if (ptr_) { ptr_->Release(); }
             ptr_ = b.ptr_;
             b.ptr_ = nullptr;
             return *this;
         }
         const T* operator->() const { return ptr_; }
         const T& operator*() const { return *ptr_; }
         explicit operator bool() const { return ptr_!=nullptr; }
         const T* get() const { return ptr_; }
    private:
         mutable const T* ptr_;
    };

    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}
         /// Store p in *this if *this is not yet assigned.
         /// Return pointer stored in *this, which can be \a p or not.
         Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_strong(other, p.get(),
    std::memory_order_release, std::memory_order_acquire)) {
                 p->Capture();

    Only one thread should ever get here, right? It just installed the
    pointer p.get() into ptr_, right?

    Yes, that's the idea. The first thread which manages to install non-null pointer will increase the refcount, others will fail and their objects
    will be released when refcounts drop to zero.



                 return p;
             } else {
                 ++gCASFailureCount;
                 return Ptr<T>(other);

             }
         }
         Ptr<T> Load() const {
             return Ptr<T>(ptr_);
         }

    Now this is the crux of an potential issue. Strong thread safety allows
    for a thread to take a reference even if it does not already own one.
    This is not allowed in basic thread safety.

    So, for example this scenario needs strong thread safety:

    static atomic_ptr<foo> g_foo(nullptr);

    thread_a()
    {
        g_foo = new foo();
    }

    thread_b()
    {
        local_ptr<foo> l_foo = g_foo;
        if (l_foo) l_foo->bar();
    }


    thread_c()
    {
        g_foo = nullptr;
    }


    In my usage case I do not have thread_c() because nobody is changing the pointer any more after it is set. It is kept alive while the
    CachedAtomicPtr object is alive. Load() will increment the refcounter,
    so the pointed B object will stay alive even if CachedAtomicPtr is
    destroyed. My test harness is checking this scenario as well.


    This example does not work with shared_ptr, but should work with atomic<shared_ptr>, it should even be lock-free on archs that support
    it. thread_b is taking a reference to g_foo when it does not already own
    a reference.

    So, basically you would need your CachedAtomicPtr to stay alive. It's
    dtor should only be called after all threads that could potentially use
    it are joined, and the program is about to end.

    This is achieved automatically because all threads access the A object
    via their own smartpointers, so A (and its contained CachedAtomicPtr)
    stay alive while there is any thread which can access them. In
    particular, they stay alive during the Load() calls as the calling
    thread holds a smartpointer to A.


    Or else, I think you are
    going to need strong thread safety for the CachedAtomicPtr::Load
    function to work in a general sense.

    Just skimmed over it.

    Thanks for the reply!
    Paavo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paavo Helde@21:1/5 to Chris M. Thomasson on Sat Sep 28 10:10:22 2024
    On 27.09.2024 03:24, Chris M. Thomasson wrote:
    On 9/26/2024 3:14 AM, Paavo Helde wrote:
    On 26.09.2024 08:49, Chris M. Thomasson wrote:
    On 9/17/2024 2:22 AM, Paavo Helde wrote:
    On 17.09.2024 09:04, Chris M. Thomasson wrote:
    On 9/16/2024 10:59 PM, Chris M. Thomasson wrote:
    On 9/16/2024 10:54 PM, Paavo Helde wrote:
    [...]
    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}

         /// Store p in *this if *this is not yet assigned. >>>>>>>>>>>      /// Return pointer stored in *this, which can be \a p or >>>>>>>>>>> not.
         Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_weak(other, p.get(), >>>>>>>>>>> std::memory_order_release, std::memory_order_acquire)) { >>>>>>>>>>>              p->IncrementRefcount();
                 return p;
             } else {
                 // wrap in an extra smartptr (increments refcount)
                 return Ptr<T>(other);
             }

    ^^^^^^^^^^^^^^^^^^

    Is Ptr<T> an intrusive reference count? I assume it is.

    Yes. Otherwise I could not generate new smartpointers from bare T*.


    FYI, here is my current full compilable code together with a test
    harness (no relacy, could not get it working, so this just creates a
    number of threads which make use of the CachedAtomicPtr objects in
    parallel.

    #include <cstddef>
    #include <atomic>
    #include <iostream>
    #include <stdexcept>
    #include <deque>
    #include <mutex>
    #include <thread>
    #include <vector>

    /// debug instrumentation
    std::atomic<int> gAcount = 0, gBcount = 0, gCASFailureCount = 0;
    /// program exit code
    std::atomic<int> exitCode = EXIT_SUCCESS;

    void Assert(bool x) {
         if (!x) {
             throw std::logic_error("Assert failed");
         }
    }

    class RefCountedBase {
    public:
         RefCountedBase(): refcount_(0) {}
         RefCountedBase(const RefCountedBase&): refcount_(0) {}
         RefCountedBase(RefCountedBase&&) = delete;
         RefCountedBase& operator=(const RefCountedBase&) = delete;
         RefCountedBase& operator=(RefCountedBase&&) = delete;

         void Capture() const noexcept {
             ++refcount_;
         }
         void Release() const noexcept {
             if (--refcount_ == 0) {
                 delete const_cast<RefCountedBase*>(this);
             }
         }
         virtual ~RefCountedBase() {}


    private:
         mutable std::atomic<std::size_t> refcount_;
    };

    template<class T>
    class Ptr {
    public:
         Ptr(): ptr_(nullptr) {}
         explicit Ptr(const T* ptr): ptr_(ptr) { if (ptr_) { ptr_-
    ;Capture(); } }
         Ptr(const Ptr& b): ptr_(b.ptr_) { if (ptr_) { ptr_->Capture(); } }
         Ptr(Ptr&& b) noexcept: ptr_(b.ptr_) { b.ptr_ = nullptr; }
         ~Ptr() { if (ptr_) { ptr_->Release(); } }
         Ptr& operator=(const Ptr& b) {
             if (b.ptr_) { b.ptr_->Capture(); }
             if (ptr_) { ptr_->Release(); }
             ptr_ = b.ptr_;
             return *this;
         }
         Ptr& operator=(Ptr&& b) noexcept {
             if (ptr_) { ptr_->Release(); }
             ptr_ = b.ptr_;
             b.ptr_ = nullptr;
             return *this;
         }
         const T* operator->() const { return ptr_; }
         const T& operator*() const { return *ptr_; }
         explicit operator bool() const { return ptr_!=nullptr; }
         const T* get() const { return ptr_; }
    private:
         mutable const T* ptr_;
    };

    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}
         /// Store p in *this if *this is not yet assigned.
         /// Return pointer stored in *this, which can be \a p or not. >>>>      Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_strong(other, p.get(),
    std::memory_order_release, std::memory_order_acquire)) {
                 p->Capture();
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


    Only one thread should ever get here, right? It just installed the
    pointer p.get() into ptr_, right?

    Yes, that's the idea. The first thread which manages to install non-
    null pointer will increase the refcount, others will fail and their
    objects will be released when refcounts drop to zero.
    [...]

    Why do a Capture _after_ the pointer is atomically installed? Think of
    adding a reference in preparation for installation. If failed, it can decrement it. If it succeeded it leaves it alone.

    Yes, could be done this way, it would be just two code lines instead of
    one, with the same end result.



    <pseudo code>
    ___________________
    shared refcount<foo>* g_foo = nullptr;


    void thread_a()
    {
        // initialized with two references
        refcount<foo> local = new refcount<foo>(2);

        refcount<foo>* shared = CAS_STRONG(&g_foo, nullptr, local);

        if (shared)
        {
           // another thread beat us to it.

           local.dec(); // we dec because we failed to install...

           // well, we have a reference to shared and local... :^)
        }

        else
        {
            // well, shared is nullptr so we were the first thread!
        }
    }
    ___________________

    Sorry for all the questions. ;^o

    No-no! I'm happy to think over this code.

    I'm still a bit worried about the memory order parameteres, I have not
    used them before and I suspect they might not be distinguishable on the platforms which I need to support (x86_64 and ARM64, MSVC++ and g++), so
    I might not understand if they are wrong. Maybe I should just use the
    default memory_order_seq_cst if it does not make any difference anyway
    on these platforms?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paavo Helde@21:1/5 to Chris M. Thomasson on Sat Sep 28 10:28:01 2024
    On 27.09.2024 03:29, Chris M. Thomasson wrote:
    On 9/26/2024 5:24 PM, Chris M. Thomasson wrote:
    On 9/26/2024 3:14 AM, Paavo Helde wrote:
    On 26.09.2024 08:49, Chris M. Thomasson wrote:
    On 9/17/2024 2:22 AM, Paavo Helde wrote:
    On 17.09.2024 09:04, Chris M. Thomasson wrote:
    On 9/16/2024 10:59 PM, Chris M. Thomasson wrote:
    On 9/16/2024 10:54 PM, Paavo Helde wrote:
    [...]
    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}

         /// Store p in *this if *this is not yet assigned. >>>>>>>>>>>>      /// Return pointer stored in *this, which can be \a p >>>>>>>>>>>> or not.
         Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_weak(other, p.get(), >>>>>>>>>>>> std::memory_order_release, std::memory_order_acquire)) { >>>>>>>>>>>>              p->IncrementRefcount();
                 return p;
             } else {
                 // wrap in an extra smartptr (increments refcount)
                 return Ptr<T>(other);
             }

    ^^^^^^^^^^^^^^^^^^

    Is Ptr<T> an intrusive reference count? I assume it is.

    Yes. Otherwise I could not generate new smartpointers from bare T*.


    FYI, here is my current full compilable code together with a test
    harness (no relacy, could not get it working, so this just creates
    a number of threads which make use of the CachedAtomicPtr objects
    in parallel.

    #include <cstddef>
    #include <atomic>
    #include <iostream>
    #include <stdexcept>
    #include <deque>
    #include <mutex>
    #include <thread>
    #include <vector>

    /// debug instrumentation
    std::atomic<int> gAcount = 0, gBcount = 0, gCASFailureCount = 0;
    /// program exit code
    std::atomic<int> exitCode = EXIT_SUCCESS;

    void Assert(bool x) {
         if (!x) {
             throw std::logic_error("Assert failed");
         }
    }

    class RefCountedBase {
    public:
         RefCountedBase(): refcount_(0) {}
         RefCountedBase(const RefCountedBase&): refcount_(0) {}
         RefCountedBase(RefCountedBase&&) = delete;
         RefCountedBase& operator=(const RefCountedBase&) = delete; >>>>>      RefCountedBase& operator=(RefCountedBase&&) = delete;

         void Capture() const noexcept {
             ++refcount_;
         }
         void Release() const noexcept {
             if (--refcount_ == 0) {
                 delete const_cast<RefCountedBase*>(this);
             }
         }
         virtual ~RefCountedBase() {}


    private:
         mutable std::atomic<std::size_t> refcount_;
    };

    template<class T>
    class Ptr {
    public:
         Ptr(): ptr_(nullptr) {}
         explicit Ptr(const T* ptr): ptr_(ptr) { if (ptr_) { ptr_-
    ;Capture(); } }
         Ptr(const Ptr& b): ptr_(b.ptr_) { if (ptr_) { ptr_-
    Capture(); } }
         Ptr(Ptr&& b) noexcept: ptr_(b.ptr_) { b.ptr_ = nullptr; }
         ~Ptr() { if (ptr_) { ptr_->Release(); } }
         Ptr& operator=(const Ptr& b) {
             if (b.ptr_) { b.ptr_->Capture(); }
             if (ptr_) { ptr_->Release(); }
             ptr_ = b.ptr_;
             return *this;
         }
         Ptr& operator=(Ptr&& b) noexcept {
             if (ptr_) { ptr_->Release(); }
             ptr_ = b.ptr_;
             b.ptr_ = nullptr;
             return *this;
         }
         const T* operator->() const { return ptr_; }
         const T& operator*() const { return *ptr_; }
         explicit operator bool() const { return ptr_!=nullptr; }
         const T* get() const { return ptr_; }
    private:
         mutable const T* ptr_;
    };

    template<typename T>
    class CachedAtomicPtr {
    public:
         CachedAtomicPtr(): ptr_(nullptr) {}
         /// Store p in *this if *this is not yet assigned.
         /// Return pointer stored in *this, which can be \a p or not. >>>>>      Ptr<T> AssignIfNull(Ptr<T> p) {
             const T* other = nullptr;
             if (ptr_.compare_exchange_strong(other, p.get(),
    std::memory_order_release, std::memory_order_acquire)) {
                 p->Capture();
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


    Only one thread should ever get here, right? It just installed the
    pointer p.get() into ptr_, right?

    Yes, that's the idea. The first thread which manages to install non-
    null pointer will increase the refcount, others will fail and their
    objects will be released when refcounts drop to zero.
    [...]

    Why do a Capture _after_ the pointer is atomically installed? Think of
    adding a reference in preparation for installation. If failed, it can
    decrement it. If it succeeded it leaves it alone.

    <pseudo code>
    ___________________
    shared refcount<foo>* g_foo = nullptr;


    void thread_a()
    {
         // initialized with two references
         refcount<foo> local = new refcount<foo>(2);

         refcount<foo>* shared = CAS_STRONG(&g_foo, nullptr, local);

         if (shared)
         {
            // another thread beat us to it.

            local.dec(); // we dec because we failed to install...

            // well, we have a reference to shared and local... :^)

    Now, actually, we have a reference to shared, but we should not
    decrement it here. We can rely on g_foo to never be set to null
    prematurely wrt our program structure. When our program is shutting
    down, g_foo can be decremented if its not nullptr. This would mean that successfully installing a point with a pre count of 2 would work. Taking
    a reference would not need to increment anything, and would not need to decrement anything. The lifetime is tied to g_foo wrt installing a
    pointer into it. Is that close to what you are doing? Or way off in the
    damn weeds?

    This is close. However, in my program there would be no such "naked"
    global g_foo pointers. They would not be so useful as they could never
    be changed during the program lifetime.

    Instead, each such pointer would be a part of another dynamically
    allocated and refcounted object, which controls its lifetime. If I need
    to change anything, I would make a clone of the enclosing object
    (without cached attributes) and make changes in single-threaded regime,
    before making the new object accessible to other threads. The cached
    attributes would then be recalculated (on new data) on demand later.

    Cheers
    Paavo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)