Forum: >>> Magnum BBS <<<

Atomic caching of smart pointers

From Paavo Helde@21:1/5 to All on Sun Sep 15 21:54:50 2024

I am thinking of developing some lock-free data structures for better
scaling on multi-core hardware and avoiding potential deadlocks. In
particular, I have got a lot of classes which are mostly immutable after construction, except for some cached data members which are calculated
on demand only, then stored in the object for later use.

Caching single numeric values is easy. However, some cached data is
large and accessed via a std::shared_ptr type refcounted smartpointers. Updating such a smartpointer in a thread-shared object is a bit more
tricky. There is a std::atomic<std::shared_ptr> in C++20, but I wonder
if I can do a bit better by providing my own implementation which uses
CAS on a single pointer (instead of DCAS with additional data fields or
other trickery).

This is assuming that

a) the cached value will not change any more after assigned, and will
stay intact until the containing object destruction;

b) it's ok if multiple threads calculate the value at the same time; the
first one stored will be the one which gets used.

My current prototype code is as follows (Ptr<T> is similar to std::shared_ptr<T>, but is using an internal atomic refcounter; using an internal counter allows me to generate additional smartpointers from a
raw pointer).

template<typename T>
class CachedAtomicPtr {
public:
CachedAtomicPtr(): ptr_(nullptr) {}

/// Store p in *this if *this is not yet assigned.
/// Return pointer stored in *this, which can be \a p or not.
Ptr<T> AssignIfNull(Ptr<T> p) {
const T* other = nullptr;
if (ptr_.compare_exchange_weak(other, p.get(), std::memory_order_release, std::memory_order_acquire)) {
p->IncrementRefcount();
return p;
} else {
// wrap in an extra smartptr (increments refcount)
return Ptr<T>(other);
}
}

/// Return pointer stored in *this,
Ptr<T> Load() const {
return Ptr<T>(ptr_);
}

~CachedAtomicPtr() {
if (const T* ptr = ptr_) {
ptr->DecrementRefcount();
}
}
private:
std::atomic<const T*> ptr_;
};

Example usage:

/// Objects of this class are in shared use by multiple threads.
class A {
// Returns B corresponding to the value of *this.
// If not yet in cache, B is calculated and cached in *this.
// Calculating can happen in multiple threads in parallel,
// the first cached result will be used in all threads.
Ptr GetOrCalcB() const {
Ptr b = cached_.Load();
if (!b) {
b = cached_.AssignIfNull(CalcB());
}
return b;
}
// ...
private:
// Calculates cached B object according to value of *this.
Ptr CalcB() const;
private:
mutable CachedAtomicPtr cached_;
// ... own data ...
};

So, what do you think? Should I just use std::atomic<std::shared_ptr>
instead? Any other suggestions? Did I get the memory order parameters
right in compare_exchange_weak()?

TIA
Paavo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paavo Helde@21:1/5 to Chris M. Thomasson on Mon Sep 16 09:01:47 2024

On 15.09.2024 23:15, Chris M. Thomasson wrote:

On 9/15/2024 11:54 AM, Paavo Helde wrote:
[...]

So, what do you think? Should I just use std::atomic<std::shared_ptr>
instead? Any other suggestions? Did I get the memory order parameters
right in compare_exchange_weak()?

Keep in mind that you need to make sure that
std::atomic<std::shared_ptr> is actually lock-free...

Make sure to investigate is_always_lock_free on your various arch's:

https://en.cppreference.com/w/cpp/atomic/atomic/is_always_lock_free

I already checked this, it is returning false at least on one of my
target platforms (Visual Studio 2022, Windows x86_64). IIRC Bonita
claimed this might be a false negative though.

Meanwhile I think I found a bug in my posted code, I should probably use compare_exchange_strong() instead of compare_exchange_weak(). I somehow
thought the latter would err on the safe side, but it does not. In my
test harness there seems to be no difference.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Muttley@dastardlyhq.com@21:1/5 to All on Mon Sep 16 07:26:44 2024

On Sun, 15 Sep 2024 21:54:50 +0300
Paavo Helde <eesnimi@osa.pri.ee> boringly babbled:

So, what do you think? Should I just use std::atomic<std::shared_ptr> >instead? Any other suggestions? Did I get the memory order parameters
right in compare_exchange_weak()?

Looks to me like you're simply protecting the pointers rather than the data they're actually pointing to but I only skim read it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paavo Helde@21:1/5 to Muttley@dastardlyhq.com on Mon Sep 16 11:59:25 2024

On 16.09.2024 10:26, Muttley@dastardlyhq.com wrote:

On Sun, 15 Sep 2024 21:54:50 +0300
Paavo Helde <eesnimi@osa.pri.ee> boringly babbled:

So, what do you think? Should I just use std::atomic<std::shared_ptr>
instead? Any other suggestions? Did I get the memory order parameters
right in compare_exchange_weak()?

Looks to me like you're simply protecting the pointers rather than the data they're actually pointing to but I only skim read it.

You are right, at the moment I am interested in protecting the pointers.

The pointed data will be immutable after initial creation (except for
potential refcounter updates inside it, but the refcounters are atomic
anyway, so no problems there).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Mueller@21:1/5 to All on Mon Sep 16 18:23:26 2024

Am 16.09.24 um 08:01 schrieb Paavo Helde:

Meanwhile I think I found a bug in my posted code, I should probably use compare_exchange_strong() instead of compare_exchange_weak().

Indeed. With ..._weak you most likely need a loop. But if you already
have a loop for other purposes using the _weak function could avoid an additional loop in the implementation of the function.

I somehow
thought the latter would err on the safe side, but it does not. In my
test harness there seems to be no difference.

On many platforms the underlying CAS primitives already are strong. In
this cases both function are identical.

Marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paavo Helde@21:1/5 to Chris M. Thomasson on Tue Sep 17 00:31:02 2024

On 15.09.2024 23:13, Chris M. Thomasson wrote:

On 9/15/2024 11:54 AM, Paavo Helde wrote:

I am thinking of developing some lock-free data structures for better
scaling on multi-core hardware and avoiding potential deadlocks. In
particular, I have got a lot of classes which are mostly immutable
after construction, except for some cached data members which are
calculated on demand only, then stored in the object for later use.

Caching single numeric values is easy. However, some cached data is
large and accessed via a std::shared_ptr type refcounted
smartpointers. Updating such a smartpointer in a thread-shared object
is a bit more tricky. There is a std::atomic<std::shared_ptr> in C+
+20, but I wonder if I can do a bit better by providing my own
implementation which uses CAS on a single pointer (instead of DCAS
with additional data fields or other trickery).

This is assuming that

a) the cached value will not change any more after assigned, and will
stay intact until the containing object destruction;

b) it's ok if multiple threads calculate the value at the same time;
the first one stored will be the one which gets used.

My current prototype code is as follows (Ptr<T> is similar to
std::shared_ptr<T>, but is using an internal atomic refcounter; using
an internal counter allows me to generate additional smartpointers
from a raw pointer).

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}

 /// Store p in *this if *this is not yet assigned.
 /// Return pointer stored in *this, which can be \a p or not.
 Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_weak(other, p.get(),
std::memory_order_release, std::memory_order_acquire)) {
 p->IncrementRefcount();
 return p;
 } else {
 // wrap in an extra smartptr (increments refcount) >> return Ptr<T>(other);
 }
 }

 /// Return pointer stored in *this,
 Ptr<T> Load() const {
 return Ptr<T>(ptr_);
 }

 ~CachedAtomicPtr() {
 if (const T* ptr = ptr_) {
 ptr->DecrementRefcount();
 }
 }
private:
 std::atomic<const T*> ptr_;
};

Example usage:

/// Objects of this class are in shared use by multiple threads.
class A {
 // Returns B corresponding to the value of *this.
 // If not yet in cache, B is calculated and cached in *this.
 // Calculating can happen in multiple threads in parallel,
 // the first cached result will be used in all threads.
 Ptr GetOrCalcB() const {
 Ptr b = cached_.Load();
 if (!b) {
 b = cached_.AssignIfNull(CalcB());
 }
 return b;
 }
 // ...
private:
 // Calculates cached B object according to value of *this.
 Ptr CalcB() const;
private:
 mutable CachedAtomicPtr cached_;
 // ... own data ...
};

So, what do you think? Should I just use std::atomic<std::shared_ptr>
instead? Any other suggestions? Did I get the memory order parameters
right in compare_exchange_weak()?

I need to look at this when I get some time. Been very busy lately.
Humm... Perhaps, when you get some free time to burn... Try to model it
in Relacy and see what happens:

https://www.1024cores.net/home/relacy-race-detector/rrd-introduction

https://groups.google.com/g/relacy

Took a look at that. It appears for using Relacy I have to somehow
translate my algorithm into Relacy-language, but this seems non-trivial.
There is about zero documentation and zero code comments, and examples
are not compiling with VS2022. Looks like for using it I would need to
grow some 20 extra IQ points and spend a significant amount of time.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paavo Helde@21:1/5 to Chris M. Thomasson on Tue Sep 17 08:54:28 2024

On 17.09.2024 00:46, Chris M. Thomasson wrote:

On 9/16/2024 2:31 PM, Paavo Helde wrote:

On 15.09.2024 23:13, Chris M. Thomasson wrote:

On 9/15/2024 11:54 AM, Paavo Helde wrote:

I am thinking of developing some lock-free data structures for
better scaling on multi-core hardware and avoiding potential
deadlocks. In particular, I have got a lot of classes which are
mostly immutable after construction, except for some cached data
members which are calculated on demand only, then stored in the
object for later use.

Caching single numeric values is easy. However, some cached data is
large and accessed via a std::shared_ptr type refcounted
smartpointers. Updating such a smartpointer in a thread-shared
object is a bit more tricky. There is a std::atomic<std::shared_ptr>
in C+ +20, but I wonder if I can do a bit better by providing my own
implementation which uses CAS on a single pointer (instead of DCAS
with additional data fields or other trickery).

This is assuming that

a) the cached value will not change any more after assigned, and
will stay intact until the containing object destruction;

b) it's ok if multiple threads calculate the value at the same time;
the first one stored will be the one which gets used.

My current prototype code is as follows (Ptr<T> is similar to
std::shared_ptr<T>, but is using an internal atomic refcounter;
using an internal counter allows me to generate additional
smartpointers from a raw pointer).

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}

 /// Store p in *this if *this is not yet assigned.
 /// Return pointer stored in *this, which can be \a p or not. >>>> Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_weak(other, p.get(),
std::memory_order_release, std::memory_order_acquire)) {
 p->IncrementRefcount();
 return p;
 } else {
 // wrap in an extra smartptr (increments refcount)
 return Ptr<T>(other);
 }
 }

So as long as CachedAtomicPtr is alive, the cached pointer, the one that
gets installed in your AssignIfNull function, will be alive?

Correct. The `p->IncrementRefcount();` line will keep the assigned T
object alive, regardless of how many other smartpointers there are
pointing to it in any threads. The refcount is decremented in ~CachedAtomicPtr().

Sorry if my
question sounds stupid or something. Get trying to get a handle on your
usage pattern. Also, the first pointer installed in CachedAtomicPtr will remain that way for the entire duration of the lifetime of said CachedAtomicPtr instance?

Correct. Otherwise I would need a mutex, DCAS or some other trickery for avoiding races.

Thanks for the answer!
Paavo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paavo Helde@21:1/5 to Chris M. Thomasson on Tue Sep 17 09:09:26 2024

On 17.09.2024 00:50, Chris M. Thomasson wrote:

On 9/16/2024 2:46 PM, Chris M. Thomasson wrote:

On 9/16/2024 2:31 PM, Paavo Helde wrote:

On 15.09.2024 23:13, Chris M. Thomasson wrote:

On 9/15/2024 11:54 AM, Paavo Helde wrote:

I am thinking of developing some lock-free data structures for
better scaling on multi-core hardware and avoiding potential
deadlocks. In particular, I have got a lot of classes which are
mostly immutable after construction, except for some cached data
members which are calculated on demand only, then stored in the
object for later use.

Caching single numeric values is easy. However, some cached data is
large and accessed via a std::shared_ptr type refcounted
smartpointers. Updating such a smartpointer in a thread-shared
object is a bit more tricky. There is a
std::atomic<std::shared_ptr> in C+ +20, but I wonder if I can do a
bit better by providing my own implementation which uses CAS on a
single pointer (instead of DCAS with additional data fields or
other trickery).

This is assuming that

a) the cached value will not change any more after assigned, and
will stay intact until the containing object destruction;

b) it's ok if multiple threads calculate the value at the same
time; the first one stored will be the one which gets used.

My current prototype code is as follows (Ptr<T> is similar to
std::shared_ptr<T>, but is using an internal atomic refcounter;
using an internal counter allows me to generate additional
smartpointers from a raw pointer).

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}

 /// Store p in *this if *this is not yet assigned.
 /// Return pointer stored in *this, which can be \a p or not. >>>>> Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_weak(other, p.get(),
std::memory_order_release, std::memory_order_acquire)) {
 p->IncrementRefcount();
 return p;
 } else {
 // wrap in an extra smartptr (increments refcount)
 return Ptr<T>(other);
 }
 }

So as long as CachedAtomicPtr is alive, the cached pointer, the one
that gets installed in your AssignIfNull function, will be alive?
Sorry if my question sounds stupid or something. Get trying to get a
handle on your usage pattern. Also, the first pointer installed in
CachedAtomicPtr will remain that way for the entire duration of the
lifetime of said CachedAtomicPtr instance?

So as long as CachedAtomicPtr stays alive, the refcount on its
successfully _installed_ point will always be +1. In other words in
order for the count to drop to zero wrt its installed smart pointer, is
only _after_ CachedAtomicPtr has been destroyed and its dtor called? Am
I getting close or WAY off in the damn weeds somewhere out
there.... ;^) ? So one a smart pointer is installed in a
CachedAtomicPtr, it will never change? Am I right, or totally wrong?

Yes, all correct!

Now I see I also need to forbid assignment in the CachedAtomicPtr class
to avoid modifying the installed smartpointer, and add a proper copy
ctor. But that's a bit aside the main functionality of installing the
cached smartpointer in a lock-free manner.

Thanks for the answer!
Paavo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paavo Helde@21:1/5 to Chris M. Thomasson on Tue Sep 17 12:22:34 2024

On 17.09.2024 09:04, Chris M. Thomasson wrote:

On 9/16/2024 10:59 PM, Chris M. Thomasson wrote:

On 9/16/2024 10:54 PM, Paavo Helde wrote:

[...]

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}

 /// Store p in *this if *this is not yet assigned.
 /// Return pointer stored in *this, which can be \a p or not. >>>>>>> Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_weak(other, p.get(),
std::memory_order_release, std::memory_order_acquire)) {
 p->IncrementRefcount();
 return p;
 } else {
 // wrap in an extra smartptr (increments refcount)
 return Ptr<T>(other);
 }

^^^^^^^^^^^^^^^^^^

Is Ptr<T> an intrusive reference count? I assume it is.

Yes. Otherwise I could not generate new smartpointers from bare T*.

FYI, here is my current full compilable code together with a test
harness (no relacy, could not get it working, so this just creates a
number of threads which make use of the CachedAtomicPtr objects in parallel.

#include <cstddef>
#include <atomic>
#include <iostream>
#include <stdexcept>
#include <deque>
#include <mutex>
#include <thread>
#include <vector>

/// debug instrumentation
std::atomic<int> gAcount = 0, gBcount = 0, gCASFailureCount = 0;
/// program exit code
std::atomic<int> exitCode = EXIT_SUCCESS;

void Assert(bool x) {
if (!x) {
throw std::logic_error("Assert failed");
}
}

class RefCountedBase {
public:
RefCountedBase(): refcount_(0) {}
RefCountedBase(const RefCountedBase&): refcount_(0) {}
RefCountedBase(RefCountedBase&&) = delete;
RefCountedBase& operator=(const RefCountedBase&) = delete;
RefCountedBase& operator=(RefCountedBase&&) = delete;

void Capture() const noexcept {
++refcount_;
}
void Release() const noexcept {
if (--refcount_ == 0) {
delete const_cast<RefCountedBase*>(this);
}
}
virtual ~RefCountedBase() {}

private:
mutable std::atomic<std::size_t> refcount_;
};

template<class T>
class Ptr {
public:
Ptr(): ptr_(nullptr) {}
explicit Ptr(const T* ptr): ptr_(ptr) { if (ptr_) { ptr_->Capture(); } }
Ptr(const Ptr& b): ptr_(b.ptr_) { if (ptr_) { ptr_->Capture(); } }
Ptr(Ptr&& b) noexcept: ptr_(b.ptr_) { b.ptr_ = nullptr; }
~Ptr() { if (ptr_) { ptr_->Release(); } }
Ptr& operator=(const Ptr& b) {
if (b.ptr_) { b.ptr_->Capture(); }
if (ptr_) { ptr_->Release(); }
ptr_ = b.ptr_;
return *this;
}
Ptr& operator=(Ptr&& b) noexcept {
if (ptr_) { ptr_->Release(); }
ptr_ = b.ptr_;
b.ptr_ = nullptr;
return *this;
}
const T* operator->() const { return ptr_; }
const T& operator*() const { return *ptr_; }
explicit operator bool() const { return ptr_!=nullptr; }
const T* get() const { return ptr_; }
private:
mutable const T* ptr_;
};

template<typename T>
class CachedAtomicPtr {
public:
CachedAtomicPtr(): ptr_(nullptr) {}
/// Store p in *this if *this is not yet assigned.
/// Return pointer stored in *this, which can be \a p or not.
Ptr<T> AssignIfNull(Ptr<T> p) {
const T* other = nullptr;
if (ptr_.compare_exchange_strong(other, p.get(), std::memory_order_release, std::memory_order_acquire)) {
p->Capture();
return p;
} else {
++gCASFailureCount;
return Ptr<T>(other);
}
}
Ptr<T> Load() const {
return Ptr<T>(ptr_);
}
~CachedAtomicPtr() {
if (const T* ptr = ptr_) {
ptr->Release();
}
}

/// A copy is made for modifying the enclosing object, which would make the cached value wrong, so we reset it to null on copying.
CachedAtomicPtr(const CachedAtomicPtr& other): ptr_(nullptr) {}

/// Any standard operations modifying ptr_ during the object lifetime would create data races and need to be forbidden.
CachedAtomicPtr(CachedAtomicPtr&&) = delete;
CachedAtomicPtr& operator=(const CachedAtomicPtr&) = delete;
CachedAtomicPtr& operator=(CachedAtomicPtr&&) = delete;

private:
std::atomic<const T*> ptr_;
};

class B: public RefCountedBase {
public:
static Ptr Make(int data) {
return Ptr(new B(data));
}
int Data() const {
return data_;
}
Ptr Clone() const {
return Ptr(new B(*this));
}
private:
B(int data): data_(data) { ++gBcount; }
B(const B& other): data_(other.data_) { ++gBcount; }
~B() { data_ = -1; Assert(typeid(*this) == typeid(B)); --gBcount; } private:
int data_;
};

class A: public RefCountedBase {
public:
static Ptr<A> Make(int data) {
return Ptr<A>(new A(data));
}
Ptr GetOrCalcB() const {
Ptr b = b_.Load();
if (!b) {
b = b_.AssignIfNull(CalcB());
}
if (!b) {
Assert(false);
}
return b;
}
int Data() const {
return data_;
}
Ptr<A> Clone() const {
return Ptr<A>(new A(*this));
}
private:
A(int data): data_(data) { ++gAcount; }
A(const A& other): data_(other.data_) { ++gAcount; }
~A() { --gAcount; }
Ptr CalcB() const {
return B::Make(2 * data_);
}
private:
int data_;
mutable CachedAtomicPtr b_;
};

// Test harness

class Queue {
public:
void Push(Ptr<A> a) {
{
std::unique_lock<std::mutex> lock{ mutex_ };
deque_.push_back(a);
}
cond_.notify_all();
}
Ptr<A> Pop() {
std::unique_lock<std::mutex> lock{ mutex_ };
cond_.wait(lock, [this]() {return !deque_.empty(); });
Ptr<A> a = deque_.front();
deque_.pop_front();
return a;
}
private:
std::mutex mutex_;
std::condition_variable cond_;
std::deque<Ptr<A>> deque_;
};

void Test1(Ptr<A>&& a) {
Ptr b = a->GetOrCalcB();
int m = a->Data();
// Destroy a, b must remain alive.
a = Ptr<A>();
int k = b->Data();
Assert(k == 2 * m);
}

void Test2(Ptr<A>&& a) {
int m = a->Data();
Ptr<A> a2 = a->Clone();
int m2 = a2->Data();
Assert(m == m2);
Ptr b = a->GetOrCalcB();
Ptr b2 = a2->GetOrCalcB();
Ptr b3 = a->Clone()->GetOrCalcB();
a = Ptr<A>();

Ptr b4 = b;
Ptr b5 = b->Clone();
int k = b->Data();
Assert(k == 2 * m);
int k2 = b2->Data();
Assert(k == 2 * m);
int k3 = b3->Data();
Assert(k == 2 * m);
int k4 = b5->Data();
Assert(k == 2 * m);
int k5 = b4->Data();
Assert(k == 2 * m);
}

int main() {
try {

/// Number of parallel threads
int N = 16;
/// Number of iterations
int n = 1000000;

std::vector<Queue> queues(N);
std::vector<std::thread> threads;
for (int i = 0; i < N; ++i) {
Queue& queue = queues[i];
threads.emplace_back([&queue]() {
try {
while (true) {
Ptr<A> a = queue.Pop();
if (!a) {
break;
}
Test1(std::move(a));

a = queue.Pop();
if (!a) {
break;
}
Test2(std::move(a));
}
} catch (const std::exception& e) {
std::cerr << "Exception: " << e.what() << "\n";
exitCode = EXIT_FAILURE;
}
}
);
}

for (int i = 0; i < n; ++i) {
if (i % (n / 100) == 0) {
std::cout < a = A::Make(i);
for (int j = 0; j < N; ++j) {
queues[j].Push(a);
}
}

for (int j = 0; j < N; ++j) {
queues[j].Push(Ptr<A>());
}

for (int j = 0; j < N; ++j) {
threads[j].join();
}

std::cout << "n = " << n << "\n";
std::cout << "Leaked gAcount = " << gAcount << "\n";
std::cout << "Leaked gBcount = " << gBcount << "\n";
std::cout << "gCASFailureCount = " << gCASFailureCount << " (" <<
(gCASFailureCount * 100.0 / (n * N)) << "%)\n";

if (gAcount || gBcount) {
std::cerr << "Error: there were leaks or double destroys.\n";
exitCode = EXIT_FAILURE;
}
if (gCASFailureCount == 0) {
std::cerr << "Warning: no CAS failures, so could not test the code
fully (single-core machine?)\n";
}

} catch (const std::exception& e) {
std::cerr << "Exception: " << e.what() << "\n";
exitCode = EXIT_FAILURE;
}
return exitCode;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paavo Helde@21:1/5 to Chris M. Thomasson on Thu Sep 26 13:14:02 2024

On 26.09.2024 08:49, Chris M. Thomasson wrote:

On 9/17/2024 2:22 AM, Paavo Helde wrote:

On 17.09.2024 09:04, Chris M. Thomasson wrote:

On 9/16/2024 10:59 PM, Chris M. Thomasson wrote:

On 9/16/2024 10:54 PM, Paavo Helde wrote:

[...]

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}

 /// Store p in *this if *this is not yet assigned.
 /// Return pointer stored in *this, which can be \a p or not.
 Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_weak(other, p.get(), >>>>>>>>> std::memory_order_release, std::memory_order_acquire)) {
 p->IncrementRefcount();
 return p;
 } else {
 // wrap in an extra smartptr (increments refcount)
 return Ptr<T>(other);
 }

^^^^^^^^^^^^^^^^^^

Is Ptr<T> an intrusive reference count? I assume it is.

Yes. Otherwise I could not generate new smartpointers from bare T*.

FYI, here is my current full compilable code together with a test
harness (no relacy, could not get it working, so this just creates a
number of threads which make use of the CachedAtomicPtr objects in
parallel.

#include <cstddef>
#include <atomic>
#include <iostream>
#include <stdexcept>
#include <deque>
#include <mutex>
#include <thread>
#include <vector>

/// debug instrumentation
std::atomic<int> gAcount = 0, gBcount = 0, gCASFailureCount = 0;
/// program exit code
std::atomic<int> exitCode = EXIT_SUCCESS;

void Assert(bool x) {
 if (!x) {
 throw std::logic_error("Assert failed");
 }
}

class RefCountedBase {
public:
 RefCountedBase(): refcount_(0) {}
 RefCountedBase(const RefCountedBase&): refcount_(0) {}
 RefCountedBase(RefCountedBase&&) = delete;
 RefCountedBase& operator=(const RefCountedBase&) = delete;
 RefCountedBase& operator=(RefCountedBase&&) = delete;

 void Capture() const noexcept {
 ++refcount_;
 }
 void Release() const noexcept {
 if (--refcount_ == 0) {
 delete const_cast<RefCountedBase*>(this);
 }
 }
 virtual ~RefCountedBase() {}

private:
 mutable std::atomic<std::size_t> refcount_;
};

template<class T>
class Ptr {
public:
 Ptr(): ptr_(nullptr) {}
 explicit Ptr(const T* ptr): ptr_(ptr) { if (ptr_) { ptr_-

;Capture(); } }

 Ptr(const Ptr& b): ptr_(b.ptr_) { if (ptr_) { ptr_->Capture(); } } >> Ptr(Ptr&& b) noexcept: ptr_(b.ptr_) { b.ptr_ = nullptr; }
 ~Ptr() { if (ptr_) { ptr_->Release(); } }
 Ptr& operator=(const Ptr& b) {
 if (b.ptr_) { b.ptr_->Capture(); }
 if (ptr_) { ptr_->Release(); }
 ptr_ = b.ptr_;
 return *this;
 }
 Ptr& operator=(Ptr&& b) noexcept {
 if (ptr_) { ptr_->Release(); }
 ptr_ = b.ptr_;
 b.ptr_ = nullptr;
 return *this;
 }
 const T* operator->() const { return ptr_; }
 const T& operator*() const { return *ptr_; }
 explicit operator bool() const { return ptr_!=nullptr; }
 const T* get() const { return ptr_; }
private:
 mutable const T* ptr_;
};

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}
 /// Store p in *this if *this is not yet assigned.
 /// Return pointer stored in *this, which can be \a p or not.
 Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_strong(other, p.get(),
std::memory_order_release, std::memory_order_acquire)) {
 p->Capture();

Only one thread should ever get here, right? It just installed the
pointer p.get() into ptr_, right?

Yes, that's the idea. The first thread which manages to install non-null pointer will increase the refcount, others will fail and their objects
will be released when refcounts drop to zero.

 return p;
 } else {
 ++gCASFailureCount;
 return Ptr<T>(other);

 }
 }
 Ptr<T> Load() const {
 return Ptr<T>(ptr_);
 }

Now this is the crux of an potential issue. Strong thread safety allows
for a thread to take a reference even if it does not already own one.
This is not allowed in basic thread safety.

So, for example this scenario needs strong thread safety:

static atomic_ptr<foo> g_foo(nullptr);

thread_a()
{
 g_foo = new foo();
}

thread_b()
{
 local_ptr<foo> l_foo = g_foo;
 if (l_foo) l_foo->bar();
}

thread_c()
{
 g_foo = nullptr;
}

In my usage case I do not have thread_c() because nobody is changing the pointer any more after it is set. It is kept alive while the
CachedAtomicPtr object is alive. Load() will increment the refcounter,
so the pointed B object will stay alive even if CachedAtomicPtr is
destroyed. My test harness is checking this scenario as well.

This example does not work with shared_ptr, but should work with atomic<shared_ptr>, it should even be lock-free on archs that support
it. thread_b is taking a reference to g_foo when it does not already own
a reference.

So, basically you would need your CachedAtomicPtr to stay alive. It's
dtor should only be called after all threads that could potentially use
it are joined, and the program is about to end.

This is achieved automatically because all threads access the A object
via their own smartpointers, so A (and its contained CachedAtomicPtr)
stay alive while there is any thread which can access them. In
particular, they stay alive during the Load() calls as the calling
thread holds a smartpointer to A.

Or else, I think you are
going to need strong thread safety for the CachedAtomicPtr::Load
function to work in a general sense.

Just skimmed over it.

Thanks for the reply!
Paavo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paavo Helde@21:1/5 to Chris M. Thomasson on Sat Sep 28 10:10:22 2024

On 27.09.2024 03:24, Chris M. Thomasson wrote:

On 9/26/2024 3:14 AM, Paavo Helde wrote:

On 26.09.2024 08:49, Chris M. Thomasson wrote:

On 9/17/2024 2:22 AM, Paavo Helde wrote:

On 17.09.2024 09:04, Chris M. Thomasson wrote:

On 9/16/2024 10:59 PM, Chris M. Thomasson wrote:

On 9/16/2024 10:54 PM, Paavo Helde wrote:

[...]

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}

 /// Store p in *this if *this is not yet assigned. >>>>>>>>>>> /// Return pointer stored in *this, which can be \a p or >>>>>>>>>>> not.
 Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_weak(other, p.get(), >>>>>>>>>>> std::memory_order_release, std::memory_order_acquire)) { >>>>>>>>>>> p->IncrementRefcount();
 return p;
 } else {
 // wrap in an extra smartptr (increments refcount)
 return Ptr<T>(other);
 }

^^^^^^^^^^^^^^^^^^

Is Ptr<T> an intrusive reference count? I assume it is.

Yes. Otherwise I could not generate new smartpointers from bare T*.

FYI, here is my current full compilable code together with a test
harness (no relacy, could not get it working, so this just creates a
number of threads which make use of the CachedAtomicPtr objects in
parallel.

#include <cstddef>
#include <atomic>
#include <iostream>
#include <stdexcept>
#include <deque>
#include <mutex>
#include <thread>
#include <vector>

/// debug instrumentation
std::atomic<int> gAcount = 0, gBcount = 0, gCASFailureCount = 0;
/// program exit code
std::atomic<int> exitCode = EXIT_SUCCESS;

void Assert(bool x) {
 if (!x) {
 throw std::logic_error("Assert failed");
 }
}

class RefCountedBase {
public:
 RefCountedBase(): refcount_(0) {}
 RefCountedBase(const RefCountedBase&): refcount_(0) {}
 RefCountedBase(RefCountedBase&&) = delete;
 RefCountedBase& operator=(const RefCountedBase&) = delete;
 RefCountedBase& operator=(RefCountedBase&&) = delete;

 void Capture() const noexcept {
 ++refcount_;
 }
 void Release() const noexcept {
 if (--refcount_ == 0) {
 delete const_cast<RefCountedBase*>(this);
 }
 }
 virtual ~RefCountedBase() {}

private:
 mutable std::atomic<std::size_t> refcount_;
};

template<class T>
class Ptr {
public:
 Ptr(): ptr_(nullptr) {}
 explicit Ptr(const T* ptr): ptr_(ptr) { if (ptr_) { ptr_-

;Capture(); } }

 Ptr(const Ptr& b): ptr_(b.ptr_) { if (ptr_) { ptr_->Capture(); } }
 Ptr(Ptr&& b) noexcept: ptr_(b.ptr_) { b.ptr_ = nullptr; }
 ~Ptr() { if (ptr_) { ptr_->Release(); } }
 Ptr& operator=(const Ptr& b) {
 if (b.ptr_) { b.ptr_->Capture(); }
 if (ptr_) { ptr_->Release(); }
 ptr_ = b.ptr_;
 return *this;
 }
 Ptr& operator=(Ptr&& b) noexcept {
 if (ptr_) { ptr_->Release(); }
 ptr_ = b.ptr_;
 b.ptr_ = nullptr;
 return *this;
 }
 const T* operator->() const { return ptr_; }
 const T& operator*() const { return *ptr_; }
 explicit operator bool() const { return ptr_!=nullptr; }
 const T* get() const { return ptr_; }
private:
 mutable const T* ptr_;
};

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}
 /// Store p in *this if *this is not yet assigned.
 /// Return pointer stored in *this, which can be \a p or not. >>>> Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_strong(other, p.get(),
std::memory_order_release, std::memory_order_acquire)) {
 p->Capture();

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Only one thread should ever get here, right? It just installed the
pointer p.get() into ptr_, right?

Yes, that's the idea. The first thread which manages to install non-
null pointer will increase the refcount, others will fail and their
objects will be released when refcounts drop to zero.

[...]

Why do a Capture _after_ the pointer is atomically installed? Think of
adding a reference in preparation for installation. If failed, it can decrement it. If it succeeded it leaves it alone.

Yes, could be done this way, it would be just two code lines instead of
one, with the same end result.

<pseudo code>
___________________
shared refcount<foo>* g_foo = nullptr;

void thread_a()
{
 // initialized with two references
 refcount<foo> local = new refcount<foo>(2);

 refcount<foo>* shared = CAS_STRONG(&g_foo, nullptr, local);

 if (shared)
 {
 // another thread beat us to it.

 local.dec(); // we dec because we failed to install...

 // well, we have a reference to shared and local... :^)
 }

 else
 {
 // well, shared is nullptr so we were the first thread!
 }
}
___________________

Sorry for all the questions. ;^o

No-no! I'm happy to think over this code.

I'm still a bit worried about the memory order parameteres, I have not
used them before and I suspect they might not be distinguishable on the platforms which I need to support (x86_64 and ARM64, MSVC++ and g++), so
I might not understand if they are wrong. Maybe I should just use the
default memory_order_seq_cst if it does not make any difference anyway
on these platforms?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paavo Helde@21:1/5 to Chris M. Thomasson on Sat Sep 28 10:28:01 2024

On 27.09.2024 03:29, Chris M. Thomasson wrote:

On 9/26/2024 5:24 PM, Chris M. Thomasson wrote:

On 9/26/2024 3:14 AM, Paavo Helde wrote:

On 26.09.2024 08:49, Chris M. Thomasson wrote:

On 9/17/2024 2:22 AM, Paavo Helde wrote:

On 17.09.2024 09:04, Chris M. Thomasson wrote:

On 9/16/2024 10:59 PM, Chris M. Thomasson wrote:

On 9/16/2024 10:54 PM, Paavo Helde wrote:

[...]

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}

 /// Store p in *this if *this is not yet assigned. >>>>>>>>>>>> /// Return pointer stored in *this, which can be \a p >>>>>>>>>>>> or not.
 Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_weak(other, p.get(), >>>>>>>>>>>> std::memory_order_release, std::memory_order_acquire)) { >>>>>>>>>>>> p->IncrementRefcount();
 return p;
 } else {
 // wrap in an extra smartptr (increments refcount)
 return Ptr<T>(other);
 }

^^^^^^^^^^^^^^^^^^

Is Ptr<T> an intrusive reference count? I assume it is.

Yes. Otherwise I could not generate new smartpointers from bare T*.

FYI, here is my current full compilable code together with a test
harness (no relacy, could not get it working, so this just creates
a number of threads which make use of the CachedAtomicPtr objects
in parallel.

#include <cstddef>
#include <atomic>
#include <iostream>
#include <stdexcept>
#include <deque>
#include <mutex>
#include <thread>
#include <vector>

/// debug instrumentation
std::atomic<int> gAcount = 0, gBcount = 0, gCASFailureCount = 0;
/// program exit code
std::atomic<int> exitCode = EXIT_SUCCESS;

void Assert(bool x) {
 if (!x) {
 throw std::logic_error("Assert failed");
 }
}

class RefCountedBase {
public:
 RefCountedBase(): refcount_(0) {}
 RefCountedBase(const RefCountedBase&): refcount_(0) {}
 RefCountedBase(RefCountedBase&&) = delete;
 RefCountedBase& operator=(const RefCountedBase&) = delete; >>>>> RefCountedBase& operator=(RefCountedBase&&) = delete;

 void Capture() const noexcept {
 ++refcount_;
 }
 void Release() const noexcept {
 if (--refcount_ == 0) {
 delete const_cast<RefCountedBase*>(this);
 }
 }
 virtual ~RefCountedBase() {}

private:
 mutable std::atomic<std::size_t> refcount_;
};

template<class T>
class Ptr {
public:
 Ptr(): ptr_(nullptr) {}
 explicit Ptr(const T* ptr): ptr_(ptr) { if (ptr_) { ptr_-

;Capture(); } }

 Ptr(const Ptr& b): ptr_(b.ptr_) { if (ptr_) { ptr_-

Capture(); } }

 Ptr(Ptr&& b) noexcept: ptr_(b.ptr_) { b.ptr_ = nullptr; }
 ~Ptr() { if (ptr_) { ptr_->Release(); } }
 Ptr& operator=(const Ptr& b) {
 if (b.ptr_) { b.ptr_->Capture(); }
 if (ptr_) { ptr_->Release(); }
 ptr_ = b.ptr_;
 return *this;
 }
 Ptr& operator=(Ptr&& b) noexcept {
 if (ptr_) { ptr_->Release(); }
 ptr_ = b.ptr_;
 b.ptr_ = nullptr;
 return *this;
 }
 const T* operator->() const { return ptr_; }
 const T& operator*() const { return *ptr_; }
 explicit operator bool() const { return ptr_!=nullptr; }
 const T* get() const { return ptr_; }
private:
 mutable const T* ptr_;
};

template<typename T>
class CachedAtomicPtr {
public:
 CachedAtomicPtr(): ptr_(nullptr) {}
 /// Store p in *this if *this is not yet assigned.
 /// Return pointer stored in *this, which can be \a p or not. >>>>> Ptr<T> AssignIfNull(Ptr<T> p) {
 const T* other = nullptr;
 if (ptr_.compare_exchange_strong(other, p.get(),
std::memory_order_release, std::memory_order_acquire)) {
 p->Capture();

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Only one thread should ever get here, right? It just installed the
pointer p.get() into ptr_, right?

Yes, that's the idea. The first thread which manages to install non-
null pointer will increase the refcount, others will fail and their
objects will be released when refcounts drop to zero.

[...]

Why do a Capture _after_ the pointer is atomically installed? Think of
adding a reference in preparation for installation. If failed, it can
decrement it. If it succeeded it leaves it alone.

<pseudo code>
___________________
shared refcount<foo>* g_foo = nullptr;

void thread_a()
{
 // initialized with two references
 refcount<foo> local = new refcount<foo>(2);

 refcount<foo>* shared = CAS_STRONG(&g_foo, nullptr, local);

 if (shared)
 {
 // another thread beat us to it.

 local.dec(); // we dec because we failed to install...

 // well, we have a reference to shared and local... :^)

Now, actually, we have a reference to shared, but we should not
decrement it here. We can rely on g_foo to never be set to null
prematurely wrt our program structure. When our program is shutting
down, g_foo can be decremented if its not nullptr. This would mean that successfully installing a point with a pre count of 2 would work. Taking
a reference would not need to increment anything, and would not need to decrement anything. The lifetime is tied to g_foo wrt installing a
pointer into it. Is that close to what you are doing? Or way off in the
damn weeds?

This is close. However, in my program there would be no such "naked"
global g_foo pointers. They would not be so useful as they could never
be changed during the program lifetime.

Instead, each such pointer would be a part of another dynamically
allocated and refcounted object, which controls its lifetime. If I need
to change anything, I would make a clone of the enclosing object
(without cached attributes) and make changes in single-threaded regime,
before making the new object accessible to other threads. The cached
attributes would then be recalculated (on new data) on demand later.

Cheers
Paavo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Plume
  Mon Jun 16 06:46:57 2025
  from Uk via SSH
- Centurion
  Mon Jun 16 05:18:46 2025
  from Berea, Ohio via Telnet
- Guest
  Sun Jun 15 18:58:11 2025
  from No via SSH
- Plume
  Sun Jun 15 15:01:03 2025
  from Uk via SSH
- Centurion
  Sun Jun 15 09:44:59 2025
  from Berea, Ohio via Telnet
- Deasl
  Sun Jun 15 08:43:59 2025
  from Foo, Bar via SSH
- Deasl
  Sun Jun 15 08:41:06 2025
  from Foo, Bar via SSH
- Plume
  Sat Jun 14 21:49:07 2025
  from Uk via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	493
Nodes:	16 (2 / 14)
Uptime:	191:18:42
Calls:	9,707
Calls today:	2
Files:	13,740
Messages:	6,180,052

Atomic caching of smart pointers

Who's Online

Recent Visitors

System Info