A while ago I chose to use a deque that is shared between two threads. I did so because the docs say:
"Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.”
(https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)
Earlier today, looking through some server logs I noticed that from time to I’m getting a
RuntimeError: deque mutated during iteration
I guess this surprised me. When I see “thread safe”, I don’t expect to get errors.
... if item % 2: q.append(item * 2)from collections import deque
q = deque([1, 2, 3, 4])
for item in q:
"Deques support thread-safe, memory efficient appends and pops from
either side of the deque with approximately the same O(1) performance
in either direction.”
(https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)
I guess this surprised me. When I see “thread safe”, I don’t expect to get errors.
A while ago I chose to use a deque that is shared between two threads. I did so because the docs say:
"Deques support thread-safe, memory efficient appends and pops from
either side of the deque with approximately the same O(1)
performance in either direction.”
(https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)
Earlier today, looking through some server logs I noticed that from
time to I’m getting a
RuntimeError: deque mutated during iteration
I guess this surprised me. When I see “thread safe”, I don’t expect
to get errors.
On 28/03/23 2:25 pm, Travis Griggs wrote:than a median?
Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone
It may be a matter of whether the GIL is held or not. I had a look
at the source for deque, and it doesn't seem to explicitly do
anything about locking, it just relies on the GIL.
So maybe statistics.median() is implemented in C and statistics.mean()
in Python, or something like that?
Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone thana median?
On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list <python-list@python.org> wrote:than a median?
On 28/03/23 2:25 pm, Travis Griggs wrote:
Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone
Sorry for any injected confusion here, but that line "data =It may be a matter of whether the GIL is held or not. I had a lookBoth functions are implemented in Python, but median() starts out with
at the source for deque, and it doesn't seem to explicitly do
anything about locking, it just relies on the GIL.
So maybe statistics.median() is implemented in C and statistics.mean()
in Python, or something like that?
this notable line:
data = sorted(data)
which gives back a copy, iterated over rapidly in C. All subsequent
work is done on that copy.
The same effect could be had with mean() by taking a snapshot using
list(q) and, I believe, would have the same effect (the source code
for the sorted() function begins by calling PySequence_List).
In any case, it makes *conceptual* sense to do your analysis on a copy
of the queue, thus ensuring that your stats are stable. The other
threads can keep going while you do your calculations, even if that
means changing the queue.
ChrisA
data = sorted(data)
Sorry for any injected confusion here, but that line "data =
sorted(data)" appears as though it takes the value of the variable named _data_, sorts it and returns it to the same variable store, so no copy
would be created. Am I missing something there?
On 3/29/23 02:08, Chris Angelico wrote:than a median?
On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list <python-list@python.org> wrote:
On 28/03/23 2:25 pm, Travis Griggs wrote:
Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone
It may be a matter of whether the GIL is held or not. I had a lookBoth functions are implemented in Python, but median() starts out with
at the source for deque, and it doesn't seem to explicitly do
anything about locking, it just relies on the GIL.
So maybe statistics.median() is implemented in C and statistics.mean()
in Python, or something like that?
this notable line:
data = sorted(data)
which gives back a copy, iterated over rapidly in C. All subsequent
work is done on that copy.
The same effect could be had with mean() by taking a snapshot using
list(q) and, I believe, would have the same effect (the source code
for the sorted() function begins by calling PySequence_List).
In any case, it makes *conceptual* sense to do your analysis on a copy
of the queue, thus ensuring that your stats are stable. The other
threads can keep going while you do your calculations, even if that
means changing the queue.
ChrisASorry for any injected confusion here, but that line "data =
sorted(data)" appears as though it takes the value of the variable named _data_, sorts it and returns it to the same variable store, so no copy
would be created. Am I missing something there?
Sorry for any injected confusion here, but that line "data =
sorted(data)" appears as though it takes the value of the variable named >_data_, sorts it and returns it to the same variable store, so no copy
would be created. Am I missing something there?
On Thu, 30 Mar 2023 at 01:52, Jack Dangler <tdldev@gmail.com> wrote:than a median?
On 3/29/23 02:08, Chris Angelico wrote:
On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list
<python-list@python.org> wrote:
On 28/03/23 2:25 pm, Travis Griggs wrote:
Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone
Aah - thanks, Chris! That makes much more sense.The variable name "data" is the parameter to median(), so it'sSorry for any injected confusion here, but that line "data =It may be a matter of whether the GIL is held or not. I had a lookBoth functions are implemented in Python, but median() starts out with
at the source for deque, and it doesn't seem to explicitly do
anything about locking, it just relies on the GIL.
So maybe statistics.median() is implemented in C and statistics.mean() >>>> in Python, or something like that?
this notable line:
data = sorted(data)
which gives back a copy, iterated over rapidly in C. All subsequent
work is done on that copy.
The same effect could be had with mean() by taking a snapshot using
list(q) and, I believe, would have the same effect (the source code
for the sorted() function begins by calling PySequence_List).
In any case, it makes *conceptual* sense to do your analysis on a copy
of the queue, thus ensuring that your stats are stable. The other
threads can keep going while you do your calculations, even if that
means changing the queue.
ChrisA
sorted(data)" appears as though it takes the value of the variable named
_data_, sorts it and returns it to the same variable store, so no copy
would be created. Am I missing something there?
whatever you ask for the median of. (I didn't make that obvious in my previous post - an excess of brevity on my part.)
The sorted() function, UNlike list.sort(), returns a sorted copy of
what it's given. I delved into the CPython source code for that, and
it begins with the PySequence_List call to (effectively) call
list(data) to get a copy of it. It ought to be a thread-safe copy due
to holding the GIL the entire time. I'm not sure what would happen in
a GIL-free world but most likely the lock on the input object would
still ensure thread safety.
ChrisA
I'm not sure what would happen in
a GIL-free world but most likely the lock on the input object would
still ensure thread safety.
On 30/03/23 6:13 am, Chris Angelico wrote:
I'm not sure what would happen in
a GIL-free world but most likely the lock on the input object would
still ensure thread safety.
In a GIL-free world, I would not expect deque to hold a lock
the entire time that something was iterating over it. That
would require holding the lock as long as an iterator object
existed referencing it, which could be a long time, even
longer than the caller expects (no reference counting,
remember!)
So for future-proofing I would recommend using deque's
copy() method to copy it before doing anything that iterates
over it. Hopefully that would be implemented in a thread-safe
way (although the docs don't currently promise that).
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 150:57:21 |
Calls: | 10,383 |
Files: | 14,054 |
Messages: | 6,417,797 |