• Re: What kind of "thread safe" are deque's actually?

    From Chris Angelico@21:1/5 to Travis Griggs on Tue Mar 28 12:35:34 2023
    On Tue, 28 Mar 2023 at 12:26, Travis Griggs <travisgriggs@gmail.com> wrote:

    A while ago I chose to use a deque that is shared between two threads. I did so because the docs say:

    "Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.”

    (https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)

    Earlier today, looking through some server logs I noticed that from time to I’m getting a

    RuntimeError: deque mutated during iteration

    I guess this surprised me. When I see “thread safe”, I don’t expect to get errors.


    I'd like to see your code, but here's an example with no threads
    whatsoever that has the same error:

    from collections import deque
    q = deque([1, 2, 3, 4])
    for item in q:
    ... if item % 2: q.append(item * 2)
    ... print(item)
    ...
    1
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    RuntimeError: deque mutated during iteration

    This error comes from having an active iterator, then mutating the
    deque, then continuing to iterate. That MAY be a result of threading,
    but it isn't necessarily.

    For threaded usage, I would recommend restricting yourself to append/appendleft/pop/popleft (the standard mutators), with any
    operations on the entire queue being done on a copy instead (either
    q.copy() or list(q) depending on what you need). The act of taking a
    copy should itself be thread-safe, and obviously anything done on a
    separate copy will be independent of changes to the original.

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From 2QdxY4RzWzUUiLuE@potatochowder.com@21:1/5 to Travis Griggs on Mon Mar 27 21:41:13 2023
    On 2023-03-27 at 18:25:01 -0700,
    Travis Griggs <travisgriggs@gmail.com> wrote:

    "Deques support thread-safe, memory efficient appends and pops from
    either side of the deque with approximately the same O(1) performance
    in either direction.”

    (https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)

    [...]

    I guess this surprised me. When I see “thread safe”, I don’t expect to get errors.

    Even without threads, mutating a collection while iterating over it
    usually results in bad things happening.

    $ python
    Python 3.10.10 (main, Mar 5 2023, 22:26:53) [GCC 12.2.1 20230201] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import collections
    >>> x = collections.deque()
    >>> x.append(44)
    >>> x.append(55)
    >>> x.append(66)
    >>> x.append(77)
    >>> x
    deque([44, 55, 66, 77])
    >>> for y in x:
    x.pop()

    77
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    RuntimeError: deque mutated during iteration

    Concurrency just increases the likeliness of mutating while iterating.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Edwards@21:1/5 to Travis Griggs on Mon Mar 27 18:54:30 2023
    On 2023-03-28, Travis Griggs <travisgriggs@gmail.com> wrote:
    A while ago I chose to use a deque that is shared between two threads. I did so because the docs say:

    "Deques support thread-safe, memory efficient appends and pops from
    either side of the deque with approximately the same O(1)
    performance in either direction.”

    (https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)

    Earlier today, looking through some server logs I noticed that from
    time to I’m getting a

    RuntimeError: deque mutated during iteration

    I guess this surprised me. When I see “thread safe”, I don’t expect
    to get errors.

    Well, I guess it doesn't say that iteration of a deque is thread
    safe. It only claims that appends and pops from either end are thread
    safe. It doesn't even claim that inserts, removes, clear, copy, or any
    other operations are thread-safe.

    --
    Grant

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Travis Griggs@21:1/5 to All on Mon Mar 27 18:25:01 2023
    A while ago I chose to use a deque that is shared between two threads. I did so because the docs say:

    "Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.”

    (https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)

    Earlier today, looking through some server logs I noticed that from time to I’m getting a

    RuntimeError: deque mutated during iteration

    I guess this surprised me. When I see “thread safe”, I don’t expect to get errors.

    Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than a
    median?

    I’ve got a couple ways I can work around this. But I was surprised.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Angelico@21:1/5 to python-list@python.org on Wed Mar 29 17:08:21 2023
    On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list <python-list@python.org> wrote:

    On 28/03/23 2:25 pm, Travis Griggs wrote:
    Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone
    than a median?

    It may be a matter of whether the GIL is held or not. I had a look
    at the source for deque, and it doesn't seem to explicitly do
    anything about locking, it just relies on the GIL.

    So maybe statistics.median() is implemented in C and statistics.mean()
    in Python, or something like that?


    Both functions are implemented in Python, but median() starts out with
    this notable line:

    data = sorted(data)

    which gives back a copy, iterated over rapidly in C. All subsequent
    work is done on that copy.

    The same effect could be had with mean() by taking a snapshot using
    list(q) and, I believe, would have the same effect (the source code
    for the sorted() function begins by calling PySequence_List).

    In any case, it makes *conceptual* sense to do your analysis on a copy
    of the queue, thus ensuring that your stats are stable. The other
    threads can keep going while you do your calculations, even if that
    means changing the queue.

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Ewing@21:1/5 to Travis Griggs on Wed Mar 29 18:54:06 2023
    On 28/03/23 2:25 pm, Travis Griggs wrote:
    Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than
    a median?

    It may be a matter of whether the GIL is held or not. I had a look
    at the source for deque, and it doesn't seem to explicitly do
    anything about locking, it just relies on the GIL.

    So maybe statistics.median() is implemented in C and statistics.mean()
    in Python, or something like that?

    --
    Greg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jack Dangler@21:1/5 to Chris Angelico on Wed Mar 29 10:50:49 2023
    On 3/29/23 02:08, Chris Angelico wrote:
    On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list <python-list@python.org> wrote:
    On 28/03/23 2:25 pm, Travis Griggs wrote:
    Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone
    than a median?
    It may be a matter of whether the GIL is held or not. I had a look
    at the source for deque, and it doesn't seem to explicitly do
    anything about locking, it just relies on the GIL.

    So maybe statistics.median() is implemented in C and statistics.mean()
    in Python, or something like that?

    Both functions are implemented in Python, but median() starts out with
    this notable line:

    data = sorted(data)

    which gives back a copy, iterated over rapidly in C. All subsequent
    work is done on that copy.

    The same effect could be had with mean() by taking a snapshot using
    list(q) and, I believe, would have the same effect (the source code
    for the sorted() function begins by calling PySequence_List).

    In any case, it makes *conceptual* sense to do your analysis on a copy
    of the queue, thus ensuring that your stats are stable. The other
    threads can keep going while you do your calculations, even if that
    means changing the queue.

    ChrisA
    Sorry for any injected confusion here, but that line "data =
    sorted(data)" appears as though it takes the value of the variable named _data_, sorts it and returns it to the same variable store, so no copy
    would be created. Am I missing something there?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Edwards@21:1/5 to Jack Dangler on Wed Mar 29 08:06:34 2023
    On 2023-03-29, Jack Dangler <tdldev@gmail.com> wrote:

    data = sorted(data)

    Sorry for any injected confusion here, but that line "data =
    sorted(data)" appears as though it takes the value of the variable named _data_, sorts it and returns it to the same variable store, so no copy
    would be created. Am I missing something there?

    Yes, you're missing the basics of what an assignment does in Python
    and how objects work. Python doesn't have such a thing as "a variable
    st store".

    The assignment operator binds a name to an object.

    The 'sorted(data)' expression creates a new object containing a sorted copy of 'data'.

    The assignment then binds the name "data" to that new object.

    The old, unsorted object then becomes inaccessable (unless there are
    other names bound to it, or it's being otherwise used). At some point
    that old, unsorted object will "go away" completely and cease to exist.

    --
    Grant

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Angelico@21:1/5 to Jack Dangler on Thu Mar 30 04:13:12 2023
    On Thu, 30 Mar 2023 at 01:52, Jack Dangler <tdldev@gmail.com> wrote:


    On 3/29/23 02:08, Chris Angelico wrote:
    On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list <python-list@python.org> wrote:
    On 28/03/23 2:25 pm, Travis Griggs wrote:
    Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone
    than a median?
    It may be a matter of whether the GIL is held or not. I had a look
    at the source for deque, and it doesn't seem to explicitly do
    anything about locking, it just relies on the GIL.

    So maybe statistics.median() is implemented in C and statistics.mean()
    in Python, or something like that?

    Both functions are implemented in Python, but median() starts out with
    this notable line:

    data = sorted(data)

    which gives back a copy, iterated over rapidly in C. All subsequent
    work is done on that copy.

    The same effect could be had with mean() by taking a snapshot using
    list(q) and, I believe, would have the same effect (the source code
    for the sorted() function begins by calling PySequence_List).

    In any case, it makes *conceptual* sense to do your analysis on a copy
    of the queue, thus ensuring that your stats are stable. The other
    threads can keep going while you do your calculations, even if that
    means changing the queue.

    ChrisA
    Sorry for any injected confusion here, but that line "data =
    sorted(data)" appears as though it takes the value of the variable named _data_, sorts it and returns it to the same variable store, so no copy
    would be created. Am I missing something there?

    The variable name "data" is the parameter to median(), so it's
    whatever you ask for the median of. (I didn't make that obvious in my
    previous post - an excess of brevity on my part.)

    The sorted() function, UNlike list.sort(), returns a sorted copy of
    what it's given. I delved into the CPython source code for that, and
    it begins with the PySequence_List call to (effectively) call
    list(data) to get a copy of it. It ought to be a thread-safe copy due
    to holding the GIL the entire time. I'm not sure what would happen in
    a GIL-free world but most likely the lock on the input object would
    still ensure thread safety.

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dennis Lee Bieber@21:1/5 to All on Wed Mar 29 13:43:00 2023
    On Wed, 29 Mar 2023 10:50:49 -0400, Jack Dangler <tdldev@gmail.com>
    declaimed the following:

    Sorry for any injected confusion here, but that line "data =
    sorted(data)" appears as though it takes the value of the variable named >_data_, sorts it and returns it to the same variable store, so no copy
    would be created. Am I missing something there?

    The entire Python object/data model.

    Data is not "stored in" variables -- rather names are "attached to" data.

    sorted() creates a new data object, allocating memory for it. THEN the name "data" is attached to this new data object (and disconnected from the previous object). If there are no names left connected to the original
    object, the garbage collector reaps its memory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jack Dangler@21:1/5 to Chris Angelico on Wed Mar 29 14:14:48 2023
    On 3/29/23 13:13, Chris Angelico wrote:
    On Thu, 30 Mar 2023 at 01:52, Jack Dangler <tdldev@gmail.com> wrote:

    On 3/29/23 02:08, Chris Angelico wrote:
    On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list
    <python-list@python.org> wrote:
    On 28/03/23 2:25 pm, Travis Griggs wrote:
    Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone
    than a median?
    It may be a matter of whether the GIL is held or not. I had a look
    at the source for deque, and it doesn't seem to explicitly do
    anything about locking, it just relies on the GIL.

    So maybe statistics.median() is implemented in C and statistics.mean() >>>> in Python, or something like that?

    Both functions are implemented in Python, but median() starts out with
    this notable line:

    data = sorted(data)

    which gives back a copy, iterated over rapidly in C. All subsequent
    work is done on that copy.

    The same effect could be had with mean() by taking a snapshot using
    list(q) and, I believe, would have the same effect (the source code
    for the sorted() function begins by calling PySequence_List).

    In any case, it makes *conceptual* sense to do your analysis on a copy
    of the queue, thus ensuring that your stats are stable. The other
    threads can keep going while you do your calculations, even if that
    means changing the queue.

    ChrisA
    Sorry for any injected confusion here, but that line "data =
    sorted(data)" appears as though it takes the value of the variable named
    _data_, sorts it and returns it to the same variable store, so no copy
    would be created. Am I missing something there?
    The variable name "data" is the parameter to median(), so it's
    whatever you ask for the median of. (I didn't make that obvious in my previous post - an excess of brevity on my part.)

    The sorted() function, UNlike list.sort(), returns a sorted copy of
    what it's given. I delved into the CPython source code for that, and
    it begins with the PySequence_List call to (effectively) call
    list(data) to get a copy of it. It ought to be a thread-safe copy due
    to holding the GIL the entire time. I'm not sure what would happen in
    a GIL-free world but most likely the lock on the input object would
    still ensure thread safety.

    ChrisA
    Aah - thanks, Chris! That makes much more sense.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Ewing@21:1/5 to Chris Angelico on Thu Mar 30 09:30:08 2023
    On 30/03/23 6:13 am, Chris Angelico wrote:
    I'm not sure what would happen in
    a GIL-free world but most likely the lock on the input object would
    still ensure thread safety.

    In a GIL-free world, I would not expect deque to hold a lock
    the entire time that something was iterating over it. That
    would require holding the lock as long as an iterator object
    existed referencing it, which could be a long time, even
    longer than the caller expects (no reference counting,
    remember!)

    So for future-proofing I would recommend using deque's
    copy() method to copy it before doing anything that iterates
    over it. Hopefully that would be implemented in a thread-safe
    way (although the docs don't currently promise that).

    --
    Greg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Angelico@21:1/5 to python-list@python.org on Thu Mar 30 12:39:54 2023
    On Thu, 30 Mar 2023 at 07:36, Greg Ewing via Python-list <python-list@python.org> wrote:

    On 30/03/23 6:13 am, Chris Angelico wrote:
    I'm not sure what would happen in
    a GIL-free world but most likely the lock on the input object would
    still ensure thread safety.

    In a GIL-free world, I would not expect deque to hold a lock
    the entire time that something was iterating over it. That
    would require holding the lock as long as an iterator object
    existed referencing it, which could be a long time, even
    longer than the caller expects (no reference counting,
    remember!)

    Certainly not, but I *would* expect the sorted() call to retain a lock
    on the input object while it copies it (or, more precisely, for the PySequence_List() call to do that).

    So for future-proofing I would recommend using deque's
    copy() method to copy it before doing anything that iterates
    over it. Hopefully that would be implemented in a thread-safe
    way (although the docs don't currently promise that).


    Probably? It's actually less clear there, since a deque's copy method
    is built on top of basic iteration and broadly looks like this (though
    in C, not in Python):

    def copy(self):
    ret = deque()
    ret.extend(self)
    return ret

    Simplified, but mostly accurate. And extending is done by getting an
    iterator, then repeatedly appending. So.... probably safe? Question
    mark?

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)