Is list(dict.items()) thread-safe?

Is the usage of list(d.items()) in the example below safe?

import threading

n = 2000

d = {}

def dict_to_list():
    while True:
        list(d.items())  # is this safe to do?

def modify():
    for i in range(n):
        d[i] = i

if __name__ == "__main__":
    t1 = threading.Thread(target=dict_to_list, daemon=True)
    t1.start()

    t2 = threading.Thread(target=modify, daemon=True)
    t2.start()
    t2.join()

The background behind this question is that an iterator over a dictionary item view checks on every step whether the dictionary size changed, as the following example illustrates.

d = {}
view = d.items()  # this is an iterable
it = iter(view)  # this is an iterator
d[1] = 1
print(list(view))  # this is ok, it prints [(1, 1)]
print(list(it))  # this raises a RuntimeError because the size of the dictionary changed

So if the call to list(...) in the first example above can be interrupted (i.e., the thread t1 could release the GIL), the first example might cause RuntimeErrors to occur in thread t1. There are sources that claim the operation is not atomic, see here. However, I haven’t been able to get the first example to crash.

I understand that the safe thing to do here would be to use some locks instead of trying to rely on the atomicity of certain operations. However, I’m debugging an issue in a third party library that uses similar code and that I cannot necessarily change directly.

Answer

Short answer: it might be fine but use a lock anyway.

Using dis you can see that list(d.items()) is effectively two bytecode instructions (6 and 8):

>>> import dis
>>> dis.dis("list(d.items())")
  1           0 LOAD_NAME                0 (list)
              2 LOAD_NAME                1 (d)
              4 LOAD_METHOD              2 (items)
              6 CALL_METHOD              0
              8 CALL_FUNCTION            1
             10 RETURN_VALUE

On the Python FAQ it says that (generally) things implemented in C are atomic (from the point of view of a running Python program):

What kinds of global value mutation are thread-safe?

In general, Python offers to switch among threads only between bytecode instructions; […]. Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.

[…]

For example, the following operations are all atomic […]

D.keys()

list() is implemented in C and d.items() is implemented in C so each should be atomic, unless they end up somehow calling out to Python code (which can happen if they call out to a dunder method that you overrode using a Python implementation) or if you’re using a subclass of dict and not a real dict or if their C implementation releases the GIL. It’s not a good idea to rely on them being atomic.

You mention that iter() will error if its underlying iterable changes size, but that’s not relevant here because .keys(), .values() and .items() return a view object and those have no problem with the underlying object changing:

d = {"a": 1, "b": 2}
view = d.items()
print(list(view))  # [("a", 1), ("b", 2)]
d["c"] = 3         # this could happen in a different thread
print(list(view))  # [("a", 1), ("b", 2), ("c", 3)]

If you’re modifying the dict in more than one instruction at a time, you’ll sometimes get d in an inconsistent state where some of the modifications have been made and some haven’t yet, but you shouldn’t get a RuntimeError like you do with iter(), unless you modify it in a way that’s non-atomic.

Leave a Reply

Your email address will not be published. Required fields are marked *