Fast way to copy dictionary in Python
Question:
I have a Python program that works with dictionaries a lot. I have to make copies of dictionaries thousands of times. I need a copy of both the keys and the associated contents. The copy will be edited and must not be linked to the original (e.g. changes in the copy must not affect the original.)
Keys are Strings, Values are Integers (0/1).
I currently use a simple way:
newDict = oldDict.copy()
Profiling my Code shows that the copy operation takes most of the time.
Are there faster alternatives to the dict.copy()
method? What would be fastest?
Answers:
Can you provide a code sample so I can see how you are using copy() and in what context?
You could use
new = dict(old)
But I dont think it will be faster.
Depending on things you leave to speculation, you may want to wrap the original dictionary and do a sort of copy-on-write.
The “copy” is then a dictionary which looks up stuff in the “parent” dictionary, if it doesn’t already contain the key — but stuffs modifications in itself.
This assumes that you won’t be modifying the original and that the extra lookups don’t end up costing more.
Appearantly dict.copy is faster, as you say.
[utdmr@utdmr-arch ~]$ python -m timeit -s "d={1:1, 2:2, 3:3}" "new = d.copy()"
1000000 loops, best of 3: 0.238 usec per loop
[utdmr@utdmr-arch ~]$ python -m timeit -s "d={1:1, 2:2, 3:3}" "new = dict(d)"
1000000 loops, best of 3: 0.621 usec per loop
[utdmr@utdmr-arch ~]$ python -m timeit -s "from copy import copy; d={1:1, 2:2, 3:3}" "new = copy(d)"
1000000 loops, best of 3: 1.58 usec per loop
Looking at the C source for the Python dict
operations, you can see that they do a pretty naive (but efficient) copy. It essentially boils down to a call to PyDict_Merge
:
PyDict_Merge(PyObject *a, PyObject *b, int override)
This does the quick checks for things like if they’re the same object and if they’ve got objects in them. After that it does a generous one-time resize/alloc to the target dict and then copies the elements one by one. I don’t see you getting much faster than the built-in copy()
.
The measurments are dependent on the dictionary size though. For 10000 entries copy(d) and d.copy() are almost the same.
a = {b: b for b in range(10000)}
In [5]: %timeit copy(a)
10000 loops, best of 3: 186 µs per loop
In [6]: %timeit deepcopy(a)
100 loops, best of 3: 14.1 ms per loop
In [7]: %timeit a.copy()
1000 loops, best of 3: 180 µs per loop
I realise this is an old thread, but this is a high result in search engines for “dict copy python”, and the top result for “dict copy performance”, and I believe this is relevant.
From Python 3.7, newDict = oldDict.copy()
is up to 5.5x faster than it was previously. Notably, right now, newDict = dict(oldDict)
does not seem to have this performance increase.
There is a little more information here.
I have a Python program that works with dictionaries a lot. I have to make copies of dictionaries thousands of times. I need a copy of both the keys and the associated contents. The copy will be edited and must not be linked to the original (e.g. changes in the copy must not affect the original.)
Keys are Strings, Values are Integers (0/1).
I currently use a simple way:
newDict = oldDict.copy()
Profiling my Code shows that the copy operation takes most of the time.
Are there faster alternatives to the dict.copy()
method? What would be fastest?
Can you provide a code sample so I can see how you are using copy() and in what context?
You could use
new = dict(old)
But I dont think it will be faster.
Depending on things you leave to speculation, you may want to wrap the original dictionary and do a sort of copy-on-write.
The “copy” is then a dictionary which looks up stuff in the “parent” dictionary, if it doesn’t already contain the key — but stuffs modifications in itself.
This assumes that you won’t be modifying the original and that the extra lookups don’t end up costing more.
Appearantly dict.copy is faster, as you say.
[utdmr@utdmr-arch ~]$ python -m timeit -s "d={1:1, 2:2, 3:3}" "new = d.copy()"
1000000 loops, best of 3: 0.238 usec per loop
[utdmr@utdmr-arch ~]$ python -m timeit -s "d={1:1, 2:2, 3:3}" "new = dict(d)"
1000000 loops, best of 3: 0.621 usec per loop
[utdmr@utdmr-arch ~]$ python -m timeit -s "from copy import copy; d={1:1, 2:2, 3:3}" "new = copy(d)"
1000000 loops, best of 3: 1.58 usec per loop
Looking at the C source for the Python dict
operations, you can see that they do a pretty naive (but efficient) copy. It essentially boils down to a call to PyDict_Merge
:
PyDict_Merge(PyObject *a, PyObject *b, int override)
This does the quick checks for things like if they’re the same object and if they’ve got objects in them. After that it does a generous one-time resize/alloc to the target dict and then copies the elements one by one. I don’t see you getting much faster than the built-in copy()
.
The measurments are dependent on the dictionary size though. For 10000 entries copy(d) and d.copy() are almost the same.
a = {b: b for b in range(10000)}
In [5]: %timeit copy(a)
10000 loops, best of 3: 186 µs per loop
In [6]: %timeit deepcopy(a)
100 loops, best of 3: 14.1 ms per loop
In [7]: %timeit a.copy()
1000 loops, best of 3: 180 µs per loop
I realise this is an old thread, but this is a high result in search engines for “dict copy python”, and the top result for “dict copy performance”, and I believe this is relevant.
From Python 3.7, newDict = oldDict.copy()
is up to 5.5x faster than it was previously. Notably, right now, newDict = dict(oldDict)
does not seem to have this performance increase.
There is a little more information here.