Deep copy of a dict in python
Question:
I would like to make a deep copy of a dict
in python. Unfortunately the .deepcopy()
method doesn’t exist for the dict
. How do I do that?
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = my_dict.deepcopy()
Traceback (most recent calll last):
File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'deepcopy'
>>> my_copy = my_dict.copy()
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
7
The last line should be 3
.
I would like that modifications in my_dict
don’t impact the snapshot my_copy
.
How do I do that? The solution should be compatible with Python 3.x.
Answers:
How about:
import copy
d = { ... }
d2 = copy.deepcopy(d)
Python 2 or 3:
Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import copy
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = copy.deepcopy(my_dict)
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
3
>>>
Python 3.x
from copy import deepcopy
# define the original dictionary
original_dict = {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}
# make a deep copy of the original dictionary
new_dict = deepcopy(original_dict)
# modify the dictionary in a loop
for key in new_dict:
if isinstance(new_dict[key], dict) and 'e' in new_dict[key]:
del new_dict[key]['e']
# print the original and modified dictionaries
print('Original dictionary:', original_dict)
print('Modified dictionary:', new_dict)
Which would yield:
Original dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}
Modified dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5}}
Without new_dict = deepcopy(original_dict)
, ‘e’ element is unable to be removed.
Why? Because if the loop was for key in original_dict
, and an attempt is made to modify original_dict, a RuntimeError would be observed:
"RuntimeError: dictionary changed size during iteration"
So in order to modify a dictionary within an iteration, a copy of the dictionary must be used.
Here is an example function that removes an element from a dictionary:
def remove_hostname(domain, hostname):
domain_copy = deepcopy(domain)
for domains, hosts in domain_copy.items():
for host, port in hosts.items():
if host == hostname:
del domain[domains][host]
return domain
dict.copy() is a shallow copy function for dictionary
id is built-in function that gives you the address of variable
First you need to understand “why is this particular problem is happening?”
In [1]: my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
In [2]: my_copy = my_dict.copy()
In [3]: id(my_dict)
Out[3]: 140190444167808
In [4]: id(my_copy)
Out[4]: 140190444170328
In [5]: id(my_copy['a'])
Out[5]: 140190444024104
In [6]: id(my_dict['a'])
Out[6]: 140190444024104
The address of the list present in both the dicts for key ‘a’ is pointing to same location.
Therefore when you change value of the list in my_dict, the list in my_copy changes as well.
Solution for data structure mentioned in the question:
In [7]: my_copy = {key: value[:] for key, value in my_dict.items()}
In [8]: id(my_copy['a'])
Out[8]: 140190444024176
Or you can use deepcopy as mentioned above.
@Rob suggested a thread safe alternative to copy.deepcopy()
:
>>> import json
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> json.loads(json.dumps(my_dict))
{'a': [1, 2, 3], 'b': [4, 5, 6]}
This only works your data contains only jsonifiable objects like str
, dict
, list
, int
, float
, and None
.
Otherwise (e.g. datetime
and custom objects) will be need to be coerced into a string or other standard type before serializing with json.dumps()
.
And you’ll need to run a custom deserializer as well after json.loads()
:
>>> from datetime import datetime as dt
>>> my_dict = {'a': (1,), 'b': dt(2023, 4, 9).isoformat()}
>>> d = json.loads(json.dumps(my_dict))
>>> d
{'a': [1], 'b': '2023-04-09T00:00:00'}
>>> for k in d:
... try:
... d[k] = dt.fromisoformat(d[k])
... except:
... pass
>>> d
{'a': [1], 'b': datetime.datetime(2023, 4, 9, 0, 0)}
Of course you need to do the serialization and deserialization on special objects recursively.
Sometimes that’s a good thing.
This process normalizes all your objects to types that are directly serializable (for example tuple
s become list
s) and you can be sure they’ll match a reproducable data schema (for relational database storage).
And it’s thread safe. The builtin copy.deepcopy()
is NOT thread safe! If you use deepcopy
within async
code that can crash your program or corrupt your data unexpectedly long after you’ve forgotten about your code.
I would like to make a deep copy of a dict
in python. Unfortunately the .deepcopy()
method doesn’t exist for the dict
. How do I do that?
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = my_dict.deepcopy()
Traceback (most recent calll last):
File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'deepcopy'
>>> my_copy = my_dict.copy()
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
7
The last line should be 3
.
I would like that modifications in my_dict
don’t impact the snapshot my_copy
.
How do I do that? The solution should be compatible with Python 3.x.
How about:
import copy
d = { ... }
d2 = copy.deepcopy(d)
Python 2 or 3:
Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import copy
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = copy.deepcopy(my_dict)
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
3
>>>
Python 3.x
from copy import deepcopy
# define the original dictionary
original_dict = {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}
# make a deep copy of the original dictionary
new_dict = deepcopy(original_dict)
# modify the dictionary in a loop
for key in new_dict:
if isinstance(new_dict[key], dict) and 'e' in new_dict[key]:
del new_dict[key]['e']
# print the original and modified dictionaries
print('Original dictionary:', original_dict)
print('Modified dictionary:', new_dict)
Which would yield:
Original dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}
Modified dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5}}
Without new_dict = deepcopy(original_dict)
, ‘e’ element is unable to be removed.
Why? Because if the loop was for key in original_dict
, and an attempt is made to modify original_dict, a RuntimeError would be observed:
"RuntimeError: dictionary changed size during iteration"
So in order to modify a dictionary within an iteration, a copy of the dictionary must be used.
Here is an example function that removes an element from a dictionary:
def remove_hostname(domain, hostname):
domain_copy = deepcopy(domain)
for domains, hosts in domain_copy.items():
for host, port in hosts.items():
if host == hostname:
del domain[domains][host]
return domain
dict.copy() is a shallow copy function for dictionary
id is built-in function that gives you the address of variable
First you need to understand “why is this particular problem is happening?”
In [1]: my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
In [2]: my_copy = my_dict.copy()
In [3]: id(my_dict)
Out[3]: 140190444167808
In [4]: id(my_copy)
Out[4]: 140190444170328
In [5]: id(my_copy['a'])
Out[5]: 140190444024104
In [6]: id(my_dict['a'])
Out[6]: 140190444024104
The address of the list present in both the dicts for key ‘a’ is pointing to same location.
Therefore when you change value of the list in my_dict, the list in my_copy changes as well.
Solution for data structure mentioned in the question:
In [7]: my_copy = {key: value[:] for key, value in my_dict.items()}
In [8]: id(my_copy['a'])
Out[8]: 140190444024176
Or you can use deepcopy as mentioned above.
@Rob suggested a thread safe alternative to copy.deepcopy()
:
>>> import json
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> json.loads(json.dumps(my_dict))
{'a': [1, 2, 3], 'b': [4, 5, 6]}
This only works your data contains only jsonifiable objects like str
, dict
, list
, int
, float
, and None
.
Otherwise (e.g. datetime
and custom objects) will be need to be coerced into a string or other standard type before serializing with json.dumps()
.
And you’ll need to run a custom deserializer as well after json.loads()
:
>>> from datetime import datetime as dt
>>> my_dict = {'a': (1,), 'b': dt(2023, 4, 9).isoformat()}
>>> d = json.loads(json.dumps(my_dict))
>>> d
{'a': [1], 'b': '2023-04-09T00:00:00'}
>>> for k in d:
... try:
... d[k] = dt.fromisoformat(d[k])
... except:
... pass
>>> d
{'a': [1], 'b': datetime.datetime(2023, 4, 9, 0, 0)}
Of course you need to do the serialization and deserialization on special objects recursively.
Sometimes that’s a good thing.
This process normalizes all your objects to types that are directly serializable (for example tuple
s become list
s) and you can be sure they’ll match a reproducable data schema (for relational database storage).
And it’s thread safe. The builtin copy.deepcopy()
is NOT thread safe! If you use deepcopy
within async
code that can crash your program or corrupt your data unexpectedly long after you’ve forgotten about your code.