Deep copy of a dict in python

Question:

I would like to make a deep copy of a dict in python. Unfortunately the .deepcopy() method doesn’t exist for the dict. How do I do that?

>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = my_dict.deepcopy()
Traceback (most recent calll last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'deepcopy'
>>> my_copy = my_dict.copy()
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
7

The last line should be 3.

I would like that modifications in my_dict don’t impact the snapshot my_copy.

How do I do that? The solution should be compatible with Python 3.x.

Asked By: Olivier Grégoire

||

Answers:

How about:

import copy
d = { ... }
d2 = copy.deepcopy(d)

Python 2 or 3:

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import copy
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = copy.deepcopy(my_dict)
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
3
>>>
Answered By: Lasse V. Karlsen

Python 3.x

from copy import deepcopy

# define the original dictionary
original_dict = {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}

# make a deep copy of the original dictionary
new_dict = deepcopy(original_dict)

# modify the dictionary in a loop
for key in new_dict:
    if isinstance(new_dict[key], dict) and 'e' in new_dict[key]:
        del new_dict[key]['e']

# print the original and modified dictionaries
print('Original dictionary:', original_dict)
print('Modified dictionary:', new_dict)

Which would yield:

Original dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}
Modified dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5}}

Without new_dict = deepcopy(original_dict), ‘e’ element is unable to be removed.

Why? Because if the loop was for key in original_dict, and an attempt is made to modify original_dict, a RuntimeError would be observed:

"RuntimeError: dictionary changed size during iteration"

So in order to modify a dictionary within an iteration, a copy of the dictionary must be used.

Here is an example function that removes an element from a dictionary:

def remove_hostname(domain, hostname):
    domain_copy = deepcopy(domain)
    for domains, hosts in domain_copy.items():
        for host, port in hosts.items():
           if host == hostname:
                del domain[domains][host]
    return domain
Answered By: xpros

dict.copy() is a shallow copy function for dictionary

id is built-in function that gives you the address of variable

First you need to understand “why is this particular problem is happening?”

In [1]: my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}

In [2]: my_copy = my_dict.copy()

In [3]: id(my_dict)
Out[3]: 140190444167808

In [4]: id(my_copy)
Out[4]: 140190444170328

In [5]: id(my_copy['a'])
Out[5]: 140190444024104

In [6]: id(my_dict['a'])
Out[6]: 140190444024104

The address of the list present in both the dicts for key ‘a’ is pointing to same location.
Therefore when you change value of the list in my_dict, the list in my_copy changes as well.


Solution for data structure mentioned in the question:

In [7]: my_copy = {key: value[:] for key, value in my_dict.items()}

In [8]: id(my_copy['a'])
Out[8]: 140190444024176

Or you can use deepcopy as mentioned above.

Answered By: theBuzzyCoder

@Rob suggested a thread safe alternative to copy.deepcopy():

>>> import json
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> json.loads(json.dumps(my_dict))
{'a': [1, 2, 3], 'b': [4, 5, 6]}

This only works your data contains only jsonifiable objects like str, dict, list, int, float, and None.
Otherwise (e.g. datetime and custom objects) will be need to be coerced into a string or other standard type before serializing with json.dumps().
And you’ll need to run a custom deserializer as well after json.loads():

>>> from datetime import datetime as dt
>>> my_dict = {'a': (1,), 'b': dt(2023, 4, 9).isoformat()}
>>> d = json.loads(json.dumps(my_dict))
>>> d
{'a': [1], 'b': '2023-04-09T00:00:00'}
>>> for k in d:
...     try:
...         d[k] = dt.fromisoformat(d[k])
...     except:
...         pass
>>> d
{'a': [1], 'b': datetime.datetime(2023, 4, 9, 0, 0)}

Of course you need to do the serialization and deserialization on special objects recursively.
Sometimes that’s a good thing.
This process normalizes all your objects to types that are directly serializable (for example tuples become lists) and you can be sure they’ll match a reproducable data schema (for relational database storage).

And it’s thread safe. The builtin copy.deepcopy() is NOT thread safe! If you use deepcopy within async code that can crash your program or corrupt your data unexpectedly long after you’ve forgotten about your code.

Answered By: hobs
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.