Dictionary merge by updating but not overwriting if value exists

Question:

If I have 2 dicts as follows:

d1 = {'a': 2, 'b': 4}
d2 = {'a': 2, 'b': ''}

In order to ‘merge’ them:

dict(d1.items() + d2.items())

results in

{'a': 2, 'b': ''}

But what should I do if I would like to compare each value of the two dictionaries and only update d2 into d1 if values in d1 are empty/None/''?

When the same key exists, I would like to only maintain the numerical value (either from d1 or d2) instead of the empty value. If both values are empty, then no problems maintaining the empty value. If both have values, then d1-value should stay.

i.e.

d1 = {'a': 2, 'b': 8, 'c': ''}
d2 = {'a': 2, 'b': '', 'c': ''}

should result in

{'a': 2, 'b': 8, 'c': ''}

where 8 is not overwritten by ''.

Asked By: siva

||

Answers:

Just switch the order:

z = dict(d2.items() + d1.items())

By the way, you may also be interested in the potentially faster update method.

In Python 3, you have to cast the view objects to lists first:

z = dict(list(d2.items()) + list(d1.items())) 

If you want to special-case empty strings, you can do the following:

def mergeDictsOverwriteEmpty(d1, d2):
    res = d2.copy()
    for k,v in d2.items():
        if k not in d1 or d1[k] == '':
            res[k] = v
    return res
Answered By: phihag

In case when you have dictionaries with the same size and keys you can use the following code:

dict((k,v if k in d2 and d2[k] in [None, ''] else d2[k]) for k,v in d1.iteritems())
Answered By: Artsiom Rudzenka

d2.update(d1) instead of dict(d2.items() + d1.items())

Answered By: warvariuc

Updates d2 with d1 key/value pairs, but only if d1 value is not None, '' (False):

>>> d1 = dict(a=1, b=None, c=2)
>>> d2 = dict(a=None, b=2, c=1)
>>> d2.update({k: v for k, v in d1.items() if v})
>>> d2
{'a': 1, 'c': 2, 'b': 2}

(Use iteritems() instead of items() in Python 2.)

Answered By: Mark Tolonen

Here’s an in-place solution (it modifies d2):

# assumptions: d2 is a temporary dict that can be discarded
# d1 is a dict that must be modified in place
# the modification is adding keys from d2 into d1 that do not exist in d1.

def update_non_existing_inplace(original_dict, to_add):
    to_add.update(original_dict) # to_add now holds the "final result" (O(n))
    original_dict.clear() # erase original_dict in-place (O(1))
    original_dict.update(to_add) # original_dict now holds the "final result" (O(n))
    return

Here’s another in-place solution, which is less elegant but potentially more efficient, as well as leaving d2 unmodified:

# assumptions: d2 is can not be modified
# d1 is a dict that must be modified in place
# the modification is adding keys from d2 into d1 that do not exist in d1.

def update_non_existing_inplace(original_dict, to_add):
    for key in to_add.iterkeys():
        if key not in original_dict:
            original_dict[key] = to_add[key]
Answered By: aong152

To add to d2 keys/values from d1 which do not exist in d2 without overwriting any existing keys/values in d2:

temp = d2.copy()
d2.update(d1)
d2.update(temp)
Answered By: Ron Kalian

Python 3.5+ Literal Dict

unless using obsolete version of python you better off using this.

Pythonic & faster way for dict unpacking:

d1 = {'a':1, 'b':1}
d2 = {'a':2, 'c':2}
merged = {**d1, **d2}  # priority from right to left
print(merged)

{'a': 2, 'b': 1, 'c': 2}

its simpler and also faster than the dict(list(d2.items()) + list(d1.items())) alternative:

d1 = {i: 1 for i in range(1000000)}
d2 = {i: 2 for i in range(2000000)}

%timeit dict(list(d1.items()) + list(d2.items())) 
402 ms ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit {**d1, **d2}
144 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

more on this from PEP448:

The keys in a dictionary remain in a right-to-left priority order, so {**{‘a’: 1}, ‘a’: 2, **{‘a’: 3}} evaluates to {‘a’: 3}. There is no restriction on the number or position of unpackings.

Merging Only Non-zero values

to do this we can just create a dict without the empty values and then merge them together this way:

d1 = {'a':1, 'b':1, 'c': '', 'd': ''}
d2 = {'a':2, 'c':2, 'd': ''}
merged_non_zero = {
    k: (d1.get(k) or d2.get(k))
    for k in set(d1) | set(d2)
}
print(merged_non_zero)

outputs:

{'a': 1, 'b': 1, 'c': 2, 'd': ''}
  • a -> prefer first value from d1 as ‘a’ exists on both d1 and d2
  • b -> only exists on d1
  • c -> non-zero on d2
  • d -> empty string on both

Explanation

The above code will create a dictionary using dict comprehension.

if d1 has the value and its non-zero value (i.e. bool(val) is True), it’ll use d1[k] value, otherwise it’ll take d2[k].

notice that we also merge all keys of the two dicts as they may not have the exact same keys using set union – set(d1) | set(d2).

Answered By: ShmulikA

If you want to ignore empty spaces so that for example merging:

a = {"a": 1, "b": 2, "c": ""}
b = {"a": "", "b": 4, "c": 5}
c = {"a": "aaa", "b": ""}
d = {"a": "", "w": ""}

results in:{'a': 'aaa', 'b': 4, 'c': 5, 'w': ''}

You can use these 2 functions:

def merge_two_dicts(a, b, path=None):
    "merges b into a"
    if path is None:
        path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge_two_dicts(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass  # same leaf value
            else:
                if a[key] and not b[key]:
                    a[key] = a[key]
                else:
                    a[key] = b[key]
        else:
            a[key] = b[key]
    return a


def merge_multiple_dicts(*a):
    output = a[0]
    if len(a) >= 2:
        for n in range(len(a) - 1):
            output = merge_two_dicts(output, a[n + 1])

    return output

So you can just use merge_multiple_dicts(a,b,c,d)

Answered By: est.tenorio

I have a solution if you want to have more freedom to choose when a value should be overwritten in the merged dictionary. Maybe it’s a verbose script, but it’s not hard to understand its logic.

Thanks fabiocaccamo and senderle for sharing the benedict package, and the nested iteration logic in lists, respectively. This knowledge was fundamental to the script development.

Python Requirements

pip install python-benedict==0.24.3

Python Script

Definition of the Dict class.

from __future__ import annotations

from collections.abc import Mapping
from benedict import benedict
from typing import Iterator
from copy import deepcopy


class Dict:
    def __init__(self, data: dict = None):
        """
        Instantiates a dictionary object with nested keys-based indexing.

        Parameters
        ----------
        data: dict
            Dictionary.

        References
        ----------
        [1] 'Dict' class: https://stackoverflow.com/a/70908985/16109419
        [2] 'Benedict' package: https://github.com/fabiocaccamo/python-benedict
        [3] Dictionary nested iteration: https://stackoverflow.com/a/10756615/16109419
        """
        self.data = deepcopy(data) if data is not None else {}

    def get(self, keys: [object], **kwargs) -> (object, bool):
        """
        Get dictionary item value based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to get item value based on.

        Returns
        -------
        value, found: (object, bool)
            Item value, and whether the target item was found.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])
        value, found = None, False

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            trace = path + [outer_key]

            # Getting item value from dictionary:
            if trace == keys:
                value, found = outer_value, True
                break

            if trace == keys[:len(trace)] and isinstance(outer_value, Mapping):  # Recursion cutoff.
                value, found = self.get(
                    data=outer_value,
                    keys=keys,
                    path=trace
                )

        return value, found

    def set(self, keys: [object], value: object, **kwargs) -> bool:
        """
        Set dictionary item value based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to set item value based on.
        value: object
            Item value.

        Returns
        -------
        updated: bool
            Whether the target item was updated.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])
        updated = False

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            trace = path + [outer_key]

            # Setting item value on dictionary:
            if trace == keys:
                data[outer_key] = value
                updated = True
                break

            if trace == keys[:len(trace)] and isinstance(outer_value, Mapping):  # Recursion cutoff.
                updated = self.set(
                    data=outer_value,
                    keys=keys,
                    value=value,
                    path=trace
                )

        return updated

    def add(self, keys: [object], value: object, **kwargs) -> bool:
        """
        Add dictionary item value based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to add item based on.
        value: object
            Item value.

        Returns
        -------
        added: bool
            Whether the target item was added.
        """
        data = kwargs.get('data', self.data)
        added = False

        # Adding item on dictionary:
        if keys[0] not in data:
            if len(keys) == 1:
                data[keys[0]] = value
                added = True
            else:
                data[keys[0]] = {}

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            if outer_key == keys[0]:  # Recursion cutoff.
                if len(keys) > 1 and isinstance(outer_value, Mapping):
                    added = self.add(
                        data=outer_value,
                        keys=keys[1:],
                        value=value
                    )

        return added

    def remove(self, keys: [object], **kwargs) -> bool:
        """
        Remove dictionary item based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to remove item based on.

        Returns
        -------
        removed: bool
            Whether the target item was removed.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])
        removed = False

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            trace = path + [outer_key]

            # Removing item from dictionary:
            if trace == keys:
                del data[outer_key]
                removed = True
                break

            if trace == keys[:len(trace)] and isinstance(outer_value, Mapping):  # Recursion cutoff.
                removed = self.remove(
                    data=outer_value,
                    keys=keys,
                    path=trace
                )

        return removed

    def items(self, **kwargs) -> Iterator[object, object]:
        """
        Get dictionary items based on nested keys.

        Returns
        -------
        keys, value: Iterator[object, object]
            List of nested keys and list of values.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])

        for outer_key, outer_value in data.items():
            if isinstance(outer_value, Mapping):
                for inner_key, inner_value in self.items(data=outer_value, path=path + [outer_key]):
                    yield inner_key, inner_value
            else:
                yield path + [outer_key], outer_value

    @staticmethod
    def merge(dict_list: [dict], overwrite: bool = False, concat: bool = False, default_value: object = None) -> dict:
        """
        Merges dictionaries, with value assignment based on order of occurrence. Overwrites values if and only if:
            - The key does not yet exist on merged dictionary;
            - The current value of the key on merged dictionary is the default value.

        Parameters
        ----------
        dict_list: [dict]
            List of dictionaries.
        overwrite: bool
            Overwrites occurrences of values. If false, keep the first occurrence of each value found.
        concat: bool
            Concatenates occurrences of values for the same key.
        default_value: object
            Default value used as a reference to override dictionary attributes.

        Returns
        -------
        md: dict
            Merged dictionary.
        """
        dict_list = [d for d in dict_list if d is not None and isinstance(d, dict)] if dict_list is not None else []
        assert len(dict_list), f"no dictionaries given."

        # Keeping the first occurrence of each value:
        if not overwrite:
            dict_list = [Dict(d) for d in dict_list]

            for i, d in enumerate(dict_list[:-1]):
                for keys, value in d.items():
                    if value != default_value:
                        for j, next_d in enumerate(dict_list[i+1:], start=i+1):
                            next_d.remove(keys=keys)

            dict_list = [d.data for d in dict_list]

        md = benedict()
        md.merge(*dict_list, overwrite=True, concat=concat)

        return md

Definition of the main method to show examples.

import json


def main() -> None:
    dict_list = [
        {1: 'a', 2: None, 3: {4: None, 5: {6: None}}},
        {1: None, 2: None, 3: {4: 'c', 5: {6: {7: None}}}},
        {1: None, 2: 'b', 3: {4: None, 5: {6: {7: 'd'}}}},
        {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['e', 'f']}}}}}},
        {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['g', 'h']}}}}}},
    ]

    d = Dict(data=dict_list[-1])

    print("Dictionary operations test:n")
    print(f"data = {json.dumps(d.data, indent=4)}n")
    print(f"d = Dict(data=data)")

    keys = [11]
    value = {12: {13: 14}}
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    print(f"d.set(keys={keys}, value={value}) --> {d.set(keys=keys, value=value)}")
    print(f"d.add(keys={keys}, value={value}) --> {d.add(keys=keys, value=value)}")
    keys = [11, 12, 13]
    value = 14
    print(f"d.add(keys={keys}, value={value}) --> {d.add(keys=keys, value=value)}")
    value = 15
    print(f"d.set(keys={keys}, value={value}) --> {d.set(keys=keys, value=value)}")
    keys = [11]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [11, 12]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [11, 12, 13]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [11, 12, 13, 15]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [2]
    print(f"d.remove(keys={keys}) --> {d.remove(keys=keys)}")
    print(f"d.remove(keys={keys}) --> {d.remove(keys=keys)}")
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")

    print("n-----------------------------n")
    print("Dictionary values match test:n")
    print(f"data = {json.dumps(d.data, indent=4)}n")
    print(f"d = Dict(data=data)")

    for keys, value in d.items():
        real_value, found = d.get(keys=keys)
        status = "found" if found else "not found"
        print(f"d{keys} = {value} == {real_value} ({status}) --> {value == real_value}")

    print("n-----------------------------n")
    print("Dictionaries merge test:n")

    for i, d in enumerate(dict_list, start=1):
        print(f"d{i} = {d}")

    dict_list_ = [f"d{i}" for i, d in enumerate(dict_list, start=1)]
    print(f"dict_list = [{', '.join(dict_list_)}]")

    md = Dict.merge(dict_list=dict_list)
    print("nmd = Dict.merge(dict_list=dict_list)")
    print("print(md)")
    print(f"{json.dumps(md, indent=4)}")


if __name__ == '__main__':
    main()

Output

Dictionary operations test:

data = {
    "1": null,
    "2": "b",
    "3": {
        "4": null,
        "5": {
            "6": {
                "8": {
                    "9": {
                        "10": [
                            "g",
                            "h"
                        ]
                    }
                }
            }
        }
    }
}

d = Dict(data=data)
d.get(keys=[11]) --> (None, False)
d.set(keys=[11], value={12: {13: 14}}) --> False
d.add(keys=[11], value={12: {13: 14}}) --> True
d.add(keys=[11, 12, 13], value=14) --> False
d.set(keys=[11, 12, 13], value=15) --> True
d.get(keys=[11]) --> ({12: {13: 15}}, True)
d.get(keys=[11, 12]) --> ({13: 15}, True)
d.get(keys=[11, 12, 13]) --> (15, True)
d.get(keys=[11, 12, 13, 15]) --> (None, False)
d.remove(keys=[2]) --> True
d.remove(keys=[2]) --> False
d.get(keys=[2]) --> (None, False)

-----------------------------

Dictionary values match test:

data = {
    "1": null,
    "3": {
        "4": null,
        "5": {
            "6": {
                "8": {
                    "9": {
                        "10": [
                            "g",
                            "h"
                        ]
                    }
                }
            }
        }
    },
    "11": {
        "12": {
            "13": 15
        }
    }
}

d = Dict(data=data)
d[1] = None == None (found) --> True
d[3, 4] = None == None (found) --> True
d[3, 5, 6, 8, 9, 10] = ['g', 'h'] == ['g', 'h'] (found) --> True
d[11, 12, 13] = 15 == 15 (found) --> True

-----------------------------

Dictionaries merge test:

d1 = {1: 'a', 2: None, 3: {4: None, 5: {6: None}}}
d2 = {1: None, 2: None, 3: {4: 'c', 5: {6: {7: None}}}}
d3 = {1: None, 2: 'b', 3: {4: None, 5: {6: {7: 'd'}}}}
d4 = {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['e', 'f']}}}}}}
d5 = {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['g', 'h']}}}}}}
dict_list = [d1, d2, d3, d4, d5]

md = Dict.merge(dict_list=dict_list)
print(md)
{
    "1": "a",
    "2": "b",
    "3": {
        "4": "c",
        "5": {
            "6": {
                "7": "d",
                "8": {
                    "9": {
                        "10": [
                            "e",
                            "f"
                        ]
                    }
                }
            }
        }
    }
}
Answered By: joao8tunes
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.