How to properly "merge" complex python dictionaries?

Question:

I have n of very complex Python dictionaries with big depth level (~5) and I don’t know how to merge them properly and fast, not to iterate over them for a milion times.

What is worth mentioning – that dicts have strict structure as you will see below.

I was trying solutions connected with:

  • defaultdict
  • merge operator

Version of Python – 3.9

d1 = {
  "name": "Louis",
  "places": [
    {
      "code": "A",
      "subplaces": [
        {
          "name": "Subplace name",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        },
        {
          "name": "Subplace name2",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        }
      ]
    }
  ]
}

d2 = {
  "name": "Louis",
  "places": [
    {
      "code": "B",
      "subplaces": [
        {
          "name": "Subplace name",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        },
        {
          "name": "Subplace name2",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        }
      ]
    }
  ]
}

d3 = {
  "name": "Louis",
  "places": [
    {
      "code": "A",
      "subplaces": [
        {
          "name": "Subplace name X",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        }
      ]
    }
  ]
}

And in that case output should be

d_merged = {
  "name": "Louis",
  "places": [
    {
      "code": "A",
      "subplaces": [
        {
          "name": "Subplace name",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        },
        {
          "name": "Subplace name2",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        },
        {
          "name": "Subplace name X",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        }
      ]
    },
    {
      "code": "B",
      "subplaces": [
        {
          "name": "Subplace name",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        },
        {
          "name": "Subplace name2",
          "subsubplaces": [
            {
              "name": "subsub1"
            }
          ]
        }
      ]
    }
  ]
}
Asked By: ImageBinarizer

||

Answers:

I think your representation of data has a lot of non unnecessary details, we can reduce them by this solution:

from typing import Dict, List


dicts = [
    {
        "name": "Louis",
        "places": [
            {
                "code": "A",
                "subplaces": [
                    {
                        "name": "Subplace name",
                        "subsubplaces": [
                            {
                                "name": "subsub1"
                            }
                        ]
                    },
                    {
                        "name": "Subplace name2",
                        "subsubplaces": [
                            {
                                "name": "subsub1"
                            }
                        ]
                    }
                ]
            }
        ]
    },
    {
        "name": "Louis",
        "places": [
            {
                "code": "B",
                "subplaces": [
                    {
                        "name": "Subplace name",
                        "subsubplaces": [
                            {
                                "name": "subsub1"
                            }
                        ]
                    },
                    {
                        "name": "Subplace name2",
                        "subsubplaces": [
                            {
                                "name": "subsub1"
                            }
                        ]
                    }
                ]
            }
        ]
    },
    {
        "name": "Louis",
        "places": [
            {
                "code": "A",
                "subplaces": [
                    {
                        "name": "Subplace name X",
                        "subsubplaces": [
                            {
                                "name": "subsub1"
                            }
                        ]
                    }
                ]
            }
        ]
    }]


def merger(dicts: List[Dict]) -> Dict:
    result = {}
    for d in dicts:
        name = d["name"]
        if not name in result:
            result[name] = {}
        places = d["places"]
        for p in places:
            code = p["code"]
            if not code in result[name]:
                result[name][code] = []
            result[name][code].extend(p["subplaces"])
    return result


print(merger(dicts=dicts))

The output will be:

{
    'Louis':{
        'A':[
            {'name': 'Subplace name', 'subsubplaces': [{'name': 'subsub1'}]},
            {'name': 'Subplace name2', 'subsubplaces': [{'name': 'subsub1'}]},
            {'name': 'Subplace name X', 'subsubplaces': [{'name': 'subsub1'}]}
        ],
        'B':[
            {'name': 'Subplace name', 'subsubplaces': [{'name': 'subsub1'}]},
            {'name': 'Subplace name2', 'subsubplaces': [{'name': 'subsub1'}]}]
    }
}

If you want your desired output it’s easy to change this one to your desired output, but this on is more readable and maintainable.

Answered By: S4eed3sm

Your task is quite specific, so universal solution is not possible. I’d suggest you to merge all "places", "subplaces" and "subsubplaces" in nested dictionary to clean up all possible duplicates and then modify data to match desired format.

from itertools import groupby
from operator import itemgetter
from collections import defaultdict

def merge_places(*dicts):
    if not dicts:
        return
    
    # check all dicts have same names
    # https://docs.python.org/3/library/itertools.html#itertools-recipes
    g = groupby(dicts, itemgetter("name"))
    if next(g, True) and next(g, False):
        raise ValueError("Dictionaries names are not equal")

    places = defaultdict(lambda: defaultdict(set))  # set values are unique
    for d in dicts:
        for place in d["places"]:
            for subplace in place["subplaces"]:
                for subsubplace in subplace["subsubplaces"]:
                    places[place["code"]][subplace["name"]].add(subsubplace["name"])

    return {
        "name": d["name"],  # always exists as dicts aren't empty
        "places": [
            {
                "code": code,
                "subplaces": [
                    {
                        "name": name,
                        "subsubplaces": [
                            {"name": subsubplace}
                            for subsubplace in subsubplaces
                        ]
                    }
                    for name, subsubplaces in subplaces.items()
                ]
            }
            for code, subplaces in places.items()
        ]
    }

Usage:

result = merge_places(d1, d2, d3)

Output:

{
    "name": "Louis",
    "places": [
        {
            "code": "A",
            "subplaces": [
                {
                    "name": "Subplace name",
                    "subsubplaces": [
                        {
                            "name": "subsub1"
                        }
                    ]
                },
                {
                    "name": "Subplace name2",
                    "subsubplaces": [
                        {
                            "name": "subsub1"
                        }
                    ]
                },
                {
                    "name": "Subplace name X",
                    "subsubplaces": [
                        {
                            "name": "subsub1"
                        }
                    ]
                }
            ]
        },
        {
            "code": "B",
            "subplaces": [
                {
                    "name": "Subplace name",
                    "subsubplaces": [
                        {
                            "name": "subsub1"
                        }
                    ]
                },
                {
                    "name": "Subplace name2",
                    "subsubplaces": [
                        {
                            "name": "subsub1"
                        }
                    ]
                }
            ]
        }
    ]
}
Answered By: Olvin Roght
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.