How to convert defaultdict of defaultdicts [of defaultdicts] to dict of dicts [of dicts]?

Question:

Using this answer, I created a defaultdict of defaultdicts. Now, I’d like to turn that deeply nested dict object back into an ordinary python dict.

from collections import defaultdict

factory = lambda: defaultdict(factory)
defdict = factory()
defdict['one']['two']['three']['four'] = 5

# defaultdict(<function <lambda> at 0x10886f0c8>, {
#             'one': defaultdict(<function <lambda> at 0x10886f0c8>, {
#                 'two': defaultdict(<function <lambda> at 0x10886f0c8>, {
#                     'three': defaultdict(<function <lambda> at 0x10886f0c8>, {
#                         'four': 5})})})})

I assume this is not the right solution:

import json

regdict = json.loads(json.dumps(defdict))

# {u'one': {u'two': {u'three': {u'four': 5}}}}

Also, this answer is inadequate since it does not recurse on the nested dict(s).

Asked By: samstav

||

Answers:

You can recurse over the tree, replacing each defaultdict instance with a dict produced by a dict comprehension:

def default_to_regular(d):
    if isinstance(d, defaultdict):
        d = {k: default_to_regular(v) for k, v in d.items()}
    return d

Demo:

>>> from collections import defaultdict
>>> factory = lambda: defaultdict(factory)
>>> defdict = factory()
>>> defdict['one']['two']['three']['four'] = 5
>>> defdict
defaultdict(<function <lambda> at 0x103098ed8>, {'one': defaultdict(<function <lambda> at 0x103098ed8>, {'two': defaultdict(<function <lambda> at 0x103098ed8>, {'three': defaultdict(<function <lambda> at 0x103098ed8>, {'four': 5})})})})
>>> default_to_regular(defdict)
{'one': {'two': {'three': {'four': 5}}}}
Answered By: Martijn Pieters

What you’re actually trying to do is pickle your recursive defaultdict. And you don’t care whether you get back a dict or a defaultdict when unpickling.

While there are a number of ways to solve this (e.g., create a defaultdict subclass with its own pickling, or explicitly override the default one with copyreg), there’s one that’s dead trivial.

Notice the error you get when you try it:

>>> pickle.dumps(defdict)
PicklingError: Can't pickle <function <lambda> at 0x10d7f4c80>: attribute lookup <lambda> on __main__ failed

You can’t pickle lambda-defined functions, because they’re anonymous, meaning there’s no way they could ever be unpickled.

But there is literally no reason this function needs to be defined by lambda. In particular, you don’t even want it to be anonymous, because you’re explicitly giving it a name. So:

def factory(): return defaultdict(factory)

And you’re done.

Here it is in action:

>>> from collections import defaultdict
>>> def factory(): return defaultdict(factory)
>>> defdict = factory()
>>> defdict['one']['two']['three']['four'] = 5
>>> import pickle
>>> pickle.dumps(defdict)
b'x80x03ccollectionsndefaultdictnqx00c__main__nfactorynqx01x85qx02Rqx03Xx03x00x00x00oneqx04hx00hx01x85qx05Rqx06Xx03x00x00x00twoqx07hx00hx01x85qx08RqtXx05x00x00x00threeqnhx00hx01x85qx0bRqx0cXx04x00x00x00fourqrKx05ssss.'

There are other cases where using lambda instead of def for no good reason will cause problems—you can’t introspect your functions as well at runtime, you get worse tracebacks in the debugger, etc. Use lambda when you want an inherently-anonymous function, or a function you can define in the middle of an expression, but don’t use it to save three characters of typing.

Answered By: abarnert