How to flatten list of dictionaries

Question:

Learning how to traverse lists and dictionaries in Python.

process = [
                {
                'process1':
                            [
                                {"subprocess1":["subprocess1_1","subprocess1_2"]},
                                "subprocess2",
                                {"subprocess3":["subprocess3_1", "subprocess3_2"]},
                                "subprocess4",
                                {"subprocess5":[{"subprocess5_1":["subprocess5_1_1","subprocess5_1_2"]}]},
                            ],
                    },
                {
                'process2':
                            [
                                "subprocess2_1"
                            ]
                }
      ]

How do I flatten above list of dictionaries into following:

process1 = [subprocess1, subprocess1_1, subprocess1_2, subprocess2, subprocess3, subprocess3_1, subprocess3_2, subprocess4, subprocess5, subprocess5_1, subprocess5_1_1, subprocess5_1_2]
process2 = [subprocess2_1]

Noob here so please ignore my lack of knowledge.
Thanks in advance.

Asked By: user1552698

||

Answers:

In such cases, it’s always useful to use a recursive function/generator:

def flatten(x):
    if isinstance(x, list):
        for item in x:
            yield from flatten(item)
    elif isinstance(x, dict):
        for k, v in x.items():
            yield k
            yield from flatten(v)
    else:
        yield x
        
out = {k: list(flatten(v)) for d in process for k,v in d.items()}

Output:

out['process1']
# ['subprocess1', 'subprocess1_1', 'subprocess1_2', 'subprocess2',
#  'subprocess3', 'subprocess3_1', 'subprocess3_2', 'subprocess4',
#  'subprocess5', 'subprocess5_1', 'subprocess5_1_1', 'subprocess5_1_2']

out['process2']
# ['subprocess2_1']
Answered By: mozway

Here is another aproach which uses the flatten function which takes a nested list or dictionary as input and returns a flattened list of its elements. It uses recursion to handle nested lists and dictionaries of arbitrary depth.

To use the function with your process data structure, you can extract the nested lists from the dictionary values using indexing and pass them to the flatten function. Here’s how you can do it:


# define the flatten function
def flatten(lst):
    # create an empty list to store the flattened list
    result = []
    # iterate through each element in the input list
    for item in lst:
        # if the element is a dictionary, recursively flatten its values
        if isinstance(item, dict):
            for val in item.values():
                result.extend(flatten(val))
        # if the element is a list, recursively flatten its elements
        elif isinstance(item, list):
            result.extend(flatten(item))
        # otherwise, append the element to the result list
        else:
            result.append(item)
    # return the flattened list
    return result

# extract the nested lists and flatten them
process1 = flatten(process[0]['process1'])
process2 = flatten(process[1]['process2'])

print(process1)
print(process2)

This will output:

['subprocess1_1', 'subprocess1_2', 'subprocess2', 'subprocess3_1', 'subprocess3_2', 'subprocess4', 'subprocess5_1_1', 'subprocess5_1_2']
['subprocess2_1']
Answered By: GSquirrel

Nested for-loops or list comprehension are common ways to flatten objects of fixed depth; and recursion is generally useful for flattening objects of arbitrary depth, so you can

  • flatten by one more level with each recursive call, and
  • use isinstance to detect dictionaries before getting .values, and
  • use hasattr to check if an input has __iter__ (and is, therefore, iterable)

You can use the nested loop in a generator function

def flatten_obj(obj):
    if hasattr(obj, '__iter__') and not isinstance(obj, str): 
        for i in (obj.values() if isinstance(obj, dict) else obj):
            for v in flatten_obj(i): yield v
    else: yield obj

But if you want the function to return a list, list comprehension might be preferable to initiating an empty list and appending to it in a nested loop.

def get_flat_list(obj, listify_single=False):
    if isinstance(obj, str) or not hasattr(obj, '__iter__'): 
        return [obj] if listify_single else obj
    
    if isinstance(obj, dict): obj = obj.values()
    return [v for x in obj for v in get_flat_list(x,listify_single=True)]

Using

  • either {k: list(flatten_obj(v)) for i in process for k,v in i.items()}
  • or {k: get_flat_list(v) for i in process for k,v in i.items()}

should return

{
  'process1': ['subprocess1_1', 'subprocess1_2', 'subprocess2', 'subprocess3_1', 'subprocess3_2', 'subprocess4', 'subprocess5_1_1', 'subprocess5_1_2'],
  'process2': ['subprocess2_1']
}

Ofc you can also define process1 and process2 as separate variables:

process1 = get_flat_list(process[0]['process1']) 
# list(flatten_obj(process[0]['process1']))

process2 = get_flat_list(process[1]['process2']) 
# list(flatten_obj(process[1]['process2']))

or

process1, process2, *_ = [list(flatten_obj(v)) for i in process for v in i.values()]
# process1, process2, *_ = [get_flat_list(v) for i in process for v in i.values()]
Answered By: Driftr95
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.