How to flatten list of dictionaries
Question:
Learning how to traverse lists and dictionaries in Python.
process = [
{
'process1':
[
{"subprocess1":["subprocess1_1","subprocess1_2"]},
"subprocess2",
{"subprocess3":["subprocess3_1", "subprocess3_2"]},
"subprocess4",
{"subprocess5":[{"subprocess5_1":["subprocess5_1_1","subprocess5_1_2"]}]},
],
},
{
'process2':
[
"subprocess2_1"
]
}
]
How do I flatten above list of dictionaries into following:
process1 = [subprocess1, subprocess1_1, subprocess1_2, subprocess2, subprocess3, subprocess3_1, subprocess3_2, subprocess4, subprocess5, subprocess5_1, subprocess5_1_1, subprocess5_1_2]
process2 = [subprocess2_1]
Noob here so please ignore my lack of knowledge.
Thanks in advance.
Answers:
In such cases, it’s always useful to use a recursive function/generator:
def flatten(x):
if isinstance(x, list):
for item in x:
yield from flatten(item)
elif isinstance(x, dict):
for k, v in x.items():
yield k
yield from flatten(v)
else:
yield x
out = {k: list(flatten(v)) for d in process for k,v in d.items()}
Output:
out['process1']
# ['subprocess1', 'subprocess1_1', 'subprocess1_2', 'subprocess2',
# 'subprocess3', 'subprocess3_1', 'subprocess3_2', 'subprocess4',
# 'subprocess5', 'subprocess5_1', 'subprocess5_1_1', 'subprocess5_1_2']
out['process2']
# ['subprocess2_1']
Here is another aproach which uses the flatten function which takes a nested list or dictionary as input and returns a flattened list of its elements. It uses recursion to handle nested lists and dictionaries of arbitrary depth.
To use the function with your process data structure, you can extract the nested lists from the dictionary values using indexing and pass them to the flatten function. Here’s how you can do it:
# define the flatten function
def flatten(lst):
# create an empty list to store the flattened list
result = []
# iterate through each element in the input list
for item in lst:
# if the element is a dictionary, recursively flatten its values
if isinstance(item, dict):
for val in item.values():
result.extend(flatten(val))
# if the element is a list, recursively flatten its elements
elif isinstance(item, list):
result.extend(flatten(item))
# otherwise, append the element to the result list
else:
result.append(item)
# return the flattened list
return result
# extract the nested lists and flatten them
process1 = flatten(process[0]['process1'])
process2 = flatten(process[1]['process2'])
print(process1)
print(process2)
This will output:
['subprocess1_1', 'subprocess1_2', 'subprocess2', 'subprocess3_1', 'subprocess3_2', 'subprocess4', 'subprocess5_1_1', 'subprocess5_1_2']
['subprocess2_1']
Nested for-loops or list comprehension are common ways to flatten objects of fixed depth; and recursion is generally useful for flattening objects of arbitrary depth, so you can
- flatten by one more level with each recursive call, and
- use
isinstance
to detect dictionaries before getting .values
, and
- use
hasattr
to check if an input has __iter__
(and is, therefore, iterable)
You can use the nested loop in a generator function
def flatten_obj(obj):
if hasattr(obj, '__iter__') and not isinstance(obj, str):
for i in (obj.values() if isinstance(obj, dict) else obj):
for v in flatten_obj(i): yield v
else: yield obj
But if you want the function to return a list, list comprehension might be preferable to initiating an empty list and appending to it in a nested loop.
def get_flat_list(obj, listify_single=False):
if isinstance(obj, str) or not hasattr(obj, '__iter__'):
return [obj] if listify_single else obj
if isinstance(obj, dict): obj = obj.values()
return [v for x in obj for v in get_flat_list(x,listify_single=True)]
Using
- either
{k: list(flatten_obj(v)) for i in process for k,v in i.items()}
- or
{k: get_flat_list(v) for i in process for k,v in i.items()}
should return
{
'process1': ['subprocess1_1', 'subprocess1_2', 'subprocess2', 'subprocess3_1', 'subprocess3_2', 'subprocess4', 'subprocess5_1_1', 'subprocess5_1_2'],
'process2': ['subprocess2_1']
}
Ofc you can also define process1
and process2
as separate variables:
process1 = get_flat_list(process[0]['process1'])
# list(flatten_obj(process[0]['process1']))
process2 = get_flat_list(process[1]['process2'])
# list(flatten_obj(process[1]['process2']))
or
process1, process2, *_ = [list(flatten_obj(v)) for i in process for v in i.values()]
# process1, process2, *_ = [get_flat_list(v) for i in process for v in i.values()]
Learning how to traverse lists and dictionaries in Python.
process = [
{
'process1':
[
{"subprocess1":["subprocess1_1","subprocess1_2"]},
"subprocess2",
{"subprocess3":["subprocess3_1", "subprocess3_2"]},
"subprocess4",
{"subprocess5":[{"subprocess5_1":["subprocess5_1_1","subprocess5_1_2"]}]},
],
},
{
'process2':
[
"subprocess2_1"
]
}
]
How do I flatten above list of dictionaries into following:
process1 = [subprocess1, subprocess1_1, subprocess1_2, subprocess2, subprocess3, subprocess3_1, subprocess3_2, subprocess4, subprocess5, subprocess5_1, subprocess5_1_1, subprocess5_1_2]
process2 = [subprocess2_1]
Noob here so please ignore my lack of knowledge.
Thanks in advance.
In such cases, it’s always useful to use a recursive function/generator:
def flatten(x):
if isinstance(x, list):
for item in x:
yield from flatten(item)
elif isinstance(x, dict):
for k, v in x.items():
yield k
yield from flatten(v)
else:
yield x
out = {k: list(flatten(v)) for d in process for k,v in d.items()}
Output:
out['process1']
# ['subprocess1', 'subprocess1_1', 'subprocess1_2', 'subprocess2',
# 'subprocess3', 'subprocess3_1', 'subprocess3_2', 'subprocess4',
# 'subprocess5', 'subprocess5_1', 'subprocess5_1_1', 'subprocess5_1_2']
out['process2']
# ['subprocess2_1']
Here is another aproach which uses the flatten function which takes a nested list or dictionary as input and returns a flattened list of its elements. It uses recursion to handle nested lists and dictionaries of arbitrary depth.
To use the function with your process data structure, you can extract the nested lists from the dictionary values using indexing and pass them to the flatten function. Here’s how you can do it:
# define the flatten function
def flatten(lst):
# create an empty list to store the flattened list
result = []
# iterate through each element in the input list
for item in lst:
# if the element is a dictionary, recursively flatten its values
if isinstance(item, dict):
for val in item.values():
result.extend(flatten(val))
# if the element is a list, recursively flatten its elements
elif isinstance(item, list):
result.extend(flatten(item))
# otherwise, append the element to the result list
else:
result.append(item)
# return the flattened list
return result
# extract the nested lists and flatten them
process1 = flatten(process[0]['process1'])
process2 = flatten(process[1]['process2'])
print(process1)
print(process2)
This will output:
['subprocess1_1', 'subprocess1_2', 'subprocess2', 'subprocess3_1', 'subprocess3_2', 'subprocess4', 'subprocess5_1_1', 'subprocess5_1_2']
['subprocess2_1']
Nested for-loops or list comprehension are common ways to flatten objects of fixed depth; and recursion is generally useful for flattening objects of arbitrary depth, so you can
- flatten by one more level with each recursive call, and
- use
isinstance
to detect dictionaries before getting.values
, and - use
hasattr
to check if an input has__iter__
(and is, therefore, iterable)
You can use the nested loop in a generator function
def flatten_obj(obj):
if hasattr(obj, '__iter__') and not isinstance(obj, str):
for i in (obj.values() if isinstance(obj, dict) else obj):
for v in flatten_obj(i): yield v
else: yield obj
But if you want the function to return a list, list comprehension might be preferable to initiating an empty list and appending to it in a nested loop.
def get_flat_list(obj, listify_single=False):
if isinstance(obj, str) or not hasattr(obj, '__iter__'):
return [obj] if listify_single else obj
if isinstance(obj, dict): obj = obj.values()
return [v for x in obj for v in get_flat_list(x,listify_single=True)]
Using
- either
{k: list(flatten_obj(v)) for i in process for k,v in i.items()}
- or
{k: get_flat_list(v) for i in process for k,v in i.items()}
should return
{ 'process1': ['subprocess1_1', 'subprocess1_2', 'subprocess2', 'subprocess3_1', 'subprocess3_2', 'subprocess4', 'subprocess5_1_1', 'subprocess5_1_2'], 'process2': ['subprocess2_1'] }
Ofc you can also define process1
and process2
as separate variables:
process1 = get_flat_list(process[0]['process1'])
# list(flatten_obj(process[0]['process1']))
process2 = get_flat_list(process[1]['process2'])
# list(flatten_obj(process[1]['process2']))
or
process1, process2, *_ = [list(flatten_obj(v)) for i in process for v in i.values()]
# process1, process2, *_ = [get_flat_list(v) for i in process for v in i.values()]