Pandas: Explode a list of nested dictionaries column and append as new rows

Question:

Please consider the below dict for example:

d2 = [{'event_id': 't1',
  'display_name': 't1',
  'form_count': 0,
  'repetition_id': None,
  'children': [{'event_id': 't_01',
    'display_name': 't(1)',
    'form_count': 1,
    'repetition_id': 't1',
    'children': [],
    'forms': [{'form_id': 'f1',
      'form_repetition_id': '1',
      'form_name': 'fff1',
      'is_active': True,
      'is_submitted': False}]}],
  'forms': []},
 {'event_id': 't2',
  'display_name': 't2',
  'form_count': 0,
  'repetition_id': None,
  'children': [{'event_id': 't_02',
    'display_name': 't(2)',
    'form_count': 1,
    'repetition_id': 't2',
    'children': [{'event_id': 't_03',
      'display_name': 't(3)',
      'form_count': 1,
      'repetition_id': 't3',
      'children': [],
      'forms': [{'form_id': 'f3',
        'form_repetition_id': '1',
        'form_name': 'fff3',
        'is_active': True,
        'is_submitted': False}]}],
    'forms': [{'form_id': 'f2',
      'form_repetition_id': '1',
      'form_name': 'fff2',
      'is_active': True,
      'is_submitted': False}]}],
  'forms': []}]

Above d2 is a list of dicts, where children is a nested dict with same keys as the parent.

Also, children can have nesting upto multiple levels which is not possible to know upfront. So in short, I don’t know how many times to keep exploding it.

Current df:

In [54]: df11 = pd.DataFrame(d2)

In [55]: df11
Out[55]: 
  event_id display_name  form_count repetition_id                                           children forms
0       t1           t1           0          None  [{'event_id': 't_01', 'display_name': 't(1)', ...    []
1       t2           t2           0          None  [{'event_id': 't_02', 'display_name': 't(2)', ...    []

I want to flatten it in the below way.

Expected output:

 event_id display_name  form_count repetition_id                                           children                                              forms
0       t1           t1           0          None  {'event_id': 't_01', 'display_name': 't(1)', '...                                                 []
1       t2           t2           0          None  {'event_id': 't_02', 'display_name': 't(2)', '...                                                 []
0     t_01         t(1)           1            t1                                                 []  [{'form_id': 'f1', 'form_repetition_id': '1', ...
1     t_02         t(2)           1            t2  {'event_id': 't_03', 'display_name': 't(3)', ...  [{'form_id': 'f2', 'form_repetition_id': '1', ...
0     t_03         t(3)           0            t3                                                 []     [{'form_id': 'f2', 'form_repetition_id': '1'}]

How do I know that how many nested children are there?

My attempt:

In [58]: df12 = df11.explode('children')
In [64]: final = pd.concat([df12, pd.json_normalize(df12.children)])
In [72]: final
Out[72]: 
  event_id display_name  form_count repetition_id                                           children                                              forms
0       t1           t1           0          None  {'event_id': 't_01', 'display_name': 't(1)', '...                                                 []
1       t2           t2           0          None  {'event_id': 't_02', 'display_name': 't(2)', '...                                                 []
0     t_01         t(1)           1            t1                                                 []  [{'form_id': 'f1', 'form_repetition_id': '1', ...
1     t_02         t(2)           1            t2  [{'event_id': 't_03', 'display_name': 't(3)', ...  [{'form_id': 'f2', 'form_repetition_id': '1', ...
Asked By: Mayank Porwal

||

Answers:

This can be solved with a little bit of recursive programming:

from collections import deque

queue = deque(d2)
d3 = []

while queue:
    item = queue.popleft()
    d3.append(item)

    # Optionally add a parent_event_id. Remove if you don't need it.
    queue += [
        {**child, "parent_event_id": item["event_id"]}
        for child in item.get("children", [])
    ]

df = pd.DataFrame(d3)
Answered By: Code Different
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.