pandas DataFrame: normalize one JSON column and merge with other columns

Question:

I have a pandas DataFrame containing one column with multiple JSON data items as list of dicts. I want to normalize the JSON column and duplicate the non-JSON columns:

# creating dataframe
df_actions = pd.DataFrame(columns=['id', 'actions'])
rows = [[12,json.loads('[{"type": "a","value": "17"},{"type": "b","value": "19"}]')],
   [15, json.loads('[{"type": "a","value": "1"},{"type": "b","value": "3"},{"type": "c","value": "5"}]')]]
df_actions.loc[0] = rows[0]
df_actions.loc[1] = rows[1]

>>>df_actions
   id                                            actions
0  12  [{'type': 'a', 'value': '17'}, {'type': 'b', '...
1  15  [{'type': 'a', 'value': '1'}, {'type': 'b', 'v...

I want

>>>df_actions_parsed
   id      type    value
   12      a        17
   12      b        19
   15      a        1
   15      b        3
   15      c        5

I can normalize JSON data using:

pd.concat([pd.DataFrame(json_normalize(x)) for x in df_actions['actions']],ignore_index=True)

but I don’t know how to join that back to the id column of the original DataFrame.

Asked By: stack_lech

||

Answers:

You can use concat with dict comprehension with pop for extract column, remove second level and join to original:

df1 = (pd.concat({i: pd.DataFrame(x) for i, x in df_actions.pop('actions').items()})
         .reset_index(level=1, drop=True)
         .join(df_actions)
         .reset_index(drop=True))

What is same as:

df1 = (pd.concat({i: json_normalize(x) for i, x in df_actions.pop('actions').items()})
         .reset_index(level=1, drop=True)
         .join(df_actions)
         .reset_index(drop=True))

print (df1)
  type value  id
0    a    17  12
1    b    19  12
2    a     1  15
3    b     3  15
4    c     5  15

Another solution if performance is important:

L = [{**{'i':k, **y}} for k, v in df_actions.pop('actions').items() for y in v]

df_actions = df_actions.join(pd.DataFrame(L).set_index('i')).reset_index(drop=True)
print (df_actions)
   id type value
0  12    a    17
1  12    b    19
2  15    a     1
3  15    b     3
4  15    c     5
Answered By: jezrael

Here’s another solution that uses explode and json_normalize:

exploded = df_actions.explode("actions")
pd.concat([exploded["id"].reset_index(drop=True), pd.json_normalize(exploded["actions"])], axis=1)

Here’s the result:

   id type value
0  12    a    17
1  12    b    19
2  15    a     1
3  15    b     3
4  15    c     5
Answered By: Powers
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.