How to convert list of nested dictionary to pandas DataFrame?

Question:

I have some data containing nested dictionaries like below:

mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]

If we convert it to pandas DataFrame,

import pandas as pd 

result_dataframe = pd.DataFrame(mylist)
print(result_dataframe)

It will output:

    a   b
  0 1   {'c': 2, 'd': 3}
  1 3   {'c': 4, 'd': 3}

I want to convert the list of dictionaries and ignore the key of the nested dictionary. My code is below:

new_dataframe = result_dataframe.drop(columns=["b"])
b_dict_list = [document["b"] for document in mylist]
b_df = pd.DataFrame(b_dict_list)
frames = [new_dataframe, b_df]
total_frame = pd.concat(frames, axis=1)

The total_frame is which I want:

    a   c   d
0   1   2   3
1   3   4   3

But I think my code is a little complicated. Is there any simple way to deal with this problem? Thank you.

Asked By: lazy_frog

||

Answers:

Use dict comprehension with pop for extract value b and merge dictionaries:

a = [{**x, **x.pop('b')} for x in mylist]
print (a)
[{'a': 1, 'c': 2, 'd': 3}, {'a': 3, 'c': 4, 'd': 3}]

result_dataframe = pd.DataFrame(a)
print(result_dataframe)
   a  c  d
0  1  2  3
1  3  4  3

Another solution, thanks @Sandeep Kadapa :

a = [{'a': x['a'], **x['b']} for x in mylist] 
#alternative
a = [{'a': x['a'], **x.get('b')} for x in mylist] 
Answered By: jezrael

Or by applying pd.Series() to your method:

mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
result_dataframe = pd.DataFrame(mylist)
result_dataframe.drop('b',1).join(result_dataframe.b.apply(pd.Series))

   a  c  d
0  1  2  3
1  3  4  3
Answered By: anky

I prefer to write a function that accepts your mylist and converts it 1 nested layer down and returns a dictionary. This has the added advantage of not requiring you to ‘manually’ know what key like b to convert. So this function works for all nested keys 1 layer down.

mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
import pandas as pd

def dropnested(alist):
    outputdict = {}
    for dic in alist:
        for key, value in dic.items():
            if isinstance(value, dict):
                for k2, v2, in value.items():
                    outputdict[k2] = outputdict.get(k2, []) + [v2]
            else:
                outputdict[key] = outputdict.get(key, []) + [value]
    return outputdict    

df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
#   a  c  d
#0  1  2  3
#1  3  4  3

If you try:

mylist = [{"a": 1, "b": {"c": 2, "d":3}, "g": {"e": 2, "f":3}}, 
          {"a": 3, "z": {"c": 4, "d":3}, "e": {"e": 2, "f":3}}]
df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
#   a  c  d  e  f
#0  1  2  3  2  3
#1  3  4  3  2  3

We can see here that it converts keys b,g,z,e without issue, as opposed to having to define each and every nested key name to convert

Answered By: ycx

I had a similar problem to this one. I used pd.json_normalize(x) and it worked. The only difference is that the column names of the data frame will look a little different.

mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
df = pd.json_normalize(mylist)
print(df)

Output:

a b.c b.d
0 1 2 3
1 3 4 3
Answered By: S Monteiro
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.