How to explode nested json in pandas as rows?

Question:

My json looks like below,

{
   "customer_name":"john",
   "phone":{
      "mobile":000, 
      "office":111
   },
   "mail":{
      "office":"[email protected]", 
      "personal":"[email protected]"
   }
}

phone or mail can have 1 or many items. Also, the key (office, mobile, personnel) are not same always. For example, it can be Business or alternative. So I wanted to convert them to rows and not columns. Most of the questions are to convert to columns so not able to find a solution for rows.

The final data frame should look like,

customer_name   |   phone.key   |   phone.val   |  mail.key   |   mail.value 
john            |   mobile      |   000         |             | 
john            |   office      |   111         |             |         
john            |               |               |  office     |   [email protected]
john            |               |               |  personal   |   [email protected]

Can someone help me?

Asked By: Selva

||

Answers:

Yiu can change format of nested dictionaries added customer_name with new keys:

d = {
   "customer_name":"john",
   "phone":{
      "mobile":000, 
      "office":111
   },
   "mail":{
      "office":"[email protected]", 
      "personal":"[email protected]"
   }
}

out = []
for k, v in d.items():
    if isinstance(v, dict):
        for k1, v1 in v.items():
            out.append({'customer_name':d['customer_name'],f'{k}.key': k1,f'{k}.value':v1})

Least comprehension alternative:

out = [{'customer_name':d['customer_name'],f'{k}.key': k1,f'{k}.value':v1} 
        for k, v in d.items() if isinstance(v, dict) for k1, v1 in v.items()]

df = pd.DataFrame(out)
print (df)
  customer_name phone.key  phone.value  mail.key       mail.value
0          john    mobile          0.0       NaN              NaN
1          john    office        111.0       NaN              NaN
2          john       NaN          NaN    office  [email protected]
3          john       NaN          NaN  personal    [email protected]
Answered By: jezrael

It would be way simpler if you use json_normalize in the following way to flatten your data:

 from pandas.io.json import json_normalize
 data = {
  "customer_name":"john",
  "phone":{
     "mobile":000, 
     "office":111
   },
  "mail":{
    "office":"[email protected]", 
    "personal":"[email protected]"
   }
 }

 df = json_normalize(data)
Answered By: Umar Aftab
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.