How Do I unnest following structure of json into interpretable table using python?

Question:

I have dataframe df which has column called test_col which contains json structures as shown below. As you can see lineItemPromotions object has nested jsons in it which can have 0-10 numbers of items in it. By unnesting, it should create new rows for each ID under lineItemPromotions.
How do I unnest this structures correctly?

{'provider': 'ABC',
 'discountCodes_out': [],
 'discounts_out': [],
 'lineItemPromotions': [{'id': '1',
   'discountCodes': [],
   'discounts': [{'rule': 'Bundle Discount',
     'name': 'Bundle Discount',
     'ruleId': '',
     'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
     'centAmount': 1733550}],
   'perUnitPromotionsShares': [1733550]},
  {'id': '2',
   'discountCodes': [],
   'discounts': [{'rule': 'Bundle Discount',
     'name': 'Bundle Discount',
     'ruleId': '',
     'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
     'centAmount': 119438}],
   'perUnitPromotionsShares': [119438, 119438]}]}

I tried following code but it is not working correctly. It is giving me nested item which I have to unnest again. Sorry that I have to paste the picture to show you the process how it is giving results.

enter image description here

Asked By: dan

||

Answers:

You could first explode your columns then concat with a new df made from the discounts column:

df = pd.json_normalize(new_dict, meta='provider', record_path='lineItemPromotions')
df = df.apply(pd.Series.explode)
pd.concat([df.drop(columns='discounts').reset_index(drop=True), 
           pd.DataFrame(df['discounts'].values.tolist())], axis=1)

Output:

  id discountCodes perUnitPromotionsShares provider             rule             name ruleId                            campaignId  centAmount
0  1           NaN                 1733550      ABC  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9     1733550
1  2           NaN                  119438      ABC  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9      119438
2  2           NaN                  119438      ABC  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9      119438
Answered By: Tranbi

Although verbose, you could explicitly normalize each level:

pd.concat(
   [
      pd.json_normalize(data).explode("lineItemPromotions")
        .drop(columns="lineItemPromotions").reset_index(drop=True),
      pd.json_normalize(data, record_path=["lineItemPromotions"])
         .drop(columns="discounts"),
      pd.json_normalize(data, record_path=["lineItemPromotions", "discounts"])
   ], 
   axis=1
)
  provider discountCodes_out discounts_out id discountCodes perUnitPromotionsShares             rule             name ruleId                            campaignId  centAmount
0      ABC                []            []  1            []               [1733550]  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9     1733550
1      ABC                []            []  2            []        [119438, 119438]  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9      119438

You could then .explode("perUnitPromotionsShares") if desired.

Answered By: jqurious
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.