How Do I unnest following structure of json into interpretable table using python?
Question:
I have dataframe df
which has column called test_col
which contains json structures as shown below. As you can see lineItemPromotions object has nested jsons in it which can have 0-10 numbers of items in it. By unnesting, it should create new rows for each ID under lineItemPromotions.
How do I unnest this structures correctly?
{'provider': 'ABC',
'discountCodes_out': [],
'discounts_out': [],
'lineItemPromotions': [{'id': '1',
'discountCodes': [],
'discounts': [{'rule': 'Bundle Discount',
'name': 'Bundle Discount',
'ruleId': '',
'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
'centAmount': 1733550}],
'perUnitPromotionsShares': [1733550]},
{'id': '2',
'discountCodes': [],
'discounts': [{'rule': 'Bundle Discount',
'name': 'Bundle Discount',
'ruleId': '',
'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
'centAmount': 119438}],
'perUnitPromotionsShares': [119438, 119438]}]}
I tried following code but it is not working correctly. It is giving me nested item which I have to unnest again. Sorry that I have to paste the picture to show you the process how it is giving results.
Answers:
You could first explode your columns then concat with a new df made from the discounts
column:
df = pd.json_normalize(new_dict, meta='provider', record_path='lineItemPromotions')
df = df.apply(pd.Series.explode)
pd.concat([df.drop(columns='discounts').reset_index(drop=True),
pd.DataFrame(df['discounts'].values.tolist())], axis=1)
Output:
id discountCodes perUnitPromotionsShares provider rule name ruleId campaignId centAmount
0 1 NaN 1733550 ABC Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 1733550
1 2 NaN 119438 ABC Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 119438
2 2 NaN 119438 ABC Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 119438
Although verbose, you could explicitly normalize each level:
pd.concat(
[
pd.json_normalize(data).explode("lineItemPromotions")
.drop(columns="lineItemPromotions").reset_index(drop=True),
pd.json_normalize(data, record_path=["lineItemPromotions"])
.drop(columns="discounts"),
pd.json_normalize(data, record_path=["lineItemPromotions", "discounts"])
],
axis=1
)
provider discountCodes_out discounts_out id discountCodes perUnitPromotionsShares rule name ruleId campaignId centAmount
0 ABC [] [] 1 [] [1733550] Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 1733550
1 ABC [] [] 2 [] [119438, 119438] Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 119438
You could then .explode("perUnitPromotionsShares")
if desired.
I have dataframe df
which has column called test_col
which contains json structures as shown below. As you can see lineItemPromotions object has nested jsons in it which can have 0-10 numbers of items in it. By unnesting, it should create new rows for each ID under lineItemPromotions.
How do I unnest this structures correctly?
{'provider': 'ABC',
'discountCodes_out': [],
'discounts_out': [],
'lineItemPromotions': [{'id': '1',
'discountCodes': [],
'discounts': [{'rule': 'Bundle Discount',
'name': 'Bundle Discount',
'ruleId': '',
'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
'centAmount': 1733550}],
'perUnitPromotionsShares': [1733550]},
{'id': '2',
'discountCodes': [],
'discounts': [{'rule': 'Bundle Discount',
'name': 'Bundle Discount',
'ruleId': '',
'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
'centAmount': 119438}],
'perUnitPromotionsShares': [119438, 119438]}]}
I tried following code but it is not working correctly. It is giving me nested item which I have to unnest again. Sorry that I have to paste the picture to show you the process how it is giving results.
You could first explode your columns then concat with a new df made from the discounts
column:
df = pd.json_normalize(new_dict, meta='provider', record_path='lineItemPromotions')
df = df.apply(pd.Series.explode)
pd.concat([df.drop(columns='discounts').reset_index(drop=True),
pd.DataFrame(df['discounts'].values.tolist())], axis=1)
Output:
id discountCodes perUnitPromotionsShares provider rule name ruleId campaignId centAmount
0 1 NaN 1733550 ABC Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 1733550
1 2 NaN 119438 ABC Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 119438
2 2 NaN 119438 ABC Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 119438
Although verbose, you could explicitly normalize each level:
pd.concat(
[
pd.json_normalize(data).explode("lineItemPromotions")
.drop(columns="lineItemPromotions").reset_index(drop=True),
pd.json_normalize(data, record_path=["lineItemPromotions"])
.drop(columns="discounts"),
pd.json_normalize(data, record_path=["lineItemPromotions", "discounts"])
],
axis=1
)
provider discountCodes_out discounts_out id discountCodes perUnitPromotionsShares rule name ruleId campaignId centAmount
0 ABC [] [] 1 [] [1733550] Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 1733550
1 ABC [] [] 2 [] [119438, 119438] Bundle Discount Bundle Discount 419f9a2f-0342-41c0-ac79-419d1023aaa9 119438
You could then .explode("perUnitPromotionsShares")
if desired.