How to use pd.json_normalize to retrieve the data I need

Question:

I have this JSON list in Python:

[{'id': 'TC2-FFA',
  'shortCode': 'TC2-FFA',
  'dataSet': {'datumPrecision': 2,
   'id': 'TC2_37',
   'shortCode': 'TC2_37',
   'shortDescription': 'Clean Continent to US Atlantic coast',
   'displayGroup': 'BCTI',
   'datumUnit': 'Worldscale',
   'data': [{'value': 156.11, 'date': '2023-03-06'}],
   'apiIdentifier': 'RDSX9KRCHQI9TVGID5O7XQGBP1KKBZ0F'},
  'datumUnit': 'WS',
  'datumPrecision': 3,
  'projectionStartOn': '2005-01-04T00:00:00',
  'projectionEndOn': '2023-03-06T00:00:00',
  'apiIdentifier': 'RPSBTGHKN64SV91SV9R3492RCH33D2OH'},
 {'id': 'TC2$-FFA',
  'shortCode': 'TC2$-FFA',
  'dataSet': {'datumPrecision': 2,
   'id': 'TC2_37',
   'shortCode': 'TC2_37',
   'shortDescription': 'Clean Continent to US Atlantic coast',
   'displayGroup': 'BCTI',
   'datumUnit': 'Worldscale',
   'data': [{'value': 156.11, 'date': '2023-03-06'}],
   'apiIdentifier': 'RDSX9KRCHQI9TVGID5O7XQGBP1KKBZ0F'},
  'datumUnit': '$/mt',
  'datumPrecision': 3,
  'projectionStartOn': '2010-05-10T00:00:00',
  'projectionEndOn': '2023-03-06T00:00:00',
  'apiIdentifier': 'RPSH1H9454DYUE7G8CLHVLFPJZ3BVM77'}]

How could I use pandas.json_normalize to only retrieve the data shortCode (or id) and data under dataSet (dataSetdatavalue and date)?

This is the desired dataframe:

  shortCode  data.value   data.date
0   TC2-FFA      156.11  2023-03-06
1   TC2$-FFA      156.11  2023-03-06

I tried

pd.json_normalize(lst_object, record_path=['dataSet', ['shortCode', ['data', 'value'], ['data', 'date']])

but it failed

Asked By: neutralname

||

Answers:

It can be confusing.

You want all values from dataSet.data so they are your records.

You can then use meta to bring in shortCode

pd.json_normalize(
   data, 
   record_path=['dataSet', 'data'], 
   meta=['shortCode']
)
    value        date shortCode
0  156.11  2023-03-06   TC2-FFA
1  156.11  2023-03-06  TC2$-FFA
Answered By: jqurious

Fix your record_path and pass parameter values for meta_path and record_prefix

pd.json_normalize(lst_object, 
                  meta=['shortCode'], 
                  record_path=['dataSet', 'data'], 
                  record_prefix='data.')

Result

   data.value   data.date shortCode
0      156.11  2023-03-06   TC2-FFA
1      156.11  2023-03-06  TC2$-FFA
Answered By: Shubham Sharma
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.