How can I import a nested json object into a pandas dataframe?

Question:

I have a json object like this:

[{'currency_pair': 'UOS_USDT',
  'orders': [{'account': 'spot',
              'amount': '1282.84',
              'create_time': '1655394430',
              'create_time_ms': 1655394430129,
              'currency_pair': 'UOS_USDT',
              'fee': '0',
              'fee_currency': 'UOS',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208865523',
              'left': '1282.84',
              'point_fee': '0',
              'price': '0.1949',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394430',
              'update_time_ms': 1655394430129}],
  'total': 1},
 {'currency_pair': 'RMRK_USDT',
  'orders': [{'account': 'spot',
              'amount': '79.365',
              'create_time': '1655394431',
              'create_time_ms': 1655394431249,
              'currency_pair': 'RMRK_USDT',
              'fee': '0',
              'fee_currency': 'RMRK',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208877018',
              'left': '79.365',
              'point_fee': '0',
              'price': '2.52',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394431',
              'update_time_ms': 1655394431249}],
  'total': 1}]

I want to convert it to a dataframe.

The data comes from an api call to a crypto exchange. I converted this to json, using the .json() method. So it’s proper json. I have tried:

df = pd.DataFrame(data)
df = pd.DataFrame(data["orders")
df = pd.DataFrame(data["currency_pair"]["orders"])

and every other imaginable path.

I want a df which has as columns ["currency_pair", "amount", "create_time", "price", "side"]

I some times get an error TypeError: list indices must be integers or slices, not str or the df works but the orders object is not unpacked. All help gratefully received. Thank you.

Asked By: user1903663

||

Answers:

import pandas as pd

data = [{'currency_pair': 'UOS_USDT',
  'orders': [{'account': 'spot',
              'amount': '1282.84',
              'create_time': '1655394430',
              'create_time_ms': 1655394430129,
              'currency_pair': 'UOS_USDT',
              'fee': '0',
              'fee_currency': 'UOS',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208865523',
              'left': '1282.84',
              'point_fee': '0',
              'price': '0.1949',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394430',
              'update_time_ms': 1655394430129}],
  'total': 1},
 {'currency_pair': 'RMRK_USDT',
  'orders': [{'account': 'spot',
              'amount': '79.365',
              'create_time': '1655394431',
              'create_time_ms': 1655394431249,
              'currency_pair': 'RMRK_USDT',
              'fee': '0',
              'fee_currency': 'RMRK',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208877018',
              'left': '79.365',
              'point_fee': '0',
              'price': '2.52',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394431',
              'update_time_ms': 1655394431249}],
  'total': 1}]

df = pd.DataFrame(data)
df['amount'] = df.apply( lambda row: row.orders[0]['amount'] , axis=1)
df['create_time'] = df.apply( lambda row: row.orders[0]['create_time'] , axis=1)
df['price'] = df.apply( lambda row: row.orders[0]['price'] , axis=1)
df['side'] = df.apply( lambda row: row.orders[0]['side'] , axis=1)
required_df = df[['currency_pair', 'amount', 'create_time', 'price', 'side']]
required_df

Result:

currency_pair   amount  create_time     price   side
0   UOS_USDT    1282.84     1655394430  0.1949  buy
1   RMRK_USDT   79.365  1655394431  2.52    buy
import pandas as pd

data = [{'currency_pair': 'UOS_USDT',
  'orders': [{'account': 'spot',
              'amount': '1282.84',
              'create_time': '1655394430',
              'create_time_ms': 1655394430129,
              'currency_pair': 'UOS_USDT',
              'fee': '0',
              'fee_currency': 'UOS',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208865523',
              'left': '1282.84',
              'point_fee': '0',
              'price': '0.1949',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394430',
              'update_time_ms': 1655394430129}],
  'total': 1},
 {'currency_pair': 'RMRK_USDT',
  'orders': [{'account': 'spot',
              'amount': '79.365',
              'create_time': '1655394431',
              'create_time_ms': 1655394431249,
              'currency_pair': 'RMRK_USDT',
              'fee': '0',
              'fee_currency': 'RMRK',
              'fill_price': '0',
              'filled_total': '0',
              'gt_discount': False,
              'gt_fee': '0',
              'iceberg': '0',
              'id': '169208877018',
              'left': '79.365',
              'point_fee': '0',
              'price': '2.52',
              'rebated_fee': '0',
              'rebated_fee_currency': 'USDT',
              'side': 'buy',
              'status': 'open',
              'text': 'apiv4',
              'time_in_force': 'gtc',
              'type': 'limit',
              'update_time': '1655394431',
              'update_time_ms': 1655394431249}],
  'total': 1}]

Use:

df = pd.json_normalize(data, record_path=['orders'])

And keep the columns you need.

It’s only one line and it should cover your case since ‘currency_pair’ that you want is already in the ‘orders’ dictionary and from what I understand from your data it will always be the same as the ‘currency_pair’ value outside ‘orders. As you said you don’t need ‘total’ too.

Use:

df = pd.json_normalize(data, record_path=['orders'], meta=['currency_pair', 'total'], record_prefix='orders_')

If you want them all

Answered By: diml

HI, hope this process can help you

#Import pandas library
import pandas as pd

#Your data 
data = [{'currency_pair': 'UOS_USDT',
         'orders': [{'account': 'spot',
                     'amount': '1282.84',
                     'create_time': '1655394430',
                     'create_time_ms': 1655394430129,
                     'currency_pair': 'UOS_USDT',
                     'fee': '0',
                     'fee_currency': 'UOS',
                     'fill_price': '0',
                     'filled_total': '0',
                     'gt_discount': False,
                     'gt_fee': '0',
                     'iceberg': '0',
                     'id': '169208865523',
                     'left': '1282.84',
                     'point_fee': '0',
                     'price': '0.1949',
                     'rebated_fee': '0',
                     'rebated_fee_currency': 'USDT',
                     'side': 'buy',
                     'status': 'open',
                     'text': 'apiv4',
                     'time_in_force': 'gtc',
                     'type': 'limit',
                     'update_time': '1655394430',
                     'update_time_ms': 1655394430129}],
         'total': 1},
        {'currency_pair': 'RMRK_USDT',
         'orders': [{'account': 'spot',
                     'amount': '79.365',
                     'create_time': '1655394431',
                     'create_time_ms': 1655394431249,
                     'currency_pair': 'RMRK_USDT',
                     'fee': '0',
                     'fee_currency': 'RMRK',
                     'fill_price': '0',
                     'filled_total': '0',
                     'gt_discount': False,
                     'gt_fee': '0',
                     'iceberg': '0',
                     'id': '169208877018',
                     'left': '79.365',
                     'point_fee': '0',
                     'price': '2.52',
                     'rebated_fee': '0',
                     'rebated_fee_currency': 'USDT',
                     'side': 'buy',
                     'status': 'open',
                     'text': 'apiv4',
                     'time_in_force': 'gtc',
                     'type': 'limit',
                     'update_time': '1655394431',
                     'update_time_ms': 1655394431249}],
         'total': 1}]


#Accessing nested values
#you cloud transform the specific column 
#into a DataFrame and access it values with indices
#then parse the value to the type you need 
#i.e
float(pd.DataFrame(data[0]['orders'])['amount'].values[0])
int(pd.DataFrame(data[0]['orders'])['create_time'].values[0])
float(pd.DataFrame(data[0]['orders'])['price'].values[0])
pd.DataFrame(data[0]['orders'])['side'].values[0]

#Create a dictionary with your chosen structure
#["currency_pair", "amount", "create_time", "price", "side"]
# then insert the corresponding columns

custom_dictionary = {
    'currency_pair': [data[0]['currency_pair'], data[1]['currency_pair']],

    'amount': [float(pd.DataFrame(data[0]['orders'])['amount'].values[0]),
               float(pd.DataFrame(data[1]['orders'])['amount'].values[0])],

    'create_time': [int(pd.DataFrame(data[0]['orders'])['create_time'].values[0]),
                    int(pd.DataFrame(data[1]['orders'])['create_time'].values[0])],

    'price': [float(pd.DataFrame(data[0]['orders'])['price'].values[0]),
              float(pd.DataFrame(data[1]['orders'])['price'].values[0])],

    'side': [pd.DataFrame(data[0]['orders'])['side'].values[0],
             pd.DataFrame(data[1]['orders'])['side'].values[0]]}

#Create a DataFrame with your custom dictionary and voila
df = pd.DataFrame(custom_dictionary)
df

the dataframe (df) could look like:

custom DataFrame

Answered By: Sigma Moon
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.