How to convert columns of numpy arrays to lists when using .to_dict

Question:

I would like to take my pandas Dataframe and convert it to a list of dictionaries. I can do this using the pandas to_dict('records') function. However, this function takes any column values that are lists and returns numpy arrays. I would like for the content of the returned list of dictionaries to be base python objects rather than numpy arrays.

I understand I could iterate my outputted dictionaries but I was wondering if there is something more clever to do this.

Here is some sample code that shows my problem:

import pandas as pd
import numpy as np


data = pd.concat([
    pd.Series(['a--b', 'c--d', 'e--f'], name='key'),
    pd.Series(['123', '456', '789'], name='code'),
    pd.Series([np.array(['123', '098']), np.array(['000', '999']), np.array(['789', '432'])], name='codes')
    ], axis=1)

output = data.to_dict('records')

# this prints <class 'numpy.ndarray'>
print(type(output[0]['codes']))

output, in this case, looks like this:

[{'key': 'a--b', 'code': '123', 'codes': array(['123', '098'], dtype='<U3')},
 {'key': 'c--d', 'code': '456', 'codes': array(['000', '999'], dtype='<U3')},
 {'key': 'e--f', 'code': '789', 'codes': array(['789', '432'], dtype='<U3')}]

I would like for that print statement to print a list. I understand I could simply do the following:

for row in output:
    row['codes'] = row['codes'].tolist()

# this now prints <class 'list'>, which is what I want
print(type(output[0]['codes']))

However, my dataframe is of course much more complicated than this, and I have multiple columns that are numpy arrays. I know I could expand the snippet above to check which columns are array type and cast them using tolist(), but I’m wondering if there is something snappier or more clever? Perhaps something provided by Pandas that is optimized?

To be clear, here is the output I need to have:

print(output)
[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']},
 {'key': 'c--d', 'code': '456', 'codes': ['000', '999']},
 {'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]
Asked By: Katya Willard

||

Answers:

Let us first use applymap to convert numpy array’s to python lists, then use to_dict

cols = ['codes']
data.assign(**data[cols].applymap(list)).to_dict('records')

[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']},
 {'key': 'c--d', 'code': '456', 'codes': ['000', '999']},
 {'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]
Answered By: Shubham Sharma

I ended up creating a list of the numpy-typed column names:

np_fields = ['codes']

and then I replaced each field in place in my dataframe:

for col in np_fields:
    data[col] = data[col].map(np.ndarray.tolist)

I then called data.to_dict('records') once that was complete.

Answered By: Katya Willard