Loading np.array from csv dataframe

Question:

I have a dataframe with columns values that are np.arrays. For example

df = pd.DataFrame([{"id":1, "sample": np.array([1,2,3])}, {"id":2, "sample": np.array([2,3,4])}])
df.to_csv("./tmp.csv", index=False)

if I save df to csv and load it again I get "sample" column as strings.

df_from_csv = pd.read_csv("./tmp.csv")   
df_from_csv == pd.DataFrame([{"id":1, "sample": '[1 2 3]')}, {"id":2, "sample": '[2 3 4]')}])
True

Is there a better way to save/load my data that does no requiere manually passing ‘[1 2 3]’ to ist corresponding array?

Asked By: esantix

||

Answers:

You can use a converter in read_csv:

import numpy as np
from ast import literal_eval
import re

def to_array(x):
    return np.array(literal_eval(re.sub('s+', ',', x)))

df_from_csv = pd.read_csv("./tmp.csv", converters={'sample': to_array}) 

#    id     sample
# 0   1  [1, 2, 3]
# 1   2  [2, 3, 4]

df_from_csv.loc[0, 'sample']

# array([1, 2, 3])
Answered By: mozway
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.