How to convert numpy array saved as string in pandas csv file back to a numpy array?

Question:

I saved with pandas a numpy array to a csv file:

df['Feature5'][i] = str(ri_RGBA.tolist())
df.to_csv(r'H:test.csv')

The csv file has the following structure:

Feature1,Feature2,Feature3,Feature4,Labels,Feature5
13.37,33.09,-0.08,992.2,nass,"[[1, 160, 246, 255], … ,[1, 160, 246, 255]]"
26.37,33.03,-0.08,992.2,trocken,"[[110, 160, 246, 255], … ,[20, 160, 246, 255]]"

Now I’m trying to convert the string "[[1, 160, 246, 255], …" back to a numpy array:

data = df['Feature5'].apply(lambda x: 
                           np.fromstring(
                               x.replace('n','')
                                .replace('"','')
                                .replace('[','')
                                .replace(']','')
                                .replace('  ',' ')
                                .replace(' ',''), sep=','))

But print(data.dtypes) still returns me type ‘object’. What am I missing? Any ideas how I could make this work?

Help will be much appreciated.

Asked By: Moe

||

Answers:

Something like this should get you there.

import ast
my_list = ast.literal_eval(df['Feature5'][i])
data = np.array(my_list)

Note, that literal_eval is horribly unsafe and should not be called with unchecked user input.

On that note: Why do you save your data like that? Numpy arrays are best stored via np.save or – if you insist on human-readable csv – as a column in your dataframe like that:

df['Feature5'] = pd.Series(data)

If your data is actual RGBA image data, I would suggest to save the images as PNGs or numpy arrays via np.save and just storing a filename in the csv.
Your approach is slow, fragile, hard to understand and hard to maintain.

Answered By: Robert Bock
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.