How to convert numpy array saved as string in pandas csv file back to a numpy array?
Question:
I saved with pandas a numpy array to a csv file:
df['Feature5'][i] = str(ri_RGBA.tolist())
df.to_csv(r'H:test.csv')
The csv file has the following structure:
Feature1,Feature2,Feature3,Feature4,Labels,Feature5
13.37,33.09,-0.08,992.2,nass,"[[1, 160, 246, 255], … ,[1, 160, 246, 255]]"
26.37,33.03,-0.08,992.2,trocken,"[[110, 160, 246, 255], … ,[20, 160, 246, 255]]"
Now I’m trying to convert the string "[[1, 160, 246, 255], …" back to a numpy array:
data = df['Feature5'].apply(lambda x:
np.fromstring(
x.replace('n','')
.replace('"','')
.replace('[','')
.replace(']','')
.replace(' ',' ')
.replace(' ',''), sep=','))
But print(data.dtypes)
still returns me type ‘object’. What am I missing? Any ideas how I could make this work?
Help will be much appreciated.
Answers:
Something like this should get you there.
import ast
my_list = ast.literal_eval(df['Feature5'][i])
data = np.array(my_list)
Note, that literal_eval
is horribly unsafe and should not be called with unchecked user input.
On that note: Why do you save your data like that? Numpy arrays are best stored via np.save
or – if you insist on human-readable csv – as a column in your dataframe like that:
df['Feature5'] = pd.Series(data)
If your data is actual RGBA image data, I would suggest to save the images as PNGs or numpy arrays via np.save
and just storing a filename in the csv.
Your approach is slow, fragile, hard to understand and hard to maintain.
I saved with pandas a numpy array to a csv file:
df['Feature5'][i] = str(ri_RGBA.tolist())
df.to_csv(r'H:test.csv')
The csv file has the following structure:
Feature1,Feature2,Feature3,Feature4,Labels,Feature5
13.37,33.09,-0.08,992.2,nass,"[[1, 160, 246, 255], … ,[1, 160, 246, 255]]"
26.37,33.03,-0.08,992.2,trocken,"[[110, 160, 246, 255], … ,[20, 160, 246, 255]]"
Now I’m trying to convert the string "[[1, 160, 246, 255], …" back to a numpy array:
data = df['Feature5'].apply(lambda x:
np.fromstring(
x.replace('n','')
.replace('"','')
.replace('[','')
.replace(']','')
.replace(' ',' ')
.replace(' ',''), sep=','))
But print(data.dtypes)
still returns me type ‘object’. What am I missing? Any ideas how I could make this work?
Help will be much appreciated.
Something like this should get you there.
import ast
my_list = ast.literal_eval(df['Feature5'][i])
data = np.array(my_list)
Note, that literal_eval
is horribly unsafe and should not be called with unchecked user input.
On that note: Why do you save your data like that? Numpy arrays are best stored via np.save
or – if you insist on human-readable csv – as a column in your dataframe like that:
df['Feature5'] = pd.Series(data)
If your data is actual RGBA image data, I would suggest to save the images as PNGs or numpy arrays via np.save
and just storing a filename in the csv.
Your approach is slow, fragile, hard to understand and hard to maintain.