Pandas covert nan values to None in string list before literal_eval and convert back to np.nan
Question:
I have a dataframe with a few series that contain lists of floats that includes nan values. Eg.
s[0] = '[1.21, 1.21, nan, nan, 100]'
These strings I want to convert to lists using literal_eval
. When I try I get the error ValueError: malformed node or string on line 1:
because as per the docs, nan
values cannot be converted as these values are not recognised.
What is the best way of converting the nan
values within the string, to None
and then converting back to np.nan
values after applying literal_eval
?
Answers:
Solution is like described in a question, but you get None
s instead NaN
s:
s.str.replace('nan', 'None', regex=True).apply(ast.literal_eval)
If you need np.nan
s use custom function:
def convert(x):
out = []
for y in x.strip('[]').split(', '):
try:
out.append(ast.literal_eval(y))
except:
out.append(np.nan)
return out
s.apply(convert)
Another idea would be to convert all values to floats:
f = lambda x: [float(y) for y in x.strip('[]').split(', ')]
s.apply(f)
pd.Series([[float(y) for y in x.strip('[]').split(', ')] for x in s],
index=s.index)
Adapting jezrael’s answer, a one-liner to incorporate converting a series of lists nan to None, converting to list using literal_eval and back to nan is:
df['col'] = df['col'].str.replace('nan', 'None', regex=True).apply(ast.literal_eval).apply(lambda row: [np.nan if x is None else x for x in row])
I have a dataframe with a few series that contain lists of floats that includes nan values. Eg.
s[0] = '[1.21, 1.21, nan, nan, 100]'
These strings I want to convert to lists using literal_eval
. When I try I get the error ValueError: malformed node or string on line 1:
because as per the docs, nan
values cannot be converted as these values are not recognised.
What is the best way of converting the nan
values within the string, to None
and then converting back to np.nan
values after applying literal_eval
?
Solution is like described in a question, but you get None
s instead NaN
s:
s.str.replace('nan', 'None', regex=True).apply(ast.literal_eval)
If you need np.nan
s use custom function:
def convert(x):
out = []
for y in x.strip('[]').split(', '):
try:
out.append(ast.literal_eval(y))
except:
out.append(np.nan)
return out
s.apply(convert)
Another idea would be to convert all values to floats:
f = lambda x: [float(y) for y in x.strip('[]').split(', ')]
s.apply(f)
pd.Series([[float(y) for y in x.strip('[]').split(', ')] for x in s],
index=s.index)
Adapting jezrael’s answer, a one-liner to incorporate converting a series of lists nan to None, converting to list using literal_eval and back to nan is:
df['col'] = df['col'].str.replace('nan', 'None', regex=True).apply(ast.literal_eval).apply(lambda row: [np.nan if x is None else x for x in row])