Select and/or replace specific array inside pandas dataframe
Question:
Here is my reproducible example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'x' : [np.zeros(2), np.array([1,2])], 'y' : [np.array([3,2]),0], 'z' : [np.array([4,5]),np.zeros(2)], 't' : [np.array([3,4]),np.array([4,5])]})
My goal is to change np.zeros(2)
to np.Nan
so as to be able to compute the mean two-dimensional array for each row excluding 0.
I have tried:
df.replace(np.zeros(2),np.NaN)
df[df.eq(np.zeros(2)).any(axis=1)]
df.where(df == [np.zeros(2)])
df[df == np.zeros(2)]
all of which are expected to worked had the item I am looking not been an array.
Obviously, being new at Python, there must be a concept that I am not grasping.
Answers:
You can’t vectorize with objects as values.
Use applymap
and numpy.array_equal
:
df[df.applymap(lambda x: np.array_equal(x, np.zeros(2)))] = np.nan
Updated df
:
x y z t
0 NaN [3, 2] [4, 5] [3, 4]
1 [1, 2] 0 NaN [4, 5]
Alternative with allclose
:
df[df.applymap(lambda x: np.allclose(x, 0))] = np.nan
Output:
x y z t
0 NaN [3, 2] [4, 5] [3, 4]
1 [1, 2] NaN NaN [4, 5]
Here is my reproducible example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'x' : [np.zeros(2), np.array([1,2])], 'y' : [np.array([3,2]),0], 'z' : [np.array([4,5]),np.zeros(2)], 't' : [np.array([3,4]),np.array([4,5])]})
My goal is to change np.zeros(2)
to np.Nan
so as to be able to compute the mean two-dimensional array for each row excluding 0.
I have tried:
df.replace(np.zeros(2),np.NaN)
df[df.eq(np.zeros(2)).any(axis=1)]
df.where(df == [np.zeros(2)])
df[df == np.zeros(2)]
all of which are expected to worked had the item I am looking not been an array.
Obviously, being new at Python, there must be a concept that I am not grasping.
You can’t vectorize with objects as values.
Use applymap
and numpy.array_equal
:
df[df.applymap(lambda x: np.array_equal(x, np.zeros(2)))] = np.nan
Updated df
:
x y z t
0 NaN [3, 2] [4, 5] [3, 4]
1 [1, 2] 0 NaN [4, 5]
Alternative with allclose
:
df[df.applymap(lambda x: np.allclose(x, 0))] = np.nan
Output:
x y z t
0 NaN [3, 2] [4, 5] [3, 4]
1 [1, 2] NaN NaN [4, 5]