Python pandas apply function if a column value is not NULL
Question:
I have a dataframe (in Python 2.7, pandas 0.15.0):
df=
A B C
0 NaN 11 NaN
1 two NaN ['foo', 'bar']
2 three 33 NaN
I want to apply a simple function for rows that does not contain NULL values in a specific column. My function is as simple as possible:
def my_func(row):
print row
And my apply code is the following:
df[['A','B']].apply(lambda x: my_func(x) if(pd.notnull(x[0])) else x, axis = 1)
It works perfectly. If I want to check column ‘B’ for NULL values the pd.notnull()
works perfectly as well. But if I select column ‘C’ that contains list objects:
df[['A','C']].apply(lambda x: my_func(x) if(pd.notnull(x[1])) else x, axis = 1)
then I get the following error message: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1')
Does anybody know why pd.notnull()
works only for integer and string columns but not for ‘list columns’?
And is there a nicer way to check for NULL values in column ‘C’ instead of this:
df[['A','C']].apply(lambda x: my_func(x) if(str(x[1]) != 'nan') else x, axis = 1)
Thank you!
Answers:
The problem is that pd.notnull(['foo', 'bar'])
operates elementwise and returns array([ True, True], dtype=bool)
. Your if condition trys to convert that to a boolean, and that’s when you get the exception.
To fix it, you could simply wrap the isnull statement with np.all
:
df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)
Now you’ll see that np.all(pd.notnull(['foo', 'bar']))
is indeed True
.
Also another way is to just use row.notnull().all()
(without numpy
), here is an example:
df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)
Here is a complete example on your df:
>>> d = {'A': [None, 2, 3, 4], 'B': [11, None, 33, 4], 'C': [None, ['a','b'], None, 4]}
>>> df = pd.DataFrame(d)
>>> df
A B C
0 NaN 11.0 None
1 2.0 NaN [a, b]
2 3.0 33.0 None
3 4.0 4.0 4
>>> def func1(r):
... return 'No'
...
>>> def func2(r):
... return 'Yes'
...
>>> df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)
0 Yes
1 Yes
2 Yes
3 No
And a friendlier screenshot 🙂
I had a column contained lists and NaN
s. So, the next one worked for me.
df.C.map(lambda x: my_func(x) if type(x) == list else x)
Try…
df['a'] = df['a'].apply(lambda x: x.replace(',',',') if x != None else x)
this example just adds an escape character to a comma if the value is not None
If you have a string and want to apply function like this example:
'September 25, 2021'
df['Year'] = df['date_added'].apply(lambda x : re.split(' |,', x)[-1] if isinstance(x, str) else np.nan)
df['Month'] = df['date_added'].apply(lambda x : re.split(' |,', x)[0] if isinstance(x, str) else np.nan )
You can apply this way and use isinstance(x, str)
to avoid NaN
or any other type, you can also use type()
like this.
df['Year'] = df['date_added'].apply(lambda x : re.split(' |,', x)[-1] if type(x)==str else np.nan )
Add the Following IF condition which returns NONE when it’s TRUE
def funtion_name(input):
if (pd.isnull(input)==False)
return np.NAN
//Rest funtion code//
The below will work for different data types.
df=
col_1 col_2
0 1 NaN
1 three seven
2 NaN NaN
3 [4,5] 2
It can be done with map, for eg to replace NON-NULL values in col_1
:
def my_func(n):
return 'func'
df.loc[df['col_1'].notnull(), 'col_1'] = df['col_1'].map(my_func)
df =
col_1 col_2
0 func NaN
1 func seven
2 NaN NaN
3 func func
I have a dataframe (in Python 2.7, pandas 0.15.0):
df=
A B C
0 NaN 11 NaN
1 two NaN ['foo', 'bar']
2 three 33 NaN
I want to apply a simple function for rows that does not contain NULL values in a specific column. My function is as simple as possible:
def my_func(row):
print row
And my apply code is the following:
df[['A','B']].apply(lambda x: my_func(x) if(pd.notnull(x[0])) else x, axis = 1)
It works perfectly. If I want to check column ‘B’ for NULL values the pd.notnull()
works perfectly as well. But if I select column ‘C’ that contains list objects:
df[['A','C']].apply(lambda x: my_func(x) if(pd.notnull(x[1])) else x, axis = 1)
then I get the following error message: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1')
Does anybody know why pd.notnull()
works only for integer and string columns but not for ‘list columns’?
And is there a nicer way to check for NULL values in column ‘C’ instead of this:
df[['A','C']].apply(lambda x: my_func(x) if(str(x[1]) != 'nan') else x, axis = 1)
Thank you!
The problem is that pd.notnull(['foo', 'bar'])
operates elementwise and returns array([ True, True], dtype=bool)
. Your if condition trys to convert that to a boolean, and that’s when you get the exception.
To fix it, you could simply wrap the isnull statement with np.all
:
df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)
Now you’ll see that np.all(pd.notnull(['foo', 'bar']))
is indeed True
.
Also another way is to just use row.notnull().all()
(without numpy
), here is an example:
df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)
Here is a complete example on your df:
>>> d = {'A': [None, 2, 3, 4], 'B': [11, None, 33, 4], 'C': [None, ['a','b'], None, 4]}
>>> df = pd.DataFrame(d)
>>> df
A B C
0 NaN 11.0 None
1 2.0 NaN [a, b]
2 3.0 33.0 None
3 4.0 4.0 4
>>> def func1(r):
... return 'No'
...
>>> def func2(r):
... return 'Yes'
...
>>> df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)
0 Yes
1 Yes
2 Yes
3 No
And a friendlier screenshot 🙂
I had a column contained lists and NaN
s. So, the next one worked for me.
df.C.map(lambda x: my_func(x) if type(x) == list else x)
Try…
df['a'] = df['a'].apply(lambda x: x.replace(',',',') if x != None else x)
this example just adds an escape character to a comma if the value is not None
If you have a string and want to apply function like this example:
'September 25, 2021'
df['Year'] = df['date_added'].apply(lambda x : re.split(' |,', x)[-1] if isinstance(x, str) else np.nan)
df['Month'] = df['date_added'].apply(lambda x : re.split(' |,', x)[0] if isinstance(x, str) else np.nan )
You can apply this way and use isinstance(x, str)
to avoid NaN
or any other type, you can also use type()
like this.
df['Year'] = df['date_added'].apply(lambda x : re.split(' |,', x)[-1] if type(x)==str else np.nan )
Add the Following IF condition which returns NONE when it’s TRUE
def funtion_name(input):
if (pd.isnull(input)==False)
return np.NAN
//Rest funtion code//
The below will work for different data types.
df=
col_1 col_2
0 1 NaN
1 three seven
2 NaN NaN
3 [4,5] 2
It can be done with map, for eg to replace NON-NULL values in col_1
:
def my_func(n):
return 'func'
df.loc[df['col_1'].notnull(), 'col_1'] = df['col_1'].map(my_func)
df =
col_1 col_2
0 func NaN
1 func seven
2 NaN NaN
3 func func