how to check not na and not empty list in a dataframe column?
Question:
d = {'status': {0: 'No', 1: 'No', 2: 'Yes', 3: 'No'}, 'time': {0: "['Morning', 'Midday', 'Afternoon']", 1: nan, 2: "[]", 3: nan}, 'id': {0: 1, 1: 5, 2: 2, 3: 3}}
df = pd.DataFrame(d)
df is the dataframe. All are object types.
I need to check not na and not empty list from all the columns of dataframe.
I did below attempts –
df['no_nans'] = ~pd.isna(df).any(axis = 1)
print(df['no_nans'])
True
False
True
False
It should be as below –
True
False
False
False
As the time column has [] blank list in the third row , its not checking through isna().
Is there a simple and easy way to put this check properly?
Thanks in advance for any help.
Answers:
As you have strings, you need to compare to '[]'
:
~(df.eq('[]')|df.isna()).any(axis=1)
Output:
0 True
1 False
2 False
3 False
dtype: bool
If you really had lists:
m1 = (df.select_dtypes(object)
.apply(lambda s: s.str.len().eq(0))
.reindex_like(df)
.fillna(False)
)
m2 = df.isna()
~(m1|m2).any(axis=1)
Alternative input for lists:
d = {'status': {0: 'No', 1: 'No', 2: 'Yes', 3: 'No'}, 'time': {0: ['Morning', 'Midday', 'Afternoon'], 1: nan, 2: [], 3: nan}, 'id': {0: 1, 1: 5, 2: 2, 3: 3}}
df = pd.DataFrame(d)
Since you sometimes have empty lists instead of NaN, you can replace [] by Nan to get expected result like so :
df = df.replace('[]', np.nan)
df['no_nans'] = ~pd.isna(df).any(axis = 1)
output :
0 True
1 False
2 False
3 False
If empty lists/tuples/sets/ strings select these columns by DataFrame.select_dtypes
, convert to booleans for False
s if empty and last add missing non object columns by DataFrame.reindex
, chain another mask by &
for bitwise AND
and check if all Trues per rows by DataFrame.all
:
m = (df.select_dtypes(object).astype(bool).reindex(df.columns, axis=1, fill_value=True) &
df.notna()).all(axis=1)
print (m)
0 True
1 False
2 False
3 False
dtype: bool
Details:
print (df.select_dtypes(object))
status time
0 0 ['Morning', 'Midday', 'Afternoon']
1 No NaN
2 Yes []
3 No NaN
print (df.select_dtypes(object).astype(bool))
status time
0 True True
1 True True
2 True False
3 True True
print (df.select_dtypes(object).astype(bool).reindex(df.columns, axis=1, fill_value=True))
status time id
0 True True True
1 True True True
2 True False True
3 True True True
d = {'status': {0: 'No', 1: 'No', 2: 'Yes', 3: 'No'}, 'time': {0: "['Morning', 'Midday', 'Afternoon']", 1: nan, 2: "[]", 3: nan}, 'id': {0: 1, 1: 5, 2: 2, 3: 3}}
df = pd.DataFrame(d)
df is the dataframe. All are object types.
I need to check not na and not empty list from all the columns of dataframe.
I did below attempts –
df['no_nans'] = ~pd.isna(df).any(axis = 1)
print(df['no_nans'])
True
False
True
False
It should be as below –
True
False
False
False
As the time column has [] blank list in the third row , its not checking through isna().
Is there a simple and easy way to put this check properly?
Thanks in advance for any help.
As you have strings, you need to compare to '[]'
:
~(df.eq('[]')|df.isna()).any(axis=1)
Output:
0 True
1 False
2 False
3 False
dtype: bool
If you really had lists:
m1 = (df.select_dtypes(object)
.apply(lambda s: s.str.len().eq(0))
.reindex_like(df)
.fillna(False)
)
m2 = df.isna()
~(m1|m2).any(axis=1)
Alternative input for lists:
d = {'status': {0: 'No', 1: 'No', 2: 'Yes', 3: 'No'}, 'time': {0: ['Morning', 'Midday', 'Afternoon'], 1: nan, 2: [], 3: nan}, 'id': {0: 1, 1: 5, 2: 2, 3: 3}}
df = pd.DataFrame(d)
Since you sometimes have empty lists instead of NaN, you can replace [] by Nan to get expected result like so :
df = df.replace('[]', np.nan)
df['no_nans'] = ~pd.isna(df).any(axis = 1)
output :
0 True
1 False
2 False
3 False
If empty lists/tuples/sets/ strings select these columns by DataFrame.select_dtypes
, convert to booleans for False
s if empty and last add missing non object columns by DataFrame.reindex
, chain another mask by &
for bitwise AND
and check if all Trues per rows by DataFrame.all
:
m = (df.select_dtypes(object).astype(bool).reindex(df.columns, axis=1, fill_value=True) &
df.notna()).all(axis=1)
print (m)
0 True
1 False
2 False
3 False
dtype: bool
Details:
print (df.select_dtypes(object))
status time
0 0 ['Morning', 'Midday', 'Afternoon']
1 No NaN
2 Yes []
3 No NaN
print (df.select_dtypes(object).astype(bool))
status time
0 True True
1 True True
2 True False
3 True True
print (df.select_dtypes(object).astype(bool).reindex(df.columns, axis=1, fill_value=True))
status time id
0 True True True
1 True True True
2 True False True
3 True True True