how to check not na and not empty list in a dataframe column?

Question:

d = {'status': {0: 'No', 1: 'No', 2: 'Yes', 3: 'No'}, 'time': {0: "['Morning', 'Midday', 'Afternoon']", 1: nan, 2: "[]", 3: nan}, 'id': {0: 1, 1: 5, 2: 2, 3: 3}}
df = pd.DataFrame(d)

df is the dataframe. All are object types.

I need to check not na and not empty list from all the columns of dataframe.
I did below attempts –

df['no_nans'] = ~pd.isna(df).any(axis = 1)
print(df['no_nans'])

True
False
True
False

It should be as below –

True
False
False
False

As the time column has [] blank list in the third row , its not checking through isna().

Is there a simple and easy way to put this check properly?
Thanks in advance for any help.

Asked By: deepu2711

||

Answers:

As you have strings, you need to compare to '[]':

~(df.eq('[]')|df.isna()).any(axis=1)

Output:

0     True
1    False
2    False
3    False
dtype: bool

If you really had lists:

m1 = (df.select_dtypes(object)
        .apply(lambda s: s.str.len().eq(0))
        .reindex_like(df)
        .fillna(False)
      )

m2 = df.isna()

~(m1|m2).any(axis=1)

Alternative input for lists:

d = {'status': {0: 'No', 1: 'No', 2: 'Yes', 3: 'No'}, 'time': {0: ['Morning', 'Midday', 'Afternoon'], 1: nan, 2: [], 3: nan}, 'id': {0: 1, 1: 5, 2: 2, 3: 3}}
df = pd.DataFrame(d)
Answered By: mozway

Since you sometimes have empty lists instead of NaN, you can replace [] by Nan to get expected result like so :

df = df.replace('[]', np.nan)
df['no_nans'] = ~pd.isna(df).any(axis = 1)

output :

0     True
1    False
2    False
3    False
Answered By: grymlin

If empty lists/tuples/sets/ strings select these columns by DataFrame.select_dtypes, convert to booleans for Falses if empty and last add missing non object columns by DataFrame.reindex, chain another mask by & for bitwise AND and check if all Trues per rows by DataFrame.all:

m = (df.select_dtypes(object).astype(bool).reindex(df.columns, axis=1, fill_value=True) & 
     df.notna()).all(axis=1)
print (m)
0     True
1    False
2    False
3    False
dtype: bool

Details:

print (df.select_dtypes(object))
  status                                time
0      0  ['Morning', 'Midday', 'Afternoon']
1     No                                 NaN
2    Yes                                  []
3     No                                 NaN

print (df.select_dtypes(object).astype(bool))
   status   time
0    True   True
1    True   True
2    True  False
3    True   True

print (df.select_dtypes(object).astype(bool).reindex(df.columns, axis=1, fill_value=True))
   status   time    id
0    True   True  True
1    True   True  True
2    True  False  True
3    True   True  True
Answered By: jezrael
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.