pandas Finding IDs that only have NAs

Question

I am looking to get a series/DataFrame of IDs that have only NAs within the data.

The data looks something like this:

ID	Score	Date
1	95	1-1-2022
1	nan	1-1-2022
1	nan	2-1-2022
2	nan	1-1-2022
2	100	2-1-2022
3	nan	1-1-2022
3	nan	1-1-2022
3	nan	2-1-2022

So in this case, I would only want to grab ID 3 and have one instance of it in the newly created DF.

So far, I’ve tried:

df = pd.DataFrame({'ID':[1,1,1,2,2,3,3,3],
                'Score':[95,nan,nan,nan,100,nan,nan,nan],
                 'Date':['1-1-2022','1-1-2022','2-1-2022','1-1-2022','2-1-2022',
                         '1-1-2022','1-1-2022', '2-1-2022']})
df2 = pd.DataFrame(df.groupby('ID',dropna=False)['Score'].count())
df3 = df2[df2['Score'] == 0]

But this does not seem to work. Thanks for your help.

Asked By: Robert Calimente

||

Source

Answer 1

I dont know if this is what you mean exactly, if not please provide what you expect as ouput. But to select the rows with Nans for the score you can use:

df2 = df.loc[df.Score.isna()]

output:

the you can select the IDs simply with:

df3 = df2.groupby('ID').first()
df3

output:

Answered By: n4321d

Answer 2

You could create a boolean vector checking if ‘Score’ column is nan and then you can groupby your ‘ID’ column and use transform('all').

This will return True / False to all rows of the ID that are nan in all rows.

>>> df['Score'].isna().groupby(df['ID']).transform('all')

0    False
1    False
2    False
3    False
4    False
5     True
6     True
7     True
Name: Score, dtype: bool

Using this boolean you can filter your dataframe and get ID 3 in a new df:

ids_with_all_none = df[df['Score'].isna().groupby(df['ID']).transform('all')]

   ID  Score      Date
5   3    NaN  1-1-2022
6   3    NaN  1-1-2022
7   3    NaN  2-1-2022

Answered By: sophocles

Answer 3

It is going to work fine if you replace nan with None. Try this out:

df = pd.DataFrame({'ID':[1,1,1,2,2,3,3,3],
                'Score':[95,None,None,None,100,None,None,None],
                 'Date':['1-1-2022','1-1-2022','2-1-2022','1-1-2022','2-1-2022',
                         '1-1-2022','1-1-2022', '2-1-2022']})
df2 = pd.DataFrame(df.groupby('ID',dropna=False, as_index=False)['Score'].count())
df3 = df2[df2['Score'] == 0]
print(df3['ID'])

Answered By: milkwithfish

pandas Finding IDs that only have NAs

Question:

Answers: