pandas Finding IDs that only have NAs

Question:

I am looking to get a series/DataFrame of IDs that have only NAs within the data.

The data looks something like this:

ID Score Date
1 95 1-1-2022
1 nan 1-1-2022
1 nan 2-1-2022
2 nan 1-1-2022
2 100 2-1-2022
3 nan 1-1-2022
3 nan 1-1-2022
3 nan 2-1-2022

So in this case, I would only want to grab ID 3 and have one instance of it in the newly created DF.

So far, I’ve tried:

df = pd.DataFrame({'ID':[1,1,1,2,2,3,3,3],
                'Score':[95,nan,nan,nan,100,nan,nan,nan],
                 'Date':['1-1-2022','1-1-2022','2-1-2022','1-1-2022','2-1-2022',
                         '1-1-2022','1-1-2022', '2-1-2022']})
df2 = pd.DataFrame(df.groupby('ID',dropna=False)['Score'].count())
df3 = df2[df2['Score'] == 0]

But this does not seem to work. Thanks for your help.

Asked By: Robert Calimente

||

Answers:

I dont know if this is what you mean exactly, if not please provide what you expect as ouput. But to select the rows with Nans for the score you can use:

df2 = df.loc[df.Score.isna()]

output:
enter image description here

the you can select the IDs simply with:

df3 = df2.groupby('ID').first()
df3

output:
enter image description here

Answered By: n4321d

You could create a boolean vector checking if ‘Score’ column is nan and then you can groupby your ‘ID’ column and use transform('all').

This will return True / False to all rows of the ID that are nan in all rows.

>>> df['Score'].isna().groupby(df['ID']).transform('all')

0    False
1    False
2    False
3    False
4    False
5     True
6     True
7     True
Name: Score, dtype: bool

Using this boolean you can filter your dataframe and get ID 3 in a new df:

ids_with_all_none = df[df['Score'].isna().groupby(df['ID']).transform('all')]

   ID  Score      Date
5   3    NaN  1-1-2022
6   3    NaN  1-1-2022
7   3    NaN  2-1-2022
Answered By: sophocles

It is going to work fine if you replace nan with None. Try this out:

df = pd.DataFrame({'ID':[1,1,1,2,2,3,3,3],
                'Score':[95,None,None,None,100,None,None,None],
                 'Date':['1-1-2022','1-1-2022','2-1-2022','1-1-2022','2-1-2022',
                         '1-1-2022','1-1-2022', '2-1-2022']})
df2 = pd.DataFrame(df.groupby('ID',dropna=False, as_index=False)['Score'].count())
df3 = df2[df2['Score'] == 0]
print(df3['ID'])
Answered By: milkwithfish