Finding NaN values in a data frame using column names located in a dictionary

Question:

I am trying to find all the nan values in certain columns and then print a statement saying that it found nan entries in those columns.

import pandas as pd
import numpy as np

data=[[np.nan, 'Indiana','[email protected]']]
df=pd.DataFrame(data,columns=['Name','State','Email'])

req_dict={"Name","Email"}

here is the sample data frame and notice how Name and Email are required but State is NOT this is because not all columns are required to have values in them

I have tried to write a function to do this but it is not working as intended

def req_cols(df,req_dict):
   for d in req_dict:
       for i in df.index():
           if df.loc[i,d]== pd.notnull():
               print('a blank was found in' + d)
   return

I understand a function is overkill for this but makes sense in the actual project.

I expect to get a print statement saying "a blank was found in Name"

How do I create a function to find blanks in a df by using the column names in a separate dictionary

Asked By: surf tastic

||

Answers:

Try using .isna() + .any():

for c in req_dict:
    if df[c].isna().any():
        print("a blank was found in", c)

Prints:

a blank was found in Name

Complete example:

data = [[np.nan, "Indiana", "[email protected]"]]
df = pd.DataFrame(data, columns=["Name", "State", "Email"])

req_dict = {"Name", "Email"}


def check_nan_columns(df, cols):
    out = []
    for c in cols:
        if df[c].isna().any():
            out.append(c)
    return out


for c in check_nan_columns(df, req_dict):
    print("a blank was found in", c)
Answered By: Andrej Kesely