Unwanted Keyerror with loc in for loop

Question:

I am trying to loop through a subset of my dataframe to find all the Nan values and print the column name and row location to a dictionary.

the output should look something like this:

{'row': 2, 'column': First Name*, 'message': 'This is a required field'}

Here is the code I have so far to achieve this:

errors=[]
req_cols = ['First Name*','Last Name*','Country*','Company*','Email Address*']
bad_nan = df.loc[df[req_cols].isna().any(1)]

for col in bad_nan.columns:
    bad_nan[col] = bad_nan[col].astype('str')
    for i in range(bad_nan.shape[0]):
        if bad_nan.loc[i, col] == 'nan':    
            errors.append({ "row": i,
                            "column": col,
                            "message": "This is a required field" })

I have tried to replace == ‘nan’ with ==’np.nan’ and I still get a keyerror. It is showing me that the keyerror is found in the section of code below

if bad_nan.loc[i, col] == 'nan':

I am really stuck on why I am getting a keyerror: 0 here any help would be appreciated.

Asked By: flipping flop

||

Answers:

You were getting error because there was no row with index value 0 in the dataframe bad_nan. What we can do is instead loop through the index values itself. Also use np.NaN for filtering blank values.

import numpy as np

for col in bad_nan.columns:
    bad_nan[col] = bad_nan[col].astype('str')
    for i in bad_nan.index:
        if bad_nan.loc[i, col] == np.NaN:
            errors.append({ "row": i,
                            "column": col,
                            "message": "This is a required field" })
Answered By: Himanshuman
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.