If statements in a for loop to write to a nested dictionary pandas

Question:

I have an if statement that is checking columns named ‘Existing Customer’ if the value is True then pass but if the value is False look at the column called ‘Email Opt-In’ if Email Opt-In is blank then raise an error to the nested dictionary titled ‘errors’ Here is a sample data set:

data= {'Existing Customers': ['True', 'False', 'True', 'False', 'False'],
     'Email Opt-In': ['True', 'True', '', '','False']}
df=pd.DataFrame(data)
 
errors= {}
errors[filename]={}
filename='test'

Here is the for loop and if statements I have:

email_optin=df[["Existing Customer","Email Opt-In"]]
for col in email_optin.columns:
   for i in email_optin.index:
      if email_optin['Existing Customer'] == True:
          pass
      elif email_optin['Existing Customer']== False:
          if email_optin['Email Opt-In'].isna().any(1):
              errors[filename][err_i]={ "row": i,                                   
                  "column": col,                                                
                  "message": "Email Opt-in is a required field for prospect clients" }                       
          err_i += 1

I get the error message ‘ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().’ I have looked at previous stack overflow questions about this but they are do not include a solution to my problem.

My desired output should look like:

    Key     Type      Size                           Value
   test     dict       1        {'row': 4, 'column': Email Opt-in, 'message': Email
                                  Opt-in is a required field for prospect clients}

I have tried to do if email_optin[‘Existing Customer’].loc[I,col] == True: to solve the value error but this does not seem like the right solution and makes the if statements slower. Any ideas on how to fix this while being efficient would be great. I have not found any problems like this on the internet.

Asked By: Test Code

||

Answers:

Always look for ways to avoid iterating through rows. In this case, you can extract all the rows that satisfy your request in one shot.

bad = df[(df["Existing Customer"] == "False") & (df["Email Opt-In"] == "")]

Now bad is a dataframe that only contains the rows you are interested in.

Answered By: Tim Roberts

There are a lot many problems with your code. Your Email Opt-In has empty string values. You should fill those values with something, such as with ‘None’ string value. Second, your if-else conditional logic is incorrect. Your columns are of type ‘object’, not bool type. It is better to shorten your code with something like this:

# Replace empty string with 'None' string.
df['Email Opt-In'] = df['Email Opt-In'].apply(lambda s: "None" if not s else s)

# Locating rows and saving the resultant
result = df[(df['Existing Customers'] == 'False') & (df['Email Opt-In'] == 'None')]

# Saving result to a dict
result.to_dict()

Output:

{'Existing Customers': {3: 'False'}, 'Email Opt-In': {3: 'None'}}

Customize to_dict if need be. See to_dict() for that.

Answered By: Firelord