Set value based on if Column A value exists in Column B

Question:

I have a dataframe with several columns, and I’m setting up a lambda function to set another column based on a series of logic. The one that I’m stuck on is if Column A’s value exists in Column B.

Dataframe looks like this:

Supplier Department Option 1 Option 2
Acme Acme Monday Tuesday
Acme Acme Wednesday
Acme HR Monday Wednesday
Acme HR Tuesday Wednesday
Acme HR Tuesday Thursday

Etc.

I’m wanting to add a Column with text to keep a certain row or not.

full_quote_df['Keep Row'] = full_quote_df.apply(
                lambda item:
                   'Keep Row' if item.Supplier == item.Department and pd.notna(item['Option 1']) else  #If supplier and department match, and the have an option, keep
                   'Do Not Keep Row' if item.Supplier == item.Department and pd.isna(item['Option 1'])  else  #If they leave it blank, exclude it
                   'Keep Row' if item.Supplier != item.Department and item['Option 2'].isin(list(item['Option 1'].values()))  else  #If the supplier does not match the department, but the second option exists in the first option, keep row
                   'Do Not Keep Row' if item.Supplier != item.Department and item['Option 2'].notin(item['Option 1'])  else #If the supplier does not match the department, but the second option does not exist in the first option, remove row
                   None,
                axis=1
            )

So my final dataframe should look like this:

Supplier Department Option 1 Option 2 Keep Row
Acme Acme Monday Tuesday Keep
Acme Acme Wednesday Keep
Acme HR Monday Wednesday Keep
Acme HR Tuesday Wednesday Keep
Acme HR Tuesday Thursday Do Not Keep

With my above function, it gives me an error ‘str’ object has no attribute ‘isin’. I don’t want to turn the Option 2 column into a list, since I need to evaluate that specific row relative to the entire column, but I’m stumped with what to do next.

Asked By: Stephen Juza

||

Answers:

You can use numpy.where with pandas.Series.isin :

m = (df["Option 2"].isna()) | (df["Option 2"].isin(df["Option 1"]))
    
df["Keep Row"] = np.where(m, "Keep", "Do Not Keep")

# Output :

print(df)
  Supplier Department   Option 1   Option 2     Keep Row
0     Acme       Acme     Monday    Tuesday         Keep
1     Acme       Acme  Wednesday        NaN         Keep
2     Acme         HR     Monday  Wednesday         Keep
3     Acme         HR    Tuesday  Wednesday         Keep
4     Acme         HR    Tuesday   Thursday  Do Not Keep
Answered By: abokey

You’re going to have a much better time with pandas if you switch to thinking about operating on columns (series) as a whole instead of on individual elements.

You can build up Series that hold intermediate results so they aren’t recomputed each time, and it can make the logic easier to read also.

supplier_dept_match = full_quote_df['Supplier'] == full_quote_df['Department']
opt1_is_na = full_quote_df['Option 1'].isna()

# Note: the following logic is a bit unclear from your question.
# This approach will look for the option 2 values in the entire option 1 column.
# If you want to look for a certain option 1 subset, then you should clarify how this should behave.
opt2_in_any_opt1 = full_quote_df['Option 2'].isin(full_quote_df['Option 1'])

# Then, you can build up your logic all at once with boolean expressions:
keep = (supplier_dept_match & ~opt1_is_na) | (~supplier_dept_match & opt2_in_any_opt1)
full_quote_df['Keep Row'] = keep.map({True: "Keep", False: "Do Not Keep"})

Hopefully this helps!

Answered By: user5002062
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.