Python/Pandas – Check if multiple columns has any of three items in a list

Question

I am creating a binary target variable that is 1 if any of these select columns (Current, Month1, Month2, Month3, Month4, Month5, Month6) has any number of these three strings (‘Item1’, ‘Item2’, ‘Item3’). If none of these columns have these items then the target variable for this row will be populated with 0. For example:

Current	Month1	Month2	Month3	Month4	Month5	Month6	Target
Item8	Item7	Item7	Item8	Item8	Item8	Item8	0
Item3	Item4	Item4	Item4	Item4	Item4	Item4	1
Item3	Item4	Item1	Item2	Item3	Item3	Item3	1

For the first row, none of the columns Current-Month6 contain either Item1, Item2, or Item3 so the Target is 0. For row 2, At least one of the key items (Item3) is in at least one of the select columns (Current) so the Target is 1. And finally, Item1, Item2, and Item3, are throughout the 6 columns so the target is 1.

I currently have this loop set up that looks at all the select columns in each row to see if they contain any of the three items. If so, the list is appends a 1 at the index, otherwise 0:

Target = []
for i in range(df.shape[0]):
    if (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item1')).any()) == True) or  (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item2')).any()) == True) or (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item3')).any()) == True):
        Target.append(1)
    else:
        Target.append(0)

I was wondering if there was a faster/more efficient way to do this, or a way to include multiple items in the .eq() function?

Thanks!

Asked By: BigDawg007

||

Source

Answer 1

df = pd.read_clipboard()

# get str cols
str_cols = df.select_dtypes(include=["object", "string"])

# find any number of references
row_ = str_cols.apply(lambda row: row.str.contains(r'Item1|Item2|Item3', regex=True, case=False, na=False)).any(axis=1)

# assign to target
df['target'] = np.where(row_.where(lambda x: x).notna(), 1, 0)

Answered By: John Stud

Python/Pandas – Check if multiple columns has any of three items in a list

Question:

Answers: