Python/Pandas – Check if multiple columns has any of three items in a list
Question:
I am creating a binary target variable that is 1 if any of these select columns (Current, Month1, Month2, Month3, Month4, Month5, Month6) has any number of these three strings (‘Item1’, ‘Item2’, ‘Item3’). If none of these columns have these items then the target variable for this row will be populated with 0. For example:
Current
Month1
Month2
Month3
Month4
Month5
Month6
Target
Item8
Item7
Item7
Item8
Item8
Item8
Item8
0
Item3
Item4
Item4
Item4
Item4
Item4
Item4
1
Item3
Item4
Item1
Item2
Item3
Item3
Item3
1
For the first row, none of the columns Current-Month6 contain either Item1, Item2, or Item3 so the Target is 0. For row 2, At least one of the key items (Item3) is in at least one of the select columns (Current) so the Target is 1. And finally, Item1, Item2, and Item3, are throughout the 6 columns so the target is 1.
I currently have this loop set up that looks at all the select columns in each row to see if they contain any of the three items. If so, the list is appends a 1 at the index, otherwise 0:
Target = []
for i in range(df.shape[0]):
if (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item1')).any()) == True) or (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item2')).any()) == True) or (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item3')).any()) == True):
Target.append(1)
else:
Target.append(0)
I was wondering if there was a faster/more efficient way to do this, or a way to include multiple items in the .eq() function?
Thanks!
Answers:
df = pd.read_clipboard()
# get str cols
str_cols = df.select_dtypes(include=["object", "string"])
# find any number of references
row_ = str_cols.apply(lambda row: row.str.contains(r'Item1|Item2|Item3', regex=True, case=False, na=False)).any(axis=1)
# assign to target
df['target'] = np.where(row_.where(lambda x: x).notna(), 1, 0)
I am creating a binary target variable that is 1 if any of these select columns (Current, Month1, Month2, Month3, Month4, Month5, Month6) has any number of these three strings (‘Item1’, ‘Item2’, ‘Item3’). If none of these columns have these items then the target variable for this row will be populated with 0. For example:
Current | Month1 | Month2 | Month3 | Month4 | Month5 | Month6 | Target |
---|---|---|---|---|---|---|---|
Item8 | Item7 | Item7 | Item8 | Item8 | Item8 | Item8 | 0 |
Item3 | Item4 | Item4 | Item4 | Item4 | Item4 | Item4 | 1 |
Item3 | Item4 | Item1 | Item2 | Item3 | Item3 | Item3 | 1 |
For the first row, none of the columns Current-Month6 contain either Item1, Item2, or Item3 so the Target is 0. For row 2, At least one of the key items (Item3) is in at least one of the select columns (Current) so the Target is 1. And finally, Item1, Item2, and Item3, are throughout the 6 columns so the target is 1.
I currently have this loop set up that looks at all the select columns in each row to see if they contain any of the three items. If so, the list is appends a 1 at the index, otherwise 0:
Target = []
for i in range(df.shape[0]):
if (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item1')).any()) == True) or (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item2')).any()) == True) or (((df[['Current', 'Month1', 'Month2', 'Month3', 'Month4', 'Month5', 'Month6']].iloc[i].eq('Item3')).any()) == True):
Target.append(1)
else:
Target.append(0)
I was wondering if there was a faster/more efficient way to do this, or a way to include multiple items in the .eq() function?
Thanks!
df = pd.read_clipboard()
# get str cols
str_cols = df.select_dtypes(include=["object", "string"])
# find any number of references
row_ = str_cols.apply(lambda row: row.str.contains(r'Item1|Item2|Item3', regex=True, case=False, na=False)).any(axis=1)
# assign to target
df['target'] = np.where(row_.where(lambda x: x).notna(), 1, 0)