Create an indicator flag based on column values of multiple columns in python
Question:
I have the below data frame in pandas:
This table includes product details held by each customer.
- Input Table:
I want to create a flag/indicator named "Apple_ind" which adds 1 or 0 based on the apple (Red or Green apples) product held by each customer. So the resultant data frame will be like:
- Output Table:
Answers:
Use DataFrame.isin
with DataFrame.any
if need test exact match at least in one column:
df['Apple_ind'] = df.isin(['Green Apple','Red Apple']).any(axis=1).astype(int)
Alternative:
df['Apple_ind'] = np.where(df.isin(['Green Apple','Red Apple']).any(axis=1), 1, 0)
If need check apple
substring use Series.str.contains
for non numeric columns:
df['Apple_ind'] = (df.select_dtypes('object')
.apply(lambda x: x.str.contains('apple'), case=False)
.any(axis=1)
.astype(int))
If you want to apply the same process for all products, you can do:
import re
products = ['apple', 'pears', 'jackfruit', 'watermelon']
pattern = re.compile(fr"b({'|'.join(products)})b", re.IGNORECASE)
ind = (df.melt('Cust No', ignore_index=False)['value']
.str.extract(pattern, expand=False)
.str.lower().dropna())
ind = pd.get_dummies(ind).groupby(level=0).max().add_suffix('_ind')
out = pd.concat([df, ind], axis=1)
Output:
I have the below data frame in pandas:
This table includes product details held by each customer.
- Input Table:
I want to create a flag/indicator named "Apple_ind" which adds 1 or 0 based on the apple (Red or Green apples) product held by each customer. So the resultant data frame will be like:
- Output Table:
Use DataFrame.isin
with DataFrame.any
if need test exact match at least in one column:
df['Apple_ind'] = df.isin(['Green Apple','Red Apple']).any(axis=1).astype(int)
Alternative:
df['Apple_ind'] = np.where(df.isin(['Green Apple','Red Apple']).any(axis=1), 1, 0)
If need check apple
substring use Series.str.contains
for non numeric columns:
df['Apple_ind'] = (df.select_dtypes('object')
.apply(lambda x: x.str.contains('apple'), case=False)
.any(axis=1)
.astype(int))
If you want to apply the same process for all products, you can do:
import re
products = ['apple', 'pears', 'jackfruit', 'watermelon']
pattern = re.compile(fr"b({'|'.join(products)})b", re.IGNORECASE)
ind = (df.melt('Cust No', ignore_index=False)['value']
.str.extract(pattern, expand=False)
.str.lower().dropna())
ind = pd.get_dummies(ind).groupby(level=0).max().add_suffix('_ind')
out = pd.concat([df, ind], axis=1)
Output: