Create an indicator flag based on column values of multiple columns in python

Question:

I have the below data frame in pandas:

This table includes product details held by each customer.

  • Input Table:

Product details at customer level

I want to create a flag/indicator named "Apple_ind" which adds 1 or 0 based on the apple (Red or Green apples) product held by each customer. So the resultant data frame will be like:

  • Output Table:

enter image description here

Asked By: Srikant D

||

Answers:

Use DataFrame.isin with DataFrame.any if need test exact match at least in one column:

df['Apple_ind'] = df.isin(['Green Apple','Red Apple']).any(axis=1).astype(int)

Alternative:

df['Apple_ind'] = np.where(df.isin(['Green Apple','Red Apple']).any(axis=1), 1, 0)

If need check apple substring use Series.str.contains for non numeric columns:

df['Apple_ind'] = (df.select_dtypes('object')
                     .apply(lambda x: x.str.contains('apple'), case=False)
                     .any(axis=1)
                     .astype(int))
Answered By: jezrael

If you want to apply the same process for all products, you can do:

import re

products = ['apple', 'pears', 'jackfruit', 'watermelon']
pattern = re.compile(fr"b({'|'.join(products)})b", re.IGNORECASE)
ind = (df.melt('Cust No', ignore_index=False)['value']
         .str.extract(pattern, expand=False)
         .str.lower().dropna())
ind = pd.get_dummies(ind).groupby(level=0).max().add_suffix('_ind')
out = pd.concat([df, ind], axis=1)

Output:

enter image description here

Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.