Python pandas how to update a column with value 1 if another column contains a certain word

Question:

df looks like this:

description and keybenefits (14) brand_cooltouch (1711) brand_easylogic (1712)
Lorem Ipsum cooltouch Lorem Ipsum
Lorem Ipsum easylogic Lorem Ipsum
Lorem Ipsum Lorem Ipsum

What I want:

  • When column description and keybenefits (14) contains the value ‘cooltouch’ column brand_cooltouch (1711) needs to be set to value 1 (int).
  • When column description and keybenefits (14) contains the value ‘easylogic’ column brand_easylogic (1712) needs to be set to value 1 (int).

Output that I want:

description and keybenefits (14) brand_cooltouch (1711) brand_easylogic (1712)
Lorem Ipsum cooltouch Lorem Ipsum 1
Lorem Ipsum Lorem Ipsum easylogic 1
Lorem Ipsum Lorem Ipsum

Any help is very much appreciated.

Asked By: Isabella

||

Answers:

One can use pandas.Series.str.contains.

For the string cooltouch do the following

df['brand_cooltouch (1711)'] = df['description and keybenefits (14)'].str.contains('cooltouch', case=False).astype(int)

[Out]:

    description and keybenefits (14)  brand_cooltouch (1711)  brand_easylogic (1712)
0  Lorem Ipsum cooltouch Lorem Ipsum                       1                     None
1  Lorem Ipsum easylogic Lorem Ipsum                       0                     None
2            Lorem Ipsum Lorem Ipsum                       0                     None

For the string easylogic, do the following

df['brand_easylogic (1712)'] = df['description and keybenefits (14)'].str.contains('easylogic', case=False).astype(int)

[Out]:

    description and keybenefits (14)  brand_cooltouch (1711)  brand_easylogic (1712)
0  Lorem Ipsum cooltouch Lorem Ipsum                       1                     0
1  Lorem Ipsum easylogic Lorem Ipsum                       0                     1
2            Lorem Ipsum Lorem Ipsum                       0                     0

Notes:

  • case=False is to make it case insensitive.
Answered By: Gonçalo Peres

Use Series.str.contains

df['brand_cooltouch (1711)'] = df['description and keybenefits (14)'].str.contains("cooltouch").astype(int)

Output

    description and keybenefits (14)  brand_cooltouch (1711)  brand_easylogic (1712)
0  Lorem Ipsum cooltouch Lorem Ipsum                       1                     NaN
1  Lorem Ipsum easylogic Lorem Ipsum                       0                     NaN
2            Lorem Ipsum Lorem Ipsum                       0                     NaN

If you do not wish the resulting column to be 1’s and 0’s – you could also do something like –

df.loc[df['description and keybenefits (14)'].str.contains("cooltouch"), ['brand_cooltouch (1711)']] = '1'
df.loc[~df['description and keybenefits (14)'].str.contains("cooltouch"), ['brand_cooltouch (1711)']] = ''

Output

    description and keybenefits (14) brand_cooltouch (1711)  brand_easylogic (1712)
0  Lorem Ipsum cooltouch Lorem Ipsum                      1                     NaN
1  Lorem Ipsum easylogic Lorem Ipsum                                            NaN
2            Lorem Ipsum Lorem Ipsum                                            NaN
Answered By: Mortz

you can use np.where. I’d suggest to fill all cells where the condition is not met with NaN or 0. Here is a solution using np.nan

df["brand_cooltouch (1711)“] = np.where(df["description and keybenefits (14)“].str.contains("cooltouch"), 1, np.nan)
df["brand_easylogic (1712)“] = np.where(df["description and keybenefits (14)“].str.contains("easylogic"), 1, np.nan)
Answered By: TiTo
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.