Pandas: Changing the value of a column based on a string existing in the column

Question:

I have a list of movies and I want to change the value of the column to 0 if the string "Action" exists in the column or 1 if the string "Drama" exists. If both exists then change the value to 0 since the genre "Action" is more important.

For example lets say I have the table below:

Genres
Action Comedy Adventure
Drama Crime Horror
Action Drama Adventure

I want it to change to this:

Genres
0
1
0

Any help will be greatly appreciated! Thank you!

Asked By: Nathan Pared

||

Answers:

Use numpy.select, if not match both condition is set NaN by parameter default:

#if test substrings
m1 = df['Genres'].str.contains('Drama')
m2 = df['Genres'].str.contains('Action')

#if test lists
m1 = ['Drama' in x for x in df['Genres']]
m2 = ['Action' in x for x in df['Genres']]

df['Genres'] = np.select([(m1 & m2) | m2, m1], [0, 1], default=np.nan)
Answered By: jezrael

You can extractall your keywords, and map from a mapping dictionary, in case of multiple matches let’s get the min number (you can use another rule if needed):

import re

d = {'Action': 0, 'Drama': 1}

pattern = f"({'|'.join(map(re.escape, d))})"
# pattern = '(Drama|Action)'

df['Genres'] = (df['Genres'].str.extractall(pattern)[0]
                 .map(d).groupby(level=0).min()
               )

Output:

   Genres
0       0
1       1
2       0

Output if we add another row with no match:

   Genres
0     0.0
1     1.0
2     0.0
3     NaN
Answered By: mozway