Pandas: Changing the value of a column based on a string existing in the column
Question:
I have a list of movies and I want to change the value of the column to 0 if the string "Action" exists in the column or 1 if the string "Drama" exists. If both exists then change the value to 0 since the genre "Action" is more important.
For example lets say I have the table below:
Genres
Action Comedy Adventure
Drama Crime Horror
Action Drama Adventure
I want it to change to this:
Genres
0
1
0
Any help will be greatly appreciated! Thank you!
Answers:
Use numpy.select
, if not match both condition is set NaN
by parameter default
:
#if test substrings
m1 = df['Genres'].str.contains('Drama')
m2 = df['Genres'].str.contains('Action')
#if test lists
m1 = ['Drama' in x for x in df['Genres']]
m2 = ['Action' in x for x in df['Genres']]
df['Genres'] = np.select([(m1 & m2) | m2, m1], [0, 1], default=np.nan)
You can extractall
your keywords, and map
from a mapping dictionary, in case of multiple matches let’s get the min
number (you can use another rule if needed):
import re
d = {'Action': 0, 'Drama': 1}
pattern = f"({'|'.join(map(re.escape, d))})"
# pattern = '(Drama|Action)'
df['Genres'] = (df['Genres'].str.extractall(pattern)[0]
.map(d).groupby(level=0).min()
)
Output:
Genres
0 0
1 1
2 0
Output if we add another row with no match:
Genres
0 0.0
1 1.0
2 0.0
3 NaN
I have a list of movies and I want to change the value of the column to 0 if the string "Action" exists in the column or 1 if the string "Drama" exists. If both exists then change the value to 0 since the genre "Action" is more important.
For example lets say I have the table below:
Genres |
---|
Action Comedy Adventure |
Drama Crime Horror |
Action Drama Adventure |
I want it to change to this:
Genres |
---|
0 |
1 |
0 |
Any help will be greatly appreciated! Thank you!
Use numpy.select
, if not match both condition is set NaN
by parameter default
:
#if test substrings
m1 = df['Genres'].str.contains('Drama')
m2 = df['Genres'].str.contains('Action')
#if test lists
m1 = ['Drama' in x for x in df['Genres']]
m2 = ['Action' in x for x in df['Genres']]
df['Genres'] = np.select([(m1 & m2) | m2, m1], [0, 1], default=np.nan)
You can extractall
your keywords, and map
from a mapping dictionary, in case of multiple matches let’s get the min
number (you can use another rule if needed):
import re
d = {'Action': 0, 'Drama': 1}
pattern = f"({'|'.join(map(re.escape, d))})"
# pattern = '(Drama|Action)'
df['Genres'] = (df['Genres'].str.extractall(pattern)[0]
.map(d).groupby(level=0).min()
)
Output:
Genres
0 0
1 1
2 0
Output if we add another row with no match:
Genres
0 0.0
1 1.0
2 0.0
3 NaN