Multiple conditions for string variable

Question

I am trying to add a new column "profile_type" to a dataframe "df_new" which contains the string "Decision Maker" if the "job_title" has any one of the following words: (Head or VP or COO or CEO or CMO or CLO or Chief or Partner or Founder or Owner or CIO or CTO or President or Leaders),

"Key Influencer" if the "job_title" has any one of the following words: (Senior or Consultant or Manager or Learning or Training or Talent or HR or Human Resources or Consultant or L&D or Lead), and

"Influencer" for all other fields in "job_title".

For example, if the ‘job_title’ includes a row "Learning and Development Specialist", the code has to pull out just the word ‘Learning’ and segregate it as ‘Key Influencer’ under ‘profile_type’.

Asked By: Alisha A

||

Source

Answer 1

First, define a function that acts on a row of the dataframe, and returns what you want: in your case, 'Decision Maker' if the job_title contains any words in your list.

def is_key_worker(row):
    if (row["job_title"] == "CTO" or row["job_title"]=="Founder") # add more here.

Next, apply the function to your dataframe, along axis 1.

df_new["Key influencer"] = df_new.apply(is_key_worker, axis=1)

Answered By: butterflyknife

Answer 2

I would try something like this:

import numpy as np

dm_titles = ['Head', 'VP', 'COO', ...]
ki_titles = ['Senior ', 'Consultant', 'Manager', ...]


conditions = [
(any([word in  new_df['job_title'] for word in dm_titles])),
(any([word in  new_df['job_title'] for word in ki_titles])),
(all([word not in  new_df['job_title'] for word in dm_titles] + [word not in  new_df['job_title'] for word in ki_titles]))
]

values = ["Decision Maker", "Key Influencer", "Influencer"]

df_new['profile_type'] = np.select(conditions, values)

Let me know if you need any clarification!

Answered By: Hobanator

Answer 3

The below code worked for me.

import re
s1 = pd.Series(df['job_title'])

condition1 = s1.str.contains('Director|Head|VP|COO|CEO...', flags=re.IGNORECASE, regex=True)

condition2 = s1.str.contains('Senior|Consultant|Manager|Learning...', flags=re.IGNORECASE, regex=True)

df_new['profile_type'] = np.where(condition1 == True, 'Decision Maker', 
         (np.where(condition2 == True, 'Key Influencer', 'Influencer')))

Answered By: Alisha A

Multiple conditions for string variable

Question:

Answers: