Conditionally Fill a column with a single value in Python

Question:

I have a dataset where I am looking to see if someone has left their job title to start a new job. The way I have decided to represent this is, I have taken the column ‘job’ and made a new column named ‘Latest_Job’ which populates to all the rows of their history. I compare the two to see when or if a change has occurred. The issue I am experiencing is that I want to populate a new column ‘Switch Jobs’ to populate with all ‘Yes’ or ‘No’ depending on if the person switched or not. Here is an example of what I have.

ID    Job      Latest_Job       Switch Jobs
 1    Sales       Sales               No 
 1    Sales       Sales               No
 2    Tech        Advisor             Yes 
 2    Tech        Advisor             Yes
 2    Advisor     Advisor             No 
 2    Advisor     Advisor             No 
 3    Sales       Manager             Yes 
 3    Manager     Manager             No 
 3    Manager     Manager             No 

The problem I am having is I would like to just see a ‘Yes’ in the ‘Switch Jobs’ column if there was a change per ID like this:

ID     Job     Latest_Job      Switch Jobs 
 1     Sales     Sales            No  
 1     Sales     Sales            No 
 2     Tech      Advisor          Yes  
 2     Tech      Advisor          Yes   
 2     Advisor   Advisor          Yes 
 2     Advisor   Advisor          Yes
 3     Sales     Manager          Yes 
 3     Manager   Manager          Yes
 3     Manager   Manager          Yes

The code I tried for getting the values to changes was this:

if df['Switch Jobs'] == 'Yes':
     df['Switch Jobs'].groupby('ID')['Switch Jobs'].replace('No', 'Yes') 

this line however threw a ValueError: The truth value of a series is ambigous. Usea. empty, a.bool(), a.item(), a.any(), a.all(). Any Suggestions to fix this?

Asked By: Coding_Nubie

||

Answers:

You can use groupby to achieve this:

def update_switch_jobs(group):
    if group['Switch Jobs'].str.contains('Yes').any():
        group['Switch Jobs'] = 'Yes'
    return group

df = df.groupby('ID').apply(update_switch_jobs)
print(df)
Answered By: Abdulmajeed

You can use groupby_transform to broadcast the highest value (‘Yes’) to the entire group:

df['Switch Jobs'] = (df['Job'].ne(df['Latest_Job']).groupby(df['ID'])
                              .transform('max').replace({True: 'Yes', False: 'No'}))
print(df)

# Output
   ID      Job Latest_Job Switch Jobs
0   1    Sales      Sales          No
1   1    Sales      Sales          No
2   2     Tech    Advisor         Yes
3   2     Tech    Advisor         Yes
4   2  Advisor    Advisor         Yes
5   2  Advisor    Advisor         Yes
6   3    Sales    Manager         Yes
7   3  Manager    Manager         Yes
8   3  Manager    Manager         Yes
Answered By: Corralien