Conditionally Fill a column with a single value in Python
Question:
I have a dataset where I am looking to see if someone has left their job title to start a new job. The way I have decided to represent this is, I have taken the column ‘job’ and made a new column named ‘Latest_Job’ which populates to all the rows of their history. I compare the two to see when or if a change has occurred. The issue I am experiencing is that I want to populate a new column ‘Switch Jobs’ to populate with all ‘Yes’ or ‘No’ depending on if the person switched or not. Here is an example of what I have.
ID Job Latest_Job Switch Jobs
1 Sales Sales No
1 Sales Sales No
2 Tech Advisor Yes
2 Tech Advisor Yes
2 Advisor Advisor No
2 Advisor Advisor No
3 Sales Manager Yes
3 Manager Manager No
3 Manager Manager No
The problem I am having is I would like to just see a ‘Yes’ in the ‘Switch Jobs’ column if there was a change per ID like this:
ID Job Latest_Job Switch Jobs
1 Sales Sales No
1 Sales Sales No
2 Tech Advisor Yes
2 Tech Advisor Yes
2 Advisor Advisor Yes
2 Advisor Advisor Yes
3 Sales Manager Yes
3 Manager Manager Yes
3 Manager Manager Yes
The code I tried for getting the values to changes was this:
if df['Switch Jobs'] == 'Yes':
df['Switch Jobs'].groupby('ID')['Switch Jobs'].replace('No', 'Yes')
this line however threw a ValueError: The truth value of a series is ambigous. Usea. empty, a.bool(), a.item(), a.any(), a.all(). Any Suggestions to fix this?
Answers:
You can use groupby to achieve this:
def update_switch_jobs(group):
if group['Switch Jobs'].str.contains('Yes').any():
group['Switch Jobs'] = 'Yes'
return group
df = df.groupby('ID').apply(update_switch_jobs)
print(df)
You can use groupby_transform
to broadcast the highest value (‘Yes’) to the entire group:
df['Switch Jobs'] = (df['Job'].ne(df['Latest_Job']).groupby(df['ID'])
.transform('max').replace({True: 'Yes', False: 'No'}))
print(df)
# Output
ID Job Latest_Job Switch Jobs
0 1 Sales Sales No
1 1 Sales Sales No
2 2 Tech Advisor Yes
3 2 Tech Advisor Yes
4 2 Advisor Advisor Yes
5 2 Advisor Advisor Yes
6 3 Sales Manager Yes
7 3 Manager Manager Yes
8 3 Manager Manager Yes
I have a dataset where I am looking to see if someone has left their job title to start a new job. The way I have decided to represent this is, I have taken the column ‘job’ and made a new column named ‘Latest_Job’ which populates to all the rows of their history. I compare the two to see when or if a change has occurred. The issue I am experiencing is that I want to populate a new column ‘Switch Jobs’ to populate with all ‘Yes’ or ‘No’ depending on if the person switched or not. Here is an example of what I have.
ID Job Latest_Job Switch Jobs
1 Sales Sales No
1 Sales Sales No
2 Tech Advisor Yes
2 Tech Advisor Yes
2 Advisor Advisor No
2 Advisor Advisor No
3 Sales Manager Yes
3 Manager Manager No
3 Manager Manager No
The problem I am having is I would like to just see a ‘Yes’ in the ‘Switch Jobs’ column if there was a change per ID like this:
ID Job Latest_Job Switch Jobs
1 Sales Sales No
1 Sales Sales No
2 Tech Advisor Yes
2 Tech Advisor Yes
2 Advisor Advisor Yes
2 Advisor Advisor Yes
3 Sales Manager Yes
3 Manager Manager Yes
3 Manager Manager Yes
The code I tried for getting the values to changes was this:
if df['Switch Jobs'] == 'Yes':
df['Switch Jobs'].groupby('ID')['Switch Jobs'].replace('No', 'Yes')
this line however threw a ValueError: The truth value of a series is ambigous. Usea. empty, a.bool(), a.item(), a.any(), a.all(). Any Suggestions to fix this?
You can use groupby to achieve this:
def update_switch_jobs(group):
if group['Switch Jobs'].str.contains('Yes').any():
group['Switch Jobs'] = 'Yes'
return group
df = df.groupby('ID').apply(update_switch_jobs)
print(df)
You can use groupby_transform
to broadcast the highest value (‘Yes’) to the entire group:
df['Switch Jobs'] = (df['Job'].ne(df['Latest_Job']).groupby(df['ID'])
.transform('max').replace({True: 'Yes', False: 'No'}))
print(df)
# Output
ID Job Latest_Job Switch Jobs
0 1 Sales Sales No
1 1 Sales Sales No
2 2 Tech Advisor Yes
3 2 Tech Advisor Yes
4 2 Advisor Advisor Yes
5 2 Advisor Advisor Yes
6 3 Sales Manager Yes
7 3 Manager Manager Yes
8 3 Manager Manager Yes