How to locate and add text to rows that don't fit the conditions. In python
Question:
I have a dataset that uses numpy and pandas with employment history and in my code I look for those employees that report to a vacant manager spot being held by another manager in the mean time. Right now it kind of works but needs to be refined. Here is the current data and code that I have.
Code:
m = df.groupby('ID','Reporting_Manager_ID'])['Manager_Name'].transform('first' ).ne(['Manger_Name'])
df.loc[m,'Manager_Name'] += ' (Vacant)'
The output for this is:
Emp_ID Reporting_Manager_ID Manager_Name
1 4012 John Wick
1 4012 John Wick
2 2812 Sarah Smith
2 2812 Sarah Smith
2 2812 John Wick (Vacant)
3 9236 Peter Doe
3 9236 John Wick (Vacant)
3 9236 John Wick
4 1293 John Wick
4 1293 John Wick
The original Manager ID for ‘John Wick’ is 4012 and should show as it does however for the other Manager IDs that he takes over [2812, 9236, 1293] should all show (Vacant) for all lines.
Desired Output:
Emp_ID Reporting_Manager_ID Manager_Name
1 4012 John Wick
1 4012 John Wick
2 2812 Sarah Smith
2 2812 Sarah Smith
2 2812 John Wick (Vacant)
3 9236 Peter Doe
3 9236 John Wick (Vacant)
3 9236 John Wick (Vacant)
4 1293 John Wick (Vacant)
4 1293 John Wick (Vacant)
The dataset has about 300+ Reporting Manager IDs and this happens multiple times, Any suggestions on how to fix this?
Answers:
How you choose for a manager his reporting id is unclear but it looks like you choose the first:
report_id = df.groupby('Manager_Name')['Reporting_Manager_ID'].transform('first')
m = ~df['Reporting_Manager_ID'].eq(report_id)
df.loc[m, 'Manager_Name'] += ' (Vacant)'
print(df)
# Output
Emp_ID Reporting_Manager_ID Manager_Name
0 1 4012 John Wick
1 1 4012 John Wick
2 2 2812 Sarah Smith
3 2 2812 Sarah Smith
4 2 2812 John Wick (Vacant)
5 3 9236 Peter Doe
6 3 9236 John Wick (Vacant)
7 3 9236 John Wick (Vacant)
8 4 1293 John Wick (Vacant)
9 4 1293 John Wick (Vacant)
I have a dataset that uses numpy and pandas with employment history and in my code I look for those employees that report to a vacant manager spot being held by another manager in the mean time. Right now it kind of works but needs to be refined. Here is the current data and code that I have.
Code:
m = df.groupby('ID','Reporting_Manager_ID'])['Manager_Name'].transform('first' ).ne(['Manger_Name'])
df.loc[m,'Manager_Name'] += ' (Vacant)'
The output for this is:
Emp_ID Reporting_Manager_ID Manager_Name
1 4012 John Wick
1 4012 John Wick
2 2812 Sarah Smith
2 2812 Sarah Smith
2 2812 John Wick (Vacant)
3 9236 Peter Doe
3 9236 John Wick (Vacant)
3 9236 John Wick
4 1293 John Wick
4 1293 John Wick
The original Manager ID for ‘John Wick’ is 4012 and should show as it does however for the other Manager IDs that he takes over [2812, 9236, 1293] should all show (Vacant) for all lines.
Desired Output:
Emp_ID Reporting_Manager_ID Manager_Name
1 4012 John Wick
1 4012 John Wick
2 2812 Sarah Smith
2 2812 Sarah Smith
2 2812 John Wick (Vacant)
3 9236 Peter Doe
3 9236 John Wick (Vacant)
3 9236 John Wick (Vacant)
4 1293 John Wick (Vacant)
4 1293 John Wick (Vacant)
The dataset has about 300+ Reporting Manager IDs and this happens multiple times, Any suggestions on how to fix this?
How you choose for a manager his reporting id is unclear but it looks like you choose the first:
report_id = df.groupby('Manager_Name')['Reporting_Manager_ID'].transform('first')
m = ~df['Reporting_Manager_ID'].eq(report_id)
df.loc[m, 'Manager_Name'] += ' (Vacant)'
print(df)
# Output
Emp_ID Reporting_Manager_ID Manager_Name
0 1 4012 John Wick
1 1 4012 John Wick
2 2 2812 Sarah Smith
3 2 2812 Sarah Smith
4 2 2812 John Wick (Vacant)
5 3 9236 Peter Doe
6 3 9236 John Wick (Vacant)
7 3 9236 John Wick (Vacant)
8 4 1293 John Wick (Vacant)
9 4 1293 John Wick (Vacant)