How to modify rows with conditions? In Python

Question:

I have a dataset of employee history containing information on job, manager, and etc. What I am trying to see is if a manager has taken over for another in their absence. If that happens have the current manager filing in add a (Sub) next to their name.

This is the output I have:

Emp_ID    Job_Title      Manager_Pos    Manager Name     MGR_ID 
   1        Sales            627         John Doe           12
   1        Sales            627         John Doe           12
   1        Sales            627         David Stern        4
   2        Tech             324         Mark Smith         7
   2        Tech             324         Henry Ford         13
   2        Tech             324         Henry Ford         13

This the output I want:

Emp_ID    Job_Title     Manager_pos     Manager Name      Mgr_ID
  1        Sales            627           John Doe          12
  1        Sales            627           John Doe          12
  1        Sales            627           David Stern(Sub)  4  
  2        Tech             324           Mark Smith        7 
  2        Tech             324           Henry Ford(Sub)   13 
  2        Tech             324           Henry Ford(Sub)   13

I have tried using:

`np.where((df['Manager_pos].head(1) == df['Manager_pos') & (df['Manager Name'].head(1) != df['Manager Name'].tail(1)), df['Manager Name'] + 'Sub', df['Manager Name')

This code ends up throwing an error. Any Suggestions?

Asked By: Coding_Nubie

||

Answers:

Use boolean mask. If the rank is greater than one, append ‘(Sub)’ to Manager Name column:

cols = ['Emp_ID', 'Manager Pos']
m = df.groupby(cols)['Manager Name'].rank(method='dense', ascending=False).gt(1)

df.loc[m, 'Manager Name'] += ' (Sub)'

Output:

>>> df
   Emp_ID Job_Title  Manager_pos       Manager Name  Mgr_ID
0       1     Sales          627           John Doe      12
1       1     Sales          627           John Doe      12
2       1     Sales          627  David Stern (Sub)       4
3       2      Tech          324         Mark Smith       7
4       2      Tech          324   Henry Ford (Sub)      13
5       2      Tech          324   Henry Ford (Sub)      13
Answered By: Corralien

Assuming you want to append '(sub)' whenever the manager has changed since the first one within a group, use groupby.transform to identify the first name and then boolean indexing:

m = (df.groupby(['Emp_ID', 'Manager_pos']) # for each group
     ['Manager Name'].transform('first')   # get first name
     .ne(df['Manager Name'])               # check if current row is different
    )

df.loc[m, 'Manager Name'] += '(sub)'

Output:

   Emp_ID Job_Title  Manager_pos      Manager Name  Mgr_ID
0       1     Sales          627          John Doe      12
1       1     Sales          627          John Doe      12
2       1     Sales          627  David Stern(sub)       4
3       2      Tech          324        Mark Smith       7
4       2      Tech          324   Henry Ford(sub)      13
5       2      Tech          324   Henry Ford(sub)      13
Answered By: mozway