How to get count of Employee activity by Manager in Python

Question:

I have a dataset that uses numpy and pandas and looks at employees by ID # and all activity like promotion, termination, job switch, and etc. What I want to do is know how to count the number of changes and group them to the manager.

Here is a sample of the data. For Reference: 1 = Yes 0 = No

ID     Date       Job_Title        ManagerID   Status  Terminated  Job_Change  Team_Change
 1   May 2022     Sales Rep            7       Active      0            0           0
 1   Oct 2022     Sales Consultant     7       Active      0            1           0
 1   Jan 2023     Sales Consultant     7       Active      0            0           0
 2   Feb 2022     Tech Advisor         3       Active      0            0           0
 2   May 2022     Tech Advisor         3       Termed      1            0           0
 3   Dec 2021     Sales Supervisor     7       Active      0            0           0
 3   Jan 2022     Tech Supervisor      10      Active      0            1           1
 3   Feb 2023     Tech Manager         10      Active      0            1           0
 

What I want the output to look like:

ManagerID    Terminated    Job_Change  Team Change
    3            1             0            0
    7            0             1            0
    10           0             2            1    

Is their a way to print this output out without having to create a new dataframe?

Asked By: Coding_Nubie

||

Answers:

You can achieve the desired output by using the Pandas library to group your dataset by the ManagerID and then aggregating the number of events in each category (Terminated, Job_Change, Team_Change).

Here’s a Python script using the Pandas library to do this:

import pandas as pd

# Assuming your data is stored in a DataFrame called 'data'

# Group the data by 'ManagerID' and sum the respective columns
result = data.groupby('ManagerID')[['Terminated', 'Job_Change', 
'Team_Change']].sum()

# Reset the index to make 'ManagerID' a column again
result = result.reset_index()

print(result)

This script groups your data by the ManagerID and then calculates the sum of the Terminated, Job_Change, and Team_Change columns. The output will be a new DataFrame with the desired format.

You can use pandas groupby() function and aggregation functions to count of changes for each status, assuming you’ve assigned a variable ‘df’ for pd.read_csv():

grouped = df.groupby(‘ManagerID’).agg({‘Terminated’: ‘sum’, ‘Job_Change’: ‘sum’, ‘Team_Change’: ‘sum’})
print(grouped)

The resulting dataframe should have one row per unique ManagerID, with columns for the count of terminated employees, job changes, and team changes for each manager.

Hope it helps

Answered By: Harshit Sharma