How to get count of Employee activity by Manager in Python
Question:
I have a dataset that uses numpy and pandas and looks at employees by ID # and all activity like promotion, termination, job switch, and etc. What I want to do is know how to count the number of changes and group them to the manager.
Here is a sample of the data. For Reference: 1 = Yes 0 = No
ID Date Job_Title ManagerID Status Terminated Job_Change Team_Change
1 May 2022 Sales Rep 7 Active 0 0 0
1 Oct 2022 Sales Consultant 7 Active 0 1 0
1 Jan 2023 Sales Consultant 7 Active 0 0 0
2 Feb 2022 Tech Advisor 3 Active 0 0 0
2 May 2022 Tech Advisor 3 Termed 1 0 0
3 Dec 2021 Sales Supervisor 7 Active 0 0 0
3 Jan 2022 Tech Supervisor 10 Active 0 1 1
3 Feb 2023 Tech Manager 10 Active 0 1 0
What I want the output to look like:
ManagerID Terminated Job_Change Team Change
3 1 0 0
7 0 1 0
10 0 2 1
Is their a way to print this output out without having to create a new dataframe?
Answers:
You can achieve the desired output by using the Pandas library to group your dataset by the ManagerID and then aggregating the number of events in each category (Terminated, Job_Change, Team_Change).
Here’s a Python script using the Pandas library to do this:
import pandas as pd
# Assuming your data is stored in a DataFrame called 'data'
# Group the data by 'ManagerID' and sum the respective columns
result = data.groupby('ManagerID')[['Terminated', 'Job_Change',
'Team_Change']].sum()
# Reset the index to make 'ManagerID' a column again
result = result.reset_index()
print(result)
This script groups your data by the ManagerID and then calculates the sum of the Terminated, Job_Change, and Team_Change columns. The output will be a new DataFrame with the desired format.
You can use pandas groupby() function and aggregation functions to count of changes for each status, assuming you’ve assigned a variable ‘df’ for pd.read_csv():
grouped = df.groupby(‘ManagerID’).agg({‘Terminated’: ‘sum’, ‘Job_Change’: ‘sum’, ‘Team_Change’: ‘sum’})
print(grouped)
The resulting dataframe should have one row per unique ManagerID, with columns for the count of terminated employees, job changes, and team changes for each manager.
Hope it helps
I have a dataset that uses numpy and pandas and looks at employees by ID # and all activity like promotion, termination, job switch, and etc. What I want to do is know how to count the number of changes and group them to the manager.
Here is a sample of the data. For Reference: 1 = Yes 0 = No
ID Date Job_Title ManagerID Status Terminated Job_Change Team_Change
1 May 2022 Sales Rep 7 Active 0 0 0
1 Oct 2022 Sales Consultant 7 Active 0 1 0
1 Jan 2023 Sales Consultant 7 Active 0 0 0
2 Feb 2022 Tech Advisor 3 Active 0 0 0
2 May 2022 Tech Advisor 3 Termed 1 0 0
3 Dec 2021 Sales Supervisor 7 Active 0 0 0
3 Jan 2022 Tech Supervisor 10 Active 0 1 1
3 Feb 2023 Tech Manager 10 Active 0 1 0
What I want the output to look like:
ManagerID Terminated Job_Change Team Change
3 1 0 0
7 0 1 0
10 0 2 1
Is their a way to print this output out without having to create a new dataframe?
You can achieve the desired output by using the Pandas library to group your dataset by the ManagerID and then aggregating the number of events in each category (Terminated, Job_Change, Team_Change).
Here’s a Python script using the Pandas library to do this:
import pandas as pd
# Assuming your data is stored in a DataFrame called 'data'
# Group the data by 'ManagerID' and sum the respective columns
result = data.groupby('ManagerID')[['Terminated', 'Job_Change',
'Team_Change']].sum()
# Reset the index to make 'ManagerID' a column again
result = result.reset_index()
print(result)
This script groups your data by the ManagerID and then calculates the sum of the Terminated, Job_Change, and Team_Change columns. The output will be a new DataFrame with the desired format.
You can use pandas groupby() function and aggregation functions to count of changes for each status, assuming you’ve assigned a variable ‘df’ for pd.read_csv():
grouped = df.groupby(‘ManagerID’).agg({‘Terminated’: ‘sum’, ‘Job_Change’: ‘sum’, ‘Team_Change’: ‘sum’})
print(grouped)
The resulting dataframe should have one row per unique ManagerID, with columns for the count of terminated employees, job changes, and team changes for each manager.
Hope it helps