How to find the overall average in pandas
Question:
I have a dataframe below:
# initialize list of lists
data = [['A','Excel','1'], ['A','Word_soft','0'],['B','Excel','1'],['B','Word','1'],['C','Word','1'],['C','Word_soft','0'],['D','Java2','1'],['D','Java','1'],['E','PPT','0'], ['E','Word_soft','0']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['System','App','DevTool'])
I performed the below to get the count of DevTool in each System.
df.groupby(['System','DevTool'])['DevTool'].count()
I need to find the overall DevTool Average in each category 1 and 0 as below
1 – Category DevTool average calculation as below
(A System’s 1 category DevTool count + B System’s 1 category DevTool count + …soon) / Number of systems which have 1 category DevTool
= (1+2+1+2) / 4 = 1.5
Similarly
0 – Category Devtool average calculation as below
(A System’s 0 category DevTool count + B System’s 0 category DevTool count + …soon) / Number of systems which have 0 category DevTool
= (1+1+2) / 3 = 1.33
In order to perform this average calculation, everytime I move the data to excel and use the excel inbuild function to get this average value. I am not sure how to perform this average calculation within pandas directly and get the average values 1.5 and 1.33 respectively for 1 and 0 category.
Answers:
Add a groupby.mean
step (.groupby('DevTool').mean()
):
(df.groupby(['System','DevTool'])['DevTool'].count()
.groupby('DevTool').mean()
)
You can also replace the first step by value_counts
:
df[['System','DevTool']].value_counts(sort=False).groupby('DevTool').mean()
Output:
DevTool
0 1.333333
1 1.500000
Name: DevTool, dtype: float64
I have a dataframe below:
# initialize list of lists
data = [['A','Excel','1'], ['A','Word_soft','0'],['B','Excel','1'],['B','Word','1'],['C','Word','1'],['C','Word_soft','0'],['D','Java2','1'],['D','Java','1'],['E','PPT','0'], ['E','Word_soft','0']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['System','App','DevTool'])
I performed the below to get the count of DevTool in each System.
df.groupby(['System','DevTool'])['DevTool'].count()
I need to find the overall DevTool Average in each category 1 and 0 as below
1 – Category DevTool average calculation as below
(A System’s 1 category DevTool count + B System’s 1 category DevTool count + …soon) / Number of systems which have 1 category DevTool
= (1+2+1+2) / 4 = 1.5
Similarly
0 – Category Devtool average calculation as below
(A System’s 0 category DevTool count + B System’s 0 category DevTool count + …soon) / Number of systems which have 0 category DevTool
= (1+1+2) / 3 = 1.33
In order to perform this average calculation, everytime I move the data to excel and use the excel inbuild function to get this average value. I am not sure how to perform this average calculation within pandas directly and get the average values 1.5 and 1.33 respectively for 1 and 0 category.
Add a groupby.mean
step (.groupby('DevTool').mean()
):
(df.groupby(['System','DevTool'])['DevTool'].count()
.groupby('DevTool').mean()
)
You can also replace the first step by value_counts
:
df[['System','DevTool']].value_counts(sort=False).groupby('DevTool').mean()
Output:
DevTool
0 1.333333
1 1.500000
Name: DevTool, dtype: float64