How to find the overall average in pandas

Question:

I have a dataframe below:

# initialize list of lists
data = [['A','Excel','1'], ['A','Word_soft','0'],['B','Excel','1'],['B','Word','1'],['C','Word','1'],['C','Word_soft','0'],['D','Java2','1'],['D','Java','1'],['E','PPT','0'], ['E','Word_soft','0']]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['System','App','DevTool'])

I performed the below to get the count of DevTool in each System.

df.groupby(['System','DevTool'])['DevTool'].count()

I need to find the overall DevTool Average in each category 1 and 0 as below

1 – Category DevTool average calculation as below

(A System’s 1 category DevTool count + B System’s 1 category DevTool count + …soon) / Number of systems which have 1 category DevTool

= (1+2+1+2) / 4 = 1.5

Similarly
0 – Category Devtool average calculation as below

(A System’s 0 category DevTool count + B System’s 0 category DevTool count + …soon) / Number of systems which have 0 category DevTool

= (1+1+2) / 3 = 1.33

In order to perform this average calculation, everytime I move the data to excel and use the excel inbuild function to get this average value. I am not sure how to perform this average calculation within pandas directly and get the average values 1.5 and 1.33 respectively for 1 and 0 category.

Asked By: John

||

Answers:

Add a groupby.mean step (.groupby('DevTool').mean()):

(df.groupby(['System','DevTool'])['DevTool'].count()
   .groupby('DevTool').mean()
)

You can also replace the first step by value_counts:

df[['System','DevTool']].value_counts(sort=False).groupby('DevTool').mean()

Output:

DevTool
0    1.333333
1    1.500000
Name: DevTool, dtype: float64
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.