Pandas df.describe() – how do I extract values into Dataframe?

Question:

I am trying to do a naive Bayes and after loading some data into a dataframe in Pandas, the describe function captures the data I want. I’d like to capture the mean and std from each column of the table but am unsure on how to do that. I’ve tried things like:

df.describe([mean])
df.describe(['mean'])
df.describe().mean

None are working. I was able to do something similar in R with summary but don’t know how to do in Python. Can someone lend some advice?

Asked By: Vaslo

||

Answers:

Please try something like this:

df.describe(include='all').loc['mean']
Answered By: milos.ai

You were close. You don’t need any include tag. Just rewrite your second approach correctly: df.describe()['mean']

For example:

import pandas as pd

s = pd.Series([1, 2, 3, 4, 5])
s.describe()['mean']
# 3.0

If you want both mean and std, just write df.describe()[['mean', 'std']]. For example,

s.describe()[['mean', 'std']]
# mean    3.000000
# std     1.581139
# dtype: float64
Answered By: Sheldore

If you further want to extract specific column data then try:

df.describe()['FeatureName']['mean']

Replace mean with any other statistic you want to extract

Answered By: Ridhi Deo

You can try:

import numpy as np 
import pandas as pd
data = pd.read_csv('./FileName.csv')
data.describe().loc['mean']
Answered By: code_m19

yeah bro i am faced same problem after seeing these solutions i tried it.luckly one get worked.here i worked on the 75% in describe function this is my coded=bank.groupby(by=['region','Gender']).get_group(('south Moravia','Female')) d.cashwdn.describe()['75%']

Answered By: harsha

If you want the mean or the std of a column of your dataframe, you don’t need to go through describe(). Instead, the proper way would be to just call the respective statistical function on the column (which really is a pandas.core.series.Series). Here is an example:

import pandas as pd

# crate dataframe with some numerical data
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'B': [8, 7, 6, 5, 4, 3, 2, 1, 0, 0]})

print(df['A'].mean()) # 5.5
print(df['B'].std())  # 2.8751811537130436

See here for the descriptive stats that are built into the pandas Series.

(Let me know if I am misunderstanding what you are trying to do here.)

Answered By: Daniel Schneider