Extended Describe Pandas and beyond
Question:
I am new to python and pandas. My question is related to that question:
Advanced Describe Pandas
Is it possible to add some functions to reply by noobie like:
geometric mean, weighted mean, harmonic mean, geometric standard deviation, etc.
import pandas as pd
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
return stats.append([kurtosis_df,skewness_df])
So basically I am interested in adding something for example from scipy.stats that is not as these functions above originated from pandas. I want to have much more informations from descriptive statistics than standard describe offers. What I tried so far was adding more functions from pandas, and with that I am OK, but wasn’t able to attach more functions that are outside of pandas.
How do I do it, please ?
Answers:
There are a couple of things you could do.
One suggestion is to use the pandas-profiling library, which can generate a comprehensive report on the data including basic statistics, correlation analysis, data type analysis, missing values analysis, and more. This can be a very useful tool for quickly getting a comprehensive overview of the dataset.
Another suggestion is to use the scipy.stats library to add any advanced statistics to your custom function. The scipy.stats library probably has a function to compute any statistic you’re looking for.
For example,
import pandas as pd
import numpy as np
from scipy.stats import gmean
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
gmean_df = pd.DataFrame(df.apply(gmean, axis=0),columns=['gmean']).T
return stats.append([kurtosis_df,skewness_df,gmean_df])
print(describex(df))
Hope this helps!
I am new to python and pandas. My question is related to that question:
Advanced Describe Pandas
Is it possible to add some functions to reply by noobie like:
geometric mean, weighted mean, harmonic mean, geometric standard deviation, etc.
import pandas as pd
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
return stats.append([kurtosis_df,skewness_df])
So basically I am interested in adding something for example from scipy.stats that is not as these functions above originated from pandas. I want to have much more informations from descriptive statistics than standard describe offers. What I tried so far was adding more functions from pandas, and with that I am OK, but wasn’t able to attach more functions that are outside of pandas.
How do I do it, please ?
There are a couple of things you could do.
One suggestion is to use the pandas-profiling library, which can generate a comprehensive report on the data including basic statistics, correlation analysis, data type analysis, missing values analysis, and more. This can be a very useful tool for quickly getting a comprehensive overview of the dataset.
Another suggestion is to use the scipy.stats library to add any advanced statistics to your custom function. The scipy.stats library probably has a function to compute any statistic you’re looking for.
For example,
import pandas as pd
import numpy as np
from scipy.stats import gmean
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
def describex(data):
data = pd.DataFrame(data)
stats = data.describe()
skewness = data.skew()
kurtosis = data.kurtosis()
skewness_df = pd.DataFrame({'skewness':skewness}).T
kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
gmean_df = pd.DataFrame(df.apply(gmean, axis=0),columns=['gmean']).T
return stats.append([kurtosis_df,skewness_df,gmean_df])
print(describex(df))
Hope this helps!