Extended Describe Pandas and beyond

Question:

I am new to python and pandas. My question is related to that question:
Advanced Describe Pandas

Is it possible to add some functions to reply by noobie like:
geometric mean, weighted mean, harmonic mean, geometric standard deviation, etc.

import pandas as pd
    
    def describex(data):
        data = pd.DataFrame(data)
        stats = data.describe()
        skewness = data.skew()
        kurtosis = data.kurtosis()
        skewness_df = pd.DataFrame({'skewness':skewness}).T
        kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
        return stats.append([kurtosis_df,skewness_df])

So basically I am interested in adding something for example from scipy.stats that is not as these functions above originated from pandas. I want to have much more informations from descriptive statistics than standard describe offers. What I tried so far was adding more functions from pandas, and with that I am OK, but wasn’t able to attach more functions that are outside of pandas.
How do I do it, please ?

Asked By: Andrzej Andrzej

||

Answers:

There are a couple of things you could do.

One suggestion is to use the pandas-profiling library, which can generate a comprehensive report on the data including basic statistics, correlation analysis, data type analysis, missing values analysis, and more. This can be a very useful tool for quickly getting a comprehensive overview of the dataset.

Another suggestion is to use the scipy.stats library to add any advanced statistics to your custom function. The scipy.stats library probably has a function to compute any statistic you’re looking for.

For example,

import pandas as pd
import numpy as np
from scipy.stats import gmean

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

def describex(data):
    data = pd.DataFrame(data)
    stats = data.describe()
    skewness = data.skew()
    kurtosis = data.kurtosis()
    skewness_df = pd.DataFrame({'skewness':skewness}).T
    kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
    gmean_df = pd.DataFrame(df.apply(gmean, axis=0),columns=['gmean']).T
    return stats.append([kurtosis_df,skewness_df,gmean_df])

print(describex(df))

Hope this helps!

Answered By: g.newt
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.