How to loop over all columns and check data distribution using Fitter library?

Question:

I need to check data distributions of all my numeric columns in a dataset. I chose Fitter library to do so. I loop over all columns but have only one plot+summary table as an outcome instead. What is wrong with my code?

from fitter import Fitter
import numpy as np

df_numeric = df.select_dtypes(include=np.number).sample(n=5000)
num_cols = df_numeric.columns.tolist()

distr = ['cauchy',
 'chi2',
 'expon',
 'exponpow',
 'gamma',
 'beta',
 'lognorm', 
 'logistic',
 'norm',
 'powerlaw',
 'rayleigh',
 'uniform']

for col in num_cols:
    modif_col = df_numeric[col].fillna(0).values
    dist_fitter =  Fitter(modif_col, distributions=distr)
    dist_fitter.fit()
    dist_fitter.summary()

enter image description here

Maybe there is another approach to check distributions in a loop?

Asked By: mustafa00

||

Answers:

It looks like your code is correctly looping over all the numeric columns in the dataframe, fitting different distributions to each column using the Fitter library, and then printing a summary of the fitting results. However, you’re only seeing one plot and summary table as the outcome because you’re overwriting the plot and summary table for each iteration of the loop.

To see a separate plot and summary table for each column, you should move the calls to dist_fitter.summary() and dist_fitter.plot() inside the loop and make sure to give each plot and summary table a unique name or title, so you can distinguish them when viewing them.

Here is the code example you can use it

 import matplotlib.pyplot as plt

   for col in num_cols:
    modif_col = df_numeric[col].fillna(0).values
    dist_fitter =  Fitter(modif_col, distributions=distr)
    dist_fitter.fit()
    plt.figure()
    dist_fitter.plot()
    plt.title(col)
    plt.show()
    print(col)
    dist_fitter.summary()
Answered By: aman ali
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.