Speeding up pandas profiling analysis using check_correlation?

Question:

Using pandas profiling to generate a report. the size of the dataset is very large to speed up the processing im trying to turn off correlations so i used check_correlations from another post I saw,
ValueError: Config parameter “check_correlation” does not exist. is then the issue I get from using this line

a = prof.ProfileReport(df, title='Downloads', check_correlation=False)

which generates this issue of

ValueError: Config parameter “check_correlation” does not exist.

Asked By: OCTAVIAN

||

Answers:

Please, see this
issue in pandas-profiling project.

Answered By: knagaev

Since they have changed the configurations on version 2, you could use it as:

import pandas_profiling

profile = df.profile_report(check_correlation_pearson=False,
correlations={'pearson': False,
'spearman': False,
'kendall': False,
'phi_k': False,
'cramers': False,
'recoded': False})

to turn off correlations. However, it is still not as fast as version 1.4. You could also investigate other configurations here.

Answered By: Levent

This way didn’t work for me and I used:

a = prof.ProfileReport(df, title=’Downloads’, minimal=True)

Answered By: Romeu Fronzaroli

As of version 3.6+ you can do this:

profile = df.profile_report(
    title="Report without correlations",
    correlations={
        "auto": {"calculate": False},
        "pearson": {"calculate": False},
        "spearman": {"calculate": False},
        "kendall": {"calculate": False},
        "phi_k": {"calculate": False},
        "cramers": {"calculate": False},
    },
)

# or using a shorthand that is available for correlations
profile = df.profile_report(
    title="Report without correlations",
    correlations=None,
)

See also the docs here.

Answered By: petezurich
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.