Speeding up pandas profiling analysis using check_correlation?
Question:
Using pandas profiling to generate a report. the size of the dataset is very large to speed up the processing im trying to turn off correlations so i used check_correlations from another post I saw,
ValueError: Config parameter “check_correlation” does not exist. is then the issue I get from using this line
a = prof.ProfileReport(df, title='Downloads', check_correlation=False)
which generates this issue of
ValueError: Config parameter “check_correlation” does not exist.
Answers:
Please, see this
issue in pandas-profiling project.
Since they have changed the configurations on version 2, you could use it as:
import pandas_profiling
profile = df.profile_report(check_correlation_pearson=False,
correlations={'pearson': False,
'spearman': False,
'kendall': False,
'phi_k': False,
'cramers': False,
'recoded': False})
to turn off correlations. However, it is still not as fast as version 1.4. You could also investigate other configurations here.
This way didn’t work for me and I used:
a = prof.ProfileReport(df, title=’Downloads’, minimal=True)
As of version 3.6+ you can do this:
profile = df.profile_report(
title="Report without correlations",
correlations={
"auto": {"calculate": False},
"pearson": {"calculate": False},
"spearman": {"calculate": False},
"kendall": {"calculate": False},
"phi_k": {"calculate": False},
"cramers": {"calculate": False},
},
)
# or using a shorthand that is available for correlations
profile = df.profile_report(
title="Report without correlations",
correlations=None,
)
See also the docs here.
Using pandas profiling to generate a report. the size of the dataset is very large to speed up the processing im trying to turn off correlations so i used check_correlations from another post I saw,
ValueError: Config parameter “check_correlation” does not exist. is then the issue I get from using this line
a = prof.ProfileReport(df, title='Downloads', check_correlation=False)
which generates this issue of
ValueError: Config parameter “check_correlation” does not exist.
Please, see this
issue in pandas-profiling project.
Since they have changed the configurations on version 2, you could use it as:
import pandas_profiling
profile = df.profile_report(check_correlation_pearson=False,
correlations={'pearson': False,
'spearman': False,
'kendall': False,
'phi_k': False,
'cramers': False,
'recoded': False})
to turn off correlations. However, it is still not as fast as version 1.4. You could also investigate other configurations here.
This way didn’t work for me and I used:
a = prof.ProfileReport(df, title=’Downloads’, minimal=True)
As of version 3.6+ you can do this:
profile = df.profile_report(
title="Report without correlations",
correlations={
"auto": {"calculate": False},
"pearson": {"calculate": False},
"spearman": {"calculate": False},
"kendall": {"calculate": False},
"phi_k": {"calculate": False},
"cramers": {"calculate": False},
},
)
# or using a shorthand that is available for correlations
profile = df.profile_report(
title="Report without correlations",
correlations=None,
)
See also the docs here.