# How do I make a correlation matrix for each subset of a column of my pandas dataframe?

## Question:

Here’s the head of my dataframe:

``````
``````

There are 100 different loggers and 10 different years. I want to subset the table by logger and find the Pearson correlation values for year by avg_max_temp, avg_min_temp, and tot_precipitation for each logger. Because there are 100 loggers, I’d expect the resulting dataframe to have 100 rows of 3 output columns as well as a column for the logger ID..

Here’s how I would do this analysis for all the data combined:

``````# Create a new dataframe with the correlation values
corr_df = pd.DataFrame(df.corr(method='pearson'))

corr_df.drop(['year', 'yield'], axis=1, inplace=True)
corr_df.drop(['avg_max_temp', 'avg_min_temp', 'tot_precipitation','yield'], axis=0, inplace=True)
# Print the dataframe
``````

However, I can’t figure out how to do this for each of the 100 dataloggers. Any help would be hugely appreciated. Thanks in advance!

You can loop through a `groupby` object to iterate through each portion of the `df` with a unique `logger`, and extract the Pearson correlation coefficients for each group, concatenating them together into your final `corr_df` DataFrame.

``````corr_df = pd.DataFrame()

for group, df_group in df.groupby('logger'):
# Create a new dataframe with the correlation values
group_corr_df = pd.DataFrame(df_group.corr(method='pearson'))

group_corr_df.drop(['year', 'yield'], axis=1, inplace=True)
group_corr_df.drop(['avg_max_temp', 'avg_min_temp', 'tot_precipitation','yield'], axis=0, inplace=True)
group_corr_df['logger'] = group
corr_df = pd.concat([corr_df, group_corr_df])
``````
Categories: questions
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.