Apply calculation for dataframe columns for multiple dataframes at the same time

Question:

I am creating multiple dataframes for each unique value in a column. It works properly.

regions = dataDF['region'].unique().tolist()  df_dict = {name:
dataDF.loc[dataDF['region'] == name] for name in regions}

However, now I would like to calculate the average for the temperature and then calculate the mean afterward for every newly created dataframe.

for df in df_dict:
    df['avg'] = (df['tmax'] + df['tmin'])/2
    df = pd.DataFrame(df.groupby(df['date'].dt.year)['avg'].mean())

Thanks for the help in advance.

Asked By: Max

||

Answers:

Dictionary of DataFrames is not necessary, you can aggregate by year and column region:

out = (dataDF[['tmax', 'tmin']].mean(axis=1)
                               .groupby([dataDF['region'], dataDF['date'].dt.year])
                               .mean())

Or:

out = (dataDF.assign(avg = dataDF[['tmax', 'tmin']].mean(axis=1), 
                     y = dataDF['date'].dt.year)
             .groupby(['region', 'y'])['avg']
             .mean())
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.