Creating new columns from pandas.DataFrame.groupby(['arg1', 'arg2']) with mean values
Question:
I have data similar to the following in a pandas.DataFrame:
df = pd.DataFrame({
'Year' : [2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002],
'Month' : ['Aug', 'Aug', 'Sep', 'Sep', 'Aug', 'Aug', 'Sep', 'Sep'],
'Day' : [1, 2, 1, 2, 1, 2, 1, 2],
'Value' : [1, 2, 3, 4, 5, 6, 7, 8] })
Now I group by ‘Month’ and ‘Year’, and calculate the mean value:
print(df.groupby(['Month', 'Year'])['Value'].mean())
The output looks like:
Month
Year
Aug
2001
1.5
2002
5.5
Sep
2001
3.5
2002
7.5
Now I want to create a new data frame, that looks like this:
Year
Aug
Sep
2001
1.5
3.5
2002
5.5
7.5
Are there any functions in the pandas module that could help me with this? Thanks in advance!
Answers:
You can do like this using pivot_table:
table = pd.pivot_table(df, values='Value', index=['Year'],
columns=['Month'], aggfunc=np.mean)
Regards,
Jehona.
OP is not far from the desired goal. As one is using pandas.DataFrame.groupby
and pandas.Series.mean
, all one has to do is use pandas.DataFrame.unstack
as follows
df_new = df.groupby(['Year', 'Month'])['Value'].mean().unstack()
[Out]:
Month Aug Sep
Year
2001 1.5 3.5
2002 5.5 7.5
I have data similar to the following in a pandas.DataFrame:
df = pd.DataFrame({
'Year' : [2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002],
'Month' : ['Aug', 'Aug', 'Sep', 'Sep', 'Aug', 'Aug', 'Sep', 'Sep'],
'Day' : [1, 2, 1, 2, 1, 2, 1, 2],
'Value' : [1, 2, 3, 4, 5, 6, 7, 8] })
Now I group by ‘Month’ and ‘Year’, and calculate the mean value:
print(df.groupby(['Month', 'Year'])['Value'].mean())
The output looks like:
Month | Year | |
---|---|---|
Aug | 2001 | 1.5 |
2002 | 5.5 | |
Sep | 2001 | 3.5 |
2002 | 7.5 |
Now I want to create a new data frame, that looks like this:
Year | Aug | Sep |
---|---|---|
2001 | 1.5 | 3.5 |
2002 | 5.5 | 7.5 |
Are there any functions in the pandas module that could help me with this? Thanks in advance!
You can do like this using pivot_table:
table = pd.pivot_table(df, values='Value', index=['Year'],
columns=['Month'], aggfunc=np.mean)
Regards,
Jehona.
OP is not far from the desired goal. As one is using pandas.DataFrame.groupby
and pandas.Series.mean
, all one has to do is use pandas.DataFrame.unstack
as follows
df_new = df.groupby(['Year', 'Month'])['Value'].mean().unstack()
[Out]:
Month Aug Sep
Year
2001 1.5 3.5
2002 5.5 7.5