Pandas add grouping to each column
Question:
I have a dataset that has one line per user per week that records whether or not the user has registered along with values of certain metrics:
cols = ["Worker ID", "Registered", "Week Ending", "Metric Value"]
rows = [
['A', True, '2022-08-06', 2],
['B', False, '2022-08-06', 3],
['C', False, '2022-08-06', 4],
['A', True, '2022-08-13', 3],
['B', False, '2022-08-13', 2],
['C', True, '2022-08-13', 5]
]
df = pd.DataFrame(columns=cols, data = rows)
I need to group by Week Ending
and Registered
and aggregate by number of unique worker ids and sum of the metrics. Normally this would be like this:
df.groupby(["Week Ending", "Registered"]).agg({'Worker ID': pd.Series.nunique, "Metric Value": sum})
This provides groupings as rows. I would like to have the registration group be along columns, something like:
Week Ending Date
Worker ID Reg True
Worker ID Reg False
Metric Value Reg True
Metric Value Reg False
2022-08-06
1
2
2
7
2022-08-13
2
1
8
2
Any ideas on how to do this?
Answers:
Use unstack
so:
df.groupby(["Week Ending", "Registered"]).agg({'Worker ID': pd.Series.nunique, "Metric Value": sum}).unstack()
I have a dataset that has one line per user per week that records whether or not the user has registered along with values of certain metrics:
cols = ["Worker ID", "Registered", "Week Ending", "Metric Value"]
rows = [
['A', True, '2022-08-06', 2],
['B', False, '2022-08-06', 3],
['C', False, '2022-08-06', 4],
['A', True, '2022-08-13', 3],
['B', False, '2022-08-13', 2],
['C', True, '2022-08-13', 5]
]
df = pd.DataFrame(columns=cols, data = rows)
I need to group by Week Ending
and Registered
and aggregate by number of unique worker ids and sum of the metrics. Normally this would be like this:
df.groupby(["Week Ending", "Registered"]).agg({'Worker ID': pd.Series.nunique, "Metric Value": sum})
This provides groupings as rows. I would like to have the registration group be along columns, something like:
Week Ending Date | Worker ID Reg True | Worker ID Reg False | Metric Value Reg True | Metric Value Reg False |
---|---|---|---|---|
2022-08-06 | 1 | 2 | 2 | 7 |
2022-08-13 | 2 | 1 | 8 | 2 |
Any ideas on how to do this?
Use unstack
so:
df.groupby(["Week Ending", "Registered"]).agg({'Worker ID': pd.Series.nunique, "Metric Value": sum}).unstack()