Resample data to add missing hour values
Question:
Im working with a df that looks like this :
trans_id amount month day hour
2018-08-18 12:59:59+00:00 1 46 8 18 12
2018-08-26 01:56:55+00:00 2 20 8 26 1
I intend to get the average ‘amount’ at each hour.I use the following code to do that:
df2 = df.groupby(['month', 'day', 'day_name', 'hour'], as_index = False)['amount'].sum()
That gives me the total amount each month day day_name hour combination which is ok. But when I count the total hours for each day they all are not 24 as expected. I imagine due to the fact that some transactions don’t exist at that specific (month day day_name hour).
My question is how do i get all 24h irrelevant if they have records or not.
Thanks
Answers:
Use Series.unstack
with DataFrame.stack
:
df2 = (df.groupby(['month', 'day', 'day_name', 'hour'])['amount']
.sum()
.unstack(fill_value=0)
.stack()
.reset_index())
I hope not to be wrong, but you can try this:
df2 = df.resample('1H').sum().copy()
This will resample your dataset for every hour from 0 to 23 and will sum the values. It will also create the nan for missing timestamps.
Late but hope it helps.
Im working with a df that looks like this :
trans_id amount month day hour
2018-08-18 12:59:59+00:00 1 46 8 18 12
2018-08-26 01:56:55+00:00 2 20 8 26 1
I intend to get the average ‘amount’ at each hour.I use the following code to do that:
df2 = df.groupby(['month', 'day', 'day_name', 'hour'], as_index = False)['amount'].sum()
That gives me the total amount each month day day_name hour combination which is ok. But when I count the total hours for each day they all are not 24 as expected. I imagine due to the fact that some transactions don’t exist at that specific (month day day_name hour).
My question is how do i get all 24h irrelevant if they have records or not.
Thanks
Use Series.unstack
with DataFrame.stack
:
df2 = (df.groupby(['month', 'day', 'day_name', 'hour'])['amount']
.sum()
.unstack(fill_value=0)
.stack()
.reset_index())
I hope not to be wrong, but you can try this:
df2 = df.resample('1H').sum().copy()
This will resample your dataset for every hour from 0 to 23 and will sum the values. It will also create the nan for missing timestamps.
Late but hope it helps.