Odd behavior in Pandas resample
Question:
I have a problem using pandas.DataFrame.resample with Datetime Index.
My df is:
Column1
2022-08-03 08:48:34 9217.02
2022-08-17 17:14:39 6229.27
2022-08-31 17:17:00 6229.27
2022-09-14 18:12:14 5939.54
2022-09-30 17:51:48 6229.27
2022-10-14 15:26:14 5939.54
2022-10-31 16:29:14 5939.54
2022-11-15 18:10:27 5939.54
2022-11-30 18:10:23 5939.54
2022-12-19 10:53:21 5939.54
2022-12-20 16:26:08 2440.98
2022-12-30 18:30:25 6302.54
2023-01-13 19:24:22 6262.74
2023-01-31 16:51:44 6262.74
The desired output is a sum of the bin from the 1st day of the month to the 15th (taking into account all hours of the day), and from the 16th to the last day of the month, something like this:
Column1
2022-08-15 9217.02
2022-08-31 12458.54
2022-09-15 5939.54
2022-09-30 6229.27
2022-10-15 5939.54
2022-10-31 5939.54
2022-11-15 5939.54
2022-11-30 5939.54
2022-12-15 0.0
2022-12-31 14683.06
2023-01-15 6262.74
2023-01-31 6262.74
The output that I get:
df.resample('SM').sum()
Column1
2022-07-31 9217.02
2022-08-15 6229.27
2022-08-31 12168.81
2022-09-15 0.00
2022-09-30 12168.81
2022-10-15 0.00
2022-10-31 5939.54
2022-11-15 5939.54
2022-11-30 5939.54
2022-12-15 14683.06
2022-12-31 6262.74
2023-01-15 0.00
2023-01-31 6262.74
It looks like the time part is messing with the resample function and it performs the sum() function on wrong bins.
I also tried with df.resample('SM', label='right')
to fix the first bin "2022-08-15", but he gave me another bin at the end instead -> "2023-02-15".
Is there something I’m doing wrong?
Answers:
Normalize/round off the time component of datetime index, then resample
with arguments label='right'
and closed='right'
df.index = df.index.normalize()
df.resample('SM', label='right', closed='right').sum()
Result
Column1
2022-08-15 9217.02
2022-08-31 12458.54
2022-09-15 5939.54
2022-09-30 6229.27
2022-10-15 5939.54
2022-10-31 5939.54
2022-11-15 5939.54
2022-11-30 5939.54
2022-12-15 0.00
2022-12-31 14683.06
2023-01-15 6262.74
2023-01-31 6262.74
2023-02-15 0.00
I have a problem using pandas.DataFrame.resample with Datetime Index.
My df is:
Column1
2022-08-03 08:48:34 9217.02
2022-08-17 17:14:39 6229.27
2022-08-31 17:17:00 6229.27
2022-09-14 18:12:14 5939.54
2022-09-30 17:51:48 6229.27
2022-10-14 15:26:14 5939.54
2022-10-31 16:29:14 5939.54
2022-11-15 18:10:27 5939.54
2022-11-30 18:10:23 5939.54
2022-12-19 10:53:21 5939.54
2022-12-20 16:26:08 2440.98
2022-12-30 18:30:25 6302.54
2023-01-13 19:24:22 6262.74
2023-01-31 16:51:44 6262.74
The desired output is a sum of the bin from the 1st day of the month to the 15th (taking into account all hours of the day), and from the 16th to the last day of the month, something like this:
Column1
2022-08-15 9217.02
2022-08-31 12458.54
2022-09-15 5939.54
2022-09-30 6229.27
2022-10-15 5939.54
2022-10-31 5939.54
2022-11-15 5939.54
2022-11-30 5939.54
2022-12-15 0.0
2022-12-31 14683.06
2023-01-15 6262.74
2023-01-31 6262.74
The output that I get:
df.resample('SM').sum()
Column1
2022-07-31 9217.02
2022-08-15 6229.27
2022-08-31 12168.81
2022-09-15 0.00
2022-09-30 12168.81
2022-10-15 0.00
2022-10-31 5939.54
2022-11-15 5939.54
2022-11-30 5939.54
2022-12-15 14683.06
2022-12-31 6262.74
2023-01-15 0.00
2023-01-31 6262.74
It looks like the time part is messing with the resample function and it performs the sum() function on wrong bins.
I also tried with df.resample('SM', label='right')
to fix the first bin "2022-08-15", but he gave me another bin at the end instead -> "2023-02-15".
Is there something I’m doing wrong?
Normalize/round off the time component of datetime index, then resample
with arguments label='right'
and closed='right'
df.index = df.index.normalize()
df.resample('SM', label='right', closed='right').sum()
Result
Column1
2022-08-15 9217.02
2022-08-31 12458.54
2022-09-15 5939.54
2022-09-30 6229.27
2022-10-15 5939.54
2022-10-31 5939.54
2022-11-15 5939.54
2022-11-30 5939.54
2022-12-15 0.00
2022-12-31 14683.06
2023-01-15 6262.74
2023-01-31 6262.74
2023-02-15 0.00