Odd behavior in Pandas resample

Question:

I have a problem using pandas.DataFrame.resample with Datetime Index.
My df is:

                        Column1
 2022-08-03 08:48:34    9217.02
 2022-08-17 17:14:39    6229.27
 2022-08-31 17:17:00    6229.27
 2022-09-14 18:12:14    5939.54
 2022-09-30 17:51:48    6229.27
 2022-10-14 15:26:14    5939.54
 2022-10-31 16:29:14    5939.54
 2022-11-15 18:10:27    5939.54
 2022-11-30 18:10:23    5939.54
 2022-12-19 10:53:21    5939.54
 2022-12-20 16:26:08    2440.98
 2022-12-30 18:30:25    6302.54
 2023-01-13 19:24:22    6262.74
 2023-01-31 16:51:44    6262.74

The desired output is a sum of the bin from the 1st day of the month to the 15th (taking into account all hours of the day), and from the 16th to the last day of the month, something like this:

                        Column1
          2022-08-15     9217.02
          2022-08-31    12458.54
          2022-09-15     5939.54
          2022-09-30     6229.27
          2022-10-15     5939.54
          2022-10-31     5939.54
          2022-11-15     5939.54
          2022-11-30     5939.54
          2022-12-15        0.0   
          2022-12-31    14683.06 
          2023-01-15     6262.74
          2023-01-31     6262.74

The output that I get:

         df.resample('SM').sum()

                        Column1  
          2022-07-31     9217.02
          2022-08-15     6229.27
          2022-08-31    12168.81
          2022-09-15        0.00
          2022-09-30    12168.81
          2022-10-15        0.00
          2022-10-31     5939.54
          2022-11-15     5939.54
          2022-11-30     5939.54
          2022-12-15    14683.06
          2022-12-31     6262.74
          2023-01-15        0.00
          2023-01-31     6262.74

It looks like the time part is messing with the resample function and it performs the sum() function on wrong bins.
I also tried with df.resample('SM', label='right') to fix the first bin "2022-08-15", but he gave me another bin at the end instead -> "2023-02-15".

Is there something I’m doing wrong?

Asked By: iXrst

||

Answers:

Normalize/round off the time component of datetime index, then resample with arguments label='right' and closed='right'

df.index = df.index.normalize()
df.resample('SM', label='right', closed='right').sum()

Result

             Column1
2022-08-15   9217.02
2022-08-31  12458.54
2022-09-15   5939.54
2022-09-30   6229.27
2022-10-15   5939.54
2022-10-31   5939.54
2022-11-15   5939.54
2022-11-30   5939.54
2022-12-15      0.00
2022-12-31  14683.06
2023-01-15   6262.74
2023-01-31   6262.74
2023-02-15      0.00
Answered By: Shubham Sharma
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.