DataFrame downsampling

Question:

In a DataFrame with too much resolution in column H, the goal is to downsample that column, and sum the values of the other columns.

My column H is a float and does not represent time. The other columns are counters of events. So when downsampling H, the values from other columns must be added.

> df = pd.DataFrame(
            data=[
              [1.0, 4, 2],
              [1.5, 3, 2],
              [2.0, 3, 3],
              [2.5, 2, 5]
          ],
          columns=['H', 'A', 'B']
     )
> df
         H   A  B
     0  1.0  4  2
     1  1.5  3  2
     2  2.0  3  3
     3  2.5  2  5

I’d like column H to have an interval of 1.0 rather than 0.5, adding the values of the other columns:

         H   A  B
     0  1.0  7  4
     1  2.0  5  8

Which I can do by:

> def downsample(x):
      return int(x)

> df2 = df.groupby(df.H.apply(downsample)).sum()
> df2

     H  A  B
H
1  2.5  7  4
2  4.5  5  8

But then I’m left with garbage which must be cleaned:

> del df2['H']
> df2.reset_index(inplace=True)
> df2
    H   A   B
0   1   7   4
1   2   5   8

Is there an easier way to do this?

Asked By: Raf

||

Answers:

You can drop the column before groupby:

df.drop(columns=['H']).groupby(df['H']//1).sum()

Output:

     A  B
H        
1.0  7  4
2.0  5  8
Answered By: Quang Hoang

Maybe this is what you are looking for:

df.set_index(df.H.apply(lambda x:int(x)))[['A', 'B']].groupby('H').sum()

Result:

   A  B
H      
1  7  4
2  5  8
Answered By: René
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.