DataFrame downsampling
Question:
In a DataFrame with too much resolution in column H
, the goal is to downsample that column, and sum the values of the other columns.
My column H
is a float and does not represent time. The other columns are counters of events. So when downsampling H, the values from other columns must be added.
> df = pd.DataFrame(
data=[
[1.0, 4, 2],
[1.5, 3, 2],
[2.0, 3, 3],
[2.5, 2, 5]
],
columns=['H', 'A', 'B']
)
> df
H A B
0 1.0 4 2
1 1.5 3 2
2 2.0 3 3
3 2.5 2 5
I’d like column H
to have an interval of 1.0 rather than 0.5, adding the values of the other columns:
H A B
0 1.0 7 4
1 2.0 5 8
Which I can do by:
> def downsample(x):
return int(x)
> df2 = df.groupby(df.H.apply(downsample)).sum()
> df2
H A B
H
1 2.5 7 4
2 4.5 5 8
But then I’m left with garbage which must be cleaned:
> del df2['H']
> df2.reset_index(inplace=True)
> df2
H A B
0 1 7 4
1 2 5 8
Is there an easier way to do this?
Answers:
You can drop the column before groupby:
df.drop(columns=['H']).groupby(df['H']//1).sum()
Output:
A B
H
1.0 7 4
2.0 5 8
Maybe this is what you are looking for:
df.set_index(df.H.apply(lambda x:int(x)))[['A', 'B']].groupby('H').sum()
Result:
A B
H
1 7 4
2 5 8
In a DataFrame with too much resolution in column H
, the goal is to downsample that column, and sum the values of the other columns.
My column H
is a float and does not represent time. The other columns are counters of events. So when downsampling H, the values from other columns must be added.
> df = pd.DataFrame(
data=[
[1.0, 4, 2],
[1.5, 3, 2],
[2.0, 3, 3],
[2.5, 2, 5]
],
columns=['H', 'A', 'B']
)
> df
H A B
0 1.0 4 2
1 1.5 3 2
2 2.0 3 3
3 2.5 2 5
I’d like column H
to have an interval of 1.0 rather than 0.5, adding the values of the other columns:
H A B
0 1.0 7 4
1 2.0 5 8
Which I can do by:
> def downsample(x):
return int(x)
> df2 = df.groupby(df.H.apply(downsample)).sum()
> df2
H A B
H
1 2.5 7 4
2 4.5 5 8
But then I’m left with garbage which must be cleaned:
> del df2['H']
> df2.reset_index(inplace=True)
> df2
H A B
0 1 7 4
1 2 5 8
Is there an easier way to do this?
You can drop the column before groupby:
df.drop(columns=['H']).groupby(df['H']//1).sum()
Output:
A B
H
1.0 7 4
2.0 5 8
Maybe this is what you are looking for:
df.set_index(df.H.apply(lambda x:int(x)))[['A', 'B']].groupby('H').sum()
Result:
A B
H
1 7 4
2 5 8