Resampling timeseries dataframe with multi-index

Question

Generate data:

import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(freq=f'{FREQ}T',start='2020-10-01',periods=(12)*24))
df['col1'] = np.random.normal(size = df.shape[0])
df['col2'] = np.random.random_integers(1, 100, size= df.shape[0])
df['uid'] = 1
df2 = pd.DataFrame(index=pd.date_range(freq=f'{FREQ}T',start='2020-10-01',periods=(12)*24))
df2['col1'] = np.random.normal(size = df2.shape[0])
df2['col2'] = np.random.random_integers(1, 50, size= df2.shape[0])
df2['uid'] = 2
df3=pd.concat([df, df2]).reset_index()
df3=df3.set_index(['index','uid'])

I am trying to resample the data to 30min intervals and assign how to aggregate the data for each uid and each column individually. I have many columns and I need to assign whether if I want the mean, median, std, max, min, for each column. Since there are duplicate timestamps I need to do this operation for each user, that’s why I try to set the multiindex and do the following:

df3.groupby(pd.Grouper(freq='30Min',closed='right',label='right')).agg({
    "col1":  "max", "col2": "min", 'uid':'max'})

but I get the following error

ValueError: MultiIndex has no single backing array. Use
‘MultiIndex.to_numpy()’ to get a NumPy array of tuples.

How can I do this operation?

Asked By: prof32

||

Source

Answer 1

You have to specify the level name when you use pd.Grouper on index:

out = (df3.groupby([pd.Grouper(level='index', freq='30T', closed='right', label='right'), 'uid'])
          .agg({"col1":  "max", "col2": "min"}))
print(out)

# Output
                             col1  col2
index               uid                
2020-10-01 00:00:00 1   -0.222489    77
                    2   -1.490019    22
2020-10-01 00:30:00 1    1.556801    16
                    2    0.580076     1
2020-10-01 01:00:00 1    0.745477    12
...                           ...   ...
2020-10-02 23:00:00 2    0.272276    13
2020-10-02 23:30:00 1    0.378779    20
                    2    0.786048     5
2020-10-03 00:00:00 1    1.716791    20
                    2    1.438454     5

[194 rows x 2 columns]

Answered By: Corralien

Resampling timeseries dataframe with multi-index

Question:

Answers: