How to resample starting from the first element in pandas?

Question:

I am resampling the following table/data:

Timestamp  L_x   L_y    L_a     R_x     R_y     R_a
2403950   621.3 461.3   313     623.3   461.8   260
2403954   622.5 461.3   312     623.3   462.6   260
2403958   623.1 461.5   311     623.4   464     261
2403962   623.6 461.7   310     623.7   465.4   261
2403966   623.8 461.5   309     623.9   466.1   261
2403970   620.9 461.4   309     623.8   465.9   259
2403974   621.7 461.1   308     623     464.8   258
2403978   622.1 461.1   308     621.9   463.9   256
2403982   622.5 461.5   308     621     463.4   255
2403986   622.4 462.1   307     620.7   463.3   254

The table goes on and on like that.
The timestamps are in milliseconds. I did the following to resample it into 100milliseconds bin time:

  1. I changed the timestamp index into a datetime format

    df.index = pd.to_datetime((df.index.values*1e6).astype(int))

  2. I resampled it in 100milliseconds:

    df = df.resample('100L')

The resulting resampled data look like the following:

Timestamp  L_x   L_y    L_a     R_x     R_y     R_a
2403900   621.3 461.3   313     623.3   461.8   260
2404000   622.5 461.3   312     623.3   462.6   260
2404100   623.1 461.5   311     623.4   464     261
2404200   623.6 461.7   310     623.7   465.4   261
2404300   623.8 461.5   309     623.9   466.1   261

As we can see the first bin time is 2403900, which is 50milliseconds behind the first timestamp index of the original table. But i wanted the bin time to start from the first timestamp index from the original table, which is 2403950. like the following:

Timestamp  L_x   L_y    L_a     R_x     R_y     R_a
2403950   621.3 461.3   313     623.3   461.8   260
2404050   622.5 461.3   312     623.3   462.6   260
2404150   623.1 461.5   311     623.4   464     261
2404250   623.6 461.7   310     623.7   465.4   261
2404350   623.8 461.5   309     623.9   466.1   261
Asked By: Same

||

Answers:

You can specify an offset:

df.resample('100L', loffset='50L')

UPDATE

Of course you can always calculate the offset:

offset = df.index[0] % 100
df.index = pd.to_datetime((df.index.values*1e6).astype(int))
df.resample('100L', loffset='{}L'.format(offset))
Answered By: Mike Müller

A much simpler (and general) solution is to just add base=1 to your resampling function:

df = df.resample('100L', base=1)
Answered By: chrin

A dynamic solution that also works with Pandas Timestamp objects (often used to index Timeseries data), or strictly numerical index values, is to use the origin argument with the resample method as such:

df = df.resample("15min", origin=df.index[0])

Where the "15min" would represent the sampling frequency and the index[0] argument essentially says:

"start sampling the desired frequency at the first value found in this DataFrame‘s index"

AFAIK, this works for any combination of numerical value + a valid Timerseries offset alias (see here) such as "15min", "4H", "1W", etc.

Answered By: alphazwest
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.