How to resample starting from the first element in pandas?
Question:
I am resampling the following table/data:
Timestamp L_x L_y L_a R_x R_y R_a
2403950 621.3 461.3 313 623.3 461.8 260
2403954 622.5 461.3 312 623.3 462.6 260
2403958 623.1 461.5 311 623.4 464 261
2403962 623.6 461.7 310 623.7 465.4 261
2403966 623.8 461.5 309 623.9 466.1 261
2403970 620.9 461.4 309 623.8 465.9 259
2403974 621.7 461.1 308 623 464.8 258
2403978 622.1 461.1 308 621.9 463.9 256
2403982 622.5 461.5 308 621 463.4 255
2403986 622.4 462.1 307 620.7 463.3 254
The table goes on and on like that.
The timestamps are in milliseconds. I did the following to resample it into 100milliseconds bin time:
-
I changed the timestamp index into a datetime format
df.index = pd.to_datetime((df.index.values*1e6).astype(int))
-
I resampled it in 100milliseconds:
df = df.resample('100L')
The resulting resampled data look like the following:
Timestamp L_x L_y L_a R_x R_y R_a
2403900 621.3 461.3 313 623.3 461.8 260
2404000 622.5 461.3 312 623.3 462.6 260
2404100 623.1 461.5 311 623.4 464 261
2404200 623.6 461.7 310 623.7 465.4 261
2404300 623.8 461.5 309 623.9 466.1 261
As we can see the first bin time is 2403900, which is 50milliseconds behind the first timestamp index of the original table. But i wanted the bin time to start from the first timestamp index from the original table, which is 2403950. like the following:
Timestamp L_x L_y L_a R_x R_y R_a
2403950 621.3 461.3 313 623.3 461.8 260
2404050 622.5 461.3 312 623.3 462.6 260
2404150 623.1 461.5 311 623.4 464 261
2404250 623.6 461.7 310 623.7 465.4 261
2404350 623.8 461.5 309 623.9 466.1 261
Answers:
You can specify an offset:
df.resample('100L', loffset='50L')
UPDATE
Of course you can always calculate the offset:
offset = df.index[0] % 100
df.index = pd.to_datetime((df.index.values*1e6).astype(int))
df.resample('100L', loffset='{}L'.format(offset))
A much simpler (and general) solution is to just add base=1
to your resampling function:
df = df.resample('100L', base=1)
A dynamic solution that also works with Pandas Timestamp
objects (often used to index Timeseries data), or strictly numerical index values, is to use the origin
argument with the resample
method as such:
df = df.resample("15min", origin=df.index[0])
Where the "15min" would represent the sampling frequency and the index[0]
argument essentially says:
"start sampling the desired frequency at the first value found in this DataFrame
‘s index"
AFAIK, this works for any combination of numerical value + a valid Timerseries
offset alias (see here) such as "15min", "4H", "1W", etc.
I am resampling the following table/data:
Timestamp L_x L_y L_a R_x R_y R_a
2403950 621.3 461.3 313 623.3 461.8 260
2403954 622.5 461.3 312 623.3 462.6 260
2403958 623.1 461.5 311 623.4 464 261
2403962 623.6 461.7 310 623.7 465.4 261
2403966 623.8 461.5 309 623.9 466.1 261
2403970 620.9 461.4 309 623.8 465.9 259
2403974 621.7 461.1 308 623 464.8 258
2403978 622.1 461.1 308 621.9 463.9 256
2403982 622.5 461.5 308 621 463.4 255
2403986 622.4 462.1 307 620.7 463.3 254
The table goes on and on like that.
The timestamps are in milliseconds. I did the following to resample it into 100milliseconds bin time:
-
I changed the timestamp index into a datetime format
df.index = pd.to_datetime((df.index.values*1e6).astype(int))
-
I resampled it in 100milliseconds:
df = df.resample('100L')
The resulting resampled data look like the following:
Timestamp L_x L_y L_a R_x R_y R_a
2403900 621.3 461.3 313 623.3 461.8 260
2404000 622.5 461.3 312 623.3 462.6 260
2404100 623.1 461.5 311 623.4 464 261
2404200 623.6 461.7 310 623.7 465.4 261
2404300 623.8 461.5 309 623.9 466.1 261
As we can see the first bin time is 2403900, which is 50milliseconds behind the first timestamp index of the original table. But i wanted the bin time to start from the first timestamp index from the original table, which is 2403950. like the following:
Timestamp L_x L_y L_a R_x R_y R_a
2403950 621.3 461.3 313 623.3 461.8 260
2404050 622.5 461.3 312 623.3 462.6 260
2404150 623.1 461.5 311 623.4 464 261
2404250 623.6 461.7 310 623.7 465.4 261
2404350 623.8 461.5 309 623.9 466.1 261
You can specify an offset:
df.resample('100L', loffset='50L')
UPDATE
Of course you can always calculate the offset:
offset = df.index[0] % 100
df.index = pd.to_datetime((df.index.values*1e6).astype(int))
df.resample('100L', loffset='{}L'.format(offset))
A much simpler (and general) solution is to just add base=1
to your resampling function:
df = df.resample('100L', base=1)
A dynamic solution that also works with Pandas Timestamp
objects (often used to index Timeseries data), or strictly numerical index values, is to use the origin
argument with the resample
method as such:
df = df.resample("15min", origin=df.index[0])
Where the "15min" would represent the sampling frequency and the index[0]
argument essentially says:
"start sampling the desired frequency at the first value found in this DataFrame
‘s index"
AFAIK, this works for any combination of numerical value + a valid Timerseries
offset alias (see here) such as "15min", "4H", "1W", etc.