pandas resample documentation
Question:
So I completely understand how to use resample, but the documentation does not do a good job explaining the options.
So most options in the resample
function are pretty straight forward except for these two:
- rule : the offset string or object representing target conversion
- how : string, method for down- or re-sampling, default to ‘mean’
So from looking at as many examples as I found online I can see for rule you can do 'D'
for day, 'xMin'
for minutes, 'xL'
for milliseconds, but that is all I could find.
for how I have seen the following: 'first'
, np.max
, 'last'
, 'mean'
, and 'n1n2n3n4...nx'
where nx is the first letter of each column index.
So is there somewhere in the documentation that I am missing that displays every option for pandas.resample
‘s rule and how inputs? If yes, where because I could not find it. If no, what are all the options for them?
Answers:
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
SM semi-month end frequency (15th and end of month)
BM business month end frequency
CBM custom business month end frequency
MS month start frequency
SMS semi-month start frequency (1st and 15th)
BMS business month start frequency
CBMS custom business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA, BY business year end frequency
AS, YS year start frequency
BAS, BYS business year start frequency
BH business hour frequency
H hourly frequency
T, min minutely frequency
S secondly frequency
L, ms milliseconds
U, us microseconds
N nanoseconds
See the timeseries documentation. It includes a list of offsets (and ‘anchored’ offsets), and a section about resampling.
Note that there isn’t a list of all the different how
options, because it can be any NumPy array function and any function that is available via groupby dispatching can be passed to how
by name.
There’s more to it than this, but you’re probably looking for this list:
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
BM business month end frequency
MS month start frequency
BMS business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA business year end frequency
AS year start frequency
BAS business year start frequency
H hourly frequency
T minutely frequency
S secondly frequency
L milliseconds
U microseconds
Source: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
If you not sure what you will get, use this function:
from pandas.tseries.frequencies import to_offset
print(to_offset("7D")) # <7 * Days>
print(to_offset("W")) # <Week: weekday=6>
print(to_offset("M")) # <MonthEnd>
print(to_offset("m")) # <MonthEnd>
print(to_offset("min")) # <Minute>
for example, uppercase and lowercase are the same (not like the usual M=Month and m=minute)
Be aware
that therefore this is not the same and gives you different results:
s.resample("7d").mean()
s.resample("W").mean() # is not the same!
The reason you can see a here: "Warning: The default values for label and closed is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’."
So I completely understand how to use resample, but the documentation does not do a good job explaining the options.
So most options in the resample
function are pretty straight forward except for these two:
- rule : the offset string or object representing target conversion
- how : string, method for down- or re-sampling, default to ‘mean’
So from looking at as many examples as I found online I can see for rule you can do 'D'
for day, 'xMin'
for minutes, 'xL'
for milliseconds, but that is all I could find.
for how I have seen the following: 'first'
, np.max
, 'last'
, 'mean'
, and 'n1n2n3n4...nx'
where nx is the first letter of each column index.
So is there somewhere in the documentation that I am missing that displays every option for pandas.resample
‘s rule and how inputs? If yes, where because I could not find it. If no, what are all the options for them?
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
SM semi-month end frequency (15th and end of month)
BM business month end frequency
CBM custom business month end frequency
MS month start frequency
SMS semi-month start frequency (1st and 15th)
BMS business month start frequency
CBMS custom business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA, BY business year end frequency
AS, YS year start frequency
BAS, BYS business year start frequency
BH business hour frequency
H hourly frequency
T, min minutely frequency
S secondly frequency
L, ms milliseconds
U, us microseconds
N nanoseconds
See the timeseries documentation. It includes a list of offsets (and ‘anchored’ offsets), and a section about resampling.
Note that there isn’t a list of all the different how
options, because it can be any NumPy array function and any function that is available via groupby dispatching can be passed to how
by name.
There’s more to it than this, but you’re probably looking for this list:
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
BM business month end frequency
MS month start frequency
BMS business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA business year end frequency
AS year start frequency
BAS business year start frequency
H hourly frequency
T minutely frequency
S secondly frequency
L milliseconds
U microseconds
Source: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
If you not sure what you will get, use this function:
from pandas.tseries.frequencies import to_offset
print(to_offset("7D")) # <7 * Days>
print(to_offset("W")) # <Week: weekday=6>
print(to_offset("M")) # <MonthEnd>
print(to_offset("m")) # <MonthEnd>
print(to_offset("min")) # <Minute>
for example, uppercase and lowercase are the same (not like the usual M=Month and m=minute)
Be aware
that therefore this is not the same and gives you different results:
s.resample("7d").mean()
s.resample("W").mean() # is not the same!
The reason you can see a here: "Warning: The default values for label and closed is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’."