How to convert DatetimeIndexResampler to DataFrame?
Question:
I want to build a matrix from series but before that I have to resample those series. However, to avoid processing the whole matrix twice with replace(np.nan, 0.0)
I want to append the dataframes to a collecting dataframe and then remove NaN
values in one pass.
So instead of
user_activities = user.groupby(["DOC_ACC_DT", "DOC_ACTV_CD"]).agg("sum")["SUM_DOC_CNT"].unstack().resample("1D").replace(np.nan, 0)
df = df.append(user_activities[activity].rename(user_id))
I want
user_activities = user.groupby(["DOC_ACC_DT", "DOC_ACTV_CD"]).agg("sum")["SUM_DOC_CNT"].unstack().resample("1D")
df = df.append(user_activities[activity].rename(user_id))
but that is not working because user_activities
is not a dataframe after resample()
.
The error suggests that I try apply()
but that method expects a parameter:
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _make_wrapper(self, name)
507 "using the 'apply' method".format(kind, name,
508 type(self).__name__))
--> 509 raise AttributeError(msg)
510
511 # need to setup the selection
AttributeError: Cannot access callable attribute 'rename' of 'SeriesGroupBy' objects, try using the 'apply' method
How can I solve this issue?
Answers:
The interface to .resample
has changed in Pandas 0.18.0 to be more groupby-like and hence more flexible ie resample
no longer returns a DataFrame: it’s now “lazyly evaluated” at the moment of the aggregation or interpolation.
I suggest reading resample API changes http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#resample-api
See also:
-
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling
-
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html
for upscaling
df.resample("1D").interpolate()
for downscaling
using mean
df.resample("1D").mean()
using OHLC
ie open high low close values or first maximal minimal last values
df.resample("1D").ohlc()
One way is to use .aggregate
.
As per the docs, note first that .agg
is an alias for it, and is preferred:
agg is an alias for aggregate. Use the alias.
It can be used as in the example below:
df.resample('1D').agg({'close': 'last', 'open': 'first'})
This returns a dataframe.
I want to build a matrix from series but before that I have to resample those series. However, to avoid processing the whole matrix twice with replace(np.nan, 0.0)
I want to append the dataframes to a collecting dataframe and then remove NaN
values in one pass.
So instead of
user_activities = user.groupby(["DOC_ACC_DT", "DOC_ACTV_CD"]).agg("sum")["SUM_DOC_CNT"].unstack().resample("1D").replace(np.nan, 0)
df = df.append(user_activities[activity].rename(user_id))
I want
user_activities = user.groupby(["DOC_ACC_DT", "DOC_ACTV_CD"]).agg("sum")["SUM_DOC_CNT"].unstack().resample("1D")
df = df.append(user_activities[activity].rename(user_id))
but that is not working because user_activities
is not a dataframe after resample()
.
The error suggests that I try apply()
but that method expects a parameter:
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _make_wrapper(self, name)
507 "using the 'apply' method".format(kind, name,
508 type(self).__name__))
--> 509 raise AttributeError(msg)
510
511 # need to setup the selection
AttributeError: Cannot access callable attribute 'rename' of 'SeriesGroupBy' objects, try using the 'apply' method
How can I solve this issue?
The interface to .resample
has changed in Pandas 0.18.0 to be more groupby-like and hence more flexible ie resample
no longer returns a DataFrame: it’s now “lazyly evaluated” at the moment of the aggregation or interpolation.
I suggest reading resample API changes http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#resample-api
See also:
-
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling
-
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html
for upscaling
df.resample("1D").interpolate()
for downscaling
using mean
df.resample("1D").mean()
using OHLC
ie open high low close values or first maximal minimal last values
df.resample("1D").ohlc()
One way is to use .aggregate
.
As per the docs, note first that .agg
is an alias for it, and is preferred:
agg is an alias for aggregate. Use the alias.
It can be used as in the example below:
df.resample('1D').agg({'close': 'last', 'open': 'first'})
This returns a dataframe.