No numeric types to aggregate – change in groupby() behaviour?
Question:
I have a problem with some groupy code which I’m quite sure once ran (on an older pandas version). On 0.9, I get No numeric types to aggregate errors. Any ideas?
In [31]: data
Out[31]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)
In [32]: latedges = linspace(-90., 90., 73)
In [33]: lats_new = linspace(-87.5, 87.5, 72)
In [34]: def _get_gridbox_label(x, bins, labels):
....: return labels[searchsorted(bins, x) - 1]
....:
In [35]: lat_bucket = lambda x: _get_gridbox_label(x, latedges, lats_new)
In [36]: data.T.groupby(lat_bucket).mean()
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-36-ed9c538ac526> in <module>()
----> 1 data.T.groupby(lat_bucket).mean()
/usr/lib/python2.7/site-packages/pandas/core/groupby.py in mean(self)
295 """
296 try:
--> 297 return self._cython_agg_general('mean')
298 except DataError:
299 raise
/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
1415
1416 def _cython_agg_general(self, how, numeric_only=True):
-> 1417 new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
1418 return self._wrap_agged_blocks(new_blocks)
1419
/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_blocks(self, how, numeric_only)
1455
1456 if len(new_blocks) == 0:
-> 1457 raise DataError('No numeric types to aggregate')
1458
1459 return new_blocks
DataError: No numeric types to aggregate
Answers:
How are you generating your data?
See how the output shows that your data is of ‘object’ type? the groupby operations specifically check whether each column is a numeric dtype first.
In [31]: data
Out[31]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)
look ↑
Did you initialize an empty DataFrame first and then filled it? If so that’s probably why it changed with the new version as before 0.9 empty DataFrames were initialized to float type but now they are of object type. If so you can change the initialization to DataFrame(dtype=float)
.
You can also call frame.astype(float)
I got this error generating a data frame consisting of timestamps and data:
df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp))
Adding the suggested solution works for me:
df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp), dtype=float))
Thanks Chang She!
Example:
data
2005-01-01 00:10:00 7.53
2005-01-01 00:20:00 7.54
2005-01-01 00:30:00 7.62
2005-01-01 00:40:00 7.68
2005-01-01 00:50:00 7.81
2005-01-01 01:00:00 7.95
2005-01-01 01:10:00 7.96
2005-01-01 01:20:00 7.95
2005-01-01 01:30:00 7.98
2005-01-01 01:40:00 8.06
2005-01-01 01:50:00 8.04
2005-01-01 02:00:00 8.06
2005-01-01 02:10:00 8.12
2005-01-01 02:20:00 8.12
2005-01-01 02:30:00 8.25
2005-01-01 02:40:00 8.27
2005-01-01 02:50:00 8.17
2005-01-01 03:00:00 8.21
2005-01-01 03:10:00 8.29
2005-01-01 03:20:00 8.31
2005-01-01 03:30:00 8.25
2005-01-01 03:40:00 8.19
2005-01-01 03:50:00 8.17
2005-01-01 04:00:00 8.18
data
2005-01-01 00:00:00 7.636000
2005-01-01 01:00:00 7.990000
2005-01-01 02:00:00 8.165000
2005-01-01 03:00:00 8.236667
2005-01-01 04:00:00 8.180000
I got this done by :
data_frame.groupby(COL1).COL2.apply(np.mean).reset_index()
Got the same problem here, searched for so long just to realize my values were not floats but strings.
Here is what solved my issue:
df["column_name"] = pd.to_numeric(df["column_name"], downcast="float")
I got this error when calling the mean()
method from groupby
on a column which was an int/object
data type. It was solved by casting the column as a float
like this:
df['column_name'] = df['column_name'].astype('float')
I have a problem with some groupy code which I’m quite sure once ran (on an older pandas version). On 0.9, I get No numeric types to aggregate errors. Any ideas?
In [31]: data
Out[31]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)
In [32]: latedges = linspace(-90., 90., 73)
In [33]: lats_new = linspace(-87.5, 87.5, 72)
In [34]: def _get_gridbox_label(x, bins, labels):
....: return labels[searchsorted(bins, x) - 1]
....:
In [35]: lat_bucket = lambda x: _get_gridbox_label(x, latedges, lats_new)
In [36]: data.T.groupby(lat_bucket).mean()
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-36-ed9c538ac526> in <module>()
----> 1 data.T.groupby(lat_bucket).mean()
/usr/lib/python2.7/site-packages/pandas/core/groupby.py in mean(self)
295 """
296 try:
--> 297 return self._cython_agg_general('mean')
298 except DataError:
299 raise
/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
1415
1416 def _cython_agg_general(self, how, numeric_only=True):
-> 1417 new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
1418 return self._wrap_agged_blocks(new_blocks)
1419
/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_blocks(self, how, numeric_only)
1455
1456 if len(new_blocks) == 0:
-> 1457 raise DataError('No numeric types to aggregate')
1458
1459 return new_blocks
DataError: No numeric types to aggregate
How are you generating your data?
See how the output shows that your data is of ‘object’ type? the groupby operations specifically check whether each column is a numeric dtype first.
In [31]: data
Out[31]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)
look ↑
Did you initialize an empty DataFrame first and then filled it? If so that’s probably why it changed with the new version as before 0.9 empty DataFrames were initialized to float type but now they are of object type. If so you can change the initialization to DataFrame(dtype=float)
.
You can also call frame.astype(float)
I got this error generating a data frame consisting of timestamps and data:
df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp))
Adding the suggested solution works for me:
df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp), dtype=float))
Thanks Chang She!
Example:
data
2005-01-01 00:10:00 7.53
2005-01-01 00:20:00 7.54
2005-01-01 00:30:00 7.62
2005-01-01 00:40:00 7.68
2005-01-01 00:50:00 7.81
2005-01-01 01:00:00 7.95
2005-01-01 01:10:00 7.96
2005-01-01 01:20:00 7.95
2005-01-01 01:30:00 7.98
2005-01-01 01:40:00 8.06
2005-01-01 01:50:00 8.04
2005-01-01 02:00:00 8.06
2005-01-01 02:10:00 8.12
2005-01-01 02:20:00 8.12
2005-01-01 02:30:00 8.25
2005-01-01 02:40:00 8.27
2005-01-01 02:50:00 8.17
2005-01-01 03:00:00 8.21
2005-01-01 03:10:00 8.29
2005-01-01 03:20:00 8.31
2005-01-01 03:30:00 8.25
2005-01-01 03:40:00 8.19
2005-01-01 03:50:00 8.17
2005-01-01 04:00:00 8.18
data
2005-01-01 00:00:00 7.636000
2005-01-01 01:00:00 7.990000
2005-01-01 02:00:00 8.165000
2005-01-01 03:00:00 8.236667
2005-01-01 04:00:00 8.180000
I got this done by :
data_frame.groupby(COL1).COL2.apply(np.mean).reset_index()
Got the same problem here, searched for so long just to realize my values were not floats but strings.
Here is what solved my issue:
df["column_name"] = pd.to_numeric(df["column_name"], downcast="float")
I got this error when calling the mean()
method from groupby
on a column which was an int/object
data type. It was solved by casting the column as a float
like this:
df['column_name'] = df['column_name'].astype('float')