No numeric types to aggregate – change in groupby() behaviour?

Question

I have a problem with some groupy code which I’m quite sure once ran (on an older pandas version). On 0.9, I get No numeric types to aggregate errors. Any ideas?

In [31]: data
Out[31]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)

In [32]: latedges = linspace(-90., 90., 73)

In [33]: lats_new = linspace(-87.5, 87.5, 72)

In [34]: def _get_gridbox_label(x, bins, labels):
   ....:             return labels[searchsorted(bins, x) - 1]
   ....: 

In [35]: lat_bucket = lambda x: _get_gridbox_label(x, latedges, lats_new)

In [36]: data.T.groupby(lat_bucket).mean()
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-36-ed9c538ac526> in <module>()
----> 1 data.T.groupby(lat_bucket).mean()

/usr/lib/python2.7/site-packages/pandas/core/groupby.py in mean(self)
    295         """
    296         try:
--> 297             return self._cython_agg_general('mean')
    298         except DataError:
    299             raise

/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
   1415 
   1416     def _cython_agg_general(self, how, numeric_only=True):
-> 1417         new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   1418         return self._wrap_agged_blocks(new_blocks)
   1419 

/usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_blocks(self, how, numeric_only)
   1455 
   1456         if len(new_blocks) == 0:
-> 1457             raise DataError('No numeric types to aggregate')
   1458 
   1459         return new_blocks

DataError: No numeric types to aggregate

Asked By: andreas-h

||

Source

Answer 1

How are you generating your data?

See how the output shows that your data is of ‘object’ type? the groupby operations specifically check whether each column is a numeric dtype first.

In [31]: data
Out[31]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00
Freq: <1 DateOffset>
Columns: 360 entries, -89.75 to 89.75
dtypes: object(360)

look ↑

Did you initialize an empty DataFrame first and then filled it? If so that’s probably why it changed with the new version as before 0.9 empty DataFrames were initialized to float type but now they are of object type. If so you can change the initialization to DataFrame(dtype=float).

You can also call frame.astype(float)

Answered By: Chang She

Answer 2

I got this error generating a data frame consisting of timestamps and data:

df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp))

Adding the suggested solution works for me:

df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp), dtype=float))

Thanks Chang She!

Example:

                     data
2005-01-01 00:10:00  7.53
2005-01-01 00:20:00  7.54
2005-01-01 00:30:00  7.62
2005-01-01 00:40:00  7.68
2005-01-01 00:50:00  7.81
2005-01-01 01:00:00  7.95
2005-01-01 01:10:00  7.96
2005-01-01 01:20:00  7.95
2005-01-01 01:30:00  7.98
2005-01-01 01:40:00  8.06
2005-01-01 01:50:00  8.04
2005-01-01 02:00:00  8.06
2005-01-01 02:10:00  8.12
2005-01-01 02:20:00  8.12
2005-01-01 02:30:00  8.25
2005-01-01 02:40:00  8.27
2005-01-01 02:50:00  8.17
2005-01-01 03:00:00  8.21
2005-01-01 03:10:00  8.29
2005-01-01 03:20:00  8.31
2005-01-01 03:30:00  8.25
2005-01-01 03:40:00  8.19
2005-01-01 03:50:00  8.17
2005-01-01 04:00:00  8.18
                     data
2005-01-01 00:00:00  7.636000
2005-01-01 01:00:00  7.990000
2005-01-01 02:00:00  8.165000
2005-01-01 03:00:00  8.236667
2005-01-01 04:00:00  8.180000

Answered By: user4952560

Answer 3

I got this done by :

data_frame.groupby(COL1).COL2.apply(np.mean).reset_index()

Answered By: hassan firasat

Answer 4

Got the same problem here, searched for so long just to realize my values were not floats but strings.

Here is what solved my issue:

df["column_name"] = pd.to_numeric(df["column_name"], downcast="float")

Answered By: Augustin

Answer 5

I got this error when calling the mean() method from groupby on a column which was an int/object data type. It was solved by casting the column as a float like this:

df['column_name'] = df['column_name'].astype('float')

Answered By: tsando

No numeric types to aggregate – change in groupby() behaviour?

Question:

Answers: