Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

Question:

I’m trying to do some aggregations on a pandas data frame. Here is a sample code:

import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})

Out[1]: 
      Amount      
         Sum Count
User              
user1   18.0     2
user2   20.5     3
user3   10.5     1

Which generates the following warning:

FutureWarning: using a dict with renaming is deprecated and will be
removed in a future version return super(DataFrameGroupBy,
self).aggregate(arg, *args, **kwargs)

How can I avoid this?

Asked By: Victor Mayrink

||

Answers:

Use groupby apply and return a Series to rename columns

Use the groupby apply method to perform an aggregation that

  • Renames the columns
  • Allows for spaces in the names
  • Allows you to order the returned columns in any way you choose
  • Allows for interactions between columns
  • Returns a single level index and NOT a MultiIndex

To do this:

  • create a custom function that you pass to apply
  • This custom function is passed each group as a DataFrame
  • Return a Series
  • The index of the Series will be the new columns

Create fake data

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})

enter image description here

create custom function that returns a Series
The variable x inside of my_agg is a DataFrame

def my_agg(x):
    names = {
        'Amount mean': x['Amount'].mean(),
        'Amount std':  x['Amount'].std(),
        'Amount range': x['Amount'].max() - x['Amount'].min(),
        'Score Max':  x['Score'].max(),
        'Score Sum': x['Score'].sum(),
        'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
                                   'Score Sum', 'Score Max', 'Amount Score Sum'])

Pass this custom function to the groupby apply method

df.groupby('User').apply(my_agg)

enter image description here

The big downside is that this function will be much slower than agg for the cythonized aggregations

Using a dictionary with groupby agg method

Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.

df.groupby('User')['Amount'].agg(['sum', 'count'])

Output

       sum  count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1

It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
              "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
              'Other': [1,2,3,4,5,6]})

df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})

Output

      Amount       Other          
         sum count   max       std
User                              
user1   18.0     2     6  3.535534
user2   20.5     3     5  1.527525
user3   10.5     1     4       NaN
Answered By: Ted Petrou

If you replace the internal dictionary with a list of tuples it gets rid of the warning message

import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})
Answered By: Jacob Stevenson

This is what I did:

Create a fake dataset:

import pandas as pd
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})
df

O/P:

    Amount  Score   User
0   10.0    9   user1
1   5.0 1   user2
2   8.0 8   user2
3   10.5    7   user3
4   7.5 7   user2
5   8.0 6   user1
6   9.0 9   user3

I first made the User the index, and then a groupby:

ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
ans

Solution:

    Sum Count
User        
user1   18.0    2
user2   20.5    3
user3   19.5    2
Answered By: JodeCharger100

Replace the inner dictionaries with a list of correctly named functions.

To rename the function I’m using this utility function:

def aliased_aggr(aggr, name):
    if isinstance(aggr,str):
        def f(data):
            return data.agg(aggr)
    else:
        def f(data):
            return aggr(data)
    f.__name__ = name
    return f

The group-by statement then becomes:


df.groupby(["User"]).agg({"Amount": [ 
    aliased_aggr("sum","Sum"),
    aliased_aggr("count","Count")
]

If you have bigger, reusable aggregation specs, you can convert them with

def convert_aggr_spec(aggr_spec):
    return {
        col : [ 
            aliased_aggr(aggr,alias) for alias, aggr in aggr_map.items() 
        ]  
        for col, aggr_map in aggr_spec.items() 
    }

So you can say

df.groupby(["User"]).agg(convert_aggr_spec({"Amount": {"Sum": "sum", "Count": "count"}}))

See also https://github.com/pandas-dev/pandas/issues/18366#issuecomment-476597674

Answered By: plankthom

Update for Pandas 0.25+ Aggregation relabeling

import pandas as pd

print(pd.__version__)
#0.25.0

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby("User")['Amount'].agg(Sum='sum', Count='count')

Output:

        Sum  Count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1
Answered By: Scott Boston

This worked for me, Pandas version 1.2.4

For each column we add a list which consists of tuples:

df.groupby('column to group by').agg(
{'column name': [('new column name', 'function to apply')]})

Example

# Create DataFrame
df=pd.DataFrame(data={'id':[1,1,2,3],'col1': [1,2,1,5], 'col2':[5,8,6,4]})

# Apply grouping 
grouped = df.groupby('id').agg({
                             'col1': [('name1', 'sum')], 
                             'col2': [('name2_mean', 'sum'), ('name2_custom_std', lambda x: np.std(x))]})

# Drop multi-index for columns and reset index
grouped.columns = grouped.columns.droplevel()
grouped.reset_index(inplace=True)

Result:

id name1 name2_mean name2_custom_std
0 1 3 13 1.5
1 2 1 6 0.0
2 3 5 4 0.0
Answered By: Anna
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.