# Applying custom functions to groupby objects pandas

## Question:

I have the following pandas dataframe.

``````import pandas as pd
import numpy as np

df = pd.DataFrame(
{
"bird_type": ["falcon", "crane", "crane", "falcon"],
"avg_speed": [np.random.randint(50, 200) for _ in range(4)],
"no_of_birds_observed": [np.random.randint(3, 10) for _ in range(4)],
"reliability_of_data": [np.random.rand() for _ in range(4)],
}
)

# The dataframe looks like this.
bird_type    avg_speed   no_of_birds_observed    reliability_of_data
0   falcon        66            3                       0.553841
1   crane         159           8                       0.472359
2   crane         158           7                       0.493193
3   falcon        161           7                       0.585865
``````

Now, I would like to have the weighted average (according to the number_of_birds_surveyed) for the average_speed and reliability variables. For that I have a simple function as follows, which calculates the weighted average.

``````def func(data, numbers):
ans = 0
for a, b in zip(data, numbers):
ans = ans + a*b
ans = ans / sum(numbers)
return ans
``````

How can I apply the function of `func` to both average speed and reliability variables?

I expect the answer to be a dataframe like follows

``````    bird_type   avg_speed        no_of_birds_observed  reliability_of_data
0   falcon      132.5                 10                   0.5762578
# how       (66*3 + 161*7)/(3+7)    (3+10)     (0.553841×3+0.585865×7)/(3+7)
1   crane       158.53                15                   0.4820815
# how      (159*8 + 158*7)/(8+7)    (8+7)     (0.472359×8+0.493193×7)/(8+7)
``````

I saw this question, but could not generalize the solution / understand it completely. I thought of not asking the question, but according to this blog post by SO and this meta question, with a different example, I think this question can be considered a "borderline duplicate". An answer will benefit me and probably some others will also find this useful. So finally decided to ask.

If want aggregate by `GroupBy.agg` for `weights` parameter is used `no_of_birds_observed` by `DataFrame.loc`:

``````#for correct ouput need default (or unique values) index
df = df.reset_index(drop=True)

f = lambda x: np.average(x,  weights= df.loc[x.index, 'no_of_birds_observed'])
df1 = (df.groupby('bird_type', sort=False, as_index=False)
.agg(avg=('avg_speed',f),
no_of_birds=('no_of_birds_observed','sum'),
reliability_of_data=('reliability_of_data', f)))
print (df1)
bird_type         avg  no_of_birds  reliability_of_data
0    falcon  132.500000           10             0.576258
1     crane  158.533333           15             0.482082
``````

Don’t use a function with `apply`, rather perform a classical aggregation:

``````cols = ['avg_speed', 'reliability_of_data']

# multiply relevant columns by no_of_birds_observed
# aggregate everything as sum
out = (df[cols].mul(df['no_of_birds_observed'], axis=0)
.combine_first(df)
.groupby('bird_type').sum()
)

# divide the relevant columns by the sum of no_of_birds_observed
out[cols] = out[cols].div(out['no_of_birds_observed'], axis=0)
``````

Output:

``````            avg_speed  no_of_birds_observed  reliability_of_data
bird_type
crane      158.533333                    15             0.482082
falcon     132.500000                    10             0.576258
``````
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.