Pandas: apply different custom functions to different columns when using groupby

Question:

I want to be able to use a "groupby" on my pandas dataframe using different custom functions for each columns. For example, if I have this as input:

annotator  event          interval_presence   duration
3          birds          [0,5]               5
3          birds          [7,9]               10
3          voices         [1,2]               10
3          traffic        [1,7]               7
5          voices         [4,7]               4
5          voices         [5,10]              6
5          traffic        [0,1]               4

Where each item in "interval_presence" is a pandas interval. When merging, I want to take the mean of column "duration" and I want to use "pd.arrays.IntervalArray" and "piso.union" on my intervals in "interval_presence". So this would be the output:

annotator  event          interval_presence   duration
3          birds          [[0,5],[7,9]]       7.5
3          voices         [1,2]               10
3          traffic        [1,7]               7
5          voices         [4,10]              5
5          traffic        [0,1]               4

Right now, I know how to merge my intervals thanks to the answer in the post:
Pandas: how to merge rows by union of intervals. So the solution would be:

data = data.groupby(['annotator', 'event'])['interval_presence'] 
    .apply(pd.arrays.IntervalArray) 
    .apply(piso.union) 
    .reset_index()

But how can I simultaneously apply a "mean" function to "duration" ?

Asked By: M.Tailleur

||

Answers:

You used the wrong agg syntax. Try this:

df.groupby(["annotator", "event"]).agg({
    "interval_presence": lambda s: piso.union(pd.arrays.IntervalArray(s)),
    "duration": "mean"
})

Within the lambda, s is a series of pd.Interval objects.

Answered By: Code Different
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.