Custom aggregation that acts on more than one columns in pandas

Question

Note that this question does not ask whether in pandas we can apply functions on more than one columns during aggregation. Here is an example:

The data frame:

A x y
foo 0 0
foo 1 1
foo 2 2
foo 3 3
bar 0 2
bar 2 3
bar 4 4
bar 6 5

I want to group this table by column A and compute the linear regression y=k*x+b on each group. So we want to achieve this:

A k b
foo 1.0 0.0
bar 0.5 2.0

I tried group by index A, and use aggregate method:

grouped = table.groupby('A')
grouped.aggregate(f)

def f():
    pass

While I find out that this method will split the tabel into series and feed this series into the function f, so f cannot access two columns at the same time.

So, how can I do such "aggregation" function that acts on multiple columns in a split-apply-combine style?

Asked By: core_exe

||

Source

Answer 1

If need processing multiple columns togther use GroupBy.apply

def f(x):
    print (x)

grouped = table.groupby('A').apply(f)

Answered By: jezrael

Answer 2

Use groupby.apply with scipy.stats.linregress:

from scipy.stats import linregress

out = (df.groupby('A', as_index=False)
         .apply(lambda g: pd.Series(linregress(g['x'], g['y'])[:2],
                                    index=['k', 'b']))
       )

NB. the first two output parameters of linregress are your k and b.

Output:

     A    k    b
0  bar  0.5  2.0
1  foo  1.0  0.0

Solution with custom function:

from scipy.stats import linregress

def f(x):
    t = linregress(x['x'], x['y'])
    return pd.Series({'k': t.slope, 'b': t.intercept})

df = df.groupby('A', as_index=False).apply(f)
print (df)
     A    k    b
0  bar  0.5  2.0
1  foo  1.0  0.0

Answered By: mozway

Custom aggregation that acts on more than one columns in pandas

Question:

Answers: