Passing parameters to a function when aggregating using pandas and numpy in python

Question:

I have the following code and data frame:

import pandas as pd
import numpy as np

df = pd.DataFrame({
      'A': [1, 2, 3, 4, 5],
      'B': [6, 7, 8, 9, 10]})

I want to calculate the 0.25 percentile for the column ‘A’ and the 0.75 percentile for the column ‘B’ using np.quantile. I try the following code:

(df.
 agg({'A' : lambda x: np.quantile(a=x, q=0.25),
      'B' : lambda x: np.quantile(a=x, q=0.75)}))

I obtain the following result:

     A     B
 0  1.0   6.0
 1  2.0   7.0
 2  3.0   8.0
 3  4.0   9.0
 4  5.0  10.0

However I was expecting the following result or something similar:

A    2.0
B    9.0
dtype: float64

The problem is that the lambda functions are computing the quantiles for each element of the series, rather than for the series as a whole.

My question is how I can obtain the expected result if a I want to use the agg function from pandas and the quantile function from numpy if I want to pass different parameters to a function using lambda functions.

I already read the posts Python Pandas: Passing Multiple Functions to agg() with Arguments and Specifying arguments to pandas aggregate function but they only work when the data is grouped and not when the data is not grouped.

Asked By: luifrancgom

||

Answers:

This works well with Series.quantile:

df.agg({'A': lambda s: s.quantile(0.25),
        'B': lambda s: s.quantile(0.75)})

With numpy.quantile, you need to pass numpy arrays, not Series:

df.agg({'A' : lambda x: np.quantile(a=x.values, q=0.25),
        'B' : lambda x: np.quantile(a=x.values, q=0.75)})

Output:

A    2.0
B    9.0
dtype: float64
Answered By: mozway

You missed the axis parameter:

>>> df.agg({'A' : lambda x: np.quantile(a=x, q=0.25, axis=0),
            'B' : lambda x: np.quantile(a=x, q=0.75, axis=0)})

A    2.0
B    9.0
dtype: float64

You can also use a partial function:

from functools import partial

q25 = partial(np.quantile, q=0.25, axis=0)
q75 = partial(np.quantile, q=0.75, axis=0)

df.agg({'A': q25, 'B': q75})
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.