Passing parameters to a function when aggregating using pandas and numpy in python
Question:
I have the following code and data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [6, 7, 8, 9, 10]})
I want to calculate the 0.25 percentile for the column ‘A’ and the 0.75 percentile for the column ‘B’ using np.quantile. I try the following code:
(df.
agg({'A' : lambda x: np.quantile(a=x, q=0.25),
'B' : lambda x: np.quantile(a=x, q=0.75)}))
I obtain the following result:
A B
0 1.0 6.0
1 2.0 7.0
2 3.0 8.0
3 4.0 9.0
4 5.0 10.0
However I was expecting the following result or something similar:
A 2.0
B 9.0
dtype: float64
The problem is that the lambda functions are computing the quantiles for each element of the series, rather than for the series as a whole.
My question is how I can obtain the expected result if a I want to use the agg function from pandas and the quantile function from numpy if I want to pass different parameters to a function using lambda functions.
I already read the posts Python Pandas: Passing Multiple Functions to agg() with Arguments and Specifying arguments to pandas aggregate function but they only work when the data is grouped and not when the data is not grouped.
Answers:
This works well with Series.quantile
:
df.agg({'A': lambda s: s.quantile(0.25),
'B': lambda s: s.quantile(0.75)})
With numpy.quantile
, you need to pass numpy arrays, not Series:
df.agg({'A' : lambda x: np.quantile(a=x.values, q=0.25),
'B' : lambda x: np.quantile(a=x.values, q=0.75)})
Output:
A 2.0
B 9.0
dtype: float64
You missed the axis
parameter:
>>> df.agg({'A' : lambda x: np.quantile(a=x, q=0.25, axis=0),
'B' : lambda x: np.quantile(a=x, q=0.75, axis=0)})
A 2.0
B 9.0
dtype: float64
You can also use a partial
function:
from functools import partial
q25 = partial(np.quantile, q=0.25, axis=0)
q75 = partial(np.quantile, q=0.75, axis=0)
df.agg({'A': q25, 'B': q75})
I have the following code and data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [6, 7, 8, 9, 10]})
I want to calculate the 0.25 percentile for the column ‘A’ and the 0.75 percentile for the column ‘B’ using np.quantile. I try the following code:
(df.
agg({'A' : lambda x: np.quantile(a=x, q=0.25),
'B' : lambda x: np.quantile(a=x, q=0.75)}))
I obtain the following result:
A B
0 1.0 6.0
1 2.0 7.0
2 3.0 8.0
3 4.0 9.0
4 5.0 10.0
However I was expecting the following result or something similar:
A 2.0
B 9.0
dtype: float64
The problem is that the lambda functions are computing the quantiles for each element of the series, rather than for the series as a whole.
My question is how I can obtain the expected result if a I want to use the agg function from pandas and the quantile function from numpy if I want to pass different parameters to a function using lambda functions.
I already read the posts Python Pandas: Passing Multiple Functions to agg() with Arguments and Specifying arguments to pandas aggregate function but they only work when the data is grouped and not when the data is not grouped.
This works well with Series.quantile
:
df.agg({'A': lambda s: s.quantile(0.25),
'B': lambda s: s.quantile(0.75)})
With numpy.quantile
, you need to pass numpy arrays, not Series:
df.agg({'A' : lambda x: np.quantile(a=x.values, q=0.25),
'B' : lambda x: np.quantile(a=x.values, q=0.75)})
Output:
A 2.0
B 9.0
dtype: float64
You missed the axis
parameter:
>>> df.agg({'A' : lambda x: np.quantile(a=x, q=0.25, axis=0),
'B' : lambda x: np.quantile(a=x, q=0.75, axis=0)})
A 2.0
B 9.0
dtype: float64
You can also use a partial
function:
from functools import partial
q25 = partial(np.quantile, q=0.25, axis=0)
q75 = partial(np.quantile, q=0.75, axis=0)
df.agg({'A': q25, 'B': q75})