pandas pivot_table percentile / quantile
Question:
Is it possible to use percentile or quantile as the aggfunc in a pandas pivot table? I’ve tried both numpy.percentile and pandas quantile without success.
Answers:
Dummy data:
In [135]: df = pd.DataFrame([['a',2,3],
['a',5,6],
['a',7,8],
['b',9,10],
['b',11,12],
['b',13,14]], columns=list('abc'))
np.percentile
seems to work just fine?
In [140]: df.pivot_table(columns='a', aggfunc=lambda x: np.percentile(x, 50))
Out[140]:
a a b
b 5 11
c 6 12
The lambda function solutions works, but produces column names of "<lambda_0>" , etc. which need to be renamed later.
Instead of using a lambda (i.e. unnamed function), we could alternatively define our own functions. They should operate on a Series of values.
df = pd.DataFrame([['a',2,3],
['a',5,6],
['a',7,8],
['b',9,10],
['b',11,12],
['b',13,14]], columns=list('abc'))
def quantile_25(growth_vals:pd.Series):
return growth_vals.quantile(.25)
def quantile_75(growth_vals:pd.Series):
return growth_vals.quantile(.75)
df.pivot_table(columns='a', aggfunc=[quantile_25, np.median, quantile_75])
The resulting column names will correspond with the function names.
Is it possible to use percentile or quantile as the aggfunc in a pandas pivot table? I’ve tried both numpy.percentile and pandas quantile without success.
Dummy data:
In [135]: df = pd.DataFrame([['a',2,3],
['a',5,6],
['a',7,8],
['b',9,10],
['b',11,12],
['b',13,14]], columns=list('abc'))
np.percentile
seems to work just fine?
In [140]: df.pivot_table(columns='a', aggfunc=lambda x: np.percentile(x, 50))
Out[140]:
a a b
b 5 11
c 6 12
The lambda function solutions works, but produces column names of "<lambda_0>" , etc. which need to be renamed later.
Instead of using a lambda (i.e. unnamed function), we could alternatively define our own functions. They should operate on a Series of values.
df = pd.DataFrame([['a',2,3],
['a',5,6],
['a',7,8],
['b',9,10],
['b',11,12],
['b',13,14]], columns=list('abc'))
def quantile_25(growth_vals:pd.Series):
return growth_vals.quantile(.25)
def quantile_75(growth_vals:pd.Series):
return growth_vals.quantile(.75)
df.pivot_table(columns='a', aggfunc=[quantile_25, np.median, quantile_75])
The resulting column names will correspond with the function names.