# Category assigning based on percentile

## Question:

I have the following dataframe

``````Group Country GDP

A     a       ***
A     b       ***
B     a       ***
B     b       ***
``````

I want to assign catagory to gdp (High,low) based on within group percentile rank by creating a new column.
This is what I tried

``````    def c(gr):
ser=gr['gdp']
p=np.nanpercentile(ser,50)
for i in ser:
if i>p:
return "high"
else:
return "low"

grouped = df.groupby('Group')
df['perf']=grouped.apply(c)
``````

Perf column is returning nan. What I am doing wrong here?

Use `quantile` with `numpy.where` and custom function:

``````def c(gr):
ser=gr['gdp']
#q=0.5 is by default, so can be omit
p = ser.quantile()
gr['perf'] = np.where( ser > p, 'high', 'low')
return gr

df = df.groupby('Group').apply(c)
``````

This can be simplified by `transform`:

``````q = df.groupby('Group')['gdp'].transform('quantile')
df['perf1'] = np.where(df['gdp'] > q, 'high', 'low')
``````

Sample:

``````np.random.seed(12)

N = 15
L = list('abcd')
df = pd.DataFrame({'Group': np.random.choice(L, N),
'gdp': np.random.rand(N)})
df = df.sort_values('Group').reset_index(drop=True)
df.loc[[0,4,5,10,13,14], 'gdp'] = np.nan
#print (df)
``````

``````def c(gr):
ser=gr['gdp']
#q=0.5 is by default, so can be omit
p = ser.quantile()
gr['perf'] = np.where( ser > p, 'high', 'low')
return gr

df = df.groupby('Group').apply(c)

q = df.groupby('Group')['gdp'].transform('quantile')
df['perf1'] = np.where( df['gdp'] > q, 'high', 'low')
print (df)
Group       gdp  perf perf1
0      a       NaN   low   low
1      a  0.907267  high  high
2      a  0.456051   low   low
3      b  0.675998   low   low
4      b       NaN   low   low
5      b       NaN   low   low
6      b  0.563141   low   low
7      b  0.801265  high  high
8      c  0.372834   low   low
9      c  0.481530  high  high
10     c       NaN   low   low
11     d  0.082524   low   low
12     d  0.725954  high  high
13     d       NaN   low   low
14     d       NaN   low   low
``````

Similar with R

``````df['output']=df.groupby('Group').gdp.apply(lambda x : np.where(x>x.quantile(0.75),'High','Low')).apply(pd.Series).stack().dropna().values

df
Out[333]:
Group       gdp output
0      a       NaN    Low
1      a  0.772128    Low
2      a  0.070406    Low
3      a  0.859301   High
4      a       NaN    Low
5      a       NaN    Low
6      b  0.681299   High
7      b  0.040839    Low
8      c  0.896475   High
9      c  0.726527    Low
10     c       NaN    Low
11     c  0.244783    Low
12     c  0.563001    Low
13     c       NaN    Low
14     d       NaN    Low
``````
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.