categorize data into N categories where each category has the same number of data but different interval

Question:

I have a series of stock returns, could be approximate 5000 data. I want to categorize them into 5 categories. Each categories should have almost the same number of data.

for example, categorize following data into 3 categories:

test = pd.DataFrame({'Returns': [0.003,0.005,0.02,0.01,0.1,0.9,-0.2,-0.13,-0.14,-0.03,0,0.001]})

it will have result when using:

test.value_counts()


Category:   number of data
0                   3
1                   3
2                   3

the intervals of data could be different.

Asked By: George

||

Answers:

Try with qcut

test['cate'] = pd.qcut(test.Returns,3).cat.codes
test['cate'].value_counts()
Out[577]: 
0    4
1    4
2    4
Name: cate, dtype: int64
Answered By: BENY
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.