Create new column and assign values from 1 to 100 based on percentiles

Question:

I am very new to pandas and Python in general.
I have a dataframe with many columns, one of them is score_init:

+----------+
|score_init|
+----------+
|     38.27|
|     39.27|
|     29.16|
|     32.60|
|     40.45|
|     19.49|
|     48.27|
|     14.25|
|     32.64|
|     32.97|
+----------+

For this column I need to calculate percentiles from 0 to 100 and based on these values create an additional column (score_new) where all score_init values that lie between 99th and 100th percentiles (99% < score_init <= 100%) are assigned 1, between 98th and 99th percentiles (98% < score <= 99%) – assigned 2, between 97th and 98th percentiles (97% < score <= 98%) – assigned 3, and so on.
For values greater than the 100th percentile, also assign 1; for values less than 0 percentile, assign 100.
Desired output:

+----------+----------+
|score_init| score_new|
+----------+----------+
|     38.27|        34|
|     39.27|        23|
|     29.16|        78|
|     32.60|        67|
|     40.45|        12|
|     19.49|        89|
|     48.27|         1|
|     14.25|       100|
|     32.64|        56|
|     32.97|        45|
+----------+----------+

UPDATED
Entire column values:

102, 25, 63, 9, 12, 17, 16, 5, 5, 13, 19, 10, 16, 9, 12, 5, 11, 13, 36, 30, 9, 4, 12, 21, 10, 32, 7, 23, 16, 39, 158, 600, 1682, 125, 99, 116, 22, 10, 60, 15, 8, 6, 7, 26, 19, 10, 8, 15, 34, 4, 33, 13, 15, 16, 16, 16, 5, 5, 73, 76, 102, 8, 9, 19, 64, 9, 8, 6, 9, 6, 70, 58, 42, 19, 40, 23, 27, 741, 47, 26, 24, 32, 22, 46, 51, 67, 39, 3, 26, 139, 25, 20, 16, 11, 100, 19, 22, 8, 47, 46, 16, 5, 42, 28, 19, 10, 9, 54, 133, 29, 39, 8, 10, 49, 112, 13, 8, 10, 18, 134, 6, 153, 11, 10, 8, 20, 37, 17, 17, 28, 34, 17, 15, 19, 29, 92, 16, 12, 15, 14, 46, 23, 10, 5, 32, 11, 19, 18, 5, 64, 42, 5, 62, 40, 12, 12, 14, 26, 10, 20, 26, 33, 23, 23, 24, 54, 18, 15, 24, 32, 34, 14, 59, 3, 45, 4, 19, 11, 4, 4, 5, 7, 6, 10, 5, 6, 23, 7, 17, 24, 41, 4, 5, 7, 7, 20, 18, 9, 4, 115, 6, 22, 5, 5, 145, 25, 37, 16, 22, 138, 48, 9, 16, 5, 4, 4, 9, 203, 130, 200, 94, 35, 5, 4, 14, 11, 20, 10, 21, 65, 15, 12, 48, 18
Asked By: Hilary

||

Answers:

Use qcut:

df['scores_new'] = 100 - pd.qcut(df['score_init'], 100).cat.codes

output:

   score_init  scores_new
0       38.27          34
1       39.27          23
2       29.16          78
3       32.60          67
4       40.45          12
5       19.49          89
6       48.27           1
7       14.25         100
8       32.64          56
9       32.97          45

You can generalize to any quantile, example for quartiles:

N = 4
df['scores_new'] = N - pd.qcut(df['score_init'], N).cat.codes
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.