Create new column and assign values from 1 to 100 based on percentiles
Question:
I am very new to pandas
and Python in general.
I have a dataframe with many columns, one of them is score_init
:
+----------+
|score_init|
+----------+
| 38.27|
| 39.27|
| 29.16|
| 32.60|
| 40.45|
| 19.49|
| 48.27|
| 14.25|
| 32.64|
| 32.97|
+----------+
For this column I need to calculate percentiles from 0 to 100 and based on these values create an additional column (score_new
) where all score_init
values that lie between 99th and 100th percentiles (99% < score_init <= 100%
) are assigned 1, between 98th and 99th percentiles (98% < score <= 99%
) – assigned 2, between 97th and 98th percentiles (97% < score <= 98%
) – assigned 3, and so on.
For values greater than the 100th percentile, also assign 1; for values less than 0 percentile, assign 100.
Desired output:
+----------+----------+
|score_init| score_new|
+----------+----------+
| 38.27| 34|
| 39.27| 23|
| 29.16| 78|
| 32.60| 67|
| 40.45| 12|
| 19.49| 89|
| 48.27| 1|
| 14.25| 100|
| 32.64| 56|
| 32.97| 45|
+----------+----------+
UPDATED
Entire column values:
102, 25, 63, 9, 12, 17, 16, 5, 5, 13, 19, 10, 16, 9, 12, 5, 11, 13, 36, 30, 9, 4, 12, 21, 10, 32, 7, 23, 16, 39, 158, 600, 1682, 125, 99, 116, 22, 10, 60, 15, 8, 6, 7, 26, 19, 10, 8, 15, 34, 4, 33, 13, 15, 16, 16, 16, 5, 5, 73, 76, 102, 8, 9, 19, 64, 9, 8, 6, 9, 6, 70, 58, 42, 19, 40, 23, 27, 741, 47, 26, 24, 32, 22, 46, 51, 67, 39, 3, 26, 139, 25, 20, 16, 11, 100, 19, 22, 8, 47, 46, 16, 5, 42, 28, 19, 10, 9, 54, 133, 29, 39, 8, 10, 49, 112, 13, 8, 10, 18, 134, 6, 153, 11, 10, 8, 20, 37, 17, 17, 28, 34, 17, 15, 19, 29, 92, 16, 12, 15, 14, 46, 23, 10, 5, 32, 11, 19, 18, 5, 64, 42, 5, 62, 40, 12, 12, 14, 26, 10, 20, 26, 33, 23, 23, 24, 54, 18, 15, 24, 32, 34, 14, 59, 3, 45, 4, 19, 11, 4, 4, 5, 7, 6, 10, 5, 6, 23, 7, 17, 24, 41, 4, 5, 7, 7, 20, 18, 9, 4, 115, 6, 22, 5, 5, 145, 25, 37, 16, 22, 138, 48, 9, 16, 5, 4, 4, 9, 203, 130, 200, 94, 35, 5, 4, 14, 11, 20, 10, 21, 65, 15, 12, 48, 18
Answers:
Use qcut
:
df['scores_new'] = 100 - pd.qcut(df['score_init'], 100).cat.codes
output:
score_init scores_new
0 38.27 34
1 39.27 23
2 29.16 78
3 32.60 67
4 40.45 12
5 19.49 89
6 48.27 1
7 14.25 100
8 32.64 56
9 32.97 45
You can generalize to any quantile, example for quartiles:
N = 4
df['scores_new'] = N - pd.qcut(df['score_init'], N).cat.codes
I am very new to pandas
and Python in general.
I have a dataframe with many columns, one of them is score_init
:
+----------+
|score_init|
+----------+
| 38.27|
| 39.27|
| 29.16|
| 32.60|
| 40.45|
| 19.49|
| 48.27|
| 14.25|
| 32.64|
| 32.97|
+----------+
For this column I need to calculate percentiles from 0 to 100 and based on these values create an additional column (score_new
) where all score_init
values that lie between 99th and 100th percentiles (99% < score_init <= 100%
) are assigned 1, between 98th and 99th percentiles (98% < score <= 99%
) – assigned 2, between 97th and 98th percentiles (97% < score <= 98%
) – assigned 3, and so on.
For values greater than the 100th percentile, also assign 1; for values less than 0 percentile, assign 100.
Desired output:
+----------+----------+
|score_init| score_new|
+----------+----------+
| 38.27| 34|
| 39.27| 23|
| 29.16| 78|
| 32.60| 67|
| 40.45| 12|
| 19.49| 89|
| 48.27| 1|
| 14.25| 100|
| 32.64| 56|
| 32.97| 45|
+----------+----------+
UPDATED
Entire column values:
102, 25, 63, 9, 12, 17, 16, 5, 5, 13, 19, 10, 16, 9, 12, 5, 11, 13, 36, 30, 9, 4, 12, 21, 10, 32, 7, 23, 16, 39, 158, 600, 1682, 125, 99, 116, 22, 10, 60, 15, 8, 6, 7, 26, 19, 10, 8, 15, 34, 4, 33, 13, 15, 16, 16, 16, 5, 5, 73, 76, 102, 8, 9, 19, 64, 9, 8, 6, 9, 6, 70, 58, 42, 19, 40, 23, 27, 741, 47, 26, 24, 32, 22, 46, 51, 67, 39, 3, 26, 139, 25, 20, 16, 11, 100, 19, 22, 8, 47, 46, 16, 5, 42, 28, 19, 10, 9, 54, 133, 29, 39, 8, 10, 49, 112, 13, 8, 10, 18, 134, 6, 153, 11, 10, 8, 20, 37, 17, 17, 28, 34, 17, 15, 19, 29, 92, 16, 12, 15, 14, 46, 23, 10, 5, 32, 11, 19, 18, 5, 64, 42, 5, 62, 40, 12, 12, 14, 26, 10, 20, 26, 33, 23, 23, 24, 54, 18, 15, 24, 32, 34, 14, 59, 3, 45, 4, 19, 11, 4, 4, 5, 7, 6, 10, 5, 6, 23, 7, 17, 24, 41, 4, 5, 7, 7, 20, 18, 9, 4, 115, 6, 22, 5, 5, 145, 25, 37, 16, 22, 138, 48, 9, 16, 5, 4, 4, 9, 203, 130, 200, 94, 35, 5, 4, 14, 11, 20, 10, 21, 65, 15, 12, 48, 18
Use qcut
:
df['scores_new'] = 100 - pd.qcut(df['score_init'], 100).cat.codes
output:
score_init scores_new
0 38.27 34
1 39.27 23
2 29.16 78
3 32.60 67
4 40.45 12
5 19.49 89
6 48.27 1
7 14.25 100
8 32.64 56
9 32.97 45
You can generalize to any quantile, example for quartiles:
N = 4
df['scores_new'] = N - pd.qcut(df['score_init'], N).cat.codes