Create equally divided ID per group with min and max using Pandas

Question:

I have the following dataframe (sample):

import pandas as pd

min_id = 1
max_id = 10

data = [['A', 2], ['A', 3], ['A', 1], ['A', 4], ['A', 4], ['A', 2],
        ['B', 4], ['B', 5], ['B', 7], ['B', 4], ['B', 2],
        ['C', 1], ['C', 3], ['C', 2], ['C', 1], ['C', 5], ['C', 2] ,['C', 1],
        ['D', 1], ['D', 1], ['D', 1], ['D', 1]]
df = pd.DataFrame(data = data, columns = ['group', 'val'])

   group  val
0      A    2
1      A    3
2      A    1
3      A    4
4      A    4
5      A    2
6      B    4
7      B    5
8      B    7
9      B    4
10     B    2
11     C    1
12     C    3
13     C    2
14     C    1
15     C    5
16     C    2
17     C    1
18     D    1
19     D    1
20     D    1
21     D    1

I would like to create a column called "id" which shows the id with a min value of 1 (min_id) and a max value of 10 (max_id) per group. So the values between min and max depend on the number of rows per group. Here you can see the desired output:

data = [['A', 2, 1], ['A', 3, 2.8], ['A', 1, 4.6], ['A', 4, 6.4], ['A', 4, 8.2], ['A', 2, 10],
        ['B', 4, 1], ['B', 5, 3.25], ['B', 7, 5.5], ['B', 4, 7.75], ['B', 2, 10],
        ['C', 1, 1], ['C', 3, 2.5], ['C', 2, 4], ['C', 1, 5.5], ['C', 5, 7], ['C', 2, 8.5] ,['C', 1, 10],
        ['D', 1, 1], ['D', 1, 4], ['D', 1, 7], ['D', 1, 10]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'val', 'id'])

   group  val     id
0      A    2   1.00
1      A    3   2.80
2      A    1   4.60
3      A    4   6.40
4      A    4   8.20
5      A    2  10.00
6      B    4   1.00
7      B    5   3.25
8      B    7   5.50
9      B    4   7.75
10     B    2  10.00
11     C    1   1.00
12     C    3   2.50
13     C    2   4.00
14     C    1   5.50
15     C    5   7.00
16     C    2   8.50
17     C    1  10.00
18     D    1   1.00
19     D    1   4.00
20     D    1   7.00
21     D    1  10.00

So I was wondering if anyone knows how to automatically create the column "id" using pandas? Please note that the number of rows could be way more then in the sample dataframe.

Asked By: Quinten

||

Answers:

Use GroupBy.transform with numpy.linspace:

df['ID']=df.groupby('group')['group'].transform(lambda x: np.linspace(min_id,max_id,len(x)))
print (df)

   group  val     ID
0      A    2   1.00
1      A    3   2.80
2      A    1   4.60
3      A    4   6.40
4      A    4   8.20
5      A    2  10.00
6      B    4   1.00
7      B    5   3.25
8      B    7   5.50
9      B    4   7.75
10     B    2  10.00
11     C    1   1.00
12     C    3   2.50
13     C    2   4.00
14     C    1   5.50
15     C    5   7.00
16     C    2   8.50
17     C    1  10.00
18     D    1   1.00
19     D    1   4.00
20     D    1   7.00
21     D    1  10.00
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.