Create equally divided ID per group with min and max using Pandas

Question

I have the following dataframe (sample):

import pandas as pd

min_id = 1
max_id = 10

data = [['A', 2], ['A', 3], ['A', 1], ['A', 4], ['A', 4], ['A', 2],
        ['B', 4], ['B', 5], ['B', 7], ['B', 4], ['B', 2],
        ['C', 1], ['C', 3], ['C', 2], ['C', 1], ['C', 5], ['C', 2] ,['C', 1],
        ['D', 1], ['D', 1], ['D', 1], ['D', 1]]
df = pd.DataFrame(data = data, columns = ['group', 'val'])

   group  val
0      A    2
1      A    3
2      A    1
3      A    4
4      A    4
5      A    2
6      B    4
7      B    5
8      B    7
9      B    4
10     B    2
11     C    1
12     C    3
13     C    2
14     C    1
15     C    5
16     C    2
17     C    1
18     D    1
19     D    1
20     D    1
21     D    1

I would like to create a column called "id" which shows the id with a min value of 1 (min_id) and a max value of 10 (max_id) per group. So the values between min and max depend on the number of rows per group. Here you can see the desired output:

data = [['A', 2, 1], ['A', 3, 2.8], ['A', 1, 4.6], ['A', 4, 6.4], ['A', 4, 8.2], ['A', 2, 10],
        ['B', 4, 1], ['B', 5, 3.25], ['B', 7, 5.5], ['B', 4, 7.75], ['B', 2, 10],
        ['C', 1, 1], ['C', 3, 2.5], ['C', 2, 4], ['C', 1, 5.5], ['C', 5, 7], ['C', 2, 8.5] ,['C', 1, 10],
        ['D', 1, 1], ['D', 1, 4], ['D', 1, 7], ['D', 1, 10]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'val', 'id'])

   group  val     id
0      A    2   1.00
1      A    3   2.80
2      A    1   4.60
3      A    4   6.40
4      A    4   8.20
5      A    2  10.00
6      B    4   1.00
7      B    5   3.25
8      B    7   5.50
9      B    4   7.75
10     B    2  10.00
11     C    1   1.00
12     C    3   2.50
13     C    2   4.00
14     C    1   5.50
15     C    5   7.00
16     C    2   8.50
17     C    1  10.00
18     D    1   1.00
19     D    1   4.00
20     D    1   7.00
21     D    1  10.00

So I was wondering if anyone knows how to automatically create the column "id" using pandas? Please note that the number of rows could be way more then in the sample dataframe.

Asked By: Quinten

||

Source

Answer 1

Use GroupBy.transform with numpy.linspace:

df['ID']=df.groupby('group')['group'].transform(lambda x: np.linspace(min_id,max_id,len(x)))
print (df)

   group  val     ID
0      A    2   1.00
1      A    3   2.80
2      A    1   4.60
3      A    4   6.40
4      A    4   8.20
5      A    2  10.00
6      B    4   1.00
7      B    5   3.25
8      B    7   5.50
9      B    4   7.75
10     B    2  10.00
11     C    1   1.00
12     C    3   2.50
13     C    2   4.00
14     C    1   5.50
15     C    5   7.00
16     C    2   8.50
17     C    1  10.00
18     D    1   1.00
19     D    1   4.00
20     D    1   7.00
21     D    1  10.00

Answered By: jezrael

Create equally divided ID per group with min and max using Pandas

Question:

Answers: