Auto re-assign ids in a dataframe

Question

I have the following dataframe:

import pandas as pd
data = {'id': [542588, 542594, 542594, 542605, 542605, 542605, 542630, 542630],
 'label': [3, 3, 1, 1, 2, 0, 0, 2]}

df = pd.DataFrame(data)
df

      id   label
0   542588  3
1   542594  3
2   542594  1
3   542605  1
4   542605  2
5   542605  0
6   542630  0
7   542630  2

The id columns contains large integers (6-digits). I want a way to simplify it, starting from 10, so that 542588 becomes 10, 542594 becomes 11, etc…

Required output:

Asked By: arilwan

||

Source

Answer 1

You can try

df['id'] = df.groupby('id').ngroup().add(10)

print(df)

   id  label
0  10      3
1  11      3
2  11      1
3  12      1
4  12      2
5  12      0
6  13      0
7  13      2

Answered By: Ynjxsjmh

Answer 2

This is a naive way of looping through the IDs, and every time you encounter an ID you haven’t seen before, associate it in a dictionary with a new ID (starting at 10, incrementing by 1 each time).

You can then swap out the values of the ID column using the map method.

new_ids = dict()
new_id = 10

for old_id in df['id']:
    if old_id not in new_ids:
        new_ids[old_id] = new_id
        new_id += 1

df['id'] = df['id'].map(new_ids)

Answered By: 11574713

Answer 3

You can use factorize:

df['id'] = df['id'].factorize()[0] + 10

Output:

   id  label
0  10      3
1  11      3
2  11      1
3  12      1
4  12      2
5  12      0
6  13      0
7  13      2

Note: factorize will enumerate the keys in the order that they occur in your data, while groupby().ngroup() solution will enumerate the key in the increasing order. You can mimic the increasing order with factorize by sorting the data first. Or you can replicate the data order with groupby() by passing sort=False to it.

Answered By: Quang Hoang

Auto re-assign ids in a dataframe

Question:

Answers: