Auto re-assign ids in a dataframe

Question:

I have the following dataframe:

import pandas as pd
data = {'id': [542588, 542594, 542594, 542605, 542605, 542605, 542630, 542630],
 'label': [3, 3, 1, 1, 2, 0, 0, 2]}

df = pd.DataFrame(data)
df

      id   label
0   542588  3
1   542594  3
2   542594  1
3   542605  1
4   542605  2
5   542605  0
6   542630  0
7   542630  2

The id columns contains large integers (6-digits). I want a way to simplify it, starting from 10, so that 542588 becomes 10, 542594 becomes 11, etc…

Required output:


    id label
0   10  3
1   11  3
2   11  1
3   12  1
4   12  2
5   12  0
6   13  0
7   13  2
Asked By: arilwan

||

Answers:

You can try

df['id'] = df.groupby('id').ngroup().add(10)
print(df)

   id  label
0  10      3
1  11      3
2  11      1
3  12      1
4  12      2
5  12      0
6  13      0
7  13      2
Answered By: Ynjxsjmh

This is a naive way of looping through the IDs, and every time you encounter an ID you haven’t seen before, associate it in a dictionary with a new ID (starting at 10, incrementing by 1 each time).

You can then swap out the values of the ID column using the map method.

new_ids = dict()
new_id = 10

for old_id in df['id']:
    if old_id not in new_ids:
        new_ids[old_id] = new_id
        new_id += 1

df['id'] = df['id'].map(new_ids)
Answered By: 11574713

You can use factorize:

df['id'] = df['id'].factorize()[0] + 10

Output:

   id  label
0  10      3
1  11      3
2  11      1
3  12      1
4  12      2
5  12      0
6  13      0
7  13      2

Note: factorize will enumerate the keys in the order that they occur in your data, while groupby().ngroup() solution will enumerate the key in the increasing order. You can mimic the increasing order with factorize by sorting the data first. Or you can replicate the data order with groupby() by passing sort=False to it.

Answered By: Quang Hoang
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.