Creating ID for every row based on the observations in variable


A want to create a system where the observations in a variable refer to a number using Python. All the numbers from the (in this case) 5 different variables together form a unique code. The first number corresponds to the first variable. When an observations in a different row is the same as the first, the same number applies. As illustrated in the example, If apple appears in row 1 and 3, both ID’s get a ‘1’ as first number.

The output should give a new column with the ID. If all the observations in a row are the same, the ID’s will be the same. In the picture below you see 5 variables leading to the unique ID on the right, which should be the output.

Example dataset

Asked By: pangelbird



You can use pd.factorize:

df['UniqueID'] = (df.apply(lambda x: (1+pd.factorize(x)[0]).astype(str))
                    .agg(''.join, axis=1))

# Output
        Fruit     Toy Letter      Car Country UniqueID
0       Apple    Bear      A  Ferrari  Brazil    11111
1  Strawberry  Blocks      B  Peugeot   Chile    22222
2       Apple  Blocks      C  Renault   China    12333
3      Orange    Bear      D     Saab   China    31443
4      Orange    Bear      D  Ferrari   India    31414
Answered By: Corralien