How to tokenise list of integers in Python?

Question:

I have a very long list of 195 different integers, but they range from 0 to 2399. For example the number 90 occurs many times, while the number 7 doesn’t show up at all.

list = [90, 110, 113, 88, 90, 110, 90, 1370, 90]

I would like to ‘tokenise’ this, or turn it into a list of integers ranging from 0 to 195, while keeping the unique ID of the different values.

Basically, I’d like this output:

new_list = [1, 2, 3, 4, 1, 2, 1, 5, 1]

The goal is to be able to efficiently iterate over the list.

Asked By: Adam Hede

||

Answers:

As @cricket_007, I question your application. Iteration doesn’t vary with the magnitude of the number. However, if you have reason to need a dense set of IDs, then this is one possible solution. I’ve left the building loop simple to let you see how it works; there are some Pythonic improvements you could make, such as using the dictionary get method.

Build a dictionary to translate old to new IDs.
Then do the translation in one fell swoop.

my_list = [90, 110, 113, 88, 90, 110, 90, 1370, 90]

new_id_dict = {}
new_id = 0

for id in my_list:
    if id not in new_id_dict:
        new_id += 1
        new_id_dict[id] = new_id

new_list = [new_id_dict[id] for id in my_list]
print new_list

Output:

[1, 2, 3, 4, 1, 2, 1, 5, 1]
Answered By: Prune
d={}
new_list = [d[i] for i in values if d.setdefault(i,len(d)+1)]
Answered By: Joran Beasley
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.