# Indexing a list with an unique index

## Question:

I have a list say `l = [10,10,20,15,10,20]`. I want to assign each unique value a certain “index” to get `[1,1,2,3,1,2]`.

This is my code:

``````a = list(set(l))
res = [a.index(x) for x in l]
``````

Which turns out to be very slow.

`l` has 1M elements, and 100K unique elements. I have also tried map with lambda and sorting, which did not help. What is the ideal way to do this?

## Answers:

You can do this in `O(N)` time using a `defaultdict` and a list comprehension:

``````>>> from itertools import count
>>> from collections import defaultdict
>>> lst = [10, 10, 20, 15, 10, 20]
>>> d = defaultdict(count(1).next)
>>> [d[k] for k in lst]
[1, 1, 2, 3, 1, 2]
``````

In Python 3 use `__next__` instead of `next`.

If you’re wondering how it works?

The `default_factory`(i.e `count(1).next` in this case) passed to `defaultdict` is called only when Python encounters a missing key, so for 10 the value is going to be 1, then for the next ten it is not a missing key anymore hence the previously calculated 1 is used, now 20 is again a missing key and Python will call the `default_factory` again to get its value and so on.

`d` at the end will look like this:

``````>>> d
defaultdict(<method-wrapper 'next' of itertools.count object at 0x1057c83b0>,
{10: 1, 20: 2, 15: 3})
``````

The slowness of your code arises because `a.index(x)` performs a linear search and you perform that linear search for each of the elements in `l`. So for each of the 1M items you perform (up to) 100K comparisons.

The fastest way to transform one value to another is looking it up in a map. You’ll need to create the map and fill in the relationship between the original values and the values you want. Then retrieve the value from the map when you encounter another of the same value in your list.

Here is an example that makes a single pass through `l`. There may be room for further optimization to eliminate the need to repeatedly reallocate `res` when appending to it.

``````res = []
conversion = {}
i = 0
for x in l:
if x not in conversion:
value = conversion[x] = i
i += 1
else:
value = conversion[x]
res.append(value)
``````

Your solution is slow because its complexity is `O(nm)` with `m` being the number of unique elements in `l`: `a.index()` is `O(m)` and you call it for every element in `l`.

To make it `O(n)`, get rid of `index()` and store indexes in a dictionary:

``````>>> idx, indexes = 1, {}
>>> for x in l:
...     if x not in indexes:
...         indexes[x] = idx
...         idx += 1
...
>>> [indexes[x] for x in l]
[1, 1, 2, 3, 1, 2]
``````

If `l` contains only integers in a known range, you could also store indexes in a list instead of a dictionary for faster lookups.

Well I guess it depends on if you want it to return the indexes in that specific order or not. If you want the example to return:

``````    [1,1,2,3,1,2]
``````

then you can look at the other answers submitted. However if you only care about getting a unique index for each unique number then I have a fast solution for you

``````    import numpy as np
l = [10,10,20,15,10,20]
a = np.array(l)
x,y = np.unique(a,return_inverse = True)
``````

and for this example the output of y is:

``````    y = [0,0,2,1,0,2]
``````

I tested this for 1,000,000 entries and it was done essentially immediately.

For completness, you can also do it eagerly:

``````from itertools import count

wordid = dict(zip(set(list_), count(1)))
``````

This uses a set to obtain the unique words in `list_`, pairs
each of those unique words with the next value from `count()` (which
counts upwards), and constructs a dictionary from the results.

Original answer, written by nneonneo.

You can use `collections.OrderedDict()` in order to preserve the unique items in order and, loop over the enumerate of this ordered unique items in order to get a dict of items and those indices (based on their order) then pass this dictionary with the main list to `operator.itemgetter()` to get the corresponding index for each item:

``````>>> from collections import OrderedDict
>>> from operator import itemgetter
>>> itemgetter(*lst)({j:i for i,j in enumerate(OrderedDict.fromkeys(lst),1)})
(1, 1, 2, 3, 1, 2)
``````
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.