# How to find all occurrences of an element in a list

## Question:

`index()`

will give the first occurrence of an item in a list. Is there a neat trick which returns all indices in a list for an element?

## Answers:

You can use a list comprehension with `enumerate`

:

```
indices = [i for i, x in enumerate(my_list) if x == "whatever"]
```

The iterator `enumerate(my_list)`

yields pairs `(index, item)`

for each item in the list. Using `i, x`

as loop variable target unpacks these pairs into the index `i`

and the list item `x`

. We filter down to all `x`

that match our criterion, and select the indices `i`

of these elements.

How about:

```
In [1]: l=[1,2,3,4,3,2,5,6,7]
In [2]: [i for i,val in enumerate(l) if val==3]
Out[2]: [2, 4]
```

```
occurrences = lambda s, lst: (i for i,e in enumerate(lst) if e == s)
list(occurrences(1, [1,2,3,1])) # = [0, 3]
```

One more solution(sorry if duplicates) for all occurrences:

```
values = [1,2,3,1,2,4,5,6,3,2,1]
map(lambda val: (val, [i for i in xrange(len(values)) if values[i] == val]), values)
```

While not a solution for lists directly, `numpy`

really shines for this sort of thing:

```
import numpy as np
values = np.array([1,2,3,1,2,4,5,6,3,2,1])
searchval = 3
ii = np.where(values == searchval)[0]
```

returns:

```
ii ==>array([2, 8])
```

This can be significantly faster for lists (arrays) with a large number of elements vs some of the other solutions.

A solution using `list.index`

:

```
def indices(lst, element):
result = []
offset = -1
while True:
try:
offset = lst.index(element, offset+1)
except ValueError:
return result
result.append(offset)
```

It’s much faster than the list comprehension with `enumerate`

, for large lists. It is also much slower than the `numpy`

solution *if* you already have the array, otherwise the cost of converting outweighs the speed gain (tested on integer lists with 100, 1000 and 10000 elements).

**NOTE:** A note of caution based on Chris_Rands’ comment: this solution is faster than the list comprehension if the results are sufficiently sparse, but if the list has many instances of the element that is being searched (more than ~15% of the list, on a test with a list of 1000 integers), the list comprehension is faster.

If you are using Python 2, you can achieve the same functionality with this:

```
f = lambda my_list, value:filter(lambda x: my_list[x] == value, range(len(my_list)))
```

Where `my_list`

is the list you want to get the indexes of, and `value`

is the value searched. Usage:

```
f(some_list, some_element)
```

If you need to search for all element’s positions between *certain indices*, you can state them:

```
[i for i,x in enumerate([1,2,3,2]) if x==2 & 2<= i <=3] # -> [3]
```

You can create a defaultdict

```
from collections import defaultdict
d1 = defaultdict(int) # defaults to 0 values for keys
unq = set(lst1) # lst1 = [1, 2, 2, 3, 4, 1, 2, 7]
for each in unq:
d1[each] = lst1.count(each)
else:
print(d1)
```

`more_itertools.locate`

finds indices for all items that satisfy a condition.

```
from more_itertools import locate
list(locate([0, 1, 1, 0, 1, 0, 0]))
# [1, 2, 4]
list(locate(['a', 'b', 'c', 'b'], lambda x: x == 'b'))
# [1, 3]
```

`more_itertools`

is a third-party library `> pip install more_itertools`

.

Or Use `range`

(python 3):

```
l=[i for i in range(len(lst)) if lst[i]=='something...']
```

For (python 2):

```
l=[i for i in xrange(len(lst)) if lst[i]=='something...']
```

And then (both cases):

```
print(l)
```

Is as expected.

### Getting all the occurrences and the position of one or more (identical) items in a list

With enumerate(alist) you can store the first element (n) that is the index of the list when the element x is equal to what you look for.

```
>>> alist = ['foo', 'spam', 'egg', 'foo']
>>> foo_indexes = [n for n,x in enumerate(alist) if x=='foo']
>>> foo_indexes
[0, 3]
>>>
```

### Let’s make our function findindex

This function takes the item and the list as arguments and return the position of the item in the list, like we saw before.

```
def indexlist(item2find, list_or_string):
"Returns all indexes of an item in a list or a string"
return [n for n,item in enumerate(list_or_string) if item==item2find]
print(indexlist("1", "010101010"))
```

**Output**

```
[1, 3, 5, 7]
```

## Simple

```
for n, i in enumerate([1, 2, 3, 4, 1]):
if i == 1:
print(n)
```

Output:

```
0
4
```

Using filter() in python2.

```
>>> q = ['Yeehaw', 'Yeehaw', 'Googol', 'B9', 'Googol', 'NSM', 'B9', 'NSM', 'Dont Ask', 'Googol']
>>> filter(lambda i: q[i]=="Googol", range(len(q)))
[2, 4, 9]
```

## Using a `for-loop`

:

- Answers with
`enumerate`

and a list comprehension are more pythonic, but not necessarily faster. However, this answer is aimed at students who may not be allowed to use some of those built-in functions. - create an empty list,
`indices`

- create the loop with
`for i in range(len(x)):`

, which essentially iterates through a list of index locations`[0, 1, 2, 3, ..., len(x)-1]`

- in the loop, add any
`i`

, where`x[i]`

is a match to`value`

, to`indices`

```
def get_indices(x: list, value: int) -> list:
indices = list()
for i in range(len(x)):
if x[i] == value:
indices.append(i)
return indices
n = [1, 2, 3, -50, -60, 0, 6, 9, -60, -60]
print(get_indices(n, -60))
>>> [4, 8, 9]
```

- The functions,
`get_indices`

, are implemented with type hints. In this case, the list,`n`

, is a bunch of`int`

s, therefore we search for`value`

, also defined as an`int`

.

## Using a `while-loop`

and `.index`

:

- With
`.index`

, use`try-except`

for error handling, because a`ValueError`

will occur if`value`

is not in the`list`

.

```
def get_indices(x: list, value: int) -> list:
indices = list()
i = 0
while True:
try:
# find an occurrence of value and update i to that index
i = x.index(value, i)
# add i to the list
indices.append(i)
# advance i by 1
i += 1
except ValueError as e:
break
return indices
print(get_indices(n, -60))
>>> [4, 8, 9]
```

Here is a time performance comparison between using `np.where`

vs `list_comprehension`

. Seems like `np.where`

is faster on average.

```
# np.where
start_times = []
end_times = []
for i in range(10000):
start = time.time()
start_times.append(start)
temp_list = np.array([1,2,3,3,5])
ixs = np.where(temp_list==3)[0].tolist()
end = time.time()
end_times.append(end)
print("Took on average {} seconds".format(
np.mean(end_times)-np.mean(start_times)))
```

```
Took on average 3.81469726562e-06 seconds
```

```
# list_comprehension
start_times = []
end_times = []
for i in range(10000):
start = time.time()
start_times.append(start)
temp_list = np.array([1,2,3,3,5])
ixs = [i for i in range(len(temp_list)) if temp_list[i]==3]
end = time.time()
end_times.append(end)
print("Took on average {} seconds".format(
np.mean(end_times)-np.mean(start_times)))
```

```
Took on average 4.05311584473e-06 seconds
```

- There’s an answer using
`np.where`

to find the indices of a single value, which is not faster than a list-comprehension, if the time to convert a list to an array is included - The overhead of importing
`numpy`

and converting a`list`

to a`numpy.array`

probably makes using`numpy`

a less efficient option for most circumstances. A careful timing analysis would be necessary.- In cases where multiple functions/operations will need to be performed on the
`list`

, converting the`list`

to an`array`

, and then using`numpy`

functions will likely be a faster option.

- In cases where multiple functions/operations will need to be performed on the
- This solution uses
`np.where`

and`np.unique`

to find the indices of**all unique elements**in a list.- Using
`np.where`

on an array (including the time to convert the list to an array) is slightly slower than a list-comprehension on a list,**for finding all indices of all unique elements**. - This has been tested on an 2M element list with 4 unique values, and the size of the list/array and number of unique elements will have an impact.

- Using
- Other solutions using
`numpy`

on an array can be found in Get a list of all indices of repeated elements in a numpy array **Tested in**`[python 3.10.4, numpy 1.23.1]`

and`[python 3.11.0, numpy 1.23.4]`

```
import numpy as np
import random # to create test list
# create sample list
random.seed(365)
l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(20)]
# convert the list to an array for use with these numpy methods
a = np.array(l)
# create a dict of each unique entry and the associated indices
idx = {v: np.where(a == v)[0].tolist() for v in np.unique(a)}
# print(idx)
{'s1': [7, 9, 10, 11, 17],
's2': [1, 3, 6, 8, 14, 18, 19],
's3': [0, 2, 13, 16],
's4': [4, 5, 12, 15]}
```

`%timeit`

on a 2M element list with 4 unique `str`

elements

```
# create 2M element list
random.seed(365)
l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(2000000)]
```

### Functions

```
def test1():
# np.where: convert list to array and find indices of a single element
a = np.array(l)
return np.where(a == 's1')
def test2():
# list-comprehension: on list l and find indices of a single element
return [i for i, x in enumerate(l) if x == "s1"]
def test3():
# filter: on list l and find indices of a single element
return list(filter(lambda i: l[i]=="s1", range(len(l))))
def test4():
# use np.where and np.unique to find indices of all unique elements: convert list to array
a = np.array(l)
return {v: np.where(a == v)[0].tolist() for v in np.unique(a)}
def test5():
# list comprehension inside dict comprehension: on list l and find indices of all unique elements
return {req_word: [idx for idx, word in enumerate(l) if word == req_word] for req_word in set(l)}
```

### Function Call

```
%timeit test1()
%timeit test2()
%timeit test3()
%timeit test4()
%timeit test5()
```

### Result `python 3.10.4`

```
214 ms ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
85.1 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
146 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
365 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
360 ms ± 5.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

### Result `python 3.11.0`

```
209 ms ± 15.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
70.4 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
132 ms ± 4.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
371 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
314 ms ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

A dynamic list comprehension based solution incase we do not know in advance which element:

```
lst = ['to', 'be', 'or', 'not', 'to', 'be']
{req_word: [idx for idx, word in enumerate(lst) if word == req_word] for req_word in set(lst)}
```

results in:

```
{'be': [1, 5], 'or': [2], 'to': [0, 4], 'not': [3]}
```

You can think of all other ways along the same lines as well but with `index()`

you can find only one index although you can set occurrence number yourself.

# Create a generator

Generators are fast and use a tiny memory footprint. They give you flexibility in how you use the result.

```
def indices(iter, val):
"""Generator: Returns all indices of val in iter
Raises a ValueError if no val does not occur in iter
Passes on the AttributeError if iter does not have an index method (e.g. is a set)
"""
i = -1
NotFound = False
while not NotFound:
try:
i = iter.index(val, i+1)
except ValueError:
NotFound = True
else:
yield i
if i == -1:
raise ValueError("No occurrences of {v} in {i}".format(v = val, i = iter))
```

The above code can be use to create a list of the indices: `list(indices(input,value))`

; use them as dictionary keys: `dict(indices(input,value))`

; sum them: `sum(indices(input,value))`

; in a for loop `for index_ in indices(input,value):`

; …etc… *without* creating an interim list/tuple or similar.

In a for loop you will get your next index back when you call for it, without waiting for all the others to be calculated first. That means: if you break out of the loop for some reason you save the time needed to find indices you never needed.

## How it works

- Call
`.index`

on the input`iter`

to find the next occurrence of

`val`

- Use the second parameter to
`.index`

to start at the point

*after*the last found occurrence - Yield the index
- Repeat until
`index`

raises a`ValueError`

## Alternative versions

I tried four different versions for flow control; two EAFP (using `try - except`

) and two TBYL (with a logical test in the `while`

statement):

- "WhileTrueBreak":
`while True:`

…`except ValueError: break`

. Surprisingly, this was usually a touch slower than option 2 and (IMV) less readable - "WhileErrFalse": Using a bool variable
`err`

to identify when a`ValueError`

is raised. This is generally the fastest*and more readable*than 1 - "RemainingSlice": Check whether val is in the remaining part of the input using slicing:
`while val in iter[i:]`

. Unsurprisingly, this does not scale well - "LastOccurrence": Check first where the last occurrence is, keep going
`while i < last`

The overall performance differences between 1,2 and 4 are negligible, so it comes down to personal style and preference. Given that `.index`

uses `ValueError`

to let you know it didn’t find anything, rather than e.g. returning `None`

, an EAFP-approach seems fitting to me.

Here are the 4 code variants and results from `timeit`

(in milliseconds) for different lengths of input and sparsity of matches

```
@version("WhileTrueBreak", versions)
def indices2(iter, val):
i = -1
while True:
try:
i = iter.index(val, i+1)
except ValueError:
break
else:
yield i
@version("WhileErrFalse", versions)
def indices5(iter, val):
i = -1
err = False
while not err:
try:
i = iter.index(val, i+1)
except ValueError:
err = True
else:
yield i
@version("RemainingSlice", versions)
def indices1(iter, val):
i = 0
while val in iter[i:]:
i = iter.index(val, i)
yield i
i += 1
@version("LastOccurrence", versions)
def indices4(iter,val):
i = 0
last = len(iter) - tuple(reversed(iter)).index(val)
while i < last:
i = iter.index(val, i)
yield i
i += 1
```

```
Length: 100, Ocurrences: 4.0%
{'WhileTrueBreak': 0.0074799987487494946, 'WhileErrFalse': 0.006440002471208572, 'RemainingSlice': 0.01221001148223877, 'LastOccurrence': 0.00801000278443098}
Length: 1000, Ocurrences: 1.2%
{'WhileTrueBreak': 0.03101000329479575, 'WhileErrFalse': 0.0278000021353364, 'RemainingSlice': 0.08278000168502331, 'LastOccurrence': 0.03986000083386898}
Length: 10000, Ocurrences: 2.05%
{'WhileTrueBreak': 0.18062000162899494, 'WhileErrFalse': 0.1810499932616949, 'RemainingSlice': 2.9145700042136014, 'LastOccurrence': 0.2049500006251037}
Length: 100000, Ocurrences: 1.977%
{'WhileTrueBreak': 1.9361200043931603, 'WhileErrFalse': 1.7280600033700466, 'RemainingSlice': 254.4725100044161, 'LastOccurrence': 1.9101499929092824}
Length: 100000, Ocurrences: 9.873%
{'WhileTrueBreak': 2.832529996521771, 'WhileErrFalse': 2.9984100023284554, 'RemainingSlice': 1132.4922299943864, 'LastOccurrence': 2.6660699979402125}
Length: 100000, Ocurrences: 25.058%
{'WhileTrueBreak': 5.119729996658862, 'WhileErrFalse': 5.2082200068980455, 'RemainingSlice': 2443.0577100021765, 'LastOccurrence': 4.75954000139609}
Length: 100000, Ocurrences: 49.698%
{'WhileTrueBreak': 9.372120001353323, 'WhileErrFalse': 8.447749994229525, 'RemainingSlice': 5042.717969999649, 'LastOccurrence': 8.050809998530895}
```