Find start and end positions of all occurrences within a string in Python
Question:
If you have a sequence:
example='abcdefabcdefabcdefg'
and your searching for:
searching_for='abc'
what function would give you a list with all the positions?
positions=[(0,2),(6-8),(12-14)]
i created a window list that splits ‘example’ by 3 so it goes from ‘abc’,’bcd’,’cde’
windows=['abc', 'bcd', 'cde', 'def', 'efa', 'fab', 'abc', 'bcd', 'cde', 'def', 'efa', 'fab', 'abc', 'bcd', 'cde', 'def']
and used a for loop
for i in windows:
if i == 'abc':
thats where i get stuck . . .
Answers:
You can use regular expressions; the match objects come with position information attached. Example using Python 2:
>>> import re
>>> example = 'abcdefabcdefabcdefg'
>>> for match in re.finditer('abc', example):
print(match.start(), match.end())
0 3
6 9
12 15
The re module provides what you need.
import re
print [(m.start(0), m.end(0)) for m in re.finditer('abc', 'abcdefabcdefabcdefg')]
This is elegantly expressed by a list comprehension:
positions = [(i, i + len(searching_for) - 1)
for i in xrange(len(example))
if example[i:].startswith(searching_for)]
Note that it’s often more useful to have the end index point after the last character, rather than to the last character as you asked for (and the above code provides).
If you have a sequence:
example='abcdefabcdefabcdefg'
and your searching for:
searching_for='abc'
what function would give you a list with all the positions?
positions=[(0,2),(6-8),(12-14)]
i created a window list that splits ‘example’ by 3 so it goes from ‘abc’,’bcd’,’cde’
windows=['abc', 'bcd', 'cde', 'def', 'efa', 'fab', 'abc', 'bcd', 'cde', 'def', 'efa', 'fab', 'abc', 'bcd', 'cde', 'def']
and used a for loop
for i in windows:
if i == 'abc':
thats where i get stuck . . .
You can use regular expressions; the match objects come with position information attached. Example using Python 2:
>>> import re
>>> example = 'abcdefabcdefabcdefg'
>>> for match in re.finditer('abc', example):
print(match.start(), match.end())
0 3
6 9
12 15
The re module provides what you need.
import re
print [(m.start(0), m.end(0)) for m in re.finditer('abc', 'abcdefabcdefabcdefg')]
This is elegantly expressed by a list comprehension:
positions = [(i, i + len(searching_for) - 1)
for i in xrange(len(example))
if example[i:].startswith(searching_for)]
Note that it’s often more useful to have the end index point after the last character, rather than to the last character as you asked for (and the above code provides).