How to use regex to find all overlapping matches

Question:

I’m trying to find every 10 digit series of numbers within a larger series of numbers using re in Python 2.6.

I’m easily able to grab no overlapping matches, but I want every match in the number series. Eg.

in “123456789123456789”

I should get the following list:

[1234567891,2345678912,3456789123,4567891234,5678912345,6789123456,7891234567,8912345678,9123456789]

I’ve found references to a “lookahead”, but the examples I’ve seen only show pairs of numbers rather than larger groupings and I haven’t been able to convert them beyond the two digits.

Asked By: danspants

||

Answers:

Use a capturing group inside a lookahead. The lookahead captures the text you’re interested in, but the actual match is technically the zero-width substring before the lookahead, so the matches are technically non-overlapping:

import re 
s = "123456789123456789"
matches = re.finditer(r'(?=(d{10}))',s)
results = [int(match.group(1)) for match in matches]
# results: 
# [1234567891,
#  2345678912,
#  3456789123,
#  4567891234,
#  5678912345,
#  6789123456,
#  7891234567,
#  8912345678,
#  9123456789]
Answered By: mechanical_meat

I’m fond of regexes, but they are not needed here.

Simply

s =  "123456789123456789"

n = 10
li = [ s[i:i+n] for i in xrange(len(s)-n+1) ]
print 'n'.join(li)

result

1234567891
2345678912
3456789123
4567891234
5678912345
6789123456
7891234567
8912345678
9123456789
Answered By: eyquem

You can also try using the third-party regex module (not re), which supports overlapping matches.

>>> import regex as re
>>> s = "123456789123456789"
>>> matches = re.findall(r'd{10}', s, overlapped=True)
>>> for match in matches: print(match)  # print match
...
1234567891
2345678912
3456789123
4567891234
5678912345
6789123456
7891234567
8912345678
9123456789
Answered By: David C

Piggybacking on the accepted answer, the following currently works as well

import re
s = "123456789123456789"
matches = re.findall(r'(?=(d{10}))',s)
results = [int(match) for match in matches]
Answered By: Michael

conventional way:

import re


S = '123456789123456789'
result = []
while len(S):
    m = re.search(r'd{10}', S)
    if m:
        result.append(int(m.group()))
        S = S[m.start() + 1:]
    else:
        break
print(result)
Answered By: Avi Cohen
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.