How to match part of a string multiple overlapping times with regex

Question:

I need a Python regex matching the part of a string multiple times:

My String: aa-bbb-c-dd

I would like to have groups like this:

  1. aa-bbb
  2. bbb-c
  3. c-dd

Does somebody have an idea on how to do this?

Asked By: user1383029

||

Answers:

You can use lookahead to get overlapping matches:

(?=b([A-Za-z]+-[A-Za-z]+)b)

See the regex demo.

Details:

  • (?= – start of a positive lookahead that matches a location that is immediately followed with
    • b – a word boundary
    • ([A-Za-z]+-[A-Za-z]+) – Group 1: one or more ASCII letters, -, one or more ASCII letters
    • b – a word boundary
  • ) – end of the lookahead.

In Python, use it with re.findall:

import re
text = "aaaa-bb-ccc-dd"
print( re.findall(r'(?=b([A-Z]+-[A-Z]+)b)', text, re.I) )
# => ['aaaa-bb', 'bb-ccc', 'ccc-dd']

See the Python demo. Note I changed [A-Za-z] to [A-Z] in the code since I made the regex matching case insensitive with the help of the re.I option. Make sure you are using the r string literal prefix or b will be treated as a BACKSPACE char, x08, and not a word boundary.

Variations

  • (?=b([^Wd_]+-[^Wd_]+)b) – matching any Unicode letters
  • (?=(?<![^Wd_])([^Wd_]+-[^Wd_]+)(?![^Wd_])) – matching any Unicode letters and the boundaries are any non-letters
Answered By: Wiktor Stribiżew
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.