combining split with findall

Question:

I’m splitting a string with some separator, but want the separator matches as well:

import re

s = "oren;moish30.4.200/-/v6.99.5/barbi"
print(re.split("d+.d+.d+", s))
print(re.findall("d+.d+.d+", s))

I can’t find an easy way to combine the 2 lists I get:

['oren;moish', '/-/v', '/barbi']
['30.4.200', '6.99.5']

Into the desired output:

['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
Asked By: OrenIshShalom

||

Answers:

Try this:

import re
s = "oren;moish30.4.200/-/v6.99.5/barbi"
print([x for y in re.findall(r"(?:([A-Za-z;/-]+)|(d+.d+.d+))", s) for x in y if x])

Result:

['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
Answered By: user56700

Another solution (regex101):

s = "oren;moish30.4.200/-/v6.99.5/barbi"

x = re.findall(r"d+.d+.d+|.+?(?=d+.d+.d+|Z)", s)
print(x)

Prints:

['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
Answered By: Andrej Kesely

From the re.split docs:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

So just wrap your regex in a capturing group:

print(re.split(r"(d+.d+.d+)", s))
Answered By: user2357112

You could use re.findall and a pattern to match:

d+.d+.d+|D+(?:d(?!d*.d+.d)D*)*

Explanation

  • d+.d+.d+ Match 3 times 1+ digits with a single dot in between
  • | Or
  • D+ Match 1+ chars other than a digit
  • (?: Non capture group to repeat as a whole part
    • d(?!d*.d+.d) Match a single digit asserting not the digits and dots pattern to the right
    • D* Match optional chars other than a digit
  • )* Close the non capture group and optionally repeat it

See a regex demo.

EXample

import re

s = "oren;moish30.4.200/-/v6.99.5/barbi"
pattern = r"d+.d+.d+|D+(?:d(?!d*.d+.d)D*)*"
print(re.findall(pattern, s))

Output

['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
Answered By: The fourth bird
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.