combining split with findall
Question:
I’m splitting a string with some separator, but want the separator matches as well:
import re
s = "oren;moish30.4.200/-/v6.99.5/barbi"
print(re.split("d+.d+.d+", s))
print(re.findall("d+.d+.d+", s))
I can’t find an easy way to combine the 2 lists I get:
['oren;moish', '/-/v', '/barbi']
['30.4.200', '6.99.5']
Into the desired output:
['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
Answers:
Try this:
import re
s = "oren;moish30.4.200/-/v6.99.5/barbi"
print([x for y in re.findall(r"(?:([A-Za-z;/-]+)|(d+.d+.d+))", s) for x in y if x])
Result:
['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
Another solution (regex101):
s = "oren;moish30.4.200/-/v6.99.5/barbi"
x = re.findall(r"d+.d+.d+|.+?(?=d+.d+.d+|Z)", s)
print(x)
Prints:
['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
From the re.split
docs:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
So just wrap your regex in a capturing group:
print(re.split(r"(d+.d+.d+)", s))
You could use re.findall and a pattern to match:
d+.d+.d+|D+(?:d(?!d*.d+.d)D*)*
Explanation
d+.d+.d+
Match 3 times 1+ digits with a single dot in between
|
Or
D+
Match 1+ chars other than a digit
(?:
Non capture group to repeat as a whole part
d(?!d*.d+.d)
Match a single digit asserting not the digits and dots pattern to the right
D*
Match optional chars other than a digit
)*
Close the non capture group and optionally repeat it
See a regex demo.
EXample
import re
s = "oren;moish30.4.200/-/v6.99.5/barbi"
pattern = r"d+.d+.d+|D+(?:d(?!d*.d+.d)D*)*"
print(re.findall(pattern, s))
Output
['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
I’m splitting a string with some separator, but want the separator matches as well:
import re
s = "oren;moish30.4.200/-/v6.99.5/barbi"
print(re.split("d+.d+.d+", s))
print(re.findall("d+.d+.d+", s))
I can’t find an easy way to combine the 2 lists I get:
['oren;moish', '/-/v', '/barbi']
['30.4.200', '6.99.5']
Into the desired output:
['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
Try this:
import re
s = "oren;moish30.4.200/-/v6.99.5/barbi"
print([x for y in re.findall(r"(?:([A-Za-z;/-]+)|(d+.d+.d+))", s) for x in y if x])
Result:
['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
Another solution (regex101):
s = "oren;moish30.4.200/-/v6.99.5/barbi"
x = re.findall(r"d+.d+.d+|.+?(?=d+.d+.d+|Z)", s)
print(x)
Prints:
['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']
From the re.split
docs:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
So just wrap your regex in a capturing group:
print(re.split(r"(d+.d+.d+)", s))
You could use re.findall and a pattern to match:
d+.d+.d+|D+(?:d(?!d*.d+.d)D*)*
Explanation
d+.d+.d+
Match 3 times 1+ digits with a single dot in between|
OrD+
Match 1+ chars other than a digit(?:
Non capture group to repeat as a whole partd(?!d*.d+.d)
Match a single digit asserting not the digits and dots pattern to the rightD*
Match optional chars other than a digit
)*
Close the non capture group and optionally repeat it
See a regex demo.
EXample
import re
s = "oren;moish30.4.200/-/v6.99.5/barbi"
pattern = r"d+.d+.d+|D+(?:d(?!d*.d+.d)D*)*"
print(re.findall(pattern, s))
Output
['oren;moish', '30.4.200', '/-/v', '6.99.5', '/barbi']