Match everything before a set of characters but if they aren't present then match everything
Question:
I have a series of strings some of which have a year string at the end in the format -2022
. I’m looking to match everything up to but excluding the -
before 4 digit year string but if there is no year present then I would like to return the entire string. The following:
import re
x = "itf-m15-cancun-15-men-2022"
re.search(r"^.+?(?=-dddd)", x).group()
Gets me 'itf-m15-cancun-15-men'
which I’m looking for. However, the following:
import re
x = "itf-m15-cancun-15-men"
re.search(r"^.+?(?=-dddd)", x).group()
Errors as no result is returned. How do I capture everything up to but excluding the -
before the 4 digit year string or return the whole string if the year string isn’t present?
Answers:
Make the (?=-dddd)
conditional by adding a ?
after it. (Tested in JavaScript)
/^.+?(?=-dddd)?$/
Add OR end |$
inside your lookahead:
^.+?(?=-d{4}|$)
Alternatively an explicit greedy alternation could be used here like in this demo.
I have a series of strings some of which have a year string at the end in the format -2022
. I’m looking to match everything up to but excluding the -
before 4 digit year string but if there is no year present then I would like to return the entire string. The following:
import re
x = "itf-m15-cancun-15-men-2022"
re.search(r"^.+?(?=-dddd)", x).group()
Gets me 'itf-m15-cancun-15-men'
which I’m looking for. However, the following:
import re
x = "itf-m15-cancun-15-men"
re.search(r"^.+?(?=-dddd)", x).group()
Errors as no result is returned. How do I capture everything up to but excluding the -
before the 4 digit year string or return the whole string if the year string isn’t present?
Make the (?=-dddd)
conditional by adding a ?
after it. (Tested in JavaScript)
/^.+?(?=-dddd)?$/
Add OR end |$
inside your lookahead:
^.+?(?=-d{4}|$)
Alternatively an explicit greedy alternation could be used here like in this demo.