Match everything before a set of characters but if they aren't present then match everything

Question:

I have a series of strings some of which have a year string at the end in the format -2022. I’m looking to match everything up to but excluding the - before 4 digit year string but if there is no year present then I would like to return the entire string. The following:

import re
x = "itf-m15-cancun-15-men-2022"
re.search(r"^.+?(?=-dddd)", x).group()

Gets me 'itf-m15-cancun-15-men' which I’m looking for. However, the following:

import re
x = "itf-m15-cancun-15-men"
re.search(r"^.+?(?=-dddd)", x).group()

Errors as no result is returned. How do I capture everything up to but excluding the - before the 4 digit year string or return the whole string if the year string isn’t present?

Asked By: Jossy

||

Answers:

Make the (?=-dddd) conditional by adding a ? after it. (Tested in JavaScript)

/^.+?(?=-dddd)?$/
Answered By: Richard Henage

Add OR end |$ inside your lookahead:

^.+?(?=-d{4}|$)

See demo at regex101

Alternatively an explicit greedy alternation could be used here like in this demo.

Answered By: bobble bubble
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.