Regex: Get "London, UK" from "Evolution Recruitment (Agency) (London, UK)"

Question:

I have this string:

>>> s = 'Evolution Recruitment (Agency) (London, UK)'

I want to get this part:

London, UK

Keep in mind that for the real case I’m working on the first brackets (agency) are not necessarily in the string.

I tried this:

>>> import re
>>> re.findall("((.*?))$", s)
['Agency) (London, UK']

If I was able to let the regex read from right to left instead of left to right this solution should work.

Is that possible? If not, is there another way to get the part London, UK?

Asked By: Bentley4

||

Answers:

If you replace .*? with [^(]* you should only capture the contents of the last set of brackets.

(You’re right that it would be more efficient to read this right-to-left – maybe you’d be better off not using a regular expression but manually checking the last character is a ), finding the last index of (, and using substring to get the content between the two?)

Answered By: Rawling
In [8]: re.search(r".*[(](.*)[)]", s).groups()
Out[8]: ('London, UK',)

It just uses a greedy .* match to get to the last set of parentheses.

Alternatively, you could find all matching parentheses, and just use the last pair:

In [14]: re.findall(r'(.*?)', s)[-1]
Out[14]: '(London, UK)'

The regex approach is quite flexible. However, if you know the input’s well-formed and you just want the text inside the last set of parentheses:

In [11]: s[s.rfind('(')+1:s.rfind(')')]
Out[11]: 'London, UK'

This scans the string right-to-left, so could potentially be fairly efficient (I have profiled anything, so that’s just a speculation).

Answered By: NPE

This seems to work:

re.findall(r"(([^)]+))$", s)

and it works with re.search as well:

re.search(r"(([^)]+))$", s).group(0)

In words it says, look for a ( then start capturing anything that isn’t a ) until you see a ) at which point, stop capturing. Only keep it if the line ends after the ) — otherwise, it doesn’t count as a match.

Answered By: mgilson

Just skip characters, and have a group with parenthesis after the skipping:

>>> re.findall(r'.+((.+))', s)
['(London, UK)']

You could acnhor this to the end of string ($) too, might make it even safer.

Answered By: unwind
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.