Python Regex Get Text Either side of Specific Characters

Question:

I have blocks of text that contain strings like the one below. I need to get the text either side of "rt" and including rt but excluding text/numbers on different lines

Example:

1.99

  Jim Smith rt Tom Ross

Random

So, here the desired result would be "Jim Smith rt Tom Ross".

I am new to regex and cannot get close. I think I need to lookahead and lookbehind then bound the result in some way but I’m struggling.

Any help would be appreciated.

Asked By: Robsmith

||

Answers:

We can use re.findall here with an appropriate regex pattern:

inp = """1.99

  Jim Smith rt Tom Ross

Random"""

matches = re.findall(r'w+(?: w+)* rt w+(?: w+)*', inp)
print(matches)  # ['Jim Smith rt Tom Ross']

Explanation of regex:

  • w+ match a single word
  • (?: w+)* proceeded by space and another word, zero or more times
  • rt match space followed by ‘rt’ and another space
  • w+ match another word
  • (?: w+)* which is followed by space and another word, zero or more times
Answered By: Tim Biegeleisen

With your shown samples please try following regex. Here is the Online demo for above regex.

^d+(?:.d+)?n+s+(.*?rt[^n]+)n+s*S+$

Python3 code: Code is written and tested in Python3x. Its using Python3’s re module’s findall function which also has re.M flag enabled in it to deal with the variable value.

import re
var = """1.99

  Jim Smith rt Tom Ross

Random"""

re.findall(r'^d+(?:.d+)?n+s+(.*?rt[^n]+)n+s*S+$',var,re.M)
['Jim Smith rt Tom Ross']

Explanation of regex:

^d+          ##From starting of the value matching 1 or more occurrences of digits.
(?:.d+)?    ##In a non-capturing group matching literal dot followed by 1 or more digits.
n+s+        ##Followed by 1 or more new lines followed by 1 or more spaces.
(.*?rt[^n]+) ##In a CAPTURING GROUP using lazy match to match till string rt just before a new line.
n+s*S+$    ##Followed by new line(s), followed by 0 or more occurrences of spaces and NON-spaces at the end of this value.
Answered By: RavinderSingh13
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.