python regex keep text between the last two occurrences of a character

Question:

As the title says, I want to extract the text between the last two ocurrences of a character in a string.

I have:

'9500 anti-Xa IU/ml - 0,6 ml 5700 IU -'
'120 mg/ml – 0.165 ml -'
'300-300-300 IR/ml  or  IC/ml - 10 ml -'
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'

I want to have:

'0,6 ml 5700 IU'
'0.165 ml'
'10 ml'
'15 g'

I tried using -s*.*- but it matches everything between first and last -. What’s the correct regex to use?

Asked By: Pedro Domingues

||

Answers:

You can use

[^-–—s][^-–—]*?(?=s*[-–—][^-–—]*$)

See the regex demo. Details:

  • [^-–—s] – a char other than whitespace, -, and
  • [^-–—]*? – zero or more chars other than -, and as few as possible
  • (?=s*[-–—][^-–—]*$) – a positive lookahead that requires zero or more whitespaces, then a -, or char and then zero or more chars other than -, and till end of string immediately to the right of the current location.
Answered By: Wiktor Stribiżew

With search:

import re
[re.search(r'[-–]s*([^-–]+?)s*[-–][^-–]*$', x).group(1) for x in l]

Or split:

[re.split(r's+[-–]s*', x, 2)[-2] for x in l]

output: ['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']

used input:

l = ['9500 anti-Xa IU/ml - 0,6 ml 5700 IU -',
     '120 mg/ml – 0.165 ml -',
     '300-300-300 IR/ml  or  IC/ml - 10 ml -',
     'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
    ]

regex demo

Answered By: mozway

Try to also match the blank space before the last -:

s-s(.*)s-

By the way, maybe this website could help you next time you have a new regex issue.

Answered By: Gian Arauz

With your shown samples Only. Please try following regex with Python code, written and tested in Python3. Here is the Online demo for used regex.

import re

var="""9500 anti-Xa IU/ml - 0,6 ml 5700 IU -
120 mg/ml - 0.165 ml -
300-300-300 IR/ml  or  IC/ml - 10 ml -
Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -"""

[x.strip(' ') for x in re.findall(r'(?<=s-|s–)(.*?)(?=-)',var,re.M)]

Output will be as follows:

['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']

Explanation: Simple explanation would be, using Python3’s re module’s findall function. Where I am using regex r'(?<=s-|s–)(.*?)(?=-)' to get the required output. Then removing all leading and trailing spaces with strip function from it to get expected output.

Answered By: RavinderSingh13
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.