python regex keep text between the last two occurrences of a character
Question:
As the title says, I want to extract the text between the last two ocurrences of a character in a string.
I have:
'9500 anti-Xa IU/ml - 0,6 ml 5700 IU -'
'120 mg/ml – 0.165 ml -'
'300-300-300 IR/ml or IC/ml - 10 ml -'
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
I want to have:
'0,6 ml 5700 IU'
'0.165 ml'
'10 ml'
'15 g'
I tried using -s*.*-
but it matches everything between first and last -
. What’s the correct regex to use?
Answers:
You can use
[^-–—s][^-–—]*?(?=s*[-–—][^-–—]*$)
See the regex demo. Details:
[^-–—s]
– a char other than whitespace, -
, –
and —
[^-–—]*?
– zero or more chars other than -
, –
and —
as few as possible
(?=s*[-–—][^-–—]*$)
– a positive lookahead that requires zero or more whitespaces, then a -
, –
or —
char and then zero or more chars other than -
, –
and —
till end of string immediately to the right of the current location.
With search:
import re
[re.search(r'[-–]s*([^-–]+?)s*[-–][^-–]*$', x).group(1) for x in l]
Or split:
[re.split(r's+[-–]s*', x, 2)[-2] for x in l]
output: ['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']
used input:
l = ['9500 anti-Xa IU/ml - 0,6 ml 5700 IU -',
'120 mg/ml – 0.165 ml -',
'300-300-300 IR/ml or IC/ml - 10 ml -',
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
]
Try to also match the blank space before the last -
:
s-s(.*)s-
By the way, maybe this website could help you next time you have a new regex issue.
With your shown samples Only. Please try following regex with Python code, written and tested in Python3. Here is the Online demo for used regex.
import re
var="""9500 anti-Xa IU/ml - 0,6 ml 5700 IU -
120 mg/ml - 0.165 ml -
300-300-300 IR/ml or IC/ml - 10 ml -
Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -"""
[x.strip(' ') for x in re.findall(r'(?<=s-|s–)(.*?)(?=-)',var,re.M)]
Output will be as follows:
['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']
Explanation: Simple explanation would be, using Python3’s re
module’s findall
function. Where I am using regex r'(?<=s-|s–)(.*?)(?=-)'
to get the required output. Then removing all leading and trailing spaces with strip
function from it to get expected output.
As the title says, I want to extract the text between the last two ocurrences of a character in a string.
I have:
'9500 anti-Xa IU/ml - 0,6 ml 5700 IU -'
'120 mg/ml – 0.165 ml -'
'300-300-300 IR/ml or IC/ml - 10 ml -'
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
I want to have:
'0,6 ml 5700 IU'
'0.165 ml'
'10 ml'
'15 g'
I tried using -s*.*-
but it matches everything between first and last -
. What’s the correct regex to use?
You can use
[^-–—s][^-–—]*?(?=s*[-–—][^-–—]*$)
See the regex demo. Details:
[^-–—s]
– a char other than whitespace,-
,–
and—
[^-–—]*?
– zero or more chars other than-
,–
and—
as few as possible(?=s*[-–—][^-–—]*$)
– a positive lookahead that requires zero or more whitespaces, then a-
,–
or—
char and then zero or more chars other than-
,–
and—
till end of string immediately to the right of the current location.
With search:
import re
[re.search(r'[-–]s*([^-–]+?)s*[-–][^-–]*$', x).group(1) for x in l]
Or split:
[re.split(r's+[-–]s*', x, 2)[-2] for x in l]
output: ['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']
used input:
l = ['9500 anti-Xa IU/ml - 0,6 ml 5700 IU -',
'120 mg/ml – 0.165 ml -',
'300-300-300 IR/ml or IC/ml - 10 ml -',
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
]
Try to also match the blank space before the last -
:
s-s(.*)s-
By the way, maybe this website could help you next time you have a new regex issue.
With your shown samples Only. Please try following regex with Python code, written and tested in Python3. Here is the Online demo for used regex.
import re
var="""9500 anti-Xa IU/ml - 0,6 ml 5700 IU -
120 mg/ml - 0.165 ml -
300-300-300 IR/ml or IC/ml - 10 ml -
Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -"""
[x.strip(' ') for x in re.findall(r'(?<=s-|s–)(.*?)(?=-)',var,re.M)]
Output will be as follows:
['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']
Explanation: Simple explanation would be, using Python3’s re
module’s findall
function. Where I am using regex r'(?<=s-|s–)(.*?)(?=-)'
to get the required output. Then removing all leading and trailing spaces with strip
function from it to get expected output.