Extract substring from dot untill colon with Python regex
Question:
I have a string that resembles the following string:
'My substring1. My substring2: My substring3: My substring4'
Ideally, my aim is to extract ‘My substring2’ from this string with Python regex. However, I would also be pleased with a result that resembles ‘. My substring2:’
So far, I am able to extract
'. My substring2: My substring3:'
with
".s.*:"
Alternatively, I have been able to extract – by using Wiktor Stribiżew’s solution that deals with a somewhat similar problem posted in How can i extract words from a string before colon and excluding n from them in python using regex –
'My substring1. My substring2'
specifically with
r'^[^:-][^:]*'
However, I have been unable, after many hours of searching and trying (I am quite new to regex), to combine the two results into a single effective regex expression that will extract ‘My substring2’ out of my aforementioned string.
I would be eternally greatfull if someone could help me find to correct regex expression to extract ‘My substring2’. Thanks!
Answers:
You might for example exclude matching the dot as well, and use a capture group matching any char except the :
^[^:-][^:.]*.s*([^:]+)
Explanation
^
Start of string
[^:-]
The first char can not be either :
or -
[^:.]*
Optionally match any char except :
or .
.s*
Match a dot and optional whitespace chars
([^:]+)
Capture group 1, match 1+ chars other than :
Or a bit shorted if there can not be :
.
and -
before matching the dot:
^[^:.-]+.s*([^:]+)
For example
import re
s = "My substring1. My substring2: My substring3: My substring4"
pattern = r"[^:-][^:.]*.s*([^:]+)"
m = re.match(pattern, s)
if m:
print(m.group(1))
Output
My substring2
You can use non-greedy regex (with ?
):
import re
s = "My substring1. My substring2: My substring3: My substring4"
print(re.search(r".s*(.*?):", s).group(1))
Prints:
My substring2
With your shown samples please try following regex, code is written and tested in Python3. Here is the Online demo for used regex.
import re
s = "My substring1. My substring2: My substring3: My substring4"
re.findall(r'^.*?.s([^:]+)(?:(?::s[^:]*)+)$',s)
['My substring2']
OR: use following regex with only 1 capturing group, little tweak to above regex, here is the Online demo for below regex.
^.*?.s([^:]+)(?::s[^:]*)+$
Explanation: Using re
module of Python3 here, where I am using re.findall
function of it. Then creating variable named s
which has value as: 'My substring1. My substring2: My substring3: My substring4'
and used regex is: ^.*?.s([^:]+)(?:(?::s[^:]*)+)$
Explanation of regex: Following is the detailed explanation for above regex.
^.*?.s ##Matching from starting of value of variable using lazy match till literal dot followed by space.
([^:]+) ##Creating one and only capturing group which has everything just before : here.
(?: ##Starting a non-capturing group here.
(?: ##Starting 2nd non-capturing group here.
:s[^:]* ##Matching colon followed by space just before next occurrence of colon here.
)+ ##Closing 2nd non-capturing group and matching its 1 or more occurrences in variable.
)$ ##Closing first non-capturing group here at end of value.
I have a string that resembles the following string:
'My substring1. My substring2: My substring3: My substring4'
Ideally, my aim is to extract ‘My substring2’ from this string with Python regex. However, I would also be pleased with a result that resembles ‘. My substring2:’
So far, I am able to extract
'. My substring2: My substring3:'
with
".s.*:"
Alternatively, I have been able to extract – by using Wiktor Stribiżew’s solution that deals with a somewhat similar problem posted in How can i extract words from a string before colon and excluding n from them in python using regex –
'My substring1. My substring2'
specifically with
r'^[^:-][^:]*'
However, I have been unable, after many hours of searching and trying (I am quite new to regex), to combine the two results into a single effective regex expression that will extract ‘My substring2’ out of my aforementioned string.
I would be eternally greatfull if someone could help me find to correct regex expression to extract ‘My substring2’. Thanks!
You might for example exclude matching the dot as well, and use a capture group matching any char except the :
^[^:-][^:.]*.s*([^:]+)
Explanation
^
Start of string[^:-]
The first char can not be either:
or-
[^:.]*
Optionally match any char except:
or.
.s*
Match a dot and optional whitespace chars([^:]+)
Capture group 1, match 1+ chars other than:
Or a bit shorted if there can not be :
.
and -
before matching the dot:
^[^:.-]+.s*([^:]+)
For example
import re
s = "My substring1. My substring2: My substring3: My substring4"
pattern = r"[^:-][^:.]*.s*([^:]+)"
m = re.match(pattern, s)
if m:
print(m.group(1))
Output
My substring2
You can use non-greedy regex (with ?
):
import re
s = "My substring1. My substring2: My substring3: My substring4"
print(re.search(r".s*(.*?):", s).group(1))
Prints:
My substring2
With your shown samples please try following regex, code is written and tested in Python3. Here is the Online demo for used regex.
import re
s = "My substring1. My substring2: My substring3: My substring4"
re.findall(r'^.*?.s([^:]+)(?:(?::s[^:]*)+)$',s)
['My substring2']
OR: use following regex with only 1 capturing group, little tweak to above regex, here is the Online demo for below regex.
^.*?.s([^:]+)(?::s[^:]*)+$
Explanation: Using re
module of Python3 here, where I am using re.findall
function of it. Then creating variable named s
which has value as: 'My substring1. My substring2: My substring3: My substring4'
and used regex is: ^.*?.s([^:]+)(?:(?::s[^:]*)+)$
Explanation of regex: Following is the detailed explanation for above regex.
^.*?.s ##Matching from starting of value of variable using lazy match till literal dot followed by space.
([^:]+) ##Creating one and only capturing group which has everything just before : here.
(?: ##Starting a non-capturing group here.
(?: ##Starting 2nd non-capturing group here.
:s[^:]* ##Matching colon followed by space just before next occurrence of colon here.
)+ ##Closing 2nd non-capturing group and matching its 1 or more occurrences in variable.
)$ ##Closing first non-capturing group here at end of value.