Extract substring from dot untill colon with Python regex

Question:

I have a string that resembles the following string:

'My substring1. My substring2: My substring3: My substring4'

Ideally, my aim is to extract ‘My substring2’ from this string with Python regex. However, I would also be pleased with a result that resembles ‘. My substring2:’

So far, I am able to extract

'. My substring2: My substring3:'

with

".s.*:"

Alternatively, I have been able to extract – by using Wiktor Stribiżew’s solution that deals with a somewhat similar problem posted in How can i extract words from a string before colon and excluding n from them in python using regex

'My substring1. My substring2'

specifically with

r'^[^:-][^:]*'

However, I have been unable, after many hours of searching and trying (I am quite new to regex), to combine the two results into a single effective regex expression that will extract ‘My substring2’ out of my aforementioned string.

I would be eternally greatfull if someone could help me find to correct regex expression to extract ‘My substring2’. Thanks!

Asked By: Derk

||

Answers:

You might for example exclude matching the dot as well, and use a capture group matching any char except the :

^[^:-][^:.]*.s*([^:]+)

Explanation

  • ^ Start of string
  • [^:-] The first char can not be either : or -
  • [^:.]* Optionally match any char except : or .
  • .s* Match a dot and optional whitespace chars
  • ([^:]+) Capture group 1, match 1+ chars other than :

Regex demo

Or a bit shorted if there can not be : . and - before matching the dot:

^[^:.-]+.s*([^:]+)

Regex demo | Python demo

For example

import re

s = "My substring1. My substring2: My substring3: My substring4"
pattern = r"[^:-][^:.]*.s*([^:]+)"
m = re.match(pattern, s)
if m:
    print(m.group(1))

Output

My substring2
Answered By: The fourth bird

You can use non-greedy regex (with ?):

import re

s = "My substring1. My substring2: My substring3: My substring4"

print(re.search(r".s*(.*?):", s).group(1))

Prints:

My substring2
Answered By: Andrej Kesely

With your shown samples please try following regex, code is written and tested in Python3. Here is the Online demo for used regex.

import re
s = "My substring1. My substring2: My substring3: My substring4"
re.findall(r'^.*?.s([^:]+)(?:(?::s[^:]*)+)$',s)
['My substring2']

OR: use following regex with only 1 capturing group, little tweak to above regex, here is the Online demo for below regex.

^.*?.s([^:]+)(?::s[^:]*)+$

Explanation: Using re module of Python3 here, where I am using re.findall function of it. Then creating variable named s which has value as: 'My substring1. My substring2: My substring3: My substring4' and used regex is: ^.*?.s([^:]+)(?:(?::s[^:]*)+)$

Explanation of regex: Following is the detailed explanation for above regex.

^.*?.s      ##Matching from starting of value of variable using lazy match till literal dot followed by space.
([^:]+)       ##Creating one and only capturing group which has everything just before : here.
(?:           ##Starting a non-capturing group here.
  (?:         ##Starting 2nd non-capturing group here.
     :s[^:]* ##Matching colon followed by space just before next occurrence of colon here.
  )+          ##Closing 2nd non-capturing group and matching its 1 or more occurrences in variable.
)$            ##Closing first non-capturing group here at end of value.
Answered By: RavinderSingh13