How to make a regular expression pattern consider a comma before the start of line when using the ^ operator?
Question:
import re
#example 1 with a , before capture group
input_text = "Hello how are you?, dfdfdfd fdfdfdf other text. hghhg"
#example 2 without a , (or .|,|;|n) before capture group
input_text = "dfdfdfd fdfdfdf other text. hghhg"
#No matter what position you place ^ within the options, it always considers it first, ignoring the others.
fears_and_panics_match = re.search(
r"(?:.|,|;|n|^)s*(?:(?:for|by)s*me|)s*(.+?)s*(?:others*text)s*(?:.|,|;|n)",
#r"(?:.|,|;|n)s*(?:(?:for|by)s*me|)s*(.+?)s*(?:others*text)s*(?:.|,|;|n|$)",
input_text, flags = re.IGNORECASE)
if fears_and_panics_match: print(fears_and_panics_match.group(1))
Why do I use this pattern r"(?:.|,|;|n|^)s*(?:(?:for|by)s*me|)s*(.+?)s*(?:others*text)s*(?:.|,|;|n)"
capture Hello how are you?, dfdfdfd fdfdfdf
no matter what position you place the ^
.
I would need you to evaluate the possibility of finding a comma ,
and then the one at the beginning of the line ^
Correct output in each case:
#for example 1
"dfdfdfd fdfdfdf"
#for example 2
"dfdfdfd fdfdfdf"
Answers:
it seems like you used the wrong operator.
Also caret means "beggining of" – you never specified what should be in this beginning, so my wild guess is it took any character
no idea how much that is helpful to you – i try to keep my regexes as dumb as possible – makes it easier for me to spot the issue.
This worked on a string you provided
"[a-zA-Z0-9s?]*,?s*(ws)*(?=otherstext)"
You can change your regex to optionally match some characters up to a .
, ,
or ;
; then capture from there until other text
:
^(?:.*?[.,;])?s*(?:(?:for|by)s*mes*)?(w.*?)(?=s*others*text)
It matches:
^
beginning of line
(?:.*?[.,;])?
an optional string of characters finishing with a .
, ,
or ;
s*
some spaces
(?:(?:for|by)s*mes*)?
the optional phrase for me
or by me
(w.*?)
a minimal number of characters, starting with a word character
(?=s*others*text)
lookahead that asserts the next characters are other text
Demo on regex101
In python (note by using re.match
we don’t need the ^
in the regex):
strs = [
'dfdfdfd fdfdfdf other text. hghhg',
'Hello how are you?, dfdfdfd fdfdfdf other text.hghhg',
'for me a word other text',
'A semicolon first; then some words before other text'
]
regex = r'(?:.*?[.,;])?s*(?:(?:for|by)s*mes*)?(w.*?)(?=s*others*text)'
for s in strs:
print(re.match(regex, s).group(1))
Output:
dfdfdfd fdfdfdf
dfdfdfd fdfdfdf
a word
then some words before
import re
#example 1 with a , before capture group
input_text = "Hello how are you?, dfdfdfd fdfdfdf other text. hghhg"
#example 2 without a , (or .|,|;|n) before capture group
input_text = "dfdfdfd fdfdfdf other text. hghhg"
#No matter what position you place ^ within the options, it always considers it first, ignoring the others.
fears_and_panics_match = re.search(
r"(?:.|,|;|n|^)s*(?:(?:for|by)s*me|)s*(.+?)s*(?:others*text)s*(?:.|,|;|n)",
#r"(?:.|,|;|n)s*(?:(?:for|by)s*me|)s*(.+?)s*(?:others*text)s*(?:.|,|;|n|$)",
input_text, flags = re.IGNORECASE)
if fears_and_panics_match: print(fears_and_panics_match.group(1))
Why do I use this pattern r"(?:.|,|;|n|^)s*(?:(?:for|by)s*me|)s*(.+?)s*(?:others*text)s*(?:.|,|;|n)"
capture Hello how are you?, dfdfdfd fdfdfdf
no matter what position you place the ^
.
I would need you to evaluate the possibility of finding a comma ,
and then the one at the beginning of the line ^
Correct output in each case:
#for example 1
"dfdfdfd fdfdfdf"
#for example 2
"dfdfdfd fdfdfdf"
it seems like you used the wrong operator.
Also caret means "beggining of" – you never specified what should be in this beginning, so my wild guess is it took any character
no idea how much that is helpful to you – i try to keep my regexes as dumb as possible – makes it easier for me to spot the issue.
This worked on a string you provided
"[a-zA-Z0-9s?]*,?s*(ws)*(?=otherstext)"
You can change your regex to optionally match some characters up to a .
, ,
or ;
; then capture from there until other text
:
^(?:.*?[.,;])?s*(?:(?:for|by)s*mes*)?(w.*?)(?=s*others*text)
It matches:
^
beginning of line(?:.*?[.,;])?
an optional string of characters finishing with a.
,,
or;
s*
some spaces(?:(?:for|by)s*mes*)?
the optional phrasefor me
orby me
(w.*?)
a minimal number of characters, starting with a word character(?=s*others*text)
lookahead that asserts the next characters areother text
Demo on regex101
In python (note by using re.match
we don’t need the ^
in the regex):
strs = [
'dfdfdfd fdfdfdf other text. hghhg',
'Hello how are you?, dfdfdfd fdfdfdf other text.hghhg',
'for me a word other text',
'A semicolon first; then some words before other text'
]
regex = r'(?:.*?[.,;])?s*(?:(?:for|by)s*mes*)?(w.*?)(?=s*others*text)'
for s in strs:
print(re.match(regex, s).group(1))
Output:
dfdfdfd fdfdfdf
dfdfdfd fdfdfdf
a word
then some words before