How to make a regular expression pattern consider a comma before the start of line when using the ^ operator?

Question

import re

#example 1  with a  ,  before capture group
input_text = "Hello how are you?, dfdfdfd fdfdfdf other text. hghhg"

#example 2 without a  , (or .|,|;|n) before capture group
input_text = "dfdfdfd fdfdfdf other text. hghhg"

#No matter what position you place ^ within the options, it always considers it first, ignoring the others.
fears_and_panics_match = re.search(
                                    r"(?:.|,|;|n|^)s*(?:(?:for|by)s*me|)s*(.+?)s*(?:others*text)s*(?:.|,|;|n)", 
                                    #r"(?:.|,|;|n)s*(?:(?:for|by)s*me|)s*(.+?)s*(?:others*text)s*(?:.|,|;|n|$)", 
                                    input_text, flags = re.IGNORECASE)


if fears_and_panics_match: print(fears_and_panics_match.group(1))

Why do I use this pattern r"(?:.|,|;|n|^)s*(?:(?:for|by)s*me|)s*(.+?)s*(?:others*text)s*(?:.|,|;|n)" capture Hello how are you?, dfdfdfd fdfdfdf no matter what position you place the ^.
I would need you to evaluate the possibility of finding a comma , and then the one at the beginning of the line ^

Correct output in each case:

#for example 1
"dfdfdfd fdfdfdf"

#for example 2
"dfdfdfd fdfdfdf"

Asked By: Elektvocal95

||

Source

Answer 1

it seems like you used the wrong operator.
Also caret means "beggining of" – you never specified what should be in this beginning, so my wild guess is it took any character

no idea how much that is helpful to you – i try to keep my regexes as dumb as possible – makes it easier for me to spot the issue.

This worked on a string you provided

"[a-zA-Z0-9s?]*,?s*(ws)*(?=otherstext)"

Answered By: wickedpanda

Answer 2

You can change your regex to optionally match some characters up to a ., , or ;; then capture from there until other text:

^(?:.*?[.,;])?s*(?:(?:for|by)s*mes*)?(w.*?)(?=s*others*text)

It matches:

^ beginning of line
(?:.*?[.,;])? an optional string of characters finishing with a ., , or ;
s* some spaces
(?:(?:for|by)s*mes*)? the optional phrase for me or by me
(w.*?) a minimal number of characters, starting with a word character
(?=s*others*text) lookahead that asserts the next characters are other text

Demo on regex101

In python (note by using re.match we don’t need the ^ in the regex):

strs = [
  'dfdfdfd fdfdfdf other text. hghhg',
  'Hello how are you?, dfdfdfd fdfdfdf other text.hghhg',
  'for me a word other text',
  'A semicolon first; then some words before other text'
]
regex = r'(?:.*?[.,;])?s*(?:(?:for|by)s*mes*)?(w.*?)(?=s*others*text)'
for s in strs:
    print(re.match(regex, s).group(1))

Output:

dfdfdfd fdfdfdf
dfdfdfd fdfdfdf
a word
then some words before

Answered By: Nick

How to make a regular expression pattern consider a comma before the start of line when using the ^ operator?

Question:

Answers: