Regex: Match whitespace plus = or = is first char
Question:
I was getting quit far via regex101 but now I am stuck.
I want to extract a string between "markers" using Regex from Python 3.9.
In the following example lines I will get the foobar
back for each line. The "marker" is =
. But that marker does have some edge cases.
lore =foobar= ipsum
(there is space before and after =
)
lore =foobar=.
=foobar= ipsum
lore =foobar=
This is what shouldn’t not match because the =x
is not allowed.
lore =foobar=x
That is the regex I am using (Python 3.9)
=(.*?)=[ .]
(see a space in the beginning!)
I can handle the characters following after the second marker; allowed is a space or a period.
Number 1 and 2 are working. But 3 and 4 are missing.
The no character or line ending is missing.
Also in the beginning I don’t now how to check for no character before =
OR
.
Answers:
You could write the pattern as:
(?:^| )=(.*?)=(?:[ .]|$)
(?:^| )
Non capture group with an alternation |
matching either a space or assert the start of the string
=
Match literally
(.*?)
Capture group 1, match any character as least as possible
=
Match literallt
(?:[ .]|$)
Match either a space or dot, or assert the end of the string
If there can not be any equals sign in between, you might also write the pattern as:
(?<!S)=([^=n]*)=(?:[ .]|$)
(?<!S)
Assert a whitspace boundary to the left
=
Match literally
([^=n]*)
Capture group 1, match any character except =
or a newline
=
Match literally
(?:[ .]|$)
Match either a space or dot, or assert the end of the string
I was getting quit far via regex101 but now I am stuck.
I want to extract a string between "markers" using Regex from Python 3.9.
In the following example lines I will get the foobar
back for each line. The "marker" is =
. But that marker does have some edge cases.
lore =foobar= ipsum
(there is space before and after=
)lore =foobar=.
=foobar= ipsum
lore =foobar=
This is what shouldn’t not match because the =x
is not allowed.
lore =foobar=x
That is the regex I am using (Python 3.9)
=(.*?)=[ .]
(see a space in the beginning!)
I can handle the characters following after the second marker; allowed is a space or a period.
Number 1 and 2 are working. But 3 and 4 are missing.
The no character or line ending is missing.
Also in the beginning I don’t now how to check for no character before =
OR
.
You could write the pattern as:
(?:^| )=(.*?)=(?:[ .]|$)
(?:^| )
Non capture group with an alternation|
matching either a space or assert the start of the string=
Match literally(.*?)
Capture group 1, match any character as least as possible=
Match literallt(?:[ .]|$)
Match either a space or dot, or assert the end of the string
If there can not be any equals sign in between, you might also write the pattern as:
(?<!S)=([^=n]*)=(?:[ .]|$)
(?<!S)
Assert a whitspace boundary to the left=
Match literally([^=n]*)
Capture group 1, match any character except=
or a newline=
Match literally(?:[ .]|$)
Match either a space or dot, or assert the end of the string