Why I can't get correct answer in this regex?
Question:
Why I can’t get correct answer in this regex?
I wanted to get Microsoft Edge
in Unknown - abc - Microsoft Edge
But I failed.
It only showed: ['- ', '- ']
, but not Microsoft Edge
Here is my code:
import re
content = 'Unknown - abc - Microsoft Edge'
p = re.compile(r"- .*?")
print(p.findall(content))
Please help me.
Thank you very much.
Answers:
You could use a rplit
in this particular case (content.rsplit('- ', 1)[-1]
), however if you insist on a regex, you can use:
p = re.compile(r"(?<=- )[^-]*$") # non "-" after "- "
print(p.findall(content))
or
p = re.compile(r"(?<=- )(?:(?!- ).)*$") # anything after "- " but not containing "- "
print(p.findall(content))
output: ['Microsoft Edge']
Without knowing what other input values are possible and what other contents should/shouldn’t be matched, it is hard to design this. p = re.compile(r"[^ ]* [^ ]*$") could work.
You get 2 times -
as a result because the pattern - .*?
has a non greedy part at the end .*?
that matches any character as few as possible.
As there are no rules following this part of the pattern, the engine can settle for matching zero characters, leaving just -
as a match.
To get the Microsoft Edge part, you can use a capture group:
-s*([^-]+)$
Explanation
-s*
Match -
and optional whitespace chars
([^-]+)
Capture group 1, match 1+ chars other than -
$
End of string
See a regex demo.
import re
s = "Unknown - abc - Microsoft Edge"
pattern = r"-s*([^-]+)$"
print(re.findall(pattern, s, re.M))
Output
['Microsoft Edge']
Why I can’t get correct answer in this regex?
I wanted to get Microsoft Edge
in Unknown - abc - Microsoft Edge
But I failed.
It only showed: ['- ', '- ']
, but not Microsoft Edge
Here is my code:
import re
content = 'Unknown - abc - Microsoft Edge'
p = re.compile(r"- .*?")
print(p.findall(content))
Please help me.
Thank you very much.
You could use a rplit
in this particular case (content.rsplit('- ', 1)[-1]
), however if you insist on a regex, you can use:
p = re.compile(r"(?<=- )[^-]*$") # non "-" after "- "
print(p.findall(content))
or
p = re.compile(r"(?<=- )(?:(?!- ).)*$") # anything after "- " but not containing "- "
print(p.findall(content))
output: ['Microsoft Edge']
Without knowing what other input values are possible and what other contents should/shouldn’t be matched, it is hard to design this. p = re.compile(r"[^ ]* [^ ]*$") could work.
You get 2 times -
as a result because the pattern - .*?
has a non greedy part at the end .*?
that matches any character as few as possible.
As there are no rules following this part of the pattern, the engine can settle for matching zero characters, leaving just -
as a match.
To get the Microsoft Edge part, you can use a capture group:
-s*([^-]+)$
Explanation
-s*
Match-
and optional whitespace chars([^-]+)
Capture group 1, match 1+ chars other than-
$
End of string
See a regex demo.
import re
s = "Unknown - abc - Microsoft Edge"
pattern = r"-s*([^-]+)$"
print(re.findall(pattern, s, re.M))
Output
['Microsoft Edge']