Why I can't get correct answer in this regex?

Question:

Why I can’t get correct answer in this regex?

I wanted to get Microsoft Edge in Unknown - abc - Microsoft Edge

But I failed.

It only showed: ['- ', '- '], but not Microsoft Edge

Here is my code:

import re

content = 'Unknown - abc - Microsoft Edge'
p = re.compile(r"- .*?")
print(p.findall(content))

Please help me.

Thank you very much.

Asked By: Danhui Xu

||

Answers:

You could use a rplit in this particular case (content.rsplit('- ', 1)[-1]), however if you insist on a regex, you can use:

p = re.compile(r"(?<=- )[^-]*$")  # non "-" after "- "
print(p.findall(content))

or

p = re.compile(r"(?<=- )(?:(?!- ).)*$")  # anything after "- " but not containing "- "
print(p.findall(content))

output: ['Microsoft Edge']

Answered By: mozway

Without knowing what other input values are possible and what other contents should/shouldn’t be matched, it is hard to design this. p = re.compile(r"[^ ]* [^ ]*$") could work.

Answered By: Andrew

You get 2 times - as a result because the pattern - .*? has a non greedy part at the end .*? that matches any character as few as possible.

As there are no rules following this part of the pattern, the engine can settle for matching zero characters, leaving just - as a match.

To get the Microsoft Edge part, you can use a capture group:

-s*([^-]+)$

Explanation

  • -s* Match - and optional whitespace chars
  • ([^-]+) Capture group 1, match 1+ chars other than -
  • $ End of string

See a regex demo.

import re

s = "Unknown - abc - Microsoft Edge"
pattern = r"-s*([^-]+)$"
print(re.findall(pattern, s, re.M))

Output

['Microsoft Edge']
Answered By: The fourth bird
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.