How to handle " in Regex Python

Question:

I am trying to grab fary_trigger_post in the code below using Regex. However, I don’t understand why it always includes " in the end of the matched pattern, which I don’t expect.
Any idea or suggestion?

re.match(
r'-instance[ "']*(.+)[ "']*$', 
'-instance "fary_trigger_post" '.strip(), 
flags=re.S).group(1)


'fary_trigger_post"'

Thank you.

Asked By: Jason T.

||

Answers:

Here’s what I’d use in your matching string, but it’s hard to provide a better answer without knowing all your cases:

r'-instances+"(.+)"s*$'
Answered By: Byron

The (.+) is greedy and grabs ANY character until the end of the input. If you modified your input to include characters after the final double quote (e.g. '-instance "fary_trigger_post" asdf') you would find the double quote and the remaining characters in the output (e.g. fary_trigger_post" asdf). Instead of .+ you should try [^"']+ to capture all characters except the quotes. This should return what you expect.

re.match(r'-instance[ "']*([^"']+)[ "'].*$', '-instance "fary_trigger_post" '.strip(), flags=re.S).group(1)

Also, note that I modified the end of the expression to use .* which will match any characters following the last quote.

Answered By: Matthew Hielsberg

When you try to get group 1 (i.e. (.+)) regex will follow this match to the end of string, as it can match . (any character) 1 or more times (but it will take maximum amount of times). I would suggest use the following pattern:

'-instance[ "']*(.+)["']+ *$'

This will require regex to match all spaces in the end and all quoutes separatelly, so that it won’t be included into group 1

Answered By: CK9
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.