Regular expression syntax for "match nothing"?
Question:
I have a python template engine that heavily uses regexp. It uses concatenation like:
re.compile( regexp1 + "|" + regexp2 + "*|" + regexp3 + "+" )
I can modify the individual substrings (regexp1, regexp2 etc).
Is there any small and light expression that matches nothing, which I can use inside a template where I don’t want any matches? Unfortunately, sometimes ‘+’ or ‘*’ is appended to the regexp atom so I can’t use an empty string – that will raise a “nothing to repeat” error.
Answers:
This shouldn’t match anything:
re.compile('$^')
So if you replace regexp1, regexp2 and regexp3 with ‘$^’ it will be impossible to find a match. Unless you are using the multi line mode.
After some tests I found a better solution
re.compile('a^')
It is impossible to match and will fail earlier than the previous solution. You can replace a with any other character and it will always be impossible to match
Maybe '.{0}'
?
To match an empty string – even in multiline mode – you can use AZ
, so:
re.compile('AZ|AZ*|AZ+')
The difference is that A
and Z
are start and end of string, whilst ^
and $
these can match start/end of lines, so $^|$^*|$^+
could potentially match a string containing newlines (if the flag is enabled).
And to fail to match anything (even an empty string), simply attempt to find content before the start of the string, e.g:
re.compile('.A|.A*|.A+')
Since no characters can come before A (by definition), this will always fail to match.
You could use
z..
This is the absolute end of string, followed by two of anything
If +
or *
is tacked on the end this still works refusing to match anything
Or, use some list comprehension to remove the useless regexp entries and join to put them all together. Something like:
re.compile('|'.join([x for x in [regexp1, regexp2, ...] if x != None]))
Be sure to add some comments next to that line of code though 🙂
(?!)
should always fail to match. It is the zero-width negative look-ahead. If what is in the parentheses matches then the whole match fails. Given that it has nothing in it, it will fail the match for anything (including nothing).
I have a python template engine that heavily uses regexp. It uses concatenation like:
re.compile( regexp1 + "|" + regexp2 + "*|" + regexp3 + "+" )
I can modify the individual substrings (regexp1, regexp2 etc).
Is there any small and light expression that matches nothing, which I can use inside a template where I don’t want any matches? Unfortunately, sometimes ‘+’ or ‘*’ is appended to the regexp atom so I can’t use an empty string – that will raise a “nothing to repeat” error.
This shouldn’t match anything:
re.compile('$^')
So if you replace regexp1, regexp2 and regexp3 with ‘$^’ it will be impossible to find a match. Unless you are using the multi line mode.
After some tests I found a better solution
re.compile('a^')
It is impossible to match and will fail earlier than the previous solution. You can replace a with any other character and it will always be impossible to match
Maybe '.{0}'
?
To match an empty string – even in multiline mode – you can use AZ
, so:
re.compile('AZ|AZ*|AZ+')
The difference is that A
and Z
are start and end of string, whilst ^
and $
these can match start/end of lines, so $^|$^*|$^+
could potentially match a string containing newlines (if the flag is enabled).
And to fail to match anything (even an empty string), simply attempt to find content before the start of the string, e.g:
re.compile('.A|.A*|.A+')
Since no characters can come before A (by definition), this will always fail to match.
You could use
z..
This is the absolute end of string, followed by two of anything
If +
or *
is tacked on the end this still works refusing to match anything
Or, use some list comprehension to remove the useless regexp entries and join to put them all together. Something like:
re.compile('|'.join([x for x in [regexp1, regexp2, ...] if x != None]))
Be sure to add some comments next to that line of code though 🙂
(?!)
should always fail to match. It is the zero-width negative look-ahead. If what is in the parentheses matches then the whole match fails. Given that it has nothing in it, it will fail the match for anything (including nothing).