Regular expression syntax for "match nothing"?

Question:

I have a python template engine that heavily uses regexp. It uses concatenation like:

re.compile( regexp1 + "|" + regexp2 + "*|" + regexp3 + "+" )

I can modify the individual substrings (regexp1, regexp2 etc).

Is there any small and light expression that matches nothing, which I can use inside a template where I don’t want any matches? Unfortunately, sometimes ‘+’ or ‘*’ is appended to the regexp atom so I can’t use an empty string – that will raise a “nothing to repeat” error.

Asked By: grigoryvp

||

Answers:

This shouldn’t match anything:

re.compile('$^')

So if you replace regexp1, regexp2 and regexp3 with ‘$^’ it will be impossible to find a match. Unless you are using the multi line mode.


After some tests I found a better solution

re.compile('a^')

It is impossible to match and will fail earlier than the previous solution. You can replace a with any other character and it will always be impossible to match

Answered By: Nadia Alramli

Maybe '.{0}'?

Answered By: Steef

To match an empty string – even in multiline mode – you can use AZ, so:

re.compile('AZ|AZ*|AZ+')

The difference is that A and Z are start and end of string, whilst ^ and $ these can match start/end of lines, so $^|$^*|$^+ could potentially match a string containing newlines (if the flag is enabled).

And to fail to match anything (even an empty string), simply attempt to find content before the start of the string, e.g:

re.compile('.A|.A*|.A+')

Since no characters can come before A (by definition), this will always fail to match.

Answered By: Peter Boughton

You could use
z..
This is the absolute end of string, followed by two of anything

If + or * is tacked on the end this still works refusing to match anything

Answered By: ShuggyCoUk

Or, use some list comprehension to remove the useless regexp entries and join to put them all together. Something like:

re.compile('|'.join([x for x in [regexp1, regexp2, ...] if x != None]))

Be sure to add some comments next to that line of code though 🙂

Answered By: Mike Miller

(?!) should always fail to match. It is the zero-width negative look-ahead. If what is in the parentheses matches then the whole match fails. Given that it has nothing in it, it will fail the match for anything (including nothing).

Answered By: Chas. Owens
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.