How to exclude a character from a regex group?

Question:

I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python).
How can I change this regular expression to match any non-alphanumeric char except the hyphen?

re.compile('[W_]')

Thanks.

Asked By: atp

||

Answers:

You could just use a negated character class instead:

re.compile(r"[^a-zA-Z0-9-]")

This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.

>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens  *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'

Notice that this also replaces spaces (which may certainly be what you want).


Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:

re.compile(r"[^a-zA-Z0-9-]+")

The + will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.

Answered By: eldarerathis

w matches alphanumerics, add in the hyphen, then negate the entire set: r"[^w-]"

Answered By: Ned Batchelder
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.