In python regex, are character classes same as special sequences?

Question:

From the documentation link here https://docs.python.org/3/library/re.html

[] – (used to indicate a set of characters)

  • Character classes such as w or S (defined below) are also accepted inside a set

What are character classes? I am only familiar with special characters (*,+,?, etc) and special sequences (n,r,s, etc). Do character classes refer to the latter special sequences? Or are they something different altogether? If so, what do character classes include exactly?


character classes appears only 2x in the entire page. It is poorly defined and does not indicate whether w, S are certain members of character classes or the only members. Further testing reveals that n can also be used within [] but not something like A – compounding the confusion.

Asked By: AlanSTACK

||

Answers:

Yes, it is a bit ill-defined, but at the same time I think it’s rather intuitive. In short, “character classes” are special characters, or “escape sequences” in the form ... representing groups of multiple characters, such as “all whitespace” s, “all numbers” d, or “all non-whitespace” S, and are a subset of those “special sequences”.

There are three character classes you should know:

  • digits d, corresponds to [0-9]; note that those do not match floating point numbers, as the . is not in d
  • “word”-characters, w, corresponds to [a-zA-Z0-9_], but (in Python 2) does not include non-ascii characters, such as umlauts, accents, etc.
  • whitespaces, s, such as space, tabs, newlines, etc.

Also, each character class can be “inverted” by using the respective capital letter, i.e. W matches everything that is not in w, and similar for D and S.

Other “special sequences” comprise only a single character, such as newline n or tab t (although those are not really a part of the regex language, but just basic Python string escape sequences), and others represent abstract concepts such as “between a word and a non-word” b, or “at the beginning of the string” A.

Answered By: tobias_k
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.