regex split on uppercase, but ignore titlecase

Question:

How can I split This Is ABC Title into This Is, ABC, Title in Python? If is use [A-Z] as regex expression it will be split into This, Is, ABC, Title? I do not want to split on whitespace.

Asked By: Omega

||

Answers:

You can use

re.split(r's*b([A-Z]+)bs*', text)

Details:

  • s* – zero or more whitespaces
  • b – word boundary
  • ([A-Z]+) – Capturing group 1: one or more ASCII uppercase letters
  • b – word boundary([A-Z]+)
  • s* – zero or more whitespaces

Note the use of capturing group that makes re.split also output the captured substring.

See the Python demo:

import re
text = "This Is ABC Title"
print( re.split(r's*b([A-Z]+)bs*', text) )
# => ['This Is', 'ABC', 'Title']
Answered By: Wiktor Stribiżew
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.