regex split on uppercase, but ignore titlecase
Question:
How can I split This Is ABC Title
into This Is, ABC, Title
in Python? If is use [A-Z]
as regex expression it will be split into This, Is, ABC, Title
? I do not want to split on whitespace.
Answers:
You can use
re.split(r's*b([A-Z]+)bs*', text)
Details:
s*
– zero or more whitespaces
b
– word boundary
([A-Z]+)
– Capturing group 1: one or more ASCII uppercase letters
b
– word boundary([A-Z]+)
s*
– zero or more whitespaces
Note the use of capturing group that makes re.split
also output the captured substring.
See the Python demo:
import re
text = "This Is ABC Title"
print( re.split(r's*b([A-Z]+)bs*', text) )
# => ['This Is', 'ABC', 'Title']
How can I split This Is ABC Title
into This Is, ABC, Title
in Python? If is use [A-Z]
as regex expression it will be split into This, Is, ABC, Title
? I do not want to split on whitespace.
You can use
re.split(r's*b([A-Z]+)bs*', text)
Details:
s*
– zero or more whitespacesb
– word boundary([A-Z]+)
– Capturing group 1: one or more ASCII uppercase lettersb
– word boundary([A-Z]+)s*
– zero or more whitespaces
Note the use of capturing group that makes re.split
also output the captured substring.
See the Python demo:
import re
text = "This Is ABC Title"
print( re.split(r's*b([A-Z]+)bs*', text) )
# => ['This Is', 'ABC', 'Title']