How to split chinese and english word once only?

Question:

I am using Python and I would like to split the following string:

string = '小西 - 杏花 Siu Sai - Heng Fa'

I would like to split the string that could give me 小西 - 杏花 and Siu Sai - Heng Fa. I tried different ways and still couldn’t split the string properly.

Thanks in advance

Asked By: Winston

||

Answers:

If the pattern you’re looking for is "a series of non-alphabetical characters, followed by a space, followed by a series of alphabetical characters, spaces and dashes":

import re

text = '小西 - 杏花 Siu Sai - Heng Fa'

m = re.match(r'([^a-zA-Z]+)s([a-zA-Zs-]+)', text)
print(f'"{m.group(1)}"')
print(f'"{m.group(2)}"')

Output:

"小西 - 杏花"
"Siu Sai - Heng Fa"

So, m.group(1) and m.group(2) will be the parts of the string you’re after.

Answered By: Grismar

One of the option is just to split before the first English character and take the 1st and 2nd group

inputstring = '小西 - 杏花 Siu Sai - Heng Fa'
a = re.split(r'([a-zA-Z].*)', inputstring)
>>>['小西 - 杏花 ', 'Siu Sai - Heng Fa', '']
Answered By: Tushar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.