How to split chinese and english word once only?
Question:
I am using Python and I would like to split the following string:
string = '小西 - 杏花 Siu Sai - Heng Fa'
I would like to split the string that could give me 小西 - 杏花
and Siu Sai - Heng Fa
. I tried different ways and still couldn’t split the string properly.
Thanks in advance
Answers:
If the pattern you’re looking for is "a series of non-alphabetical characters, followed by a space, followed by a series of alphabetical characters, spaces and dashes":
import re
text = '小西 - 杏花 Siu Sai - Heng Fa'
m = re.match(r'([^a-zA-Z]+)s([a-zA-Zs-]+)', text)
print(f'"{m.group(1)}"')
print(f'"{m.group(2)}"')
Output:
"小西 - 杏花"
"Siu Sai - Heng Fa"
So, m.group(1)
and m.group(2)
will be the parts of the string you’re after.
One of the option is just to split before the first English character and take the 1st and 2nd group
inputstring = '小西 - 杏花 Siu Sai - Heng Fa'
a = re.split(r'([a-zA-Z].*)', inputstring)
>>>['小西 - 杏花 ', 'Siu Sai - Heng Fa', '']
I am using Python and I would like to split the following string:
string = '小西 - 杏花 Siu Sai - Heng Fa'
I would like to split the string that could give me 小西 - 杏花
and Siu Sai - Heng Fa
. I tried different ways and still couldn’t split the string properly.
Thanks in advance
If the pattern you’re looking for is "a series of non-alphabetical characters, followed by a space, followed by a series of alphabetical characters, spaces and dashes":
import re
text = '小西 - 杏花 Siu Sai - Heng Fa'
m = re.match(r'([^a-zA-Z]+)s([a-zA-Zs-]+)', text)
print(f'"{m.group(1)}"')
print(f'"{m.group(2)}"')
Output:
"小西 - 杏花"
"Siu Sai - Heng Fa"
So, m.group(1)
and m.group(2)
will be the parts of the string you’re after.
One of the option is just to split before the first English character and take the 1st and 2nd group
inputstring = '小西 - 杏花 Siu Sai - Heng Fa'
a = re.split(r'([a-zA-Z].*)', inputstring)
>>>['小西 - 杏花 ', 'Siu Sai - Heng Fa', '']