python split string on multiple delimeters without regex
Question:
I have a string that I need to split on multiple characters without the use of regular expressions. for example, I would need something like the following:
>>>string="hello there[my]friend"
>>>string.split(' []')
['hello','there','my','friend']
is there anything in python like this?
Answers:
If you need multiple delimiters, re.split
is the way to go.
Without using a regex, it’s not possible unless you write a custom function for it.
Here’s such a function – it might or might not do what you want (consecutive delimiters cause empty elements):
>>> def multisplit(s, delims):
... pos = 0
... for i, c in enumerate(s):
... if c in delims:
... yield s[pos:i]
... pos = i + 1
... yield s[pos:]
...
>>> list(multisplit('hello there[my]friend', ' []'))
['hello', 'there', 'my', 'friend']
re.split
is the right tool here.
>>> string="hello there[my]friend"
>>> import re
>>> re.split('[] []', string)
['hello', 'there', 'my', 'friend']
In regex, [...]
defines a character class. Any characters inside the brackets will match. The way I’ve spaced the brackets avoids needing to escape them, but the pattern [[] ]
also works.
>>> re.split('[[] ]', string)
['hello', 'there', 'my', 'friend']
The re.DEBUG
flag to re.compile is also useful, as it prints out what the pattern will match:
>>> re.compile('[] []', re.DEBUG)
in
literal 93
literal 32
literal 91
<_sre.SRE_Pattern object at 0x16b0850>
(Where 32, 91, 93, are the ascii values assigned to
, [
, ]
)
Solution without regexp:
from itertools import groupby
sep = ' []'
s = 'hello there[my]friend'
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]
I’ve just posted an explanation here https://stackoverflow.com/a/19211729/2468006
A recursive solution without use of regex. Uses only base python in contrast to the other answers.
def split_on_multiple_chars(string_to_split, set_of_chars_as_string):
# Recursive splitting
# Returns a list of strings
s = string_to_split
chars = set_of_chars_as_string
# If no more characters to split on, return input
if len(chars) == 0:
return([s])
# Split on the first of the delimiter characters
ss = s.split(chars[0])
# Recursive call without the first splitting character
bb = []
for e in ss:
aa = split_on_multiple_chars(e, chars[1:])
bb.extend(aa)
return(bb)
Works very similarly to pythons regular string.split(...)
, but accepts several delimiters.
Example use:
print(split_on_multiple_chars('my"example_string.with:funny?delimiters', '_.:;'))
Output:
['my"example', 'string', 'with', 'funny?delimiters']
If you’re not worried about long strings, you could force all delimiters to be the same using string.replace(). The following splits a string by both -
and ,
x.replace('-', ',').split(',')
If you have many delimiters you could do the following:
def split(x, delimiters):
for d in delimiters:
x = x.replace(d, delimiters[0])
return x.split(delimiters[0])
I have a string that I need to split on multiple characters without the use of regular expressions. for example, I would need something like the following:
>>>string="hello there[my]friend"
>>>string.split(' []')
['hello','there','my','friend']
is there anything in python like this?
If you need multiple delimiters, re.split
is the way to go.
Without using a regex, it’s not possible unless you write a custom function for it.
Here’s such a function – it might or might not do what you want (consecutive delimiters cause empty elements):
>>> def multisplit(s, delims):
... pos = 0
... for i, c in enumerate(s):
... if c in delims:
... yield s[pos:i]
... pos = i + 1
... yield s[pos:]
...
>>> list(multisplit('hello there[my]friend', ' []'))
['hello', 'there', 'my', 'friend']
re.split
is the right tool here.
>>> string="hello there[my]friend"
>>> import re
>>> re.split('[] []', string)
['hello', 'there', 'my', 'friend']
In regex, [...]
defines a character class. Any characters inside the brackets will match. The way I’ve spaced the brackets avoids needing to escape them, but the pattern [[] ]
also works.
>>> re.split('[[] ]', string)
['hello', 'there', 'my', 'friend']
The re.DEBUG
flag to re.compile is also useful, as it prints out what the pattern will match:
>>> re.compile('[] []', re.DEBUG)
in
literal 93
literal 32
literal 91
<_sre.SRE_Pattern object at 0x16b0850>
(Where 32, 91, 93, are the ascii values assigned to ,
[
, ]
)
Solution without regexp:
from itertools import groupby
sep = ' []'
s = 'hello there[my]friend'
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]
I’ve just posted an explanation here https://stackoverflow.com/a/19211729/2468006
A recursive solution without use of regex. Uses only base python in contrast to the other answers.
def split_on_multiple_chars(string_to_split, set_of_chars_as_string):
# Recursive splitting
# Returns a list of strings
s = string_to_split
chars = set_of_chars_as_string
# If no more characters to split on, return input
if len(chars) == 0:
return([s])
# Split on the first of the delimiter characters
ss = s.split(chars[0])
# Recursive call without the first splitting character
bb = []
for e in ss:
aa = split_on_multiple_chars(e, chars[1:])
bb.extend(aa)
return(bb)
Works very similarly to pythons regular string.split(...)
, but accepts several delimiters.
Example use:
print(split_on_multiple_chars('my"example_string.with:funny?delimiters', '_.:;'))
Output:
['my"example', 'string', 'with', 'funny?delimiters']
If you’re not worried about long strings, you could force all delimiters to be the same using string.replace(). The following splits a string by both -
and ,
x.replace('-', ',').split(',')
If you have many delimiters you could do the following:
def split(x, delimiters):
for d in delimiters:
x = x.replace(d, delimiters[0])
return x.split(delimiters[0])