python split string on multiple delimeters without regex

Question:

I have a string that I need to split on multiple characters without the use of regular expressions. for example, I would need something like the following:

>>>string="hello there[my]friend"
>>>string.split(' []')
['hello','there','my','friend']

is there anything in python like this?

Asked By: ewok

||

Answers:

If you need multiple delimiters, re.split is the way to go.

Without using a regex, it’s not possible unless you write a custom function for it.

Here’s such a function – it might or might not do what you want (consecutive delimiters cause empty elements):

>>> def multisplit(s, delims):
...     pos = 0
...     for i, c in enumerate(s):
...         if c in delims:
...             yield s[pos:i]
...             pos = i + 1
...     yield s[pos:]
...
>>> list(multisplit('hello there[my]friend', ' []'))
['hello', 'there', 'my', 'friend']
Answered By: ThiefMaster

re.split is the right tool here.

>>> string="hello there[my]friend"
>>> import re
>>> re.split('[] []', string)
['hello', 'there', 'my', 'friend']

In regex, [...] defines a character class. Any characters inside the brackets will match. The way I’ve spaced the brackets avoids needing to escape them, but the pattern [[] ] also works.

>>> re.split('[[] ]', string)
['hello', 'there', 'my', 'friend']

The re.DEBUG flag to re.compile is also useful, as it prints out what the pattern will match:

>>> re.compile('[] []', re.DEBUG)
in 
  literal 93
  literal 32
  literal 91
<_sre.SRE_Pattern object at 0x16b0850>

(Where 32, 91, 93, are the ascii values assigned to , [, ])

Answered By: Daenyth

Solution without regexp:

from itertools import groupby
sep = ' []'
s = 'hello there[my]friend'
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]

I’ve just posted an explanation here https://stackoverflow.com/a/19211729/2468006

Answered By: monitorius

A recursive solution without use of regex. Uses only base python in contrast to the other answers.

def split_on_multiple_chars(string_to_split, set_of_chars_as_string):
    # Recursive splitting
    # Returns a list of strings

    s = string_to_split
    chars = set_of_chars_as_string

    # If no more characters to split on, return input
    if len(chars) == 0:
        return([s])

    # Split on the first of the delimiter characters
    ss = s.split(chars[0])

    # Recursive call without the first splitting character
    bb = []
    for e in ss:
        aa = split_on_multiple_chars(e, chars[1:])
        bb.extend(aa)
    return(bb)

Works very similarly to pythons regular string.split(...), but accepts several delimiters.

Example use:

print(split_on_multiple_chars('my"example_string.with:funny?delimiters', '_.:;'))

Output:

['my"example', 'string', 'with', 'funny?delimiters']
Answered By: Anton

If you’re not worried about long strings, you could force all delimiters to be the same using string.replace(). The following splits a string by both - and ,

x.replace('-', ',').split(',')

If you have many delimiters you could do the following:

def split(x, delimiters):
    for d in delimiters:
        x = x.replace(d, delimiters[0])
    return x.split(delimiters[0])
Answered By: Matt Falk
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.