How to split by comma and strip white spaces in Python?

Question:

I have some python code that splits on comma, but doesn’t strip the whitespace:

>>> string = "blah, lots  ,  of ,  spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots  ', '  of ', '  spaces', ' here ']

I would rather end up with whitespace removed like this:

['blah', 'lots', 'of', 'spaces', 'here']

I am aware that I could loop through the list and strip() each item but, as this is Python, I’m guessing there’s a quicker, easier and more elegant way of doing it.

Asked By: Mr_Chimp

||

Answers:

Use list comprehension — simpler, and just as easy to read as a for loop.

my_string = "blah, lots  ,  of ,  spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]

See: Python docs on List Comprehension
A good 2 second explanation of list comprehension.

Answered By: Sean Vieira

map(lambda s: s.strip(), mylist) would be a little better than explicitly looping. Or for the whole thing at once: map(lambda s:s.strip(), string.split(','))

Answered By: user470379

Just remove the white space from the string before you split it.

mylist = my_string.replace(' ','').split(',')
Answered By: user489041

I know this has already been answered, but if you end doing this a lot, regular expressions may be a better way to go:

>>> import re
>>> re.sub(r's', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']

The s matches any whitespace character, and we just replace it with an empty string ''. You can find more info here: http://docs.python.org/library/re.html#re.sub

Answered By: Brad Montgomery

Split using a regular expression. Note I made the case more general with leading spaces. The list comprehension is to remove the null strings at the front and back.

>>> import re
>>> string = "  blah, lots  ,  of ,  spaces, here "
>>> pattern = re.compile("^s+|s*,s*|s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']

This works even if ^s+ doesn’t match:

>>> string = "foo,   bar  "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>

Here’s why you need ^s+:

>>> pattern = re.compile("s*,s*|s+$")
>>> print([x for x in pattern.split(string) if x])
['  blah', 'lots', 'of', 'spaces', 'here']

See the leading spaces in blah?

Clarification: above uses the Python 3 interpreter, but results are the same in Python 2.

Answered By: tbc0

I came to add:

map(str.strip, string.split(','))

but saw it had already been mentioned by Jason Orendorff in a comment.

Reading Glenn Maynard’s comment on the same answer suggesting list comprehensions over map I started to wonder why. I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?).

So a quick (possibly flawed?) test on my box (Python 2.6.5 on Ubuntu 10.04) applying the three methods in a loop revealed:

$ time ./list_comprehension.py  # [word.strip() for word in string.split(',')]
real    0m22.876s

$ time ./map_with_lambda.py     # map(lambda s: s.strip(), string.split(','))
real    0m25.736s

$ time ./map_with_str.strip.py  # map(str.strip, string.split(','))
real    0m19.428s

making map(str.strip, string.split(',')) the winner, although it seems they are all in the same ballpark.

Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension.

Answered By: Sean
s = 'bla, buu, jii'

sp = []
sp = s.split(',')
for st in sp:
    print st
Answered By: Parikshit Pandya

re (as in regular expressions) allows splitting on multiple characters at once:

$ string = "blah, lots  ,  of ,  spaces, here "
$ re.split(', ',string)
['blah', 'lots  ', ' of ', ' spaces', 'here ']

This doesn’t work well for your example string, but works nicely for a comma-space separated list. For your example string, you can combine the re.split power to split on regex patterns to get a “split-on-this-or-that” effect.

$ re.split('[, ]',string)
['blah',
 '',
 'lots',
 '',
 '',
 '',
 '',
 'of',
 '',
 '',
 '',
 'spaces',
 '',
 'here',
 '']

Unfortunately, that’s ugly, but a filter will do the trick:

$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']

Voila!

Answered By: Dannid
import re
result=[x for x in re.split(',| ',your_string) if x!='']

this works fine for me.

Answered By: Zieng
import re
mylist = [x for x in re.compile('s*[,|s+]s*').split(string)]

Simply, comma or at least one white spaces with/without preceding/succeeding white spaces.

Please try!

Answered By: ghchoi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.