Split string based on a regular expression

Question:

I have the output of a command in tabular form. I’m parsing this output from a result file and storing it in a string. Each element in one row is separated by one or more whitespace characters, thus I’m using regular expressions to match 1 or more spaces and split it. However, a space is being inserted between every element:

>>> str1="a    b     c      d" # spaces are irregular
>>> str1
'a    b     c      d'
>>> str2=re.split("( )+", str1)
>>> str2
['a', ' ', 'b', ' ', 'c', ' ', 'd'] # 1 space element between!!!

Is there a better way to do this?

After each split str2 is appended to a list.

Asked By: gjois

||

Answers:

When you use re.split and the split pattern contains capturing groups, the groups are retained in the output. If you don’t want this, use a non-capturing group instead.

Answered By: BrenBarn

The str.split method will automatically remove all white space between items:

>>> str1 = "a    b     c      d"
>>> str1.split()
['a', 'b', 'c', 'd']

Docs are here: http://docs.python.org/library/stdtypes.html#str.split

Answered By: Trevor

By using (,), you are capturing the group, if you simply remove them you will not have this problem.

>>> str1 = "a    b     c      d"
>>> re.split(" +", str1)
['a', 'b', 'c', 'd']

However there is no need for regex, str.split without any delimiter specified will split this by whitespace for you. This would be the best way in this case.

>>> str1.split()
['a', 'b', 'c', 'd']

If you really wanted regex you can use this ('s' represents whitespace and it’s clearer):

>>> re.split("s+", str1)
['a', 'b', 'c', 'd']

or you can find all non-whitespace characters

>>> re.findall(r'S+',str1)
['a', 'b', 'c', 'd']
Answered By: jamylak

Its very simple actually. Try this:

str1="a    b     c      d"
splitStr1 = str1.split()
print splitStr1
Answered By: damned
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.