I have the output of a command in tabular form. I’m parsing this output from a result file and storing it in a string. Each element in one row is separated by one or more whitespace characters, thus I’m using regular expressions to match 1 or more spaces and split it. However, a space is being inserted between every element:
>>> str1="a b c d" # spaces are irregular >>> str1 'a b c d' >>> str2=re.split("( )+", str1) >>> str2 ['a', ' ', 'b', ' ', 'c', ' ', 'd'] # 1 space element between!!!
Is there a better way to do this?
After each split
str2 is appended to a list.
When you use
re.split and the split pattern contains capturing groups, the groups are retained in the output. If you don’t want this, use a non-capturing group instead.
str.split method will automatically remove all white space between items:
>>> str1 = "a b c d" >>> str1.split() ['a', 'b', 'c', 'd']
Docs are here: http://docs.python.org/library/stdtypes.html#str.split
), you are capturing the group, if you simply remove them you will not have this problem.
>>> str1 = "a b c d" >>> re.split(" +", str1) ['a', 'b', 'c', 'd']
However there is no need for regex,
str.split without any delimiter specified will split this by whitespace for you. This would be the best way in this case.
>>> str1.split() ['a', 'b', 'c', 'd']
If you really wanted regex you can use this (
's' represents whitespace and it’s clearer):
>>> re.split("s+", str1) ['a', 'b', 'c', 'd']
or you can find all non-whitespace characters
>>> re.findall(r'S+',str1) ['a', 'b', 'c', 'd']
Its very simple actually. Try this:
str1="a b c d" splitStr1 = str1.split() print splitStr1