Splitting a string representation of a nested list into string representations of the sublists
Question:
I have the following:
str = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
I want to split it so that I have an array of strings like
['[5.955894, 45.817792]', '[10.49238, 45.817792]', ...]
So that the [...]
objects are elements of the array. It is important that the enclosing [
and ]
are included. I’ve come so far:
re.split('D,sD', str)
But that gives me:
['[5.955894, 45.817792', '10.49238, 45.817792', '10.49238, 47.808381', '5.955894, 47.808381]']
Answers:
I prefer to use re.findall
and specify what I want instead of trying to describe the delimiter for re.split
>>> s = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
>>> re.findall(r"[[^]]*]",s)
['[5.955894, 45.817792]', '[10.49238, 45.817792]', '[10.49238, 47.808381]', '[5.955894, 47.808381]']
[
matches [
[^]]*
matches anything but ]
]
matches ]
You need to use re.split
with look-ahead:
>>> s = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
>>> re.split(",[ ]*(?=[)", s)
['[5.955894, 45.817792]', '[10.49238, 45.817792]', '[10.49238, 47.808381]', '[5.955894, 47.808381]']
And don’t use str
as variable. It’s shadows the built-in.
The below pattern:
,[ ]*(?=[)
will match the comma(,)
and some whitespaces, which is followed by a [
You can even do it with look-behind
. So, (?<=]),[ ]*
will also work.
Here is a naive procedure I’ve written, I think it solves your problem but couldn’t be the best.
>>>def split_string(strg, begin = '[', end = ']'):
myList = []
string = ''
for char in strg:
if char == begin:
string = ''
string += char
if char == end:
myList.append(string)
return myList
>>>strg = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
>>>split_string(strg)
['[5.955894, 45.817792]', '[10.49238, 45.817792]', '[10.49238, 47.808381]', '[5.955894, 47.808381]']
Following on from @nhahtdh comment.
Depends on your trust issues.
In [510]: txt = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
In [511]: lst = eval ("[%s]" % txt)
In [512]: [str(x) for x in lst]
Out[512]:
['[5.955894, 45.817792]',
'[10.49238, 45.817792]',
'[10.49238, 47.808381]',
'[5.955894, 47.808381]']
I have the following:
str = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
I want to split it so that I have an array of strings like
['[5.955894, 45.817792]', '[10.49238, 45.817792]', ...]
So that the [...]
objects are elements of the array. It is important that the enclosing [
and ]
are included. I’ve come so far:
re.split('D,sD', str)
But that gives me:
['[5.955894, 45.817792', '10.49238, 45.817792', '10.49238, 47.808381', '5.955894, 47.808381]']
I prefer to use re.findall
and specify what I want instead of trying to describe the delimiter for re.split
>>> s = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
>>> re.findall(r"[[^]]*]",s)
['[5.955894, 45.817792]', '[10.49238, 45.817792]', '[10.49238, 47.808381]', '[5.955894, 47.808381]']
[
matches [[^]]*
matches anything but ]]
matches ]
You need to use re.split
with look-ahead:
>>> s = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
>>> re.split(",[ ]*(?=[)", s)
['[5.955894, 45.817792]', '[10.49238, 45.817792]', '[10.49238, 47.808381]', '[5.955894, 47.808381]']
And don’t use str
as variable. It’s shadows the built-in.
The below pattern:
,[ ]*(?=[)
will match the comma(,)
and some whitespaces, which is followed by a [
You can even do it with look-behind
. So, (?<=]),[ ]*
will also work.
Here is a naive procedure I’ve written, I think it solves your problem but couldn’t be the best.
>>>def split_string(strg, begin = '[', end = ']'):
myList = []
string = ''
for char in strg:
if char == begin:
string = ''
string += char
if char == end:
myList.append(string)
return myList
>>>strg = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
>>>split_string(strg)
['[5.955894, 45.817792]', '[10.49238, 45.817792]', '[10.49238, 47.808381]', '[5.955894, 47.808381]']
Following on from @nhahtdh comment.
Depends on your trust issues.
In [510]: txt = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
In [511]: lst = eval ("[%s]" % txt)
In [512]: [str(x) for x in lst]
Out[512]:
['[5.955894, 45.817792]',
'[10.49238, 45.817792]',
'[10.49238, 47.808381]',
'[5.955894, 47.808381]']