Python: how to split a list into an unknown number of smaller lists based on a delimeter
Question:
I’ve got a list which contains the following strings:
MainList
’00:00′
’00:01′
’00:02′
’00:03′
’00:04′
’00:00′
’00:01′
’00:02′
’00:03′
’00:04′
I would like to split this into a smaller number of lists whenever ’00:00′ is encountered since ’00:00′ is the only element that won’t change:
Desired output:
List1
’00:00′
’00:01′
’00:02′
’00:03′
’00:04′
List2
’00:00′
’00:01′
’00:02′
’00:03′
’00:04′
I tried looking at list slicing but the problem is that the last value and as such, number of elements may change. Moreover, I’m not sure how many smaller lists I’ll need (and how I’d dynamically create n number of smaller lists?)
Answers:
In an explicit way, you could do like this :
sep = '00:00'
split_list = []
for item in Mainlist:
if item == sep:
split_list.append([item])
else:
split_list[-1].append(item)
print split_list
I usually do this:
def splitby( lst, breaker='00:00'):
current = []
it = iter(lst)
first = next(it)
assert first==breaker, "`lst` must begin with `breaker`"
current.append(first)
for item in it:
if item == breaker:
yield current
current = []
current.append(item)
yield current
The inevitable itertools solution is a bit more general:
from itertools import groupby
class splitter(object):
def __init__(self, breaker):
self.breaker = breaker
self.current_group = 0
def __call__(self, item):
if item == self.breaker:
self.current_group+=1
return self.current_group
def group(self, items):
return (list(v) for k,v in groupby(items,self))
print list(splitter('00:00').group(items))
Comprehensions is your best friend :). Just two lines:
>>> a=['00:00', '00:01', '00:02', '00:03', '00:00', '00:01', '00:02']
>>> found=[index for index,item in enumerate(a) if item=='00:00'] + [len(a)]
>>> [a[found[i]:found[i+1]] for i in range(len(found)-1)]
[['00:00', '00:01', '00:02', '00:03'], ['00:00', '00:01', '00:02']]
Here is what we do:
We search for delimiter positions and get a list which contains delimiter indexes:
>>> found=[index for index,item in enumerate(a) if item=='00:00']
>>> found
[0, 4]
We’re adding len(a) for including the last dict.
And creating new lists with splitting a with founded indexes :
>>> [a[found[i]:found[i+1]] for i in range(len(found)-1)]
[['00:00', '00:01', '00:02', '00:03'], ['00:00', '00:01', '00:02']]
I could think of another way 🙂
def list_split(a):
#a=['00:00', '00:01', '00:02', '00:03', '00:00', '00:01', '00:02']
output = []
count = 0
if len(a) < 1:
output.append(a)
return output
for i, item in enumerate(a[1:]):
if item == a[0]:
output.append(a[count:i+1])
count = i + 1
else:
output.append(a[count:])
return output
I’ve got a list which contains the following strings:
MainList
’00:00′
’00:01′
’00:02′
’00:03′
’00:04′
’00:00′
’00:01′
’00:02′
’00:03′
’00:04′
I would like to split this into a smaller number of lists whenever ’00:00′ is encountered since ’00:00′ is the only element that won’t change:
Desired output:
List1
’00:00′
’00:01′
’00:02′
’00:03′
’00:04′
List2
’00:00′
’00:01′
’00:02′
’00:03′
’00:04′
I tried looking at list slicing but the problem is that the last value and as such, number of elements may change. Moreover, I’m not sure how many smaller lists I’ll need (and how I’d dynamically create n number of smaller lists?)
In an explicit way, you could do like this :
sep = '00:00'
split_list = []
for item in Mainlist:
if item == sep:
split_list.append([item])
else:
split_list[-1].append(item)
print split_list
I usually do this:
def splitby( lst, breaker='00:00'):
current = []
it = iter(lst)
first = next(it)
assert first==breaker, "`lst` must begin with `breaker`"
current.append(first)
for item in it:
if item == breaker:
yield current
current = []
current.append(item)
yield current
The inevitable itertools solution is a bit more general:
from itertools import groupby
class splitter(object):
def __init__(self, breaker):
self.breaker = breaker
self.current_group = 0
def __call__(self, item):
if item == self.breaker:
self.current_group+=1
return self.current_group
def group(self, items):
return (list(v) for k,v in groupby(items,self))
print list(splitter('00:00').group(items))
Comprehensions is your best friend :). Just two lines:
>>> a=['00:00', '00:01', '00:02', '00:03', '00:00', '00:01', '00:02']
>>> found=[index for index,item in enumerate(a) if item=='00:00'] + [len(a)]
>>> [a[found[i]:found[i+1]] for i in range(len(found)-1)]
[['00:00', '00:01', '00:02', '00:03'], ['00:00', '00:01', '00:02']]
Here is what we do:
We search for delimiter positions and get a list which contains delimiter indexes:
>>> found=[index for index,item in enumerate(a) if item=='00:00']
>>> found
[0, 4]
We’re adding len(a) for including the last dict.
And creating new lists with splitting a with founded indexes :
>>> [a[found[i]:found[i+1]] for i in range(len(found)-1)]
[['00:00', '00:01', '00:02', '00:03'], ['00:00', '00:01', '00:02']]
I could think of another way 🙂
def list_split(a):
#a=['00:00', '00:01', '00:02', '00:03', '00:00', '00:01', '00:02']
output = []
count = 0
if len(a) < 1:
output.append(a)
return output
for i, item in enumerate(a[1:]):
if item == a[0]:
output.append(a[count:i+1])
count = i + 1
else:
output.append(a[count:])
return output