splitting list in python by keyword

Question:

I have a list like the following:

lst = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']

and my desired result is to split the list into sublists like this:

[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a','a'],['start','b','b','b','end'],['a','a','a','a'],['start','b','b','end']]

so start and end are keywords, is there anyway you can use .split() by using particular keywords/if it matches?

So far I have made a function which finds the indices of ‘start’ i.e. starting_ind = [3, 9, 18] and ending_ind = [5, 13, 21] however if I do

temp=[]
for i in range(len(starting_ind)):
       x = lst[starting_ind[i]: ending_ind[i]]
       temp += x
print(temp)     

the result is incorrect.

Asked By: LibbyB

||

Answers:

You can write so:

   lst = ['a', 'a', 'a', 'start', 'b', 'end',
          'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
   temp=[]
   ind = [0, 3, 6, 9, 14, 18, 22]
   for i in range(len(ind)-1):
          x = lst[ind[i]: ind[i+1]]
          temp.append(x)
   print(temp)

and you will get:

[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a', 'a'], ['start', 'b', 'b', 'b', 'end'], ['a', 'a', 'a', 'a'], ['start', 'b', 'b', 'end']]
Answered By: fed

This solution doesn’t require you to calculate indices beforehand:

lst = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a', 'a', 'start', 'b', 'b',
       'b', 'end', 'a', 'a', 'a', 'a', 'start', 'b', 'b', 'end']
result = []
sublist = []

for el in range(len(lst)):
  if lst[el] == 'start':
    result.append(sublist.copy())
    sublist.clear()
    sublist.append(lst[el])
  else:
    sublist.append(lst[el])
    if lst[el] == 'end':
      result.append(sublist.copy())
      sublist.clear()
print(result)

Answered By: George Rylkov

If you can be certain that your keywords will always appear in pairs, and in the right order (i.e. there will never be a 'start' without an 'end' that follows it, at some point in the list), this should work:

l = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']

def get_sublist(l):
    try: 
        return l[:l.index('end') + 1] if l.index('start') == 0 else l[:l.index('start')]
    except ValueError:
        return l

result = []

while l:
    sublist = get_sublist(l)
    result.append(sublist)
    l = l[len(sublist):]

print(result)

Gives the following result:

[['a', 'a', 'a'],
 ['start', 'b', 'end'],
 ['a', 'a', 'a'],
 ['start', 'b', 'b', 'b', 'end'],
 ['a', 'a', 'a', 'a'],
 ['start', 'b', 'b', 'end']]
Answered By: ojh

Here’s a possible way to use regular expression to extract the patterns, please check if it’s acceptable:

import re

lst = ['a','a','a', 'start','b','end', 'a','a','a', 'start','b','b','b','end', 'a','a','a','a', 'start','b','b','end']
result = []
for e in re.findall('a_[a_]+|start[_b]+_end', '_'.join(lst)):
    result.append(e.strip('_').split('_'))
print(result)

Output is as desired:

[['a', 'a', 'a'],
 ['start', 'b', 'end'],
 ['a', 'a', 'a'],
 ['start', 'b', 'b', 'b', 'end'],
 ['a', 'a', 'a', 'a'],
 ['start', 'b', 'b', 'end']]

A better way is this:

result = []
for e in re.split(r'(start[_b]+_end)', '_'.join(lst)):
    result.append(e.strip('_').split('_'))
print([x for x in result if x != ['']])

Same output

Answered By: Adrian Ang
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.