Is it possible to iterate over elements in a list and return chunks of the list based on repeating characters?

Question:

I have data stored in a list that follows a pattern. The data comes from reading a file line by line and appending certain lines to a list. When I print the list, the data follows an order, which is great.

Can I read through that list to pick out chunks of data?

For example, if I have in the list:

['|CODE|', 'name', 'group', 'info', '|CODE_1|', 'name', 'group', 'info',] and so on…

Is is possible to return the strings between the |CODE| parts?

Also, the strings between each |CODE| string may not be the same length, ie there may be 4 strings between them or 6 or 10 or 1 etc.

Let me know if I can amend my question in any way 🙂

I haven’t got any code yet as I was unsure on how to tackle the problem.

Asked By: Jack

||

Answers:

yourlist[yourlist.index('|CODE|')+1:yourlist.index('|CODE_1|')]

list.index gets the index of the strings that you desire. The rest is just a slice to slice that list between the desired indices

Answered By: Mateo Vial

You need to follow a few steps:

  1. Create the final list you need,
  2. Iterate over the list after the first element is '|CODE|',
  3. Add element to the current chunk of data,
  4. If you detect a new chunk, store the current chunk and create a new one.
lst = ["|CODE|", "name", "group", "info", "|CODE_1|", "name", "group", "info"]

chunk = []
chunks = []
for e in lst[lst.index("|CODE|") + 1 :]:
    if e.startswith("|CODE"):
        chunks.append(chunk)
        chunk = []
    else:
        chunk.append(e)
if chunk:
    chunks.append(chunk)
print(chunks)
# [['name', 'group', 'info'], ['name', 'group', 'info']]

If you need to include the string that start a chunk:

chunk = ["|CODE|"]  # start chunk with first marker
...
        chunk = [e]  # each chunk start with the marker
# [['|CODE|', 'name', 'group', 'info'], ['|CODE_1|', 'name', 'group', 'info']]

How to start this kind of algorithms.

You need to proceed step by step.

  1. You know that you need to iterate over the list, so do the for loop,

  2. You need a list of chunks, easy to do, create an empty list before the loop. It’s your final data structure you want to fill,

  3. You want short chunk of data to be stored inside chunks:

    1. Initialize it empty before the loop,
    2. Add the code to append a chunk inside chunks.
    3. When does a chunk should be added? When detecting the "new chunk marker", a "|CODE" starting string. Do the if.
    4. And reset the chunk to an empty list, now, you start a new chunk,
  4. What if don’t detect a new chunk? Add the value to current chunk,

Each step is simple in itself, but the whole process can seems overwhelming. When you don’t know where to start, split into small task, divide the complexity. And don’t hesitate to sub divide if a step seems too complex to be done easily. When you progress in the writing of the algorithm, try to validate your steps by running your code and check that the result of your current work match your expectation.

Answered By: Dorian Turba
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.