Python Find x in file and add following lines until not indented

Question:

Python Find x in file and add following lines until not indented

I have a file like below

T01_JOB1
 T01_JOB1a
 T01_JOB1b
 T01_JOB1c
T01_JOB2
 T01_JOB2a
 T01_JOB2b
  T01_JOB2c
  T01_JOB2d
  T01_JOB2e
T01_JOB3
T01_JOB4
 T01_JOB4a
 T01_JOB4b
 T01_JOB4c
  T01_JOB5
   T01_JOB5a
    T01_JOB5b

All jobs without indentation are top level so have taken that information.

topboxes=[]
for line in batchjobs:
    if line.startswith(b'T0'):
        topboxes.append(line)

example what I am trying

topjobs = (‘T01_JOB1’, ‘T01_JOB2’, ‘T01_JOB3’, ‘T01_JOB4’)

jobgroups = {(T01_JOB1,T01_JOB1a,T01_JOB1b,T01_JOB1c),(T01_JOB2,T01_JOB2a,T01_JOB2b,T01_JOB2c,T01_JOB2d,T01_JOB2e),(T01_JOB3),(T01_JOB4,T01_JOB4a,T01_JOB4b,T01_JOB4c,T01_JOB5,T01_JOB5a,T01_JOB5b)}

(job5 is within job4 tree and that’s ok)

This comes out with [b’T01_JOB1n’, b’T01_JOB2n’, b’T01_JOB3n’, b’T01_JOB4n’] which is fine

I now want to go through the list again and this time bring all the indents underneath, stopping when I get to a line without the indent.

topboxesjobs={}

for line in batchjobs:
    for x in topboxes:
        if x in line:
            #print(x, 'imhere')
            topboxesjobs.append(x)
            if line.startswith(b' '):
                topboxesjobs.append(x,line)
                continue
                print(topboxesjobs)
            if line.startswith(b'T0'):
                exit()

But this isn’t working. Any suggestions please?

Asked By: Demo

||

Answers:

I have interpreted your question in the following way:

Since you want to have ALL the indents, I do not see a need for an exit() in the for loop, elsewise you would always quit after job1.

So I have redesigned your code to go through the list again, picking up topjobs as keys and using these keys to create a dictionairy of subjobs.
I use defaultdict from the collections-module of the standard library to directly append to an empty list inside the dictionairy without needing to initialize this list first.

from collections import defaultdict
from pprint import pprint

batchjobs = """T01_JOB1
 T01_JOB1a
 T01_JOB1b
 T01_JOB1c
T01_JOB2
 T01_JOB2a
 T01_JOB2b
  T01_JOB2c
  T01_JOB2d
  T01_JOB2e
T01_JOB3
T01_JOB4
 T01_JOB4a
 T01_JOB4b
 T01_JOB4c
T01_JOB5
  T01_JOB5a
   T01_JOB5b""".splitlines()  # pseudo file-content

# get topboxes
topboxes=[]
for line in batchjobs:
    if line.startswith('T0'):
        topboxes.append(line)

# get dict for sub-jobs
topboxesjobs=defaultdict(list) # directly append to an empty list
job_key = None
for line in batchjobs:
    if line in topboxes:
        job_key = line  # new topbox found, this is the current key
    if line.startswith(' ') and job_key:
        # append sub-jobs in the dict with the current key
        # use strip() to get rid of indentation whitespace
        topboxesjobs[job_key].append(line.strip())

#pretty print the defaultdict
pprint(dict(topboxesjobs))

which hopefully gives you your desired output:

{'T01_JOB1': ['T01_JOB1a', 'T01_JOB1b', 'T01_JOB1c'],
 'T01_JOB2': ['T01_JOB2a', 'T01_JOB2b', 'T01_JOB2c', 'T01_JOB2d', 'T01_JOB2e'],
 'T01_JOB4': ['T01_JOB4a', 'T01_JOB4b', 'T01_JOB4c'],
 'T01_JOB5': ['T01_JOB5a', 'T01_JOB5b']}
Answered By: Christian Karcher
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.