Python Find x in file and add following lines until not indented
Question:
Python Find x in file and add following lines until not indented
I have a file like below
T01_JOB1
T01_JOB1a
T01_JOB1b
T01_JOB1c
T01_JOB2
T01_JOB2a
T01_JOB2b
T01_JOB2c
T01_JOB2d
T01_JOB2e
T01_JOB3
T01_JOB4
T01_JOB4a
T01_JOB4b
T01_JOB4c
T01_JOB5
T01_JOB5a
T01_JOB5b
All jobs without indentation are top level so have taken that information.
topboxes=[]
for line in batchjobs:
if line.startswith(b'T0'):
topboxes.append(line)
example what I am trying
topjobs = (‘T01_JOB1’, ‘T01_JOB2’, ‘T01_JOB3’, ‘T01_JOB4’)
jobgroups = {(T01_JOB1,T01_JOB1a,T01_JOB1b,T01_JOB1c),(T01_JOB2,T01_JOB2a,T01_JOB2b,T01_JOB2c,T01_JOB2d,T01_JOB2e),(T01_JOB3),(T01_JOB4,T01_JOB4a,T01_JOB4b,T01_JOB4c,T01_JOB5,T01_JOB5a,T01_JOB5b)}
(job5 is within job4 tree and that’s ok)
This comes out with [b’T01_JOB1n’, b’T01_JOB2n’, b’T01_JOB3n’, b’T01_JOB4n’] which is fine
I now want to go through the list again and this time bring all the indents underneath, stopping when I get to a line without the indent.
topboxesjobs={}
for line in batchjobs:
for x in topboxes:
if x in line:
#print(x, 'imhere')
topboxesjobs.append(x)
if line.startswith(b' '):
topboxesjobs.append(x,line)
continue
print(topboxesjobs)
if line.startswith(b'T0'):
exit()
But this isn’t working. Any suggestions please?
Answers:
I have interpreted your question in the following way:
Since you want to have ALL the indents, I do not see a need for an exit() in the for loop, elsewise you would always quit after job1.
So I have redesigned your code to go through the list again, picking up topjobs as keys and using these keys to create a dictionairy of subjobs.
I use defaultdict from the collections-module of the standard library to directly append to an empty list inside the dictionairy without needing to initialize this list first.
from collections import defaultdict
from pprint import pprint
batchjobs = """T01_JOB1
T01_JOB1a
T01_JOB1b
T01_JOB1c
T01_JOB2
T01_JOB2a
T01_JOB2b
T01_JOB2c
T01_JOB2d
T01_JOB2e
T01_JOB3
T01_JOB4
T01_JOB4a
T01_JOB4b
T01_JOB4c
T01_JOB5
T01_JOB5a
T01_JOB5b""".splitlines() # pseudo file-content
# get topboxes
topboxes=[]
for line in batchjobs:
if line.startswith('T0'):
topboxes.append(line)
# get dict for sub-jobs
topboxesjobs=defaultdict(list) # directly append to an empty list
job_key = None
for line in batchjobs:
if line in topboxes:
job_key = line # new topbox found, this is the current key
if line.startswith(' ') and job_key:
# append sub-jobs in the dict with the current key
# use strip() to get rid of indentation whitespace
topboxesjobs[job_key].append(line.strip())
#pretty print the defaultdict
pprint(dict(topboxesjobs))
which hopefully gives you your desired output:
{'T01_JOB1': ['T01_JOB1a', 'T01_JOB1b', 'T01_JOB1c'],
'T01_JOB2': ['T01_JOB2a', 'T01_JOB2b', 'T01_JOB2c', 'T01_JOB2d', 'T01_JOB2e'],
'T01_JOB4': ['T01_JOB4a', 'T01_JOB4b', 'T01_JOB4c'],
'T01_JOB5': ['T01_JOB5a', 'T01_JOB5b']}
Python Find x in file and add following lines until not indented
I have a file like below
T01_JOB1
T01_JOB1a
T01_JOB1b
T01_JOB1c
T01_JOB2
T01_JOB2a
T01_JOB2b
T01_JOB2c
T01_JOB2d
T01_JOB2e
T01_JOB3
T01_JOB4
T01_JOB4a
T01_JOB4b
T01_JOB4c
T01_JOB5
T01_JOB5a
T01_JOB5b
All jobs without indentation are top level so have taken that information.
topboxes=[]
for line in batchjobs:
if line.startswith(b'T0'):
topboxes.append(line)
example what I am trying
topjobs = (‘T01_JOB1’, ‘T01_JOB2’, ‘T01_JOB3’, ‘T01_JOB4’)
jobgroups = {(T01_JOB1,T01_JOB1a,T01_JOB1b,T01_JOB1c),(T01_JOB2,T01_JOB2a,T01_JOB2b,T01_JOB2c,T01_JOB2d,T01_JOB2e),(T01_JOB3),(T01_JOB4,T01_JOB4a,T01_JOB4b,T01_JOB4c,T01_JOB5,T01_JOB5a,T01_JOB5b)}
(job5 is within job4 tree and that’s ok)
This comes out with [b’T01_JOB1n’, b’T01_JOB2n’, b’T01_JOB3n’, b’T01_JOB4n’] which is fine
I now want to go through the list again and this time bring all the indents underneath, stopping when I get to a line without the indent.
topboxesjobs={}
for line in batchjobs:
for x in topboxes:
if x in line:
#print(x, 'imhere')
topboxesjobs.append(x)
if line.startswith(b' '):
topboxesjobs.append(x,line)
continue
print(topboxesjobs)
if line.startswith(b'T0'):
exit()
But this isn’t working. Any suggestions please?
I have interpreted your question in the following way:
Since you want to have ALL the indents, I do not see a need for an exit() in the for loop, elsewise you would always quit after job1.
So I have redesigned your code to go through the list again, picking up topjobs as keys and using these keys to create a dictionairy of subjobs.
I use defaultdict from the collections-module of the standard library to directly append to an empty list inside the dictionairy without needing to initialize this list first.
from collections import defaultdict
from pprint import pprint
batchjobs = """T01_JOB1
T01_JOB1a
T01_JOB1b
T01_JOB1c
T01_JOB2
T01_JOB2a
T01_JOB2b
T01_JOB2c
T01_JOB2d
T01_JOB2e
T01_JOB3
T01_JOB4
T01_JOB4a
T01_JOB4b
T01_JOB4c
T01_JOB5
T01_JOB5a
T01_JOB5b""".splitlines() # pseudo file-content
# get topboxes
topboxes=[]
for line in batchjobs:
if line.startswith('T0'):
topboxes.append(line)
# get dict for sub-jobs
topboxesjobs=defaultdict(list) # directly append to an empty list
job_key = None
for line in batchjobs:
if line in topboxes:
job_key = line # new topbox found, this is the current key
if line.startswith(' ') and job_key:
# append sub-jobs in the dict with the current key
# use strip() to get rid of indentation whitespace
topboxesjobs[job_key].append(line.strip())
#pretty print the defaultdict
pprint(dict(topboxesjobs))
which hopefully gives you your desired output:
{'T01_JOB1': ['T01_JOB1a', 'T01_JOB1b', 'T01_JOB1c'],
'T01_JOB2': ['T01_JOB2a', 'T01_JOB2b', 'T01_JOB2c', 'T01_JOB2d', 'T01_JOB2e'],
'T01_JOB4': ['T01_JOB4a', 'T01_JOB4b', 'T01_JOB4c'],
'T01_JOB5': ['T01_JOB5a', 'T01_JOB5b']}