How to print a line containing a certain string along with the subsequent lines until a character is found

Question:

Sorry if the header is poorly worded. I have a large file full of data subsets, each with a unique identifier. I want to be able to find the first line containing the identifier and print that line along with every line after that one until the next data subset is reached (that line will start with "<"). The data is structured as shown below.

<ID1|ID1_x
AAA
BBB
CCC
<ID2|ID2_x
DDD
EEE
FFF
<ID3|ID3_x
...

I would like to print:

<(ID2)
DDD
EEE
FFF

So far I have:

with open('file.txt') as f:
    for line in f:
        if 'ID2' in line:
           print(line)
           ...

Asked By: chrisphils26

||

Answers:

Try with the code below:

found_id = False
with open('file.txt') as f:
    for line in f:
        if '<ID' in line:
            if '<ID2' in line:
                id_line_split = line.split('|')
                id_line = id_line_split[0][1:]
                print('<(' + str(id_line) + ')')
                found_id = True
            else:
                found_id = False
        else:
            if found_id == True:
                # remove carriage return and line feed
                line = line.replace('n','')
                line = line.replace('r','')
                print(line)

The execution of previous code in my system, with your file.txt produces this output:

<(ID2)
DDD
EEE
FFF

Second question (from comment)

To select ID2 and ID23 (see questione in the comment of this answer), the program has been changed in this way:

found_id = False
with open('file.txt') as f:
    for line in f:
        if '<ID' in line:
            if ('<ID2' in line) or ('<ID23' in line):
                id_line_split = line.split('|')
                id_line = id_line_split[0][1:]
                print('<(' + str(id_line) + ')')
                found_id = True
            else:
                found_id = False
        else:
            if found_id == True:
                # remove carriage return and line feed
                line = line.replace('n','')
                line = line.replace('r','')
                print(line)```
Answered By: frankfalse
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.