check if an element from a list exists in a strings of a txt file (not working) Python
Question:
I have tried many ways but I do not get any output at all. I have a list containing different types of strings:
lst=['ATCGG','GTAACGCT','AATCGAT',...]
and I have a text file as below:
>seq1
NNNGTAACGCTNNN
>seq2
NNNNNAATCGATNNNN
>seq3
NNNNNNNN
.
.
.
I want to print lines of the text file if any item in the list exists in the line. Based on the examples above, the desired output should be:
NNNGTAACGCTNNN
NNNNNAATCGATNNNN
I used the command below but nothing is getting printed out:
main_file = open('test_file.txt', 'r')
contn = main_file.read()
#print(contn)
for dna in contn:
if any(i in dna for i in lst):
print(dna)
Answers:
An explicit loop version for intuitiveness:
def find_lines_containing_any(filename, wanted_list):
with open(filename, 'r') as dna_file:
for dna_line in dna_file:
for wanted in wanted_list:
if wanted in dna_line:
yield dna_line
for dna in find_lines_containing_any('test_file.txt', lst):
print(dna)
You need readlines
instead of read
, read will be creating a single string so when you iterate, dna
is actually just individual characters
contn = main_file.read()
should be
contn = main_file.readlines()
With for dna in contn
, you are iterating over the characters as read()
returns a str
object, you can simply do:
main_file = open('test_file.txt', 'r')
for line in main_file:
if any(i in line for i in lst):
print(line)
I have tried many ways but I do not get any output at all. I have a list containing different types of strings:
lst=['ATCGG','GTAACGCT','AATCGAT',...]
and I have a text file as below:
>seq1
NNNGTAACGCTNNN
>seq2
NNNNNAATCGATNNNN
>seq3
NNNNNNNN
.
.
.
I want to print lines of the text file if any item in the list exists in the line. Based on the examples above, the desired output should be:
NNNGTAACGCTNNN
NNNNNAATCGATNNNN
I used the command below but nothing is getting printed out:
main_file = open('test_file.txt', 'r')
contn = main_file.read()
#print(contn)
for dna in contn:
if any(i in dna for i in lst):
print(dna)
An explicit loop version for intuitiveness:
def find_lines_containing_any(filename, wanted_list):
with open(filename, 'r') as dna_file:
for dna_line in dna_file:
for wanted in wanted_list:
if wanted in dna_line:
yield dna_line
for dna in find_lines_containing_any('test_file.txt', lst):
print(dna)
You need readlines
instead of read
, read will be creating a single string so when you iterate, dna
is actually just individual characters
contn = main_file.read()
should be
contn = main_file.readlines()
With for dna in contn
, you are iterating over the characters as read()
returns a str
object, you can simply do:
main_file = open('test_file.txt', 'r')
for line in main_file:
if any(i in line for i in lst):
print(line)