Python: search string in a txt file always results in not finding
Question:
I have been trying to debug my code for searching strings in two files, but I can’t understand why the strings are not found all the time. I have been stuck here for half day, and probably you could help me to understand the error, please?
The logic is: (after filtering out line in "try_ID.txt" by this piece len(re.findall("Ca", row)) == 0 or len(re.findall("Co", row)) == 0
), if ca and co in "try_ID.txt" do not appear in both "try.txt" and "try_C.txt", then we go into the first if condition in my code; if we only find either ca or co in "try.txt" or "try_C.txt", then it goes into the elif
conditions in my code; if we find both ca and co in both files "try_C.txt" and "try.txt", then we go into else
condition in my code.
The problem is that, with my code, all the items go into the first if conditions (both not found). I don’t know why.
my code
import re
with open("try_ID.txt", 'r') as fin,
open("try_C.txt", 'r') as co_splice,
open("try.txt", 'r') as ca_splice:
for row in fin:
if len(re.findall("Ca", row)) == 0 or len(re.findall("Co", row)) == 0:
pass
else: # problem starts from here
name = str(row.split()[1]) + "_blast"
if not row.split()[1] in ca_splice.read() and not row.split()[2] in co_splice.read():
print(row.split()[0:2])
elif row.split()[1] in ca_splice.read() and not row.split()[2] in col_splice.read():
print(row.split()[1] + "Ca")
elif not row.split()[1] in can_splice.read() and row.split()[2] in col_splice.read():
print(row.split()[2] + "Co")
else:
ne_name = name + "recip"
print(ne_name)
"try_ID.txt"
H21911 Ca29092.1t A05340.1
H21912 Ca19588.1t Co27353.1t A05270.1
H21913 Ca19590.1t Co14899.1t A05260.1
H21914 Ca19592.1t Co14897.1t A05240.1
H21915 Co14877.1t A05091.1
S25338 Ca12595.1t Co27352.1t A53970.1
S20778 Ca29091.1t Co24326.1t A61120.1
S26552 Ca20916.1t Co14730.1t A16155.1
"try_C.txt"
Co14730.1t;Co14730.2t
Co27352.1t;Co27352.2t;Co27352.3t;Co27352.4t;Co27352.5t
Co14732.1t;Co14732.2t
Co4217.1t;Co4217.2t
Co27353.1t;Co27353.2t
Co14733.1t;Co14733.2t
"try.txt"
Ca12595.1t;Ca12595.2t
Ca29091.1t;Ca29091.2t
Ca1440.1t;Ca1440.2t
Ca29092.1t;Ca29092.2t
Ca20916.1t;Ca20916.2t
Though weird thing is when I try a small piece of code like below, it can find the strings.
row = "H20118 Ca12595.1t Co18779.1t A01010.1"
text_file = "try.txt"
with open(text_file, 'r') as fin:
if row.split()[1] in fin.read():
print(True)
else:
print(False)
I really don’t understand.
Answers:
Try to read and split and search only once wherever possible. Try to keep it simple.
with open("try_ID.txt", 'r') as fin,
open("try_C.txt", 'r') as co_splice,
open("try.txt", 'r') as ca_splice:
co_splice = co_splice.read()
ca_splice = ca_splice.read()
for row in fin:
if 'Ca' in row or 'Co' in row:
zero,one,two,*_ = row.split()
name = one + "_blast"
one_in_ca = one in ca_splice
two_in_co = two in co_splice
if not one_in_ca and not two_in_co:
print(zero,one,two)
elif one_in_ca and not two_in_co:
print(one + "Ca")
elif not one_in_ca and two_in_co:
print(two + "Co")
else:
ne_name = name + "recip"
print(ne_name)
I have been trying to debug my code for searching strings in two files, but I can’t understand why the strings are not found all the time. I have been stuck here for half day, and probably you could help me to understand the error, please?
The logic is: (after filtering out line in "try_ID.txt" by this piece len(re.findall("Ca", row)) == 0 or len(re.findall("Co", row)) == 0
), if ca and co in "try_ID.txt" do not appear in both "try.txt" and "try_C.txt", then we go into the first if condition in my code; if we only find either ca or co in "try.txt" or "try_C.txt", then it goes into the elif
conditions in my code; if we find both ca and co in both files "try_C.txt" and "try.txt", then we go into else
condition in my code.
The problem is that, with my code, all the items go into the first if conditions (both not found). I don’t know why.
my code
import re
with open("try_ID.txt", 'r') as fin,
open("try_C.txt", 'r') as co_splice,
open("try.txt", 'r') as ca_splice:
for row in fin:
if len(re.findall("Ca", row)) == 0 or len(re.findall("Co", row)) == 0:
pass
else: # problem starts from here
name = str(row.split()[1]) + "_blast"
if not row.split()[1] in ca_splice.read() and not row.split()[2] in co_splice.read():
print(row.split()[0:2])
elif row.split()[1] in ca_splice.read() and not row.split()[2] in col_splice.read():
print(row.split()[1] + "Ca")
elif not row.split()[1] in can_splice.read() and row.split()[2] in col_splice.read():
print(row.split()[2] + "Co")
else:
ne_name = name + "recip"
print(ne_name)
"try_ID.txt"
H21911 Ca29092.1t A05340.1
H21912 Ca19588.1t Co27353.1t A05270.1
H21913 Ca19590.1t Co14899.1t A05260.1
H21914 Ca19592.1t Co14897.1t A05240.1
H21915 Co14877.1t A05091.1
S25338 Ca12595.1t Co27352.1t A53970.1
S20778 Ca29091.1t Co24326.1t A61120.1
S26552 Ca20916.1t Co14730.1t A16155.1
"try_C.txt"
Co14730.1t;Co14730.2t
Co27352.1t;Co27352.2t;Co27352.3t;Co27352.4t;Co27352.5t
Co14732.1t;Co14732.2t
Co4217.1t;Co4217.2t
Co27353.1t;Co27353.2t
Co14733.1t;Co14733.2t
"try.txt"
Ca12595.1t;Ca12595.2t
Ca29091.1t;Ca29091.2t
Ca1440.1t;Ca1440.2t
Ca29092.1t;Ca29092.2t
Ca20916.1t;Ca20916.2t
Though weird thing is when I try a small piece of code like below, it can find the strings.
row = "H20118 Ca12595.1t Co18779.1t A01010.1"
text_file = "try.txt"
with open(text_file, 'r') as fin:
if row.split()[1] in fin.read():
print(True)
else:
print(False)
I really don’t understand.
Try to read and split and search only once wherever possible. Try to keep it simple.
with open("try_ID.txt", 'r') as fin,
open("try_C.txt", 'r') as co_splice,
open("try.txt", 'r') as ca_splice:
co_splice = co_splice.read()
ca_splice = ca_splice.read()
for row in fin:
if 'Ca' in row or 'Co' in row:
zero,one,two,*_ = row.split()
name = one + "_blast"
one_in_ca = one in ca_splice
two_in_co = two in co_splice
if not one_in_ca and not two_in_co:
print(zero,one,two)
elif one_in_ca and not two_in_co:
print(one + "Ca")
elif not one_in_ca and two_in_co:
print(two + "Co")
else:
ne_name = name + "recip"
print(ne_name)