Python: search string in a txt file always results in not finding

Question:

I have been trying to debug my code for searching strings in two files, but I can’t understand why the strings are not found all the time. I have been stuck here for half day, and probably you could help me to understand the error, please?

The logic is: (after filtering out line in "try_ID.txt" by this piece len(re.findall("Ca", row)) == 0 or len(re.findall("Co", row)) == 0), if ca and co in "try_ID.txt" do not appear in both "try.txt" and "try_C.txt", then we go into the first if condition in my code; if we only find either ca or co in "try.txt" or "try_C.txt", then it goes into the elif conditions in my code; if we find both ca and co in both files "try_C.txt" and "try.txt", then we go into else condition in my code.

The problem is that, with my code, all the items go into the first if conditions (both not found). I don’t know why.

my code

import re


with open("try_ID.txt", 'r') as fin, 
        open("try_C.txt", 'r') as co_splice, 
        open("try.txt", 'r') as ca_splice:
    for row in fin:
        if len(re.findall("Ca", row)) == 0 or len(re.findall("Co", row)) == 0:
            pass 
        else: # problem starts from here
            name = str(row.split()[1]) + "_blast"
            if not row.split()[1] in ca_splice.read() and not row.split()[2] in co_splice.read():
                print(row.split()[0:2])
            elif row.split()[1] in ca_splice.read() and not row.split()[2] in col_splice.read(): 
                print(row.split()[1] + "Ca")
            elif not row.split()[1] in can_splice.read() and row.split()[2] in col_splice.read(): 
                print(row.split()[2] + "Co")
            else: 
                ne_name = name + "recip"
                print(ne_name)

"try_ID.txt"

H21911        Ca29092.1t    A05340.1
H21912        Ca19588.1t    Co27353.1t    A05270.1
H21913        Ca19590.1t    Co14899.1t    A05260.1
H21914        Ca19592.1t    Co14897.1t    A05240.1
H21915    Co14877.1t    A05091.1
S25338  Ca12595.1t  Co27352.1t  A53970.1
S20778  Ca29091.1t  Co24326.1t  A61120.1
S26552  Ca20916.1t  Co14730.1t  A16155.1

"try_C.txt"

Co14730.1t;Co14730.2t
Co27352.1t;Co27352.2t;Co27352.3t;Co27352.4t;Co27352.5t
Co14732.1t;Co14732.2t
Co4217.1t;Co4217.2t
Co27353.1t;Co27353.2t
Co14733.1t;Co14733.2t

"try.txt"

Ca12595.1t;Ca12595.2t
Ca29091.1t;Ca29091.2t
Ca1440.1t;Ca1440.2t
Ca29092.1t;Ca29092.2t
Ca20916.1t;Ca20916.2t

Though weird thing is when I try a small piece of code like below, it can find the strings.

row = "H20118        Ca12595.1t    Co18779.1t    A01010.1"
text_file = "try.txt"
with open(text_file, 'r') as fin:
    if row.split()[1] in fin.read():
        print(True)
    else:
        print(False)

I really don’t understand.

Asked By: zzz

||

Answers:

Try to read and split and search only once wherever possible. Try to keep it simple.

with open("try_ID.txt", 'r') as fin, 
        open("try_C.txt", 'r') as co_splice, 
        open("try.txt", 'r') as ca_splice:
    co_splice = co_splice.read()
    ca_splice = ca_splice.read()
    for row in fin:
        if 'Ca' in row or 'Co' in row:
            zero,one,two,*_ = row.split()
            name = one + "_blast"
            one_in_ca = one in ca_splice
            two_in_co = two in co_splice
            if not one_in_ca and not two_in_co:
                print(zero,one,two)
            elif one_in_ca and not two_in_co: 
                print(one + "Ca")
            elif not one_in_ca and two_in_co: 
                print(two + "Co")
            else: 
                ne_name = name + "recip"
                print(ne_name)
Answered By: wwii
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.