startswith TypeError in function
Question:
Here is the code:
def readFasta(filename):
""" Reads a sequence in Fasta format """
fp = open(filename, 'rb')
header = ""
seq = ""
while True:
line = fp.readline()
if (line == ""):
break
if (line.startswith('>')):
header = line[1:].strip()
else:
seq = fp.read().replace('n','')
seq = seq.replace('r','') # for windows
break
fp.close()
return (header, seq)
FASTAsequence = readFasta("MusChr01.fa")
The error I’m getting is:
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
But the first argument to startswith
is supposed to be a string according to the docs… so what is going on?
I’m assuming I’m using at least Python 3 since I’m using the latest version of LiClipse.
Answers:
It’s because you’re opening the file in bytes mode, and so you’re calling bytes.startswith()
and not str.startswith()
.
You need to do line.startswith(b'>')
, which will make '>'
a bytes literal.
Without having your file to test on try encoding to utf-8 on the ‘open’
fp = open(filename, 'r', encoding='utf-8')
If remaining to open a file in binary, replacing ‘STR’ to bytes(‘STR’.encode(‘utf-8’)) works for me.
Here is the code:
def readFasta(filename):
""" Reads a sequence in Fasta format """
fp = open(filename, 'rb')
header = ""
seq = ""
while True:
line = fp.readline()
if (line == ""):
break
if (line.startswith('>')):
header = line[1:].strip()
else:
seq = fp.read().replace('n','')
seq = seq.replace('r','') # for windows
break
fp.close()
return (header, seq)
FASTAsequence = readFasta("MusChr01.fa")
The error I’m getting is:
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
But the first argument to startswith
is supposed to be a string according to the docs… so what is going on?
I’m assuming I’m using at least Python 3 since I’m using the latest version of LiClipse.
It’s because you’re opening the file in bytes mode, and so you’re calling bytes.startswith()
and not str.startswith()
.
You need to do line.startswith(b'>')
, which will make '>'
a bytes literal.
Without having your file to test on try encoding to utf-8 on the ‘open’
fp = open(filename, 'r', encoding='utf-8')
If remaining to open a file in binary, replacing ‘STR’ to bytes(‘STR’.encode(‘utf-8’)) works for me.