Concatenate multiple strings and dictionary values

Question:

I have this dictionary and a list of sequences in the input file. The dictionary keys represent an aminoacid and the value represents the vector for this aminoacid.

I am trying to get an output like this:

MNTFSQVWVFSDTPSRLPELMNGAQALANQ:000000000010000000000000000000010000000000000000000000001000000010000000000000000000000000000001000000000000000001000000000000000000000001000000000000000000001000000000000000000100000010000000000000000000000000000001000000100000000000000000000000000000000010000000000000001000000000000000000000010000000000000000001000000000000010000000000000000000000010000000000100000000000000000000000010000000000000000000001000000000000000000001000000000000010000000000000010000000000000000000000000000000010000001000000000000000000000000000100000000000100000000000000000000000000000010000000000000000000001000000 
NTFSQVWVFSDTPSRLPELMNGAQALANQI:000000000001000000000000000000000000100000001000000000000000000000000000000100000000000000000100000000000000000000000100000000000000000000100000000000000000010000001000000000000000000000000000000100000010000000000000000000000000000000001000000000000000100000000000000000000001000000000000000000100000000000001000000000000000000000001000000000010000000000000000000000001000000000000000000000100000000000000000000100000000000001000000000000001000000000000000000000000000000001000000100000000000000000000000000010000000000010000000000000000000000000000001000000000000000000000100000000000001000000000000
TFSQVWVFSDTPSRLPELMNGAQALANQIN:000000000000000010000000100000000000000000000000000000010000000000000000010000000000000000000000010000000000000000000010000000000000000001000000100000000000000000000000000000010000001000000000000000000000000000000000100000000000000010000000000000000000000100000000000000000010000000000000100000000000000000000000100000000001000000000000000000000000100000000000000000000010000000000000000000010000000000000100000000000000100000000000000000000000000000000100000010000000000000000000000000001000000000001000000000000000000000000000000100000000000000000000010000000000000100000000000000000000000100000000

This is the code that I have so far.
I have created a loop for getting all the sequences from the file and after that I am trying to
get all the values of the corresponding aminoacid in just one string together with the original sequence.

vecAa = {
"A":"10000000000000000000", 
"C":"01000000000000000000", 
"D":"00100000000000000000", 
"E":"00010000000000000000", 
"F":"00001000000000000000",
"G":"00000100000000000000", 
"H":"00000010000000000000", 
"I":"00000001000000000000", 
"L":"00000000100000000000",
"K":"00000000010000000000",
"M":"00000000001000000000",
"N":"00000000000100000000",
"P":"00000000000010000000",
"Q":"00000000000001000000",
"R":"00000000000000100000", 
"S":"00000000000000010000",
"T":"00000000000000001000",
"V":"00000000000000000100",
"W":"00000000000000000010",
"Y":"00000000000000000001",
 }

with open("/home/example.txt", "r") as f:
    for line in f:
        x = line
        print(x)
        out = ([vecAa[value] for value in x ])

However I am getting the following error.

Traceback (most recent call last):
  File "vector.py", line 28, in <module>
    out = ([vecAa[value] for value in x ])
  File "vector.py", line 28, in <listcomp>
    out = ([vecAa[value] for value in x ])
KeyError: 'n'

How to resolve this?

Asked By: Vykov

||

Answers:

You can remove 'n' from the line:

x = line.strip()

or read lines without 'n's:

for line in f.readlines():
    ...
Answered By: Yevhen Kuzmovych
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.