Python – DNA Transcription using a for loop

Question:

dictionary = {1: ['A', 'U'],
          2: ['C', 'G'],
          3: ['G', 'C'],
          4: ['T', 'A']}
def transcribe(S):
"""Converts a single-character c from DNA
       nucleotide to its complementary RNA nucleotide
"""
if S =='':
    return ''
for i in dictionary:
    S = S.replace(dictionary[i][0], dictionary[i][1])
return S

Above is my code so far. Below are the tests I am running.


print("Function 6 Tests")
print( "transcribe('ACGTTGCA')             should be  'UGCAACGU' :",  transcribe('ACGTTGCA') )
print( "transcribe('ACG TGCA')             should be  'UGCACGU' :",  transcribe('ACG TGCA') )  # Note that the space disappears
print( "transcribe('GATTACA')              should be  'CUAAUGU' :",  transcribe('GATTACA') )
print( "transcribe('cs5')                  should be  ''  :",  transcribe('cs5') ) # Note that other characters disappear
print( "transcribe('')                     should be  '' :",  transcribe('') )   # Empty strings!
Function 6 Tests
transcribe('ACGTTGCA')             should be  'UGCAACGU' : UCCAACCU
transcribe('ACG TGCA')             should be  'UGCACGU' : UCC ACCU
transcribe('GATTACA')              should be  'CUAAUGU' : CUAAUCU
transcribe('cs5')                  should be  ''  : cs5
transcribe('')                     should be  '' : 

Above are the results I am getting.

1)I don’t understand why C will not convert into G even though I listed it in the dictionary.
2)Is there a way to modify the first if statement so that anything else other than ATCG entered will result in ” being printed?
3) Also, how do I get rid of the space between ACG and TGCA?

Asked By: T.Mok

||

Answers:

Consider:

>>> a = "hello"
>>> a = a.replace('l', 'x')
>>> a
'hexxo'
>>> a = a.replace('x', 'l')
>>> a
'hello'
>>>

You have an entry that converts C to G, but then you have an entry that converts G back to C.

Try having a dictionary that maps a character to the character to replace with:

d = {'A': 'U', 'C': 'G', 'G': 'C', 'T': 'A'}

Now you can do something like the following, where you only convert each character once.

>>> d = {'A': 'U', 'C': 'G', 'G': 'C', 'T': 'A'}
>>> d
{'A': 'U', 'C': 'G', 'T': 'A', 'G': 'C'}
>>> ''.join(d[ch] for ch in "ACTG")
'UGAC'
>>>

This assumes that the string you’re working on only contains A, C, G, or T.

Answered By: Chris

replace replaces all instances. The problem is for ACGTTGCA, there are 2 Cs, so once you replace C by G, you replace the already replaced G by C again.

Make dictionary a mapping from letters in S to the replacement letters. Then simply use it in a loop to replace letters

# make the dictionary that maps the first list element to the second
d = {k:v for k,v in dictionary.values()}

def transcribe(S):
    """
       Converts a single-character c from DNA
       nucleotide to its complementary RNA nucleotide
    """
    # get dict values from S
    return ''.join([d.get(k, '') for k in S])

Maybe it’s worth considering moving to an intermediate alphabet:

from typing import Dict, Final

DNA_2_RNA: Final[Dict[str, str]] = {
    "A": "1",
    "C": "2",
    "G": "3",
    "T": "4"
}

def transcribe(dna: str) -> str: # rna

    temp = ""

    for nucleotide in dna:
        temp += DNA_2_RNA[nucleotide]

    return (
        temp
            .replace("1", "U")
            .replace("2", "G")
            .replace("3", "C")
            .replace("4", "A")
    )

Answered By: OrenIshShalom
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.