all combinations of DNA characters in a string of length 4

Question:

I am trying to generate a list of all possible DNA sequences of length four with the four character A, T, C, G. There is a total of 4^4 (256) different combinations. I include repeats, such that AAAA is allowed.
I have looked at itertools.combinations_with_replacement(iterable, r)
however, the list output changes depending on the order of the input string i.e

itertools.combinations_with_replacement('ATCG', 4) #diff results to...
itertools.combinations_with_replacement('ATGC', 4)

Because of this, I had an attempt at combining itertools.combinations_with_replacement(iterable, r), with itertools.permutations()

Such that pass the output of itertools.permutations() to itertools.combinations_with_replacement(). As defined below:

def allCombinations(s, strings):
perms = list(itertools.permutations(s, 4))
allCombos = []
for perm in perms:
    combo = list(itertools.combinations_with_replacement(perm, 4))
    allCombos.append(combo)
for combos in allCombos:
    for tup in combos:
        strings.append("".join(str(x) for x in tup))

However running allCombinations('ATCG', li) where li = [] and then taking the
list(set(li)) still only proceeds 136 unique sequences, rather than 256.

There must be an easy way to do this, maybe generating a power set and then filtering for length 4?

Asked By: izaak_pyzaak

||

Answers:

You could just try this

l = []

s = 'ATCG'

for a in s:
    n1 = a
    for b in s:
        n2 = n1 + b
        for c in s:
            n3 = n2 + c
            for d in s:
                l.append(n3+d)
Answered By: Pax Vobiscum

You can achieve this by using product. It gives the Cartesian product of the passed iterables:

a = 'ACTG'

print(len(list(itertools.product(a, a, a, a))))
# or even better, print(len(list(itertools.product(a, repeat=4)))) as @ayhan commented
>> 256

But it returns tuples, so if you are looking for strings:

for output in itertools.product(a, repeat=4):
    print(''.join(output))

>> 'AAAA'
   'AAAC'
   .
   .
   'GGGG'
Answered By: DeepSpace