Finding unique combinations of two sequence (String) Python

Question:

Hello i have a question.

I got two List like that :

list1 = ['G','C','A','T','C','A']

list2 = ['G','A','*','T','AC','A']

And i wanted in python to find all combination possible of those 2 sequences.

For example the result will be :

(GCATCA), (GA*TACA), (GAATCA), (GA*TCA), (GC*TCA), (GC*TACA), (GCATACA), (GAATACA)

Thanks you for your answer

Asked By: Théo Durand

||

Answers:

First lets get all the pairs in every position:

pairs = zip(list1, list2)

Now lets make a combination for every choice in every position.
To do so, we will use itertools.product.

from itertools import product

combinations = product(*zip(list1, list2))

Lets get rid of all duplicate combinations.

unique_combinations = set(combinations)

Finally lets join each combination in a single string:

all_combinations = [''.join(bases) for bases in unique_combinations]

Here all_combinations will be the desired output:
['GC*TCA', 'GC*TACA', 'GCATCA', 'GAATACA', 'GCATACA', 'GA*TACA', 'GAATCA', 'GA*TCA']. Of course you can get it in a single line: [''.join(bases) for bases in set(product(*zip(list1, list2)))]

Answered By: Jorge Luis

Just like Jorge Luis’s answer, but also deduplicating earlier, at each position. Then we only produce the 8 combinations directly instead of producing 64 and throwing away 56 of them.

from itertools import product

list1 = ['G','C','A','T','C','A']
list2 = ['G','A','*','T','AC','A']

result = {*map(''.join, product(*map(set, zip(list1, list2))))}

print(result)

Output (Attempt This Online!):

{'GAATCA', 'GA*TCA', 'GA*TACA', 'GC*TACA',
 'GCATACA', 'GC*TCA', 'GCATCA', 'GAATACA'}

Alternatively, without itertools:

list1 = ['G','C','A','T','C','A']
list2 = ['G','A','*','T','AC','A']

result = {''}
for pair in zip(list1, list2):
    result = {
        r + p
        for p in set(pair)
        for r in result
    }

print(result)

Attempt This Online!

Answered By: Kelly Bundy