Replace combinations of characters

Question:

I have a string 27AAGCB5913L2ZF. If any of A or J or K appear in the string then it I need to change them to all possible combinations of the three letters. If I pass the above string input to the program then the output should be like this

27AAGCB5913L2ZF
27AJGCB5913L2ZF
27AKGCB5913L2ZF
27JAGCB5913L2ZF
27KAGCB5913L2ZF
27KJGCB5913L2ZF
27JKGCB5913L2ZF
27JJGCB5913L2ZF
27KKGCB5913L2ZF

The letters may be present anywhere in string. If just a single letter is present then it must be replaced by A, J and K in turn. For example the output for the string 27ABGCB5913L2ZF should be as follows

27ABGCB5913L2ZF
27JBGCB5913L2ZF
27kBGCB5913L2ZF

I can search for a given character with code like this

while ( $string =~ /(B)/g ) {
    say $1, ' at ', pos $string;
}

How can I produce all possible strings if there could be any number of A, J, or K in any positions?

Asked By: Amol

||

Answers:

The collection of combinations that you want of your key letters A, J, and K is called the
Cartesian product.
In Python, you can use
itertools.product to generate them.

Firstly, we need to find the positions of all of the key letters in the input string. The easy way to do that uses the built-in
enumerate
function. Once I know those positions, and how many key letters the string contains, we can generate each item of the Cartesian product, replace the key letters, and print the new string.

In Python, strings are immutable (they can’t be changed), so I convert the string to a list of characters, replace the characters in the key positions, and then build a new string from the list using the
str.join
method.

The following code will work with both versions 2 and 3 of Python

Python

from itertools import product

def make_patterns(s):

    keyletters = 'AJK'

    # Convert input string into a list so we can easily substitute letters
    seq = list(s)

    # Find indices of key letters in seq
    indices = [ i for i, c in enumerate(seq) if c in keyletters ]

    # Generate key letter combinations & place them into the list
    for t in product(keyletters, repeat=len(indices)):
        for i, c in zip(indices, t):
            seq[i] = c
        print(''.join(seq))

# Test

data = (
    '1ABC2',
    '27AAGCB5913L2ZF',
    '3A4J',
    '5K67KA',
)

for s in data:
    print('nInput:', s)
    make_patterns(s)

output

Input: 1ABC2
1ABC2
1JBC2
1KBC2

Input: 27AAGCB5913L2ZF
27AAGCB5913L2ZF
27AJGCB5913L2ZF
27AKGCB5913L2ZF
27JAGCB5913L2ZF
27JJGCB5913L2ZF
27JKGCB5913L2ZF
27KAGCB5913L2ZF
27KJGCB5913L2ZF
27KKGCB5913L2ZF

Input: 3A4J
3A4A
3A4J
3A4K
3J4A
3J4J
3J4K
3K4A
3K4J
3K4K

Input: 5K67KA
5A67AA
5A67AJ
5A67AK
5A67JA
5A67JJ
5A67JK
5A67KA
5A67KJ
5A67KK
5J67AA
5J67AJ
5J67AK
5J67JA
5J67JJ
5J67JK
5J67KA
5J67KJ
5J67KK
5K67AA
5K67AJ
5K67AK
5K67JA
5K67JJ
5K67JK
5K67KA
5K67KJ
5K67KK

With a minor change, we can turn our function into a generator. That lets you loop over the output strings easily, or turn them into a list if you want.

Python

from itertools import product

def make_patterns(s):

    keyletters = 'AJK'

    # Convert input string into a list so we can easily substitute letters
    seq = list(s)

    # Find indices of key letters in seq
    indices = [i for i, c in enumerate(seq) if c in keyletters]

    # Generate key letter combinations & place them into the list
    for t in product(keyletters, repeat=len(indices)):
        for i, c in zip(indices, t):
            seq[i] = c
        yield ''.join(seq)

# Test

print(list(make_patterns('A12K')))

for s in make_patterns('3KJ4'):
    print(s)

output

['A12A', 'A12J', 'A12K', 'J12A', 'J12J', 'J12K', 'K12A', 'K12J', 'K12K']
3AA4
3AJ4
3AK4
3JA4
3JJ4
3JK4
3KA4
3KJ4
3KK4
Answered By: PM 2Ring

This can be done in Perl by using the glob operator. glob is intended for finding matching files, but if no generic wildcard characters (*, ? or [...]) are included in the pattern then it will simply return all possible matches, whether they exist as files or not

This Perl code uses a substitution to form a glob pattern by replacing all occurrences of A, J, or K with the multiple pattern {A,J,K}. Submitting the result to glob gives us the required output

use strict;
use warnings 'all';
use feature 'say';

for my $s ( qw/ 27AAGCB5913L2ZF 27KBGCB5913L2ZF / ) {

    (my $patt = $s) =~ s/[AJK]/{A,J,K}/g;

    say for glob $patt;
    say "";
}

output

27AAGCB5913L2ZF
27AJGCB5913L2ZF
27AKGCB5913L2ZF
27JAGCB5913L2ZF
27JJGCB5913L2ZF
27JKGCB5913L2ZF
27KAGCB5913L2ZF
27KJGCB5913L2ZF
27KKGCB5913L2ZF

27ABGCB5913L2ZF
27JBGCB5913L2ZF
27KBGCB5913L2ZF
Answered By: Borodin

Here is a recursive solution in Perl:

my $str = '27AAGCB5913L2ZF';
my @replace = qw (A J K);
print_string( $str);

sub print_string {
    my ( $str, $replace, $start) = @_;

    if (defined $replace) {
        substr( $str, $start, 1) = $replace;
    }
    else {
        $start = -1;
    }
    pos( $str) = $start +1;
    if ($str =~ /G.*?(A|J|K)/g) {
        my $cur_start = $-[-1];
        print_string( $str, $_, $cur_start) for @replace;
    }
    else {
        say $str;
    }
}

Output:

27AAGCB5913L2ZF
27AJGCB5913L2ZF
27AKGCB5913L2ZF
27JAGCB5913L2ZF
27JJGCB5913L2ZF
27JKGCB5913L2ZF
27KAGCB5913L2ZF
27KJGCB5913L2ZF
27KKGCB5913L2ZF
Answered By: Håkon Hægland

A short solution:

Start by splitting the string into tuples of characters that may appear at each position

>>> s = "27AAGCB5913L2ZF"
>>> p2 = [("A","J","K") if c in "AJK" else (c,) for c in s]
>>> p2
[('2',), ('7',), ('A', 'J', 'K'), ('A', 'J', 'K'), ('G',), ('C',), ('B',), ('5',), ('9',), ('1',), ('3',), ('L',), ('2',), ('Z',), ('F',)]

Then this function will assemble the list of tuples back into a string:

def assemble(t, s=''):
    if t:
        for c in t[0]:
            assemble(t[1:], s+c)
    else:
        print(s)

>>> assemble(p2)
27AAGCB5913L2ZF
27AJGCB5913L2ZF
27AKGCB5913L2ZF
27JAGCB5913L2ZF
27JJGCB5913L2ZF
27JKGCB5913L2ZF
27KAGCB5913L2ZF
27KJGCB5913L2ZF
27KKGCB5913L2ZF
Answered By: BoarGules
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.