Get all possible combinations of replacing a substring

Question:

Python combinations of text given substring and arbitrary replacement

I have a string:

"foo bar foo foo"

And given a substring in that string, for example: "foo" I want to get every combination substituting that "foo" for some arbitrary string (it might have different length).

For example:

>>> combinations("foo bar foo foo", "foo", "fooer")
{
    "foo bar foo foo",
    "fooer bar foo foo",
    "foo bar fooer foo",
    "foo bar foo fooer",
    "fooer bar fooer foo",
    "fooer bar foo fooer",
    "fooer bar fooer fooer",
    "foo bar fooer fooer",
}

I have searched already and I can’t find anything that could help me.

I know I have to use itertools.product for the combinations, however I get stuck when there are more than one appearances in the same string and the substring and its substitution have different lengths.

By the moment I get the indices where I have to start replacing:

def indices_substring(a_str, sub):
    """https://stackoverflow.com/a/4665027/9288003"""
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches
Asked By: Guillem

||

Answers:

You can follow the following recipe:

  1. Separate string to a list of words.
  2. Find the indices of the words you want to replace.
  3. Create the power-set of those indices.
  4. Iterate over the power-set and replace the words in the indices of each set.

1. Separate string to a list of words

Easy enough to any Python user:

words = "foo bar foo foo".split()

In case the string is not necessarily space-separated, you can use regex:

import re

words = re.split("(foo)", "foobarfoofoo")

2. Find the indices of the words you want to replace

This can be done with a pretty simple list-comprehension:

indices = [i for i, v in enumerate(words) if v == "foo"]

3. Create the power-set of those indices

The official itertools Recipes page has one for a power-set:

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

So with the function this step is super easy:

power_set = powerset(indices)

4. Iterate over the power-set and replace the words in the indices of each set

For this we will first create a copy of the words list to work on, and then simply iterate on the indices of each item from the powerset and replace the words in those indices. To end things we will just join the list:

for replacements in powerset(indices):
    new_words = list(words)
    for index in replacements:
        new_words[index] = "fooer"
    print(" ".join(new_words))

* if using the regex version, it should be ''.join(...)

Full code

All together this will look like:

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

s = "foo bar foo foo"
to_find = "foo"
to_replace = "fooer"

words = s.split()
# regex: words = re.split(f"({to_find})", s)
indices = [i for i, v in enumerate(words) if v == to_find]
for replacements in powerset(indices):
    new_words = list(words)
    for index in replacements:
        new_words[index] = to_replace
    print(" ".join(new_words))
    # regex: print(''.join(new_words))

Which gives:

foo bar foo foo
fooer bar foo foo
foo bar fooer foo
foo bar foo fooer
fooer bar fooer foo
fooer bar foo fooer
foo bar fooer fooer
fooer bar fooer fooer
Answered By: Tomerikoo