Get all possible combinations of replacing a substring
Question:
Python combinations of text given substring and arbitrary replacement
I have a string:
"foo bar foo foo"
And given a substring in that string, for example: "foo"
I want to get every combination substituting that "foo"
for some arbitrary string (it might have different length).
For example:
>>> combinations("foo bar foo foo", "foo", "fooer")
{
"foo bar foo foo",
"fooer bar foo foo",
"foo bar fooer foo",
"foo bar foo fooer",
"fooer bar fooer foo",
"fooer bar foo fooer",
"fooer bar fooer fooer",
"foo bar fooer fooer",
}
I have searched already and I can’t find anything that could help me.
I know I have to use itertools.product
for the combinations, however I get stuck when there are more than one appearances in the same string and the substring and its substitution have different lengths.
By the moment I get the indices where I have to start replacing:
def indices_substring(a_str, sub):
"""https://stackoverflow.com/a/4665027/9288003"""
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
Answers:
You can follow the following recipe:
- Separate string to a list of words.
- Find the indices of the words you want to replace.
- Create the power-set of those indices.
- Iterate over the power-set and replace the words in the indices of each set.
1. Separate string to a list of words
Easy enough to any Python user:
words = "foo bar foo foo".split()
In case the string is not necessarily space-separated, you can use regex:
import re
words = re.split("(foo)", "foobarfoofoo")
2. Find the indices of the words you want to replace
This can be done with a pretty simple list-comprehension:
indices = [i for i, v in enumerate(words) if v == "foo"]
3. Create the power-set of those indices
The official itertools
Recipes page has one for a power-set:
from itertools import chain, combinations
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
So with the function this step is super easy:
power_set = powerset(indices)
4. Iterate over the power-set and replace the words in the indices of each set
For this we will first create a copy of the words
list to work on, and then simply iterate on the indices of each item from the powerset and replace the words in those indices. To end things we will just join
the list:
for replacements in powerset(indices):
new_words = list(words)
for index in replacements:
new_words[index] = "fooer"
print(" ".join(new_words))
* if using the regex version, it should be ''.join(...)
Full code
All together this will look like:
from itertools import chain, combinations
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
s = "foo bar foo foo"
to_find = "foo"
to_replace = "fooer"
words = s.split()
# regex: words = re.split(f"({to_find})", s)
indices = [i for i, v in enumerate(words) if v == to_find]
for replacements in powerset(indices):
new_words = list(words)
for index in replacements:
new_words[index] = to_replace
print(" ".join(new_words))
# regex: print(''.join(new_words))
Which gives:
foo bar foo foo
fooer bar foo foo
foo bar fooer foo
foo bar foo fooer
fooer bar fooer foo
fooer bar foo fooer
foo bar fooer fooer
fooer bar fooer fooer
Python combinations of text given substring and arbitrary replacement
I have a string:
"foo bar foo foo"
And given a substring in that string, for example: "foo"
I want to get every combination substituting that "foo"
for some arbitrary string (it might have different length).
For example:
>>> combinations("foo bar foo foo", "foo", "fooer")
{
"foo bar foo foo",
"fooer bar foo foo",
"foo bar fooer foo",
"foo bar foo fooer",
"fooer bar fooer foo",
"fooer bar foo fooer",
"fooer bar fooer fooer",
"foo bar fooer fooer",
}
I have searched already and I can’t find anything that could help me.
I know I have to use itertools.product
for the combinations, however I get stuck when there are more than one appearances in the same string and the substring and its substitution have different lengths.
By the moment I get the indices where I have to start replacing:
def indices_substring(a_str, sub):
"""https://stackoverflow.com/a/4665027/9288003"""
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
You can follow the following recipe:
- Separate string to a list of words.
- Find the indices of the words you want to replace.
- Create the power-set of those indices.
- Iterate over the power-set and replace the words in the indices of each set.
1. Separate string to a list of words
Easy enough to any Python user:
words = "foo bar foo foo".split()
In case the string is not necessarily space-separated, you can use regex:
import re
words = re.split("(foo)", "foobarfoofoo")
2. Find the indices of the words you want to replace
This can be done with a pretty simple list-comprehension:
indices = [i for i, v in enumerate(words) if v == "foo"]
3. Create the power-set of those indices
The official itertools
Recipes page has one for a power-set:
from itertools import chain, combinations
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
So with the function this step is super easy:
power_set = powerset(indices)
4. Iterate over the power-set and replace the words in the indices of each set
For this we will first create a copy of the words
list to work on, and then simply iterate on the indices of each item from the powerset and replace the words in those indices. To end things we will just join
the list:
for replacements in powerset(indices):
new_words = list(words)
for index in replacements:
new_words[index] = "fooer"
print(" ".join(new_words))
* if using the regex version, it should be ''.join(...)
Full code
All together this will look like:
from itertools import chain, combinations
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
s = "foo bar foo foo"
to_find = "foo"
to_replace = "fooer"
words = s.split()
# regex: words = re.split(f"({to_find})", s)
indices = [i for i, v in enumerate(words) if v == to_find]
for replacements in powerset(indices):
new_words = list(words)
for index in replacements:
new_words[index] = to_replace
print(" ".join(new_words))
# regex: print(''.join(new_words))
Which gives:
foo bar foo foo
fooer bar foo foo
foo bar fooer foo
foo bar foo fooer
fooer bar fooer foo
fooer bar foo fooer
foo bar fooer fooer
fooer bar fooer fooer