How to replace all string patterns except for one?
Question:
I have a string and a pattern that I’m trying to replace:
my_string = "this is my string, it has [012] numbers and [1123] other things, like [2] cookies"
# pattern = all numbers between the brackets, and the brackets
I want to replace all of those patterns except for one with some other pattern:
new_pattern = "_new_pattern_"
And I need to do this N
number of times, where N
is the number of times the pattern appears (in this case 3).
I know I can replace all of such pattern using regex:
import re
re.sub(r'[d+]', new_pattern, my_string)
But I don’t know how to do it for all patterns except for one.
Examples:
#1
my_string = "this is my string, it has [012] numbers and [1123] other things, like [2] cookies"
expected_output = [
"this is my string, it has [012] numbers and new_pattern other things, like new_pattern cookies",
"this is my string, it has new_pattern numbers and [1123] other things, like new_pattern cookies",
"this is my string, it has new_pattern numbers and new_pattern other things, like [2] cookies"
]
#2
my_string = "this is my string"
expected_output = ["this is my string"]
#3
my_string = "this is my string [111]"
expected_output = ["this is my string [111]"]
#4
my_string = "this is my string [111] and this [111]"
expected_output = ["this is my string [111] and this new_pattern",
"this is my string new_pattern and this [111]"]
To clarify, I want to do it for all matches except for one of them, N
times (so if there are N
matches, I want to make N-1
replacements, in all possible variations)
Answers:
This code is what I’ve come up with:
import itertools
import re
from typing import List
examples: List[str] = [
"this is my string",
"this is my string [111]",
"this is my string [111] and this [111]",
"this is my string [111] and this [111] and that [111]"
]
outputs: List[List] = []
pattern: str = r"[[0-9]+]"
new_pattern: str = "_new_pattern_"
for example in examples:
matches: List = re.findall(pattern, example)
if len(matches) < 2:
outputs.append([example])
continue
output: List = []
for comb in itertools.combinations([(x.start(), x.end()) for x in re.finditer(pattern, example)], len(matches) - 1):
curr_start: int = 0
curr_end: int = 0
curr_out: str = ""
for cell in comb:
curr_end = cell[0]
curr_out += example[curr_start:curr_end] + new_pattern
curr_start = cell[1]
curr_out += example[curr_start:]
output.append(curr_out)
outputs.append(output)
for output in outputs:
print(output)
The results:
[‘this is my string’]
[‘this is my string [111]’]
[‘this is my string new_pattern and this [111]’, ‘this is my string [111] and this new_pattern’]
[‘this is my string new_pattern and this new_pattern and that [111]’, ‘this is my string new_pattern and this [111] and that new_pattern’, ‘this is my string [111] and this new_pattern and that new_pattern’]
Here I’ve generated a list of pairs (start_index, stop_index)
and have generated the list of every combination containing all but one of them to be replaced (itertools.combinations([(x.start(), x.end()) for x in re.finditer(pattern, example)], len(matches) - 1)
). I’ve then proceeded to said replacement, in order to generate the desired output.
There may be a quicker/shorter way to achieve this, but this should be the logic to be followed.
Here I use re.split
to find the patterns and get a list alternating between non-pattern and pattern strings. Then I replace all the old patterns with the new pattern in that list. And then for each old pattern, I produce a list and then string where I put just that one back in.
import re
import pprint
my_string = "foo [012] bar [1123] stack [2] overflow"
new_pattern = "NP"
parts = re.split(r'([d+])', my_string)
old_patterns = parts[1::2]
parts[1::2] = [new_pattern] * len(old_patterns)
result = []
for i, old in enumerate(old_patterns):
copy = parts[:]
copy[1 + 2*i] = old
result.append(''.join(copy))
if not result:
result.append(my_string)
pprint.pp(result)
Output (Attempt This Online!):
['foo [012] bar NP stack NP overflow',
'foo NP bar [1123] stack NP overflow',
'foo NP bar NP stack [2] overflow']
I have a string and a pattern that I’m trying to replace:
my_string = "this is my string, it has [012] numbers and [1123] other things, like [2] cookies"
# pattern = all numbers between the brackets, and the brackets
I want to replace all of those patterns except for one with some other pattern:
new_pattern = "_new_pattern_"
And I need to do this N
number of times, where N
is the number of times the pattern appears (in this case 3).
I know I can replace all of such pattern using regex:
import re
re.sub(r'[d+]', new_pattern, my_string)
But I don’t know how to do it for all patterns except for one.
Examples:
#1
my_string = "this is my string, it has [012] numbers and [1123] other things, like [2] cookies"
expected_output = [
"this is my string, it has [012] numbers and new_pattern other things, like new_pattern cookies",
"this is my string, it has new_pattern numbers and [1123] other things, like new_pattern cookies",
"this is my string, it has new_pattern numbers and new_pattern other things, like [2] cookies"
]
#2
my_string = "this is my string"
expected_output = ["this is my string"]
#3
my_string = "this is my string [111]"
expected_output = ["this is my string [111]"]
#4
my_string = "this is my string [111] and this [111]"
expected_output = ["this is my string [111] and this new_pattern",
"this is my string new_pattern and this [111]"]
To clarify, I want to do it for all matches except for one of them, N
times (so if there are N
matches, I want to make N-1
replacements, in all possible variations)
This code is what I’ve come up with:
import itertools
import re
from typing import List
examples: List[str] = [
"this is my string",
"this is my string [111]",
"this is my string [111] and this [111]",
"this is my string [111] and this [111] and that [111]"
]
outputs: List[List] = []
pattern: str = r"[[0-9]+]"
new_pattern: str = "_new_pattern_"
for example in examples:
matches: List = re.findall(pattern, example)
if len(matches) < 2:
outputs.append([example])
continue
output: List = []
for comb in itertools.combinations([(x.start(), x.end()) for x in re.finditer(pattern, example)], len(matches) - 1):
curr_start: int = 0
curr_end: int = 0
curr_out: str = ""
for cell in comb:
curr_end = cell[0]
curr_out += example[curr_start:curr_end] + new_pattern
curr_start = cell[1]
curr_out += example[curr_start:]
output.append(curr_out)
outputs.append(output)
for output in outputs:
print(output)
The results:
[‘this is my string’]
[‘this is my string [111]’]
[‘this is my string new_pattern and this [111]’, ‘this is my string [111] and this new_pattern’]
[‘this is my string new_pattern and this new_pattern and that [111]’, ‘this is my string new_pattern and this [111] and that new_pattern’, ‘this is my string [111] and this new_pattern and that new_pattern’]
Here I’ve generated a list of pairs (start_index, stop_index)
and have generated the list of every combination containing all but one of them to be replaced (itertools.combinations([(x.start(), x.end()) for x in re.finditer(pattern, example)], len(matches) - 1)
). I’ve then proceeded to said replacement, in order to generate the desired output.
There may be a quicker/shorter way to achieve this, but this should be the logic to be followed.
Here I use re.split
to find the patterns and get a list alternating between non-pattern and pattern strings. Then I replace all the old patterns with the new pattern in that list. And then for each old pattern, I produce a list and then string where I put just that one back in.
import re
import pprint
my_string = "foo [012] bar [1123] stack [2] overflow"
new_pattern = "NP"
parts = re.split(r'([d+])', my_string)
old_patterns = parts[1::2]
parts[1::2] = [new_pattern] * len(old_patterns)
result = []
for i, old in enumerate(old_patterns):
copy = parts[:]
copy[1 + 2*i] = old
result.append(''.join(copy))
if not result:
result.append(my_string)
pprint.pp(result)
Output (Attempt This Online!):
['foo [012] bar NP stack NP overflow',
'foo NP bar [1123] stack NP overflow',
'foo NP bar NP stack [2] overflow']