How to replace all string patterns except for one?

Question:

I have a string and a pattern that I’m trying to replace:

my_string = "this is my string, it has [012] numbers and [1123] other things, like [2] cookies"
# pattern = all numbers between the brackets, and the brackets 

I want to replace all of those patterns except for one with some other pattern:

new_pattern = "_new_pattern_"

And I need to do this N number of times, where N is the number of times the pattern appears (in this case 3).

I know I can replace all of such pattern using regex:

import re
re.sub(r'[d+]', new_pattern, my_string)

But I don’t know how to do it for all patterns except for one.

Examples:
#1
my_string = "this is my string, it has [012] numbers and [1123] other things, like [2] cookies"
expected_output = [
    "this is my string, it has [012] numbers and new_pattern other things, like new_pattern cookies",
    "this is my string, it has new_pattern numbers and [1123] other things, like new_pattern cookies",
    "this is my string, it has new_pattern numbers and new_pattern other things, like [2] cookies"
]
#2
my_string = "this is my string"
expected_output = ["this is my string"]
#3
my_string = "this is my string [111]"
expected_output = ["this is my string [111]"]
#4
my_string = "this is my string [111] and this [111]"
expected_output = ["this is my string [111] and this new_pattern",
                   "this is my string new_pattern and this [111]"]

To clarify, I want to do it for all matches except for one of them, N times (so if there are N matches, I want to make N-1 replacements, in all possible variations)

Asked By: Penguin

||

Answers:

This code is what I’ve come up with:

import itertools
import re
from typing import List

examples: List[str] = [
    "this is my string",
    "this is my string [111]",
    "this is my string [111] and this [111]",
    "this is my string [111] and this [111] and that [111]"
]
outputs: List[List] = []

pattern: str = r"[[0-9]+]"
new_pattern: str = "_new_pattern_"
for example in examples:
    matches: List = re.findall(pattern, example)
    if len(matches) < 2:
        outputs.append([example])
        continue
    output: List = []
    for comb in itertools.combinations([(x.start(), x.end()) for x in re.finditer(pattern, example)], len(matches) - 1):
        curr_start: int = 0
        curr_end: int = 0
        curr_out: str = ""
        for cell in comb:
            curr_end = cell[0]
            curr_out += example[curr_start:curr_end] + new_pattern
            curr_start = cell[1]
        curr_out += example[curr_start:]
        output.append(curr_out)
    outputs.append(output)

for output in outputs:
    print(output)

The results:

[‘this is my string’]
[‘this is my string [111]’]
[‘this is my string new_pattern and this [111]’, ‘this is my string [111] and this new_pattern’]
[‘this is my string new_pattern and this new_pattern and that [111]’, ‘this is my string new_pattern and this [111] and that new_pattern’, ‘this is my string [111] and this new_pattern and that new_pattern’]

Here I’ve generated a list of pairs (start_index, stop_index) and have generated the list of every combination containing all but one of them to be replaced (itertools.combinations([(x.start(), x.end()) for x in re.finditer(pattern, example)], len(matches) - 1)). I’ve then proceeded to said replacement, in order to generate the desired output.
There may be a quicker/shorter way to achieve this, but this should be the logic to be followed.

Answered By: GregoirePelegrin

Here I use re.split to find the patterns and get a list alternating between non-pattern and pattern strings. Then I replace all the old patterns with the new pattern in that list. And then for each old pattern, I produce a list and then string where I put just that one back in.

import re
import pprint

my_string = "foo [012] bar [1123] stack [2] overflow"
new_pattern = "NP"

parts = re.split(r'([d+])', my_string)
old_patterns = parts[1::2]
parts[1::2] = [new_pattern] * len(old_patterns)
result = []
for i, old in enumerate(old_patterns):
    copy = parts[:]
    copy[1 + 2*i] = old
    result.append(''.join(copy))
if not result:
    result.append(my_string)

pprint.pp(result)

Output (Attempt This Online!):

['foo [012] bar NP stack NP overflow',
 'foo NP bar [1123] stack NP overflow',
 'foo NP bar NP stack [2] overflow']
Answered By: Kelly Bundy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.