How do I replace part of string with various combinations in lookup in Python?

Question:

I have the following code replacing every element with it’s short form in the lookup:

case = ["MY_FIRST_RODEO"]
lookup = {'MY': 'M', 'FIRST': 'FRST', 'RODEO' : 'RD', 'FIRST_RODEO': 'FRD', 'MY_FIRST': 'MF', 'MY_FIRST_RODEO': 'MFR'}
case_mod = []
for string in case:
    words = string.split("_")
    new_string = [lookup[word] for word in words]
    case_mod.append("_".join(new_string))
print(case_mod)

This returns:

['M_FRST_RD']

However, I want it to additionally return all possibilities since in the lookup, I have short words for all MY_FIRST, FIRST_RODEO, and MY_FIRST_RODEO. So, I want the following returned:

['M_FRST_RD', 'MF_RD', 'M_FRD', 'MFR']

I was able to write code to break the original list into all possibilities as follows:

case = ["MY_FIRST_RODEO"]
result = []
for string in case:
    words = string.split("_")
    n = len(words)
    for i in range(n):
        result.append("_".join(words[:i + 1]))
        for j in range(i + 1, n):
            result.append("_".join(words[i:j + 1]))
            result.extend(words)
result = list(dict.fromkeys(result))
print(result)

to return:

['MY', 'MY_FIRST', 'FIRST', 'RODEO', 'MY_FIRST_RODEO', 'FIRST_RODEO']

But somehow can’t make the connection between the two solutions. Any help will be greatly appreciated.

Asked By: flying_fluid_four

||

Answers:

One thing you could try is the following:

from itertools import combinations

string = "MY_FIRST_RODEO"
lookup = {'MY': 'M', 'FIRST': 'FRST', 'RODEO' : 'RD', 'FIRST_RODEO': 'FRD', 'MY_FIRST': 'MF', 'MY_FIRST_RODEO': 'MFR'}

underscores = [i for i, c in enumerate(string) if c == "_"]
length = len(string)
results = []
for r in range(len(underscores), -1, -1):
    for parts in combinations(underscores, r):
        limits = ((a + 1, b) for a, b in zip((-1,) + parts, parts + (length,)))
        results.append("_".join(lookup[string[a:b]] for a, b in limits))

First record the indices of string with an underscore and then use them with combinations (from the standard library module itertools) to choose all the different partitions of string along the underscores. (I’ve left out the outer loop over case since that is not needed to show the proposed mechanic.)

Result here:

['M_FRST_RD', 'M_FRD', 'MF_RD', 'MFR']
Answered By: Timus