Order of dictionary key/values changes output of script

Question:

The problem occurs in a script processing lines of text like the following:

import re  

combo_dict = {  
    'Mngr': 'Manager',
    'Shp': 'Shop'   
}


rexCapDoubleWord = '(?s)s([A-z]+s[A-z]+)$'
fileLines = ['001 AALTONEN Alan Roy 2 Berkeley_Road,_Welltown Shp Mngr']

for fileLine in fileLines:

    words = []
    try:
        regexCapDouble = re.search(rexCapDoubleWord, fileLine).group(1)
    
        if(regexCapDouble):
            words = regexCapDouble.split(" ")
                   
            for key, value in combo_dict.items():
                if(key == words[0]):
                # replace the capture group with the dictionary key value found
                    SubWordOne = re.sub(words[0], value, fileLine)
                else:
                    SubWordOne = fileLine
                            
            for key, value in combo_dict.items():
                if(key == words[1]):
                # replace the capture group with the dictionary key value found
                    SubWordTwo = re.sub(words[1], value, SubWordOne) 
                    fileLines = list(map(lambda x: x.replace(fileLine, SubWordTwo), fileLines)) 
                else:
                    fileLines = list(map(lambda x: x.replace(fileLine, SubWordOne), fileLines))

    except AttributeError:
        regexCapDouble = None

for fileLine in fileLines:
    print(fileLine)

This simple example outputs:

001 AALTONEN Alan Roy 2 Berkeley_Road,_Welltown Shop Manager

But if the dictionary contents are reversed:

combo_dict = {  
    'Shp': 'Shop',
    'Mngr': 'Manager'   
}

Output:

001 AALTONEN Alan Roy 2 Berkeley_Road,_Welltown Shp Manager

I can’t see I’m doing anything wrong. Is it the way I’m trying to access the dictionary keys? I want to clear this up because my ‘use case’ gets more complicated from here. Would appreciate any suggestions.

Asked By: Dave

||

Answers:

The problem appears to be caused by the order of the keys in the dictionary. In the first example, the keys are in the order ‘Mngr’ followed by ‘Shp’, so the code replaces the first key ‘Mngr’ with ‘Manager’ and then the second key ‘Shp’ with ‘Shop’.

In the second example, the keys are in the reverse order, so the code replaces the first key ‘Shp’ with ‘Shop’ and then the second key ‘Mngr’ with ‘Manager’. Since the first key has already been replaced, the second key is not replaced.

To fix this problem, you could use an ordered dictionary, such as collections.OrderedDict, to ensure that the keys are always processed in the same order. This way, the code can replace both keys with the correct values.

For example:

import re
from collections import OrderedDict

combo_dict = OrderedDict({  
    'Mngr': 'Manager',
    'Shp': 'Shop'   
})

rexCapDoubleWord = '(?s)s([A-z]+s[A-z]+)$'
fileLines = ['001 AALTONEN Alan Roy 2 Berkeley_Road,_Welltown Shp Mngr']

for fileLine in fileLines:

    words = []
    try:
        regexCapDouble = re.search(rexCapDoubleWord, fileLine).group(1)
    
        if(regexCapDouble):
            words = regexCapDouble.split(" ")
                   
            for key, value in combo_dict.items():
                if(key == words[0]):
                # replace the capture group with the dictionary key value found
                    SubWordOne = re.sub(words[0], value, fileLine)
                else:
                    SubWordOne = fileLine
                            
            for key, value in combo_dict.items():
                if(key == words[1]):
                # replace the capture group with the dictionary key value found
                    SubWordTwo = re.sub(words[1], value, SubWordOne) 
                    fileLines = list(map(lambda x: x.replace(fileLine, SubWordTwo), fileLines)) 
                else:
                    fileLines = list(map(lambda x: x.replace(fileLine, SubWordOne), fileLines))

    except AttributeError:
        regexCapDouble = None

for fileLine in fileLines:
    print(fileLine)
Answered By: Branch

Dicts are meant to be accessed by keys, not iterated. Since your end goal is apparently to replace the last two words of each line with mapped values if any matches, you can split each line by spaces and replace each of the last two words with mapped values from combo_dict using its get method:

for line in fileLines:
    words = line.split(' ')
    words[-2:] = (combo_dict.get(word, word) for word in words[-2:])
    print(' '.join(words))

This outputs:

001 AALTONEN Alan Roy 2 Berkeley_Road,_Welltown Shop Manager

Demo: https://replit.com/@blhsing/ValuableFuzzyOolanguage

Answered By: blhsing
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.