# Count Each String In A List with One Character Mismatch

## Question:

I have a list of strings:

``````my_list = 'AAA AAA BBB BBB DDD DDD DDA'.split()
my_list

['AAA', 'AAA', 'BBB', 'BBB', 'DDD', 'DDD', 'DDA']
``````

I need to count every element appearing in the list. However, if two strings have one mismatch, we would count them as the same string and then count.

I mostly use the following script to count.

``````my_list.count('AAA')
``````

However, not sure about how to implement the mismatch part. I am thinking to run two `for loops`, compare two strings and then increment the count. It would be O(n^2).

Desired Output

``````AAA 2
BBB 2
DDD 3
DDA 3
``````

What would be the ideal way of getting the desired output? Any suggestions would be appreciated. Thanks!

## Answers:

Let’s start with an unoptimized method to test if two words are "close". You might lookup or import a real library that did "Levenshtein distance" rather than my half baked approach:

``````def is_close_enough(word1, word2):    # Levenshtein Distance == 1 ?
if word1 == word2:
return True

if len(word1) != len(word2):
return False

return sum(c1==c2 for c1, c2 in zip(word1, word2)) >= len(word1) -1

print(is_close_enough("dog", "bog"))
print(is_close_enough("dog", "bot"))
print(is_close_enough("dog", "cat"))
print(is_close_enough("dog", "dogo"))
``````

That should give you:

``````True
False
False
False
``````

Now let’s try that in conjunction with your base list of words.

``````import collections

def is_close_enough(word1, word2):    # Levenshtein Distance == 1 ?
if word1 == word2:
return True

if len(word1) != len(word2):
return False

return sum(c1==c2 for c1, c2 in zip(word1, word2)) >= len(word1) -1

my_list = 'AAA AAA BBB BBB DDD DDD DDA'.split()
my_list_counted = collections.Counter(my_list)

print({
word1: sum(
count2
for word2, count2
in my_list_counted.items()
if is_close_enough(word1, word2)
)
for word1
in my_list_counted
})
``````

That should give you:

``````{'AAA': 2, 'BBB': 2, 'DDD': 3, 'DDA': 3}
``````

Addendum:

If you had a specific list of interesting words to find rather than all matches you would iterate through it instead:

``````import collections

def is_close_enough(word1, word2):    # Levenshtein Distance == 1 ?
if word1 == word2:
return True

if len(word1) != len(word2):
return False

return sum(c1==c2 for c1, c2 in zip(word1, word2)) >= len(word1) -1

my_interesting_words = ["AAA", "DDA"]
my_list = 'AAA AAA BBB BBB DDD DDD DDA'.split()
my_list_counted = collections.Counter(my_list)

print({
word1: sum(
count2
for word2, count2
in my_list_counted.items()
if is_close_enough(word1, word2)
)
for word1
in my_interesting_words
})
``````
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.