Fuzzy Match values to list of list python

Question:

Struggling with how to do this in a pythonic way. I have a list of list which we can call names

[('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh' "Laurie')]

And then I have a two variables

First_name = 'Jimm'

Last_name = 'Smitn'

I want to iterate through this list of list, of first and last names to fuzzy match these values and return the list that is the closest to the specified First_name and Last_name

Asked By: ChessGuy

||

Answers:

You can implement fuzzy matching obtaining best match ratio (using max()) returned by difflib.SequenceMatcher().

To implement this we should pass lambda as key argument which will return match ratio. In my example I’d use SequenceMatcher.ratio(), but if performance is important you should also try with SequenceMatcher.quick_ratio() and SequenceMatcher.real_quick_ratio().

from difflib import SequenceMatcher

lst = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = 'Jimm'
last_name = 'Smitn'

matcher = SequenceMatcher(a=first_name + ' ' + last_name)
match_first_name, match_last_name = max(lst,
    key=lambda x: matcher.set_seq2(' '.join(x)) or matcher.ratio())

print(first_name, last_name, '-', match_first_name, match_last_name)
Answered By: Olvin Roght

Another possible path would be to use set intersections.

names = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = "Jimm"
last_name = "Smitn"

setf = set(first_name)
# {'m', 'i', 'J'}
setl = set(last_name)
# {'t', 'n', 'm', 'i', 'S'}

ranked = [(len(setf & set(f)) + len(setl & set(l)), f, l) for f, l in names]
# [(7, 'Jimmy', 'Smith'), (4, 'James', 'Wilson'), (1, 'Hugh', 'Laurie')]

best_match = max(ranked, key=lambda x: x[0])[1:]
# ('Jimmy', 'Smith')
Answered By: Chris
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.