Fuzzy Match values to list of list python

Question

Struggling with how to do this in a pythonic way. I have a list of list which we can call names

[('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh' "Laurie')]

And then I have a two variables

First_name = 'Jimm'

Last_name = 'Smitn'

I want to iterate through this list of list, of first and last names to fuzzy match these values and return the list that is the closest to the specified First_name and Last_name

Asked By: ChessGuy

||

Source

Answer 1

You can implement fuzzy matching obtaining best match ratio (using max()) returned by difflib.SequenceMatcher().

To implement this we should pass lambda as key argument which will return match ratio. In my example I’d use SequenceMatcher.ratio(), but if performance is important you should also try with SequenceMatcher.quick_ratio() and SequenceMatcher.real_quick_ratio().

from difflib import SequenceMatcher

lst = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = 'Jimm'
last_name = 'Smitn'

matcher = SequenceMatcher(a=first_name + ' ' + last_name)
match_first_name, match_last_name = max(lst,
    key=lambda x: matcher.set_seq2(' '.join(x)) or matcher.ratio())

print(first_name, last_name, '-', match_first_name, match_last_name)

Answered By: Olvin Roght

Answer 2

Another possible path would be to use set intersections.

names = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = "Jimm"
last_name = "Smitn"

setf = set(first_name)
# {'m', 'i', 'J'}
setl = set(last_name)
# {'t', 'n', 'm', 'i', 'S'}

ranked = [(len(setf & set(f)) + len(setl & set(l)), f, l) for f, l in names]
# [(7, 'Jimmy', 'Smith'), (4, 'James', 'Wilson'), (1, 'Hugh', 'Laurie')]

best_match = max(ranked, key=lambda x: x[0])[1:]
# ('Jimmy', 'Smith')

Answered By: Chris

Fuzzy Match values to list of list python

Question:

Answers: