Fuzzy Match values to list of list python
Question:
Struggling with how to do this in a pythonic way. I have a list of list which we can call names
[('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh' "Laurie')]
And then I have a two variables
First_name = 'Jimm'
Last_name = 'Smitn'
I want to iterate through this list of list, of first and last names to fuzzy match these values and return the list that is the closest to the specified First_name and Last_name
Answers:
You can implement fuzzy matching obtaining best match ratio (using max()
) returned by difflib.SequenceMatcher()
.
To implement this we should pass lambda
as key
argument which will return match ratio. In my example I’d use SequenceMatcher.ratio()
, but if performance is important you should also try with SequenceMatcher.quick_ratio()
and SequenceMatcher.real_quick_ratio()
.
from difflib import SequenceMatcher
lst = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = 'Jimm'
last_name = 'Smitn'
matcher = SequenceMatcher(a=first_name + ' ' + last_name)
match_first_name, match_last_name = max(lst,
key=lambda x: matcher.set_seq2(' '.join(x)) or matcher.ratio())
print(first_name, last_name, '-', match_first_name, match_last_name)
Another possible path would be to use set intersections.
names = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = "Jimm"
last_name = "Smitn"
setf = set(first_name)
# {'m', 'i', 'J'}
setl = set(last_name)
# {'t', 'n', 'm', 'i', 'S'}
ranked = [(len(setf & set(f)) + len(setl & set(l)), f, l) for f, l in names]
# [(7, 'Jimmy', 'Smith'), (4, 'James', 'Wilson'), (1, 'Hugh', 'Laurie')]
best_match = max(ranked, key=lambda x: x[0])[1:]
# ('Jimmy', 'Smith')
Struggling with how to do this in a pythonic way. I have a list of list which we can call names
[('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh' "Laurie')]
And then I have a two variables
First_name = 'Jimm'
Last_name = 'Smitn'
I want to iterate through this list of list, of first and last names to fuzzy match these values and return the list that is the closest to the specified First_name and Last_name
You can implement fuzzy matching obtaining best match ratio (using max()
) returned by difflib.SequenceMatcher()
.
To implement this we should pass lambda
as key
argument which will return match ratio. In my example I’d use SequenceMatcher.ratio()
, but if performance is important you should also try with SequenceMatcher.quick_ratio()
and SequenceMatcher.real_quick_ratio()
.
from difflib import SequenceMatcher
lst = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = 'Jimm'
last_name = 'Smitn'
matcher = SequenceMatcher(a=first_name + ' ' + last_name)
match_first_name, match_last_name = max(lst,
key=lambda x: matcher.set_seq2(' '.join(x)) or matcher.ratio())
print(first_name, last_name, '-', match_first_name, match_last_name)
Another possible path would be to use set intersections.
names = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = "Jimm"
last_name = "Smitn"
setf = set(first_name)
# {'m', 'i', 'J'}
setl = set(last_name)
# {'t', 'n', 'm', 'i', 'S'}
ranked = [(len(setf & set(f)) + len(setl & set(l)), f, l) for f, l in names]
# [(7, 'Jimmy', 'Smith'), (4, 'James', 'Wilson'), (1, 'Hugh', 'Laurie')]
best_match = max(ranked, key=lambda x: x[0])[1:]
# ('Jimmy', 'Smith')