String edit distance in python
Question:
I need to check if the string distance (Measure the minimal number of changes – character removal, addition, and transposition) between two strings in python is greater than 1.
I can implement it on my own, but I bet there are existing packages for that would save me from implementing that on my own. I wasn’t able to find any such package I could identify as commonly used. Are there any?
Answers:
There are many implementations of the corresponding algorithm you need: the following belongs to a well documented library called NLTK.
Yes. strsimpy can be used. Check out here – https://pypi.org/project/strsimpy/
I hope this is what you are looking for. Here is a usage example:
from strsimpy.levenshtein import Levenshtein
levenshtein = Levenshtein()
levenshtein.distance('1234', '123') # 1 (deletion/insertion)
levenshtein.distance('1234', '12345') # 1 (deletion/insertion)
levenshtein.distance('1234', '1235') # 1 (substitution)
levenshtein.distance('1234', '1324') # 2 (substitutions)
levenshtein.distance('1234', 'ABCD') # 4 (substitutions)
There are a lot of other metrics available.
There is a NLTK package which you can use, it uses the Levenshtein edit-distance which should be what you’re looking for.
Example:
import nltk
s1 = "abc"
s2 = "ebcd"
nltk.edit_distance(s1, s2) # output: 2
Reference:
https://tedboy.github.io/nlps/generated/generated/nltk.edit_distance.html
I need to check if the string distance (Measure the minimal number of changes – character removal, addition, and transposition) between two strings in python is greater than 1.
I can implement it on my own, but I bet there are existing packages for that would save me from implementing that on my own. I wasn’t able to find any such package I could identify as commonly used. Are there any?
There are many implementations of the corresponding algorithm you need: the following belongs to a well documented library called NLTK.
Yes. strsimpy can be used. Check out here – https://pypi.org/project/strsimpy/
I hope this is what you are looking for. Here is a usage example:
from strsimpy.levenshtein import Levenshtein
levenshtein = Levenshtein()
levenshtein.distance('1234', '123') # 1 (deletion/insertion)
levenshtein.distance('1234', '12345') # 1 (deletion/insertion)
levenshtein.distance('1234', '1235') # 1 (substitution)
levenshtein.distance('1234', '1324') # 2 (substitutions)
levenshtein.distance('1234', 'ABCD') # 4 (substitutions)
There are a lot of other metrics available.
There is a NLTK package which you can use, it uses the Levenshtein edit-distance which should be what you’re looking for.
Example:
import nltk
s1 = "abc"
s2 = "ebcd"
nltk.edit_distance(s1, s2) # output: 2
Reference:
https://tedboy.github.io/nlps/generated/generated/nltk.edit_distance.html