String edit distance in python

Question:

I need to check if the string distance (Measure the minimal number of changes – character removal, addition, and transposition) between two strings in python is greater than 1.

I can implement it on my own, but I bet there are existing packages for that would save me from implementing that on my own. I wasn’t able to find any such package I could identify as commonly used. Are there any?

Asked By: yuvalm2

||

Answers:

There are many implementations of the corresponding algorithm you need: the following belongs to a well documented library called NLTK.

https://www.nltk.org/_modules/nltk/metrics/distance.html

Yes. strsimpy can be used. Check out here – https://pypi.org/project/strsimpy/
I hope this is what you are looking for. Here is a usage example:

from strsimpy.levenshtein import Levenshtein

levenshtein = Levenshtein()
levenshtein.distance('1234', '123')   # 1 (deletion/insertion)
levenshtein.distance('1234', '12345') # 1 (deletion/insertion)
levenshtein.distance('1234', '1235')  # 1 (substitution)
levenshtein.distance('1234', '1324')  # 2 (substitutions)
levenshtein.distance('1234', 'ABCD')  # 4 (substitutions)

There are a lot of other metrics available.

Answered By: scorpi03

There is a NLTK package which you can use, it uses the Levenshtein edit-distance which should be what you’re looking for.

Example:

import nltk
s1 = "abc"
s2 = "ebcd"
nltk.edit_distance(s1, s2) # output: 2

Reference:
https://tedboy.github.io/nlps/generated/generated/nltk.edit_distance.html

Answered By: jrchew