Hamming distance between two strings in Python
Question:
I need to find the Hamming distance between two strings:
chaine1 = 6fb17381822a6ca9b02153d031d5d3da
chaine2 = a242eace2c57f7a16e8e872ed2f2287d
The XOR function didn’t work, and my search on the web was not very successful.
I tried to modify something I found on the web, but there’s some invalid syntax…:
assert len (chaine1) == len(chaine2)
return sum(chaine1 != chaine2 for chaine1, chaine2 in zip(chaine1, chaine2))
if __name__=="__main__":
chaine1 = hashlib.md5("chaine1".encode()).hexdigest()
chaine2 = hashlib.md5("chaine2".encode()).hexdigest()
print hamming_distance(chaine1, chaine2)
How could I proceed?
Answers:
Following is a program calculating the Hamming distance using two different ways.
import hashlib
def hamming_distance(chaine1, chaine2):
return sum(c1 != c2 for c1, c2 in zip(chaine1, chaine2))
def hamming_distance2(chaine1, chaine2):
return len(list(filter(lambda x : ord(x[0])^ord(x[1]), zip(chaine1, chaine2))))
if __name__=="__main__":
chaine1 = hashlib.md5("chaine1".encode()).hexdigest()
chaine2 = hashlib.md5("chaine2".encode()).hexdigest()
#chaine1 = "6fb17381822a6ca9b02153d031d5d3da"
#chaine2 = "a242eace2c57f7a16e8e872ed2f2287d"
assert len(chaine1) == len(chaine2)
print(hamming_distance(chaine1, chaine2))
print(hamming_distance2(chaine1, chaine2))
The reason why you get Invalid syntax: ...
is probably you don’t have any indentations, which are required in Python.
First we should review the definition of the Hamming distance between two strings:
The Hamming distance between two strings of equal length is the number of positions at which these strings vary. In more technical terms, it is a measure of the minimum number of changes required to turn one string into another.
Let’s get a solution for it.
def hamming(s1,s2):
result=0
if len(s1)!=len(s2):
print("String are not equal")
else:
for x,(i,j) in enumerate(zip(s1,s2)):
if i!=j:
print(f'char not math{i,j}in {x}')
result+=1
return result
s1="rover"
s2="river"
print(hamming(s1,s2))
Result: char not match (‘o’, ‘i’) in 1
from scipy.spatial import distance
DNA1 = list("GAGCCTACTAACGGGAT")
DNA2 = list("CATCGTAATGACGGCCT")
d = round(distance.hamming(DNA1, DNA2) * len(DNA1))
print(d) # 7
I need to find the Hamming distance between two strings:
chaine1 = 6fb17381822a6ca9b02153d031d5d3da
chaine2 = a242eace2c57f7a16e8e872ed2f2287d
The XOR function didn’t work, and my search on the web was not very successful.
I tried to modify something I found on the web, but there’s some invalid syntax…:
assert len (chaine1) == len(chaine2)
return sum(chaine1 != chaine2 for chaine1, chaine2 in zip(chaine1, chaine2))
if __name__=="__main__":
chaine1 = hashlib.md5("chaine1".encode()).hexdigest()
chaine2 = hashlib.md5("chaine2".encode()).hexdigest()
print hamming_distance(chaine1, chaine2)
How could I proceed?
Following is a program calculating the Hamming distance using two different ways.
import hashlib
def hamming_distance(chaine1, chaine2):
return sum(c1 != c2 for c1, c2 in zip(chaine1, chaine2))
def hamming_distance2(chaine1, chaine2):
return len(list(filter(lambda x : ord(x[0])^ord(x[1]), zip(chaine1, chaine2))))
if __name__=="__main__":
chaine1 = hashlib.md5("chaine1".encode()).hexdigest()
chaine2 = hashlib.md5("chaine2".encode()).hexdigest()
#chaine1 = "6fb17381822a6ca9b02153d031d5d3da"
#chaine2 = "a242eace2c57f7a16e8e872ed2f2287d"
assert len(chaine1) == len(chaine2)
print(hamming_distance(chaine1, chaine2))
print(hamming_distance2(chaine1, chaine2))
The reason why you get Invalid syntax: ...
is probably you don’t have any indentations, which are required in Python.
First we should review the definition of the Hamming distance between two strings:
The Hamming distance between two strings of equal length is the number of positions at which these strings vary. In more technical terms, it is a measure of the minimum number of changes required to turn one string into another.
Let’s get a solution for it.
def hamming(s1,s2):
result=0
if len(s1)!=len(s2):
print("String are not equal")
else:
for x,(i,j) in enumerate(zip(s1,s2)):
if i!=j:
print(f'char not math{i,j}in {x}')
result+=1
return result
s1="rover"
s2="river"
print(hamming(s1,s2))
Result: char not match (‘o’, ‘i’) in 1
from scipy.spatial import distance
DNA1 = list("GAGCCTACTAACGGGAT")
DNA2 = list("CATCGTAATGACGGCCT")
d = round(distance.hamming(DNA1, DNA2) * len(DNA1))
print(d) # 7