Calculate Distance Metric between Homomorphic Encrypted Vectors

Question:

Is there a way to calculate a distance metric (euclidean or cosine similarity or manhattan) between two homomorphically encrypted vectors?

Specifically, I’m looking to generate embeddings of documents (using a transformer), homomorphically encrypting those embeddings, and wanting to calculate a distance metric between embeddings to obtain document similarity scores.

I have evaluated libraries like concrete-numpy, TenSEAL, and Pyfhel (HE libraries) and each library appears to lack a specific mathematical operation be it division, cumulative sum, or absolute value that prevents generating any of the listed distance metrics above.

(I did find this: https://github.com/ibarrond/Pyfhel/blob/master/examples/Demo_8_HammingDist.py which calculates hamming distance between encrypted vectors, but this metric doesn’t help with document similarity).

Asked By: Brian Behe

||

Answers:

[Credit goes to ibarrond – answer found here: https://github.com/ibarrond/Pyfhel/issues/155]

There is indeed! You just need to rely on a few tricks to overcome the limitations of supported operations in CKKS/BFV schemes (mainly additions and multiplications):

Cosine Similarity: Formulated as CS(x, y) = (sum(xᵢ * yᵢ))/(||x|| * ||y||), it would require a division and a norm.

The trick: Normalize the vectors x’ = x / ||x|| and y’ = y / ||y||, encrypt x’ and y’ and perform a simple scalar product sum(xᵢ’ * yᵢ’) to obtain the cosine similarity (Check Demo_7_ScalarProd.py on how to do that).

Euclidean Distance: Formulated as ED(x, y) = sqrt( sum(xᵢ² – yᵢ²)), it would require a square root.

The trick: Settle with the Squared Euclidean Distance instead, where SED = sum(xᵢ² – yᵢ²)). In Pyfhel you can perform element-wise squaring natively (Demo_3), followed by Pyfhel.cumul_add for cumulative sum (some examples of this in Demo_7 and Demo_8).

Manhattan Distance: Formulated as MD(x, y) = sum(|xᵢ – yᵢ|), it would require computing the absolute value.

The trick: If you encrypt binary values only (that is, x̂, ŷ s.t. x̂ᵢ , ŷᵢ) ∈ {0,1} ∀ i), you can reformulate MD(x̂, ŷ) = sum((x̂ᵢ – ŷᵢ)²) = HD(x̂, ŷ), the Hamming Distance, which is implemented in Demo_8. For non-binary vectors you can at least compute the Squared Manhattan Distance SMD(x, y) = sum((xᵢ – yᵢ)²), missing a square root to get the MD.

General Tip: Leverage on the operations you can perform before encrypting (e.g., normalization) and after decrypting (e.g., square root of the result) to avoid computing non-linear functions in FHE!

Demos referenced can be found here: https://github.com/ibarrond/Pyfhel/tree/master/examples

Answered By: Brian Behe