Pairwise Kullback Leibler (or Jensen-Shannon) divergence distance matrix in Python

Question:

I have two matrices X and Y (in most of my cases they are similar) Now I want to calculate the pairwise KL divergence between all rows and output them in a matrix. E.g:

X = [[0.1, 0.9], [0.8, 0.2]]

The function should then take kl_divergence(X, X) and compute the pairwise Kl divergence distance for each pair of rows of both X matrices. The output would be a 2×2 matrix.

Is already some implementation for this in Python? If not, this should be quite simple to calculate. I’d like some kind of matrix implementation for this, because I have a lot of data and need to keep the runtime as low as possible. Alternatively the Jensen-Shannon entropy is also fine. Eventually this would even be a better solution for me.

Asked By: fsociety

||

Answers:

Note that KL divergence is essentially a dot product of P(i) and log(P(i)/Q(i)). So, one option is to form a list of numpy arrays for P(i) and another for log(P(i)/Q(i)), one row for each KL divergence you want to calculate), then perform dot-products.

Answered By: jrennie

There is a new(ish) library called dit which has JSD implemented, as well as mutual information and many other distance metrics:

import dit
foo = dit.Distribution(['A','B','C'],[0.5,0.5,0.0])
bar = dit.Distribution(['A','B','C'],[0.1,0.0,0.9])
dit.divergences.jensen_shannon_divergence([foo,bar])
0.80499327350549388

The docs could use a bit of work, but it looks promising.

http://docs.dit.io/en/latest/generalinfo.html#quickstart

Answered By: hurfdurf
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.