List of pairwise distances from full pairwise distance matrix

Question:

All the answers I have found already deal with the opposite problem of generating a pairwise distance matrix from a list of pairwise distances.
Assuming I have a full pairwise distance matrix in the format:


    1       2       3       4       5
1   0.000   1.154   1.235   1.297   0.960   
2   1.154   0.000   0.932   0.929   0.988
3   1.235   0.932   0.000   0.727   1.244
4   1.297   0.929   0.727   0.000   1.019
5   0.960   0.988   1.244   1.019   0.000

And in the end I need a pairwise distance list like:

1   2   1.154
1   3   1.235
1   4   1.297
1   5   0.960
2   3   0.932
... ... ... 

How would I proceed? I haven’t found a function yet to "slice" the dataframe above the diagonal of 0s. Also it would be great if you could hint at me how well this scales with matrix size? Because the original distance matrix is about 100,000×100,000 items big.

Many, many thanks in advance!

I figured that I can use pandas dataframe.stack() function to get the list. But in that case I get double values for all distances as the matrix is symmetrical and I get also the distances of 0 between equal items.

Asked By: Philipp O.

||

Answers:

Use numpy:

>>> df.values[np.triu_indices(df.shape[0], k=1)]
array([1.154, 1.235, 1.297, 0.96 , 0.932, 0.929, 0.988, 0.727, 1.244,
       1.019])
Answered By: Corralien

Use ‘s triu_indices_from:

a = df.to_numpy()
idx, col = np.triu_indices_from(a, k=1)

out = pd.DataFrame({'index': df.index[idx],
                    'columns': df.columns[col],
                    'value': a[idx, col]})

Output:

   index columns  value
0      1       2  1.154
1      1       3  1.235
2      1       4  1.297
3      1       5  0.960
4      2       3  0.932
5      2       4  0.929
6      2       5  0.988
7      3       4  0.727
8      3       5  1.244
9      4       5  1.019
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.