Melt the Upper Triangular Matrix of a Pandas Dataframe

Question:

Given a square pandas DataFrame of the following form:

``````   a  b  c
a  1 .5 .3
b .5  1 .4
c .3 .4  1
``````

How can the upper triangle be melted to get a matrix of the following form

`````` Row     Column    Value
a        a       1
a        b       .5
a        c       .3
b        b       1
b        c       .4
c        c       1

#Note the combination a,b is only listed once.  There is no b,a listing
``````

I’m more interested in an idiomatic pandas solution, a custom indexer would be easy enough to write by hand…

First I convert lower values of `df` to `NaN` by `where` and `numpy.triu` and then `stack`, `reset_index` and set column names:

``````import numpy as np

print df
a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True  True  True]
[False  True  True]
[False False  True]]

df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
a    b    c
a   1  0.5  0.3
b NaN  1.0  0.4
c NaN  NaN  1.0

df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df

Row Column  Value
0   a      a    1.0
1   a      b    0.5
2   a      c    0.3
3   b      b    1.0
4   b      c    0.4
5   c      c    1.0
``````

Building from solution by @jezrael, boolean indexing would be a more explicit approach:

``````import numpy
from pandas import DataFrame

df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]
``````

output:

``````     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

a  a    1.0
b    0.5
c    0.3
b  b    1.0
c    0.4
c  c    1.0
dtype: float64
``````

Also buildin on solution by @jezrael, here’s a version adding a function to do the inverse operation (from xy to matrix), usefull in my case to work with covariance / correlation matrices.

``````def matrix_to_xy(df, columns=None, reset_index=False):
bool_index = np.triu(np.ones(df.shape)).astype(bool)
xy = (
df.where(bool_index).stack().reset_index()
if reset_index
else df.where(bool_index).stack()
)
if reset_index:
xy.columns = columns or ["row", "col", "val"]
return xy

def xy_to_matrix(xy):
df = xy.pivot(*xy.columns).fillna(0)
df_vals = df.to_numpy()
df = pd.DataFrame(
np.triu(df_vals, 1) + df_vals.T, index=df.index, columns=df.index
)
return df
df = pd.DataFrame(
{"a": [1, 0.5, 0.3], "b": [0.5, 1, 0.4], "c": [0.3, 0.4, 1]},
index=list("abc"),
)
print(df)
xy = matrix_to_xy(df, reset_index=True)
print(xy)
mx = xy_to_matrix(xy)
print(mx)
``````

output:

``````     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

row col  val
0   a   a  1.0
1   a   b  0.5
2   a   c  0.3
3   b   b  1.0
4   b   c  0.4
5   c   c  1.0

row    a    b    c
row
a    1.0  0.5  0.3
b    0.5  1.0  0.4
c    0.3  0.4  1.0
``````
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.