# Pandas percentage change matrix

## Question:

I have a data frame:

product cost
0 product a 56
1 product b 59
2 product c 104

I’d like to make a percentage change matrix like:

product a product b product c
product a -5.08% -46.15%
product b 5.36% -43.30%
product c 85.71% 76.27%

There could be n number of products.

1. How do I this using pandas?

2. How do I get the highest / lowest percentage change products? i.e. Highest: product a vs. product c. Lowest: product c vs. product a.

``````# convert columns to arrays
idx = df['product'].to_numpy()
cost = df['cost'].to_numpy()

# compute the percentage change using broadcasting
# convert to DataFrame
out = pd.DataFrame(((cost[:,None]-cost)/cost*100).round(2),
index=idx, columns=idx)

# optional, set NaNs in the diagonal
np.fill_diagonal(out.values, np.nan)

print(out)
``````

Output:

``````           product a  product b  product c
product a        NaN      -5.08     -46.15
product b       5.36        NaN     -43.27
product c      85.71      76.27        NaN
``````

## question 1

Here is a short way to do the math

``````import pandas as pd
import numpy as np

df = pd.DataFrame([
["product a", 56],
["product b", 59],
["product c", 104]
], columns=["product", "cost"])

m = pd.DataFrame(
data=np.array(df.cost) * np.ones((3, 3)),
index=df["product"],
columns=df["product"],
)
m.index.name = None
m.columns.name = None
m = (m.T-m) / m  # this is where the actual calculation happens
m
``````

result is

## question 2

``````# products with largest change (looks complicated to avoid that product a is compared to itself)
(m + np.diag(np.full(len(df),-np.inf))).idxmax(axis=0)
``````

``````# products with smallest change
(m + np.diag(np.full(len(df),np.inf))).idxmin(axis=0)
``````

### edit

OP asks for the single highest / lowest value in matrix m

``````
# index of largest value
(m + np.diag(np.full(len(df), -np.inf))).stack().idxmax()

# index of smallest value
(m + np.diag(np.full(len(df), +np.inf))).stack().idxmin()

``````

Another possible solution, which uses spacial distance with a custom function to calculate the percentages (`perc_change`). Matrices `mat1` and `mat2` compute, respectively, the values below and above the main diagonal of the final dataframe.

``````from scipy.spatial.distance import pdist, squareform

def perc_change(u, v):
return (v - u) / u * 100

mat1 = squareform(pdist(df[['cost']].values, lambda u, v: perc_change(v[0], u[0])))
mat2 = squareform(pdist(df[['cost']].values, lambda u, v: perc_change(u[0], v[0])))

mat = np.tril(mat1) + np.triu(mat2)

pd.DataFrame(mat, columns=df['product'].to_list(), index=df['product'].to_list())
``````

Output:

``````           product a  product b  product c
product a   0.000000   5.357143  85.714286
product b  -5.084746   0.000000  76.271186
product c -46.153846 -43.269231   0.000000
``````

Here is a way using dot and np.diag

``````df = df.set_index('product')
df2 = df.dot(df.T)
df2 = df2.rdiv(np.diag(df2.to_numpy()),axis=0).sub(1)
``````

Output:

``````product    product a  product b  product c
product
product a   0.000000  -0.050847  -0.461538
product b   0.053571   0.000000  -0.432692
product c   0.857143   0.762712   0.000000
``````
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.