Pandas percentage change matrix

Question:

I have a data frame:

product cost
0 product a 56
1 product b 59
2 product c 104

I’d like to make a percentage change matrix like:

product a product b product c
product a -5.08% -46.15%
product b 5.36% -43.30%
product c 85.71% 76.27%

There could be n number of products.

  1. How do I this using pandas?

  2. How do I get the highest / lowest percentage change products? i.e. Highest: product a vs. product c. Lowest: product c vs. product a.

Thank you for your help.

Asked By: Hal

||

Answers:

Use broadcasting:

# convert columns to arrays
idx = df['product'].to_numpy()
cost = df['cost'].to_numpy()

# compute the percentage change using broadcasting
# convert to DataFrame
out = pd.DataFrame(((cost[:,None]-cost)/cost*100).round(2),
                   index=idx, columns=idx)

# optional, set NaNs in the diagonal
np.fill_diagonal(out.values, np.nan)

print(out)

Output:

           product a  product b  product c
product a        NaN      -5.08     -46.15
product b       5.36        NaN     -43.27
product c      85.71      76.27        NaN
Answered By: mozway

question 1

Here is a short way to do the math

import pandas as pd
import numpy as np

df = pd.DataFrame([
    ["product a", 56],
    ["product b", 59],
    ["product c", 104]
], columns=["product", "cost"])

m = pd.DataFrame(
    data=np.array(df.cost) * np.ones((3, 3)),
    index=df["product"],
    columns=df["product"],
)
m.index.name = None
m.columns.name = None
m = (m.T-m) / m  # this is where the actual calculation happens
m

result is

enter image description here

question 2

# products with largest change (looks complicated to avoid that product a is compared to itself)
(m + np.diag(np.full(len(df),-np.inf))).idxmax(axis=0)

enter image description here

# products with smallest change
(m + np.diag(np.full(len(df),np.inf))).idxmin(axis=0)

enter image description here

edit

OP asks for the single highest / lowest value in matrix m


# index of largest value
(m + np.diag(np.full(len(df), -np.inf))).stack().idxmax()

# index of smallest value
(m + np.diag(np.full(len(df), +np.inf))).stack().idxmin()

Answered By: Klops

Another possible solution, which uses spacial distance with a custom function to calculate the percentages (perc_change). Matrices mat1 and mat2 compute, respectively, the values below and above the main diagonal of the final dataframe.

from scipy.spatial.distance import pdist, squareform

def perc_change(u, v):
    return (v - u) / u * 100

mat1 = squareform(pdist(df[['cost']].values, lambda u, v: perc_change(v[0], u[0])))
mat2 = squareform(pdist(df[['cost']].values, lambda u, v: perc_change(u[0], v[0])))

mat = np.tril(mat1) + np.triu(mat2)

pd.DataFrame(mat, columns=df['product'].to_list(), index=df['product'].to_list())

Output:

           product a  product b  product c
product a   0.000000   5.357143  85.714286
product b  -5.084746   0.000000  76.271186
product c -46.153846 -43.269231   0.000000
Answered By: PaulS

Here is a way using dot and np.diag

df = df.set_index('product')
df2 = df.dot(df.T)
df2 = df2.rdiv(np.diag(df2.to_numpy()),axis=0).sub(1)

Output:

product    product a  product b  product c
product                                   
product a   0.000000  -0.050847  -0.461538
product b   0.053571   0.000000  -0.432692
product c   0.857143   0.762712   0.000000
Answered By: rhug123
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.