Pairwise row difference – Python

Question:

I would like to create a matrix showing deviation by pairwise comparison of columns. This is the head of my dataframe. My dataframe consists of 1660 rows and 8 columns (A:alternatives, C:criteria).

     C1     C2     C3
A1  1996    36     0.18
A2  1373    15     0.53
A3  1472    19     0.29

First I want to create two new columns. The first one consists of the names of my alternatives and the second one the difference between the alternatives. The result should look like this for each criterion:

         C1  C2  C3
A1-A2   623  .
A1-A3   524  .
A2-A1  -623
A2-A3  -99
A3-A1  -524
A3-A2   99

Im using Python languange and dataframes.

Asked By: Kostas Pl

||

Answers:

Here’s one way to do your pairwise row difference:

# Necessary imports
import pandas as pd

# Define some sample data
df = pd.DataFrame(
    [
        [1996, 36, 0.18],
        [1373, 15, 0.53],
        [1472, 19, 0.29],
    ],
    index=["A1", "A2", "A3"],
    columns=["C1", "C2", "C3"],
)

# Find the different combinations of the indexes, for example A1-A2, A1-A3, ...
combs = df.index.to_series().apply(lambda i: i + "-" + df.index.to_series().drop(i))
combs = combs.melt(value_vars=combs.columns).dropna()['value']
print(combs)
# Prints
#
# 0    A1-A2
# 2    A3-A2
# 3    A1-A3
# 4    A2-A3
# 7    A2-A1
# 8    A3-A1
# Name: value, dtype: object

# Create a new dataframe with each pairwise row difference
row_diffs = pd.DataFrame(
    [
        [
            df[col][comb.split("-")[0]] - df[col][comb.split("-")[1]]
            for col in df.columns
        ]
        for comb in combs.values
    ],
    columns=df.columns,
    index=combs,
).sort_index()
print(row_diffs)
# Prints
#
#         C1  C2    C3
# value               
# A1-A2  623  21 -0.35
# A1-A3  524  17 -0.11
# A2-A1 -623 -21  0.35
# A2-A3  -99  -4  0.24
# A3-A1 -524 -17  0.11
# A3-A2   99   4 -0.24
Answered By: Ingwersen_erik

You can do a cartesian product of df index with pd.MultiIndex.from_product then filter the same value. Then index the df with the MultiIndex.

index = pd.MultiIndex.from_product([df.index, df.index])
index = index[index.get_level_values(0) != index.get_level_values(1)]
out = pd.DataFrame(df.loc[index.get_level_values(0)].to_numpy() - df.loc[index.get_level_values(1)].to_numpy(),
                   index=index.map('{0[0]}-{0[1]}'.format),
                   columns=df.columns)
print(out)

          C1    C2    C3
A1-A2  623.0  21.0 -0.35
A1-A3  524.0  17.0 -0.11
A2-A1 -623.0 -21.0  0.35
A2-A3  -99.0  -4.0  0.24
A3-A1 -524.0 -17.0  0.11
A3-A2   99.0   4.0 -0.24
Answered By: Ynjxsjmh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.