Pairwise row difference – Python
Question:
I would like to create a matrix showing deviation by pairwise comparison of columns. This is the head of my dataframe. My dataframe consists of 1660 rows and 8 columns (A:alternatives, C:criteria).
C1 C2 C3
A1 1996 36 0.18
A2 1373 15 0.53
A3 1472 19 0.29
First I want to create two new columns. The first one consists of the names of my alternatives and the second one the difference between the alternatives. The result should look like this for each criterion:
C1 C2 C3
A1-A2 623 .
A1-A3 524 .
A2-A1 -623
A2-A3 -99
A3-A1 -524
A3-A2 99
Im using Python languange and dataframes.
Answers:
Here’s one way to do your pairwise row difference:
# Necessary imports
import pandas as pd
# Define some sample data
df = pd.DataFrame(
[
[1996, 36, 0.18],
[1373, 15, 0.53],
[1472, 19, 0.29],
],
index=["A1", "A2", "A3"],
columns=["C1", "C2", "C3"],
)
# Find the different combinations of the indexes, for example A1-A2, A1-A3, ...
combs = df.index.to_series().apply(lambda i: i + "-" + df.index.to_series().drop(i))
combs = combs.melt(value_vars=combs.columns).dropna()['value']
print(combs)
# Prints
#
# 0 A1-A2
# 2 A3-A2
# 3 A1-A3
# 4 A2-A3
# 7 A2-A1
# 8 A3-A1
# Name: value, dtype: object
# Create a new dataframe with each pairwise row difference
row_diffs = pd.DataFrame(
[
[
df[col][comb.split("-")[0]] - df[col][comb.split("-")[1]]
for col in df.columns
]
for comb in combs.values
],
columns=df.columns,
index=combs,
).sort_index()
print(row_diffs)
# Prints
#
# C1 C2 C3
# value
# A1-A2 623 21 -0.35
# A1-A3 524 17 -0.11
# A2-A1 -623 -21 0.35
# A2-A3 -99 -4 0.24
# A3-A1 -524 -17 0.11
# A3-A2 99 4 -0.24
You can do a cartesian product of df index with pd.MultiIndex.from_product
then filter the same value. Then index the df with the MultiIndex.
index = pd.MultiIndex.from_product([df.index, df.index])
index = index[index.get_level_values(0) != index.get_level_values(1)]
out = pd.DataFrame(df.loc[index.get_level_values(0)].to_numpy() - df.loc[index.get_level_values(1)].to_numpy(),
index=index.map('{0[0]}-{0[1]}'.format),
columns=df.columns)
print(out)
C1 C2 C3
A1-A2 623.0 21.0 -0.35
A1-A3 524.0 17.0 -0.11
A2-A1 -623.0 -21.0 0.35
A2-A3 -99.0 -4.0 0.24
A3-A1 -524.0 -17.0 0.11
A3-A2 99.0 4.0 -0.24
I would like to create a matrix showing deviation by pairwise comparison of columns. This is the head of my dataframe. My dataframe consists of 1660 rows and 8 columns (A:alternatives, C:criteria).
C1 C2 C3
A1 1996 36 0.18
A2 1373 15 0.53
A3 1472 19 0.29
First I want to create two new columns. The first one consists of the names of my alternatives and the second one the difference between the alternatives. The result should look like this for each criterion:
C1 C2 C3
A1-A2 623 .
A1-A3 524 .
A2-A1 -623
A2-A3 -99
A3-A1 -524
A3-A2 99
Im using Python languange and dataframes.
Here’s one way to do your pairwise row difference:
# Necessary imports
import pandas as pd
# Define some sample data
df = pd.DataFrame(
[
[1996, 36, 0.18],
[1373, 15, 0.53],
[1472, 19, 0.29],
],
index=["A1", "A2", "A3"],
columns=["C1", "C2", "C3"],
)
# Find the different combinations of the indexes, for example A1-A2, A1-A3, ...
combs = df.index.to_series().apply(lambda i: i + "-" + df.index.to_series().drop(i))
combs = combs.melt(value_vars=combs.columns).dropna()['value']
print(combs)
# Prints
#
# 0 A1-A2
# 2 A3-A2
# 3 A1-A3
# 4 A2-A3
# 7 A2-A1
# 8 A3-A1
# Name: value, dtype: object
# Create a new dataframe with each pairwise row difference
row_diffs = pd.DataFrame(
[
[
df[col][comb.split("-")[0]] - df[col][comb.split("-")[1]]
for col in df.columns
]
for comb in combs.values
],
columns=df.columns,
index=combs,
).sort_index()
print(row_diffs)
# Prints
#
# C1 C2 C3
# value
# A1-A2 623 21 -0.35
# A1-A3 524 17 -0.11
# A2-A1 -623 -21 0.35
# A2-A3 -99 -4 0.24
# A3-A1 -524 -17 0.11
# A3-A2 99 4 -0.24
You can do a cartesian product of df index with pd.MultiIndex.from_product
then filter the same value. Then index the df with the MultiIndex.
index = pd.MultiIndex.from_product([df.index, df.index])
index = index[index.get_level_values(0) != index.get_level_values(1)]
out = pd.DataFrame(df.loc[index.get_level_values(0)].to_numpy() - df.loc[index.get_level_values(1)].to_numpy(),
index=index.map('{0[0]}-{0[1]}'.format),
columns=df.columns)
print(out)
C1 C2 C3
A1-A2 623.0 21.0 -0.35
A1-A3 524.0 17.0 -0.11
A2-A1 -623.0 -21.0 0.35
A2-A3 -99.0 -4.0 0.24
A3-A1 -524.0 -17.0 0.11
A3-A2 99.0 4.0 -0.24