How to append all possible pairings as a column in pandas

Question:

I have a dataframe like below:

Class  Value1   Value2
A      2        1
B      3        3
C      4        5

I wanted to generate all possible pairings and my output dataframe looks like below

Class   Value1   Value2   A_Value1   A_Value2   B_Value1   B_Value2   C_Value1  C_Value2
A       2        1        2          1          3           3         4         5   
B       3        3        2          1          3           3         4         5 
C       4        5        2          1          3           3         4         5 

Please assume there are nearly 1000 such classes. Is there any efficient way to do this ? Ultimately, I wanted to find the difference between (Value1 and value2) across each pairings

EDIT:
A_B_Value is created based on the formula

A_B_Value = absolute(ClassA_value1 - ClassB_value1) + absolute(ClassA_value2 - ClassB_value2)

    Class   Value1   Value2   A_B_Value   A_C_Value   B_C_Value
    A         2        1       3            6         3
    B         3        3       3            6         3
    C         4        5       3            6         3

Thank you

Asked By: python_interest

||

Answers:

If need subtract columns Value1,Value2 and append new columns create dictionary and add them by DataFrame.assign:

d = dict(zip(df['Class'].add('_diff'), 
             df['Value1'].sub(df['Value2'])))
print (d)
{'A_diff': 1, 'B_diff': 0, 'C_diff': -1}

df = df.assign(**d)
print (df)
  Class  Value1  Value2  A_diff  B_diff  C_diff
0     A       2       1       1       0      -1
1     B       3       3       1       0      -1
2     C       4       5       1       0      -1

EDIT: You can create all combinations by itertools.combinations and in dictionary comprehension get difference, last create new columns by DataFrame.assign:

from  itertools import combinations

df1 = df.set_index('Class')
cols = list(combinations(df1.index,2))

d = {f'{a}_{b}_Value' : abs(df1.loc[a, 'Value1'] - df1.loc[b, 'Value1']) + 
                        abs(df1.loc[a, 'Value2'] - df1.loc[b, 'Value2']) for a, b in cols}
df = df.assign(**d)
print (df)
  Class  Value1  Value2  A_B_Value  A_C_Value  B_C_Value
0     A       2       1          3          6          3
1     B       3       3          3          6          3
2     C       4       5          3          6          3

EDIT1: Because performance is important, here is vectorized solution inspired this:

#convert Class to index
df1 = df.set_index('Class')

#convert DataFrame to 2d array
v = df1.to_numpy()
#get indices of combinations
i, j = np.tril_indices(len(df1.index), -1)

#select array - first column Value1 is 0
out1_1 = v[i, 0]
out1_2 = v[j, 0]

#select array - second column Value2 is 1
out2_1 = v[i, 1]
out2_2 = v[j, 1] 

#new columns names by combinations
cols = [f'{a}_{b}_Value' for a, b in zip(df1.index[j], df1.index[i])]

#new values in array
arr = np.abs(out1_1 - out1_2) + np.abs(out2_1 - out2_2)
 
#appended new columns
df = df.assign(**dict(zip(cols, arr)))
print (df)
  Class  Value1  Value2  A_B_Value  A_C_Value  B_C_Value
0     A       2       1          3          6          3
1     B       3       3          3          6          3
2     C       4       5          3          6          3

Performance comparison:

#100 rows DataFrame
N = 100
df = pd.DataFrame({'Class':[f'val{i+1}' for i in np.arange(N)],
                   'Value1':np.random.randint(100, size=N),
                   'Value2':np.random.randint(100, size=N)})

# print (df)

%%timeit
df1 = df.set_index('Class')

v = df1.to_numpy()
i, j = np.tril_indices(len(df1.index), -1)

#first column Value1 is 0
v1_1 = v[i, 0]
v1_2 = v[j, 0]

#second column Value2 is 1
v2_1 = v[i, 1]
v2_2 = v[j, 1] 

cols = [f'{a}_{b}_Value' for a, b in zip(df1.index[j], df1.index[i])]
arr = np.abs(v1_1-v1_2) + np.abs(v2_1-v2_2)
 
df.assign(**dict(zip(cols, arr)))


303 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
tmp = df.set_index('Class')

cols = list(combinations(tmp.index,2))
idx1, idx2 = map(list, zip(*cols))

v1_1 = tmp.loc[idx1, 'Value1'].to_numpy()
v1_2 = tmp.loc[idx2, 'Value1'].to_numpy()
v2_1 = tmp.loc[idx1, 'Value2'].to_numpy()
v2_2 = tmp.loc[idx2, 'Value2'].to_numpy()

df[[f'{x1}_{x2}_Value' for x1, x2 in cols]
  ] = np.repeat((abs(v1_1-v1_2)+abs(v2_1-v2_2))[None], len(df), axis=0)
  
733 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Answered By: jezrael

You can stack and flatten the MultiIndex, then perform a cross merge:

s = df.set_index('Class').stack()
s.index = s.index.map('_'.join)
out = df.merge(s.to_frame().T, how='cross')

Output:

  Class  Value1  Value2  A_Value1  A_Value2  B_Value1  B_Value2  C_Value1  C_Value2
0     A       2       1         2         1         3         3         4         5
1     B       3       3         2         1         3         3         4         5
2     C       4       5         2         1         3         3         4         5

vectorial numerical computation

from itertools import combinations

tmp = df.set_index('Class')

cols = list(combinations(tmp.index,2))
idx1, idx2 = map(list, zip(*cols))

v1_1 = tmp.loc[idx1, 'Value1'].to_numpy()
v1_2 = tmp.loc[idx2, 'Value1'].to_numpy()
v2_1 = tmp.loc[idx1, 'Value2'].to_numpy()
v2_2 = tmp.loc[idx2, 'Value2'].to_numpy()

df[[f'{x1}_{x2}_Value' for x1, x2 in cols]
  ] = np.repeat((abs(v1_1-v1_2)+abs(v2_1-v2_2))[None], len(df), axis=0)

print(df)

Output:

  Class  Value1  Value2  A_B_Value  A_C_Value  B_C_Value
0     A       2       1          3          6          3
1     B       3       3          3          6          3
2     C       4       5          3          6          3
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.