Market Basket Association Analysis python or SQL

Question

I am using a dataset as below. Rows show invoice numbers, columns show products. I want to show the number of products on the same invoice as a matrix (i.e. there will be products in both rows and columns, the intersection of the row and column will show how many times those 2 products are on the same invoice. How can I do that? Thanks.

Note: ‘1’ indicates that the product is included in that invoice, and ‘0’ indicates that it is not.
finally i want to get a matrix like in the picture. intersecting cells show the number of sales of the relevant product pair.

import pandas as pd
ids = ['invoice_1','invoice_2','invoice_3','invoice_4','invoice_5','invoice_6']
A= [0,0,1,0,1,1]
B= [0,1,1,0,1,1]
C= [1,1,1,0,1,0]
D= [1,0,0,1,1,0]
df=pd.DataFrame.from_dict({'A':A, 'B':B, 'C':C, 'D':D})
df.index=ids

Actually I want to get Table 2 from Table 1. AA=3 because product A is included in 3 invoices (rows) totally. AB=4 because A and B are included in 4 invoices (rows) together.
Note: Even if the AA, BB, CC , DD cells are not full, it does not matter. Binary products (like AB, DC etc.) are important to me.

Table 1
                A   B   C   D
    invoice_1   0   0   1   1
    invoice_2   0   1   1   0
    invoice_3   1   1   1   0
    invoice_4   0   0   0   1
    invoice_5   1   1   1   1
    invoice_6   1   1   0   0
    invoice_7   1   1   0   0
    
Table 2    
        A   B   C   D
    A   4   4   2   1
    B   4   4   3   1
    C   2   3   4   2
    D   1   1   2   3

Asked By: Bugra Varol

||

Source

Answer 1

The OP made two mistakes here. The first one is the input to generate the intended Table 1 should be:

import pandas as pd

ids = ['invoice_1', 'invoice_2', 'invoice_3', 'invoice_4', 'invoice_5', 'invoice_6', 'invoice_7']
A = [0, 0, 1, 0, 1, 1, 1]
B = [0, 1, 1, 0, 1, 1, 1]
C = [1, 1, 1, 0, 1, 0, 0]
D = [1, 0, 0, 1, 1, 0, 0]
df = pd.DataFrame(data={'A': A, 'B': B, 'C': C, 'D': D}, index=ids)

Table 1
                A   B   C   D
    invoice_1   0   0   1   1
    invoice_2   0   1   1   0
    invoice_3   1   1   1   0
    invoice_4   0   0   0   1
    invoice_5   1   1   1   1
    invoice_6   1   1   0   0
    invoice_7   1   1   0   0

Table 2 is actually the dot product of 2 matrices, i.e. matrix df and its transpose. One can use numpy.dot to produce it.

import numpy as np

pd.DataFrame(data=np.dot(df.T, df), index=df.columns, columns=df.columns)

Table 2    
        A   B   C   D
    A   4   4   2   1
    B   4   5   3   1
    C   2   3   4   2
    D   1   1   2   3

The second mistake I pointed out is BB where it should be 5 instead of 4.

Answered By: Anastasiya-Romanova 秀

Market Basket Association Analysis python or SQL

Question:

Answers: