Common words in two different pandas data frame and colum

Question:

A

x disc
a ‘tall’, ‘short’, ‘medium’
b ‘small’, ‘long’, ‘short’

B

y
‘tall’, ‘short’
‘short’, ‘long’
‘small’, ‘tall’

output like-

x disc tall short short long
a ‘tall’, ‘short’, ‘medium’ 1 0
b ‘small’, ‘long’, ‘short’ 0 1
Asked By: user18325356

||

Answers:

Convert values to sets and find common words with set new columns:

for x in B['y']:
    s = set(x.split(', '))
    A[x] = [int(set(y.split(', ')) >= s) for y in A['disc']]

If necessarry remove only 0 columns add:

out = A.loc[:, A.ne(0).any()]
Answered By: jezrael

You can use set comparison with numpy broadcasting:

out = A.join(pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                           >= B['y'].apply(set).to_numpy()).astype(int),
                          columns=B['y'].apply(' '.join), index=A.index)
             )

Output:

   x                   disc  tall short  short long  small tall
0  a  [tall, short, medium]           1           0           0
1  b   [small, long, short]           0           1           0

If you want only the matches:

tmp = pd.DataFrame((A['disc'].apply(set).to_numpy()[:,None]
                     >= B['y'].apply(set).to_numpy()),
                    columns=B['y'].apply(' '.join), index=A.index)
                   
out = A.join(tmp.loc[:, tmp.any()].astype(int))

Output:

   x                   disc  tall short  short long
0  a  [tall, short, medium]           1           0
1  b   [small, long, short]           0           1
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.