Check if string in df1 exists anywhere in df2 and return the matching column name(s) in df1

Question:

I’m trying to check for string matches between two dataframes and return the name of the column the match was found in.

I need to include all matches found in an exact match to be returned as a list in df1.

My Dataframes

df1 (Data to lookup is in the title column)

    title  counts
0    nike       3
1  adidas       2
2   rebok       2

df2 (Dataframe to search for matches in)

          0       1         2
0      nike  adidas     rebok
1      nike   rebok    hitech
2  converse    puma  converse
3    hitech  adidas      nike

Desired Output

    title  counts  matching_cols
0    nike       3  [0,0,2]
1  adidas       2  [1,1]
2   rebok       2  [1,2]

Minimum Reproducible Example

# make df1
data = [['nike', 3], ['adidas', 2], ['rebok', 2]]
df_counts = pd.DataFrame(data, columns=['title', 'counts'])

# make df2
lu_data = [{0: 'nike', 1: 'adidas', 2: 'rebok'},
        {0: 'nike', 1: 'rebok', 2: 'hitech'},
        {0: 'converse', 1: 'puma', 2: 'converse'},
        {0: 'hitech', 1: 'adidas', 2: 'nike'}
        ]

df_words = pd.DataFrame(lu_data)

What I’ve tried:
I’m a little bit stuck reading a lot of similar threads on stackoverflow / google. The issue I’m having is most solutions are looking to match the name of the column, whereas I’m not looking to return the name of a column when a string matches.

Asked By: Lee Roy

||

Answers:

Melt the dataframe then groupby and agg with list to create a mapping dict that maps title to list of column names, then use this dictionary to substitute values in title column of df_counts

d = df_words.melt().groupby('value')['variable'].agg(list)
df_counts['matching_cols'] = df_counts['title'].map(d)

Result

    title  counts matching_cols
0    nike       3     [0, 0, 2]
1  adidas       2        [1, 1]
2   rebok       2        [1, 2]
Answered By: Shubham Sharma
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.