How to merge dataframe with semicolon in python?

Question:

I have two data frames

Product, Users

The product can be in multiple categories and where all the categories are separated with a semicolon.

User interest will be in multiple categories which are also separated from a semicolon as well.

Now I need to find all content ids where users have interaction.

I tried to split both columns of dataframes (Product, Users) and tried to find isin() value I get this error.

users['intrestcategory'].str.split(";", n=1, expand=True)

A value is trying to be set on a copy of a slice from a DataFrame
ValueError: Wrong number of items passed 0, placement implies 1

Below is a sample of data frames:

  1. product
Categories      contentId
                1
12;2            2
                3
2               4
3;15            5
15              6
                7
20              8
20;2            9
  1. Users
userid  intrestcategories
2       12;2
3       3
4       15
  1. Final output
userid  contentId
2       4
2       2
2       9
3       5
4       5
4       6
Asked By: Mohd Waseem

||

Answers:

First we use explode (pandas version >= 0.25.0) to convert the multiple categories per column into multiple rows and then merge on the categories and drop duplicates:

import pandas as pd
from numpy import nan
dfp = pd.DataFrame({'contentId': {0: nan, 1: 2.0, 2: nan, 3: 4.0, 4: 5.0, 5: 6.0, 6: nan, 7: 8.0, 8: 9.0}, 'Categories': {0: '1', 1: '12;2', 2: '3', 3: '2', 4: '3;15', 5: '15', 6: '7', 7: '20', 8: '20;2'}})
dfu = pd.DataFrame({'intrestcategories': {0: '12;2', 1: '3', 2: '15'}, 'userid': {0: 2, 1: 3, 2: 4}})

dfp.Categories = dfp.Categories.str.split(';')
dfp = dfp.explode('Categories')

dfu.intrestcategories = dfu.intrestcategories.str.split(';')
dfu = dfu.explode('intrestcategories')

dfp.dropna().merge(dfu,left_on='Categories',right_on='intrestcategories')[['userid','contentId']].drop_duplicates().astype(int)

Result:

    userid  contentId
0        2          2
2        2          4
3        2          9
4        3          5
5        4          5
6        4          6
Answered By: Stef
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.