(vectorization) loop through two dataframe cell by cell and find if one is part of the other

Question:

I have a dataframe contains color and material parameters and another one contain data. I want to check cell by cell if the data dataframe have any of the data in the parameters dataframe
I know that I should use vectorization but I am not sure how

parameter = pd.DataFrame({'color': ['red','blue','green'],
                   'material': ['wood','metal','plastic']})


data = pd.DataFrame({'name': ['my blue color','red chair','green rod'],
                   'description': ['it is a great color','made with wood','made with metal']})

and i want to create a new column contains the parameters. This is the output that i need.

data['attribute2']= ['','wood','metal']
print(data)
           color             material attribute attribute2
0  my blue color  it is a great color      blue           
1      red chair       made with wood       red       wood
2      green rod      made with metal     green      metal
    
Asked By: Sha tha

||

Answers:

The following code filters color and material which is able to extract color(s) and material(s).

data['attribute'] = data['name'].apply(lambda name: ','.join([c for c in parameter['color'].tolist() if c in name]))
data['attribute2'] = data['description'].apply(lambda desc: ','.join([m for m in parameter['material'].tolist() if m in desc]))

Output:

index color material attribute attribute2
0 my blue color it is a great color blue
1 red chair made with wood red wood
2 green rod made with metal green metal
Answered By: luangtatipsy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.