How to create a column as a list of similar strings onto a new column?

Question

I’ve been trying to get a new row in a pandas dataframe which encapsullates as a list all the similar strings into it’s original matching row.

This is the original pandas dataframe:

import pandas as pd

d = {'product_name': ['2 pack liner socks', '2 pack logo liner socks', 'b.bare Hipster', 'Lady BARE Hipster Panty'], 'id': [13, 12, 11, 10]}
df = pd.DataFrame(data=d)

I would like to get a dataframe that looks like this:

# product_name                  # id          # group
  2 pack liner socks             13           ['2 pack liner socks', '2 pack logo liner socks']
  2 pack logo liner socks        12           ['2 pack liner socks', '2 pack logo liner socks']
  b.bare Hipster                 11           ['b.bare Hipster', 'Lady BARE Hipster Panty']
  Lady BARE Hipster Panty        10           ['b.bare Hipster', 'Lady BARE Hipster Panty']

I tried the following:

import thefuzz
from thefuzz import process


df["group"] = df["product_name"].apply(lambda x: process.extractOne(x, df["product_name"], scorer=fuzz.partial_ratio)[0])

And it throws the next error:

NameError: name ‘fuzz’ is not defined

How could I fix this code or on the other hand are there any other approaches to solve this?

Asked By: AlSub

||

Source

Answer 1

You need to import fuzz – from thefuzz import process, fuzz but using process.extractOne with a list of all values in the product_name will always return the actual value of that row because it is a 100% match so let’s filter that out by doing df["product_name"].loc[df['product_name'] != x]

from thefuzz import process, fuzz


df['group'] = df["product_name"].apply(lambda x: sorted([x, process.extractOne(x, df["product_name"].loc[df['product_name'] != x],
                                                                               scorer=fuzz.partial_ratio)[0]]))

              product_name  id                                          group
0       2 pack liner socks  13  [2 pack liner socks, 2 pack logo liner socks]
1  2 pack logo liner socks  12  [2 pack liner socks, 2 pack logo liner socks]
2           b.bare Hipster  11      [Lady BARE Hipster Panty, b.bare Hipster]
3  Lady BARE Hipster Panty  10      [Lady BARE Hipster Panty, b.bare Hipster]

Answered By: It_is_Chris

How to create a column as a list of similar strings onto a new column?

Question:

Answers: