Checking a if a string contains a string value from a dictionary and return the appropriate key

Question:

I want to check if a string in a Pandas column contains a word from a dictionary and if there is a match need to create a new column with the appropriate dictionary key as the column value. eg.
dict = {‘Car’: [‘Merc’, ‘BMW’, ‘Ford, ‘Suzuki’], ‘MotorCycle’: [‘Harley’, ‘Yamaha’, ‘Triump’]}

df

Person Sentence
A ‘He drives a Merc’
B ‘He rides a Harley’

should return

Person Sentence Vehicle
A ‘He drives a Merc’ ‘Car’
B ‘He rides a Harley’ "Motorcycle
Asked By: dratoms

||

Answers:

One solution is to create reversed dictionary from the dct and search for right word using str.split:

dct = {
    "Car": ["Merc", "BMW", "Ford", "Suzuki"],
    "MotorCycle": ["Harley", "Yamaha", "Triump"],
}

dct_inv = {i: k for k, v in dct.items() for i in v}


def find_word(x):
    for w in x.strip(" '").split():
        if w in dct_inv:
            return dct_inv[w]
    return None


df["Vehicle"] = df["Sentence"].apply(find_word)
print(df)

Prints:

  Person             Sentence     Vehicle
0      A   'He drives a Merc'         Car
1      B  'He rides a Harley'  MotorCycle
Answered By: Andrej Kesely

You can invert the dictionary and use a regex + map:

import re

dic = {'Car': ['Merc', 'BMW', 'Ford', 'Suzuki'],
       'MotorCycle': ['Harley', 'Yamaha', 'Triump']}

# invert dictionary
d = {k:v for v,l in dic.items()
     for k in l}

# craft regex 
regex = f'({"|".join(map(re.escape, d))})'

# map vehicle from match
df['Vehicle'] = df['Sentence'].str.extract(regex, expand=False).map(d)

Output:

  Person           Sentence     Vehicle
0      A   He drives a Merc         Car
1      B  He rides a Harley  MotorCycle

several matches:

use extractall and either aggregation, or here unstack:

df.join(df['Sentence'].str.extractall(regex)[0].map(d).unstack().add_prefix('Vehicle_'))

output:

  Person                     Sentence   Vehicle_0   Vehicle_1
0      A             He drives a Merc         Car         NaN
1      B            He rides a Harley  MotorCycle         NaN
2      C  She has a Merc and a Harley         Car  MotorCycle

with aggregation:

df['Vehicle'] = df['Sentence'].str.extractall(regex)[0].map(d).groupby(level=0).agg(','.join)

output:

  Person                     Sentence         Vehicle
0      A             He drives a Merc             Car
1      B            He rides a Harley      MotorCycle
2      C  She has a Merc and a Harley  Car,MotorCycle
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.