String Matching with dictionary key in python

Question:

I have one list of string and one dictionary. For eg:

list = ["apple fell on Newton", "lemon is yellow","grass is greener"]
dict = {"apple" : "fruits", "lemon" : "vegetable"}

Task is to match each string from list with the key of dictionary. If it matches then return the value of the key.

Currently, I am using this approach which is very time consuming. Can someone please help me out with any efficient technique ?

lmb_extract_type = (lambda post: list(filter(None, set(dict.get(w)[0] if w in post.lower().split() else None for w in dict))))

 df['type']  = df[list].apply(lmb_extract_type)
Asked By: SK Singh

||

Answers:

It is a single column with a string (eg.: "apple fell on Newton") in each row of the data frame. For each row, I have to match it with key from the dictionary and return value of the corresponding key

Number of elements in the list is around 40-50 million.So, its taking a lot of time

IIUC, based on your comments, you can solve this easily with a str.extract and series.replace, both of which are vectorized functions without any loops.

  1. For using str.extract, you can create a regex pattern from the keys of the dictionary. This only extracts the keywords apple or lemon.
  2. You can use the dictionary d to then simply replace each of these directly with the corresponding values
l = ["apple fell on Newton", "lemon is yellow","grass is greener"]
d = {"apple" : "fruits", "lemon" : "vegetable"}

df = pd.DataFrame(l, columns=['sentences']) #Single column dataframe to demonstrate.

pattern = '('+'|'.join(d.keys())+')'   #Regular expression pattern
df['type'] = df.sentences.str.extract(pattern).replace(d)
print(df)
              sentences       type
0  apple fell on Newton     fruits
1       lemon is yellow  vegetable
2      grass is greener        NaN
Answered By: Akshay Sehgal

Check by applying the lambda function and store the values in string in the dataframe.

df['New_Col'] = df['sentences'].apply(lambda l: ', '.join([key for key, value in d.items() if value in l]))
Answered By: Tejas Sutar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.