Looking for values from the list among column values

Question:

So the problem I am facing that I am looking for a solution to do something like this:
(general example)

def categotizer(value):
   toys = ['ball', 'bear', 'lego']
   food = ['pizza', 'ice-cream', 'cake']
   if value in toys:
      return 'toys'
   if value in food:
      return 'food'
   else:
      return 'empty'

df['purchases_category'] = df['purchases'].apply(categorizer)

On a column which looks like the first one with result as the second column:
Table

In this exact method I am getting new column filled with ’empty’ values though I totally have examples from the lists. How could this possibly be fixed?

Thanks.

Asked By: Moons eat Stars

||

Answers:

Because in if value in toys: for example, value here is "red ball from…" and it’s not in the toys list. Same can be said for food. Instead, you might want to check the elements in toys/food against the value. Perhaps this would answer your concern?

import pandas as pd


def categorizer(value):
   toys = ['ball', 'bear', 'lego']
   food = ['pizza', 'ice-cream', 'cake']
   for toy in toys:
      if toy in value:
         return 'toys'
   for f in food:
      if f in value:
         return 'food'
   return 'empty'


df = pd.DataFrame({
   "purchases": ["red ball from the shop nearby", "chocolate cake cafe", "teddy bear extra"]
})
df['purchases_category'] = df['purchases'].apply(categorizer)
print(df)
Answered By: Jobo Fernandez

The problem is that if the value in toys checks if the full sentence/ purchase value is contained in the list toys, while you presumably want to check if any keyword of toys is included in the purchase value.

You can do something like this, which is easier to generalize and include other categories.

def categorizer(value):
    category_keywords = {
        'toys': ['ball', 'bear', 'lego'],
        'food': ['pizza', 'ice-cream', 'cake']
    }
    
    for category, keywords in category_keywords.items(): 
        if any(kw in value for kw in keywords):
            return category 
    
    return 'empty'


df = pd.DataFrame({
    "purchases": ["red ball from the shop nearby", 
                  "chocolate cake cafe", 
                  "teddy bear extra"]
})  

df['purchases_category'] = df['purchases'].apply(categorizer)

Output:

>>> df

                       purchases purchases_category
0  red ball from the shop nearby               toys
1            chocolate cake cafe               food
2               teddy bear extra               toys
Answered By: Rodalm