Apply function to find elements not in a list

Question:

I want to apply a function that returns the elements not found in a reference list. What I want to get is the following.

import pandas as pd

product_list = ['Chive & Garlic', 'The Big Smoke',
                'Jalapeno & Lemon', 'Spinach & Artichoke']

data = [['ACTIVE BODY', ['Chive & Garlic', 'The Big Smoke'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
        ['AG VALLEY FOODS', ['Chive & Garlic', 'Spinach & Artichoke'], ['The Big Smoke', 'Jalapeno & Lemon']],
        ['ALIM MICHEL HALLORAN', ['The Big Smoke', 'Chive & Garlic'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
        ['ALIMENTATION IAN DES', ['The Big Smoke', 'Jalapeno & Lemon'],['Chive & Garlic', 'Spinach & Artichoke']]]

df = pd.DataFrame(data, columns=['store', 'products', 'missing_products'])

enter image description here

where missing_products are the products in list type, not found in the array of the products column

I tried the following function but it’s not working as intended

def gap(row):
    for item in product_list:
        if item not in row:
            return item

Important to note that each value in the products column is an array, not list of strings. Not sure if this affects something.

[['ACADEMIE DU GOURMET ACADEMY INC', array([nan], dtype=object)],
 ['ACTIVE BODY',
  array(['Chive & Garlic', 'Garlic Tzatziki', 'The Big Smoke'], dtype=object)],
 ['AG VALLEY FOODS',
  array(['Chive & Garlic', 'Spinach & Artichoke'], dtype=object)],
 ['ALIM MICHEL HALLORAN',
  array(['The Meadow', 'The Big Smoke', 'Chive & Garlic',
         'Jalapeno & Lemon', 'Dill & Truffle'], dtype=object)],
 ['ALIMENTATION IAN DES',
  array(['The Big Smoke', 'Jalapeno & Lemon'], dtype=object)]]

Thanks in advance for the help!

Asked By: Alejandro L

||

Answers:

Create helper list and append matched values:

def gap(row):
    out = []
    for item in product_list:
        if item not in row:
            out.append(item)
    return out

Alternative with list comprehension:

def gap(row):
    return [item for item in product_list if item not in row]


df['missing_products1'] = df['products'].apply(gap)

List comprehension only solution:

df['missing_products1'] = [[item for item in product_list if item not in row] for row in df['products']]
Answered By: jezrael

You can create the data frame as a binary data frame where if the store has the product you put 1, and if not you put 0.

That way it can be more generic instead of just lists in the data frame.

Answered By: Mohamed Maatar

I would recommend to use set operations, this should be the most efficient:

S = set(product_list)

df['missing_products'] = [list(S.difference(x)) for x in df['products']]

Output:

                  store                               products  
0           ACTIVE BODY        [Chive & Garlic, The Big Smoke]   
1       AG VALLEY FOODS  [Chive & Garlic, Spinach & Artichoke]   
2  ALIM MICHEL HALLORAN        [The Big Smoke, Chive & Garlic]   
3  ALIMENTATION IAN DES      [The Big Smoke, Jalapeno & Lemon]   

                          missing_products  
0  [Spinach & Artichoke, Jalapeno & Lemon]  
1        [Jalapeno & Lemon, The Big Smoke]  
2  [Spinach & Artichoke, Jalapeno & Lemon]  
3    [Spinach & Artichoke, Chive & Garlic]
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.