Apply function to find elements not in a list
Question:
I want to apply a function that returns the elements not found in a reference list. What I want to get is the following.
import pandas as pd
product_list = ['Chive & Garlic', 'The Big Smoke',
'Jalapeno & Lemon', 'Spinach & Artichoke']
data = [['ACTIVE BODY', ['Chive & Garlic', 'The Big Smoke'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
['AG VALLEY FOODS', ['Chive & Garlic', 'Spinach & Artichoke'], ['The Big Smoke', 'Jalapeno & Lemon']],
['ALIM MICHEL HALLORAN', ['The Big Smoke', 'Chive & Garlic'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
['ALIMENTATION IAN DES', ['The Big Smoke', 'Jalapeno & Lemon'],['Chive & Garlic', 'Spinach & Artichoke']]]
df = pd.DataFrame(data, columns=['store', 'products', 'missing_products'])
where missing_products
are the products in list type, not found in the array of the products
column
I tried the following function but it’s not working as intended
def gap(row):
for item in product_list:
if item not in row:
return item
Important to note that each value in the products
column is an array, not list of strings. Not sure if this affects something.
[['ACADEMIE DU GOURMET ACADEMY INC', array([nan], dtype=object)],
['ACTIVE BODY',
array(['Chive & Garlic', 'Garlic Tzatziki', 'The Big Smoke'], dtype=object)],
['AG VALLEY FOODS',
array(['Chive & Garlic', 'Spinach & Artichoke'], dtype=object)],
['ALIM MICHEL HALLORAN',
array(['The Meadow', 'The Big Smoke', 'Chive & Garlic',
'Jalapeno & Lemon', 'Dill & Truffle'], dtype=object)],
['ALIMENTATION IAN DES',
array(['The Big Smoke', 'Jalapeno & Lemon'], dtype=object)]]
Thanks in advance for the help!
Answers:
Create helper list and append matched values:
def gap(row):
out = []
for item in product_list:
if item not in row:
out.append(item)
return out
Alternative with list comprehension:
def gap(row):
return [item for item in product_list if item not in row]
df['missing_products1'] = df['products'].apply(gap)
List comprehension only solution:
df['missing_products1'] = [[item for item in product_list if item not in row] for row in df['products']]
You can create the data frame as a binary data frame where if the store has the product you put 1
, and if not you put 0
.
That way it can be more generic instead of just lists in the data frame.
I would recommend to use set
operations, this should be the most efficient:
S = set(product_list)
df['missing_products'] = [list(S.difference(x)) for x in df['products']]
Output:
store products
0 ACTIVE BODY [Chive & Garlic, The Big Smoke]
1 AG VALLEY FOODS [Chive & Garlic, Spinach & Artichoke]
2 ALIM MICHEL HALLORAN [The Big Smoke, Chive & Garlic]
3 ALIMENTATION IAN DES [The Big Smoke, Jalapeno & Lemon]
missing_products
0 [Spinach & Artichoke, Jalapeno & Lemon]
1 [Jalapeno & Lemon, The Big Smoke]
2 [Spinach & Artichoke, Jalapeno & Lemon]
3 [Spinach & Artichoke, Chive & Garlic]
I want to apply a function that returns the elements not found in a reference list. What I want to get is the following.
import pandas as pd
product_list = ['Chive & Garlic', 'The Big Smoke',
'Jalapeno & Lemon', 'Spinach & Artichoke']
data = [['ACTIVE BODY', ['Chive & Garlic', 'The Big Smoke'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
['AG VALLEY FOODS', ['Chive & Garlic', 'Spinach & Artichoke'], ['The Big Smoke', 'Jalapeno & Lemon']],
['ALIM MICHEL HALLORAN', ['The Big Smoke', 'Chive & Garlic'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
['ALIMENTATION IAN DES', ['The Big Smoke', 'Jalapeno & Lemon'],['Chive & Garlic', 'Spinach & Artichoke']]]
df = pd.DataFrame(data, columns=['store', 'products', 'missing_products'])
where missing_products
are the products in list type, not found in the array of the products
column
I tried the following function but it’s not working as intended
def gap(row):
for item in product_list:
if item not in row:
return item
Important to note that each value in the products
column is an array, not list of strings. Not sure if this affects something.
[['ACADEMIE DU GOURMET ACADEMY INC', array([nan], dtype=object)],
['ACTIVE BODY',
array(['Chive & Garlic', 'Garlic Tzatziki', 'The Big Smoke'], dtype=object)],
['AG VALLEY FOODS',
array(['Chive & Garlic', 'Spinach & Artichoke'], dtype=object)],
['ALIM MICHEL HALLORAN',
array(['The Meadow', 'The Big Smoke', 'Chive & Garlic',
'Jalapeno & Lemon', 'Dill & Truffle'], dtype=object)],
['ALIMENTATION IAN DES',
array(['The Big Smoke', 'Jalapeno & Lemon'], dtype=object)]]
Thanks in advance for the help!
Create helper list and append matched values:
def gap(row):
out = []
for item in product_list:
if item not in row:
out.append(item)
return out
Alternative with list comprehension:
def gap(row):
return [item for item in product_list if item not in row]
df['missing_products1'] = df['products'].apply(gap)
List comprehension only solution:
df['missing_products1'] = [[item for item in product_list if item not in row] for row in df['products']]
You can create the data frame as a binary data frame where if the store has the product you put 1
, and if not you put 0
.
That way it can be more generic instead of just lists in the data frame.
I would recommend to use set
operations, this should be the most efficient:
S = set(product_list)
df['missing_products'] = [list(S.difference(x)) for x in df['products']]
Output:
store products
0 ACTIVE BODY [Chive & Garlic, The Big Smoke]
1 AG VALLEY FOODS [Chive & Garlic, Spinach & Artichoke]
2 ALIM MICHEL HALLORAN [The Big Smoke, Chive & Garlic]
3 ALIMENTATION IAN DES [The Big Smoke, Jalapeno & Lemon]
missing_products
0 [Spinach & Artichoke, Jalapeno & Lemon]
1 [Jalapeno & Lemon, The Big Smoke]
2 [Spinach & Artichoke, Jalapeno & Lemon]
3 [Spinach & Artichoke, Chive & Garlic]