Classification Function to check more then one Column

Question

I use two Columns to do identify specifics products on my data.

Example Dataframe with foo_data

	ncm_code	product_description
0	27101932	EXXON S-10
1	27101932	LUBRAX S-10
2	27101932	OIL
3	36101932	OIL
4	84713012	NOTEBOOK IDEAPAD (1162)
5	84713012	WI-FI ACCESS POINT 6
6	84713012	ETHERNET ROUTER MODEL 000123

My Analysis

Classification could be done checking just product_description.
Classification could be done checking just ncm_code
Classification could be done checking ncm_code and product_description.

My aproach, so far

I am open to more pythonic suggestions.

import pandas as pd
df = pd.DataFrame(foo_data)
foo_data = { 'ncm_code': ['27101932', '27101932', '27101932', '36101932', '84713012', '84713012', '84713012'],
             'product_description': ['EXXON S-10','LUBRAX S-10', 'OIL',
                                     'OIL', 'NOTEBOOK IDEAPAD (1162)', 'WI-FI ACCESS POINT 6', 
                                     'ETHERNET ROUTER MODEL 000123']}
def foo_func(x):
    fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
    info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH', 
                 'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE', 
                 'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
    if any(item in x for item in info_words):
        return 'INFO'
    elif any(item in x for item in fuel_words):
        return 'FUEL'

df['desired_class'] = df['product_description'].apply(foo_func)
df

my question

This code is just check one column product_description.
How could I adapt the analysis so that the function would analyze two columns???

Expected Output

	ncm_code	product_description	desired_output
0	27101932	EXXON S-10	FUEL
1	27101932	LUBRAX S-10	FUEL
2	27101932	OIL	FUEL
3	36101932	OIL	MINERAL OIL
4	84713012	NOTEBOOK IDEAPAD (1162)	INFO
5	84713012	WI-FI ACCESS POINT 6	INFO
6	84713012	ETHERNET ROUTER MODEL 000123	INFO

The problem

Line index = 3 should be ‘MINERAL OIL’ cause of the ncm_code

My code is not cheching ncm_code

I want to improve my code to check ncm_code and desired_output

Thanks.

Solution:

Thanks to @Laurent

def foo_func(x, y):
    fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
    info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH', 
                 'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE', 
                 'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
    if any(item in x for item in info_words):
        return "INFO"
    if any(item in x for item in fuel_words) and y == "36101932":
        return "MINERAL OIL"
    return "FUEL"


df["desired_output"] = df.apply(
    lambda df_: foo_func(df_["product_description"], df_["ncm_code"]), axis=1
)

Asked By: Andre Nevares

||

Source

Answer 1

Here is one way to modify your function:

def foo_func(x, y):
    fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
    info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH', 
                 'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE', 
                 'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
    if any(item in x for item in info_words):
        return "INFO"
    if any(item in x for item in fuel_words) and y == "36101932":
        return "MINERAL OIL"
    return "FUEL"

And then:

df["desired_output"] = df.apply(
    lambda df_: foo_func(df_["product_description"], df_["ncm_code"]), axis=1
)

print(df)
# Output
   ncm_code           product_description desired_output
0  27101932                    EXXON S-10           FUEL
1  27101932                   LUBRAX S-10           FUEL
2  27101932                           OIL           FUEL
3  36101932                           OIL    MINERAL OIL
4  84713012       NOTEBOOK IDEAPAD (1162)           INFO
5  84713012          WI-FI ACCESS POINT 6           INFO
6  84713012  ETHERNET ROUTER MODEL 000123           INFO

Answered By: Laurent