Classification Function to check more then one Column

Question:

I use two Columns to do identify specifics products on my data.

Example Dataframe with foo_data
ncm_code product_description
0 27101932 EXXON S-10
1 27101932 LUBRAX S-10
2 27101932 OIL
3 36101932 OIL
4 84713012 NOTEBOOK IDEAPAD (1162)
5 84713012 WI-FI ACCESS POINT 6
6 84713012 ETHERNET ROUTER MODEL 000123
My Analysis
  • Classification could be done checking just product_description.
  • Classification could be done checking just ncm_code
  • Classification could be done checking ncm_code and product_description.
My aproach, so far

I am open to more pythonic suggestions.

import pandas as pd
df = pd.DataFrame(foo_data)
foo_data = { 'ncm_code': ['27101932', '27101932', '27101932', '36101932', '84713012', '84713012', '84713012'],
             'product_description': ['EXXON S-10','LUBRAX S-10', 'OIL',
                                     'OIL', 'NOTEBOOK IDEAPAD (1162)', 'WI-FI ACCESS POINT 6', 
                                     'ETHERNET ROUTER MODEL 000123']}
def foo_func(x):
    fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
    info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH', 
                 'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE', 
                 'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
    if any(item in x for item in info_words):
        return 'INFO'
    elif any(item in x for item in fuel_words):
        return 'FUEL'

df['desired_class'] = df['product_description'].apply(foo_func)
df
my question

This code is just check one column product_description.
How could I adapt the analysis so that the function would analyze two columns???

Expected Output
ncm_code product_description desired_output
0 27101932 EXXON S-10 FUEL
1 27101932 LUBRAX S-10 FUEL
2 27101932 OIL FUEL
3 36101932 OIL MINERAL OIL
4 84713012 NOTEBOOK IDEAPAD (1162) INFO
5 84713012 WI-FI ACCESS POINT 6 INFO
6 84713012 ETHERNET ROUTER MODEL 000123 INFO
The problem

Line index = 3 should be ‘MINERAL OIL’ cause of the ncm_code

My code is not cheching ncm_code

I want to improve my code to check ncm_code and desired_output

Thanks.

Solution:

Thanks to @Laurent

def foo_func(x, y):
    fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
    info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH', 
                 'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE', 
                 'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
    if any(item in x for item in info_words):
        return "INFO"
    if any(item in x for item in fuel_words) and y == "36101932":
        return "MINERAL OIL"
    return "FUEL"


df["desired_output"] = df.apply(
    lambda df_: foo_func(df_["product_description"], df_["ncm_code"]), axis=1
)

Asked By: Andre Nevares

||

Answers:

Here is one way to modify your function:

def foo_func(x, y):
    fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
    info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH', 
                 'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE', 
                 'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
    if any(item in x for item in info_words):
        return "INFO"
    if any(item in x for item in fuel_words) and y == "36101932":
        return "MINERAL OIL"
    return "FUEL"

And then:

df["desired_output"] = df.apply(
    lambda df_: foo_func(df_["product_description"], df_["ncm_code"]), axis=1
)
print(df)
# Output
   ncm_code           product_description desired_output
0  27101932                    EXXON S-10           FUEL
1  27101932                   LUBRAX S-10           FUEL
2  27101932                           OIL           FUEL
3  36101932                           OIL    MINERAL OIL
4  84713012       NOTEBOOK IDEAPAD (1162)           INFO
5  84713012          WI-FI ACCESS POINT 6           INFO
6  84713012  ETHERNET ROUTER MODEL 000123           INFO
Answered By: Laurent
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.