Classification Function to check more then one Column
Question:
I use two Columns to do identify specifics products on my data.
Example Dataframe with foo_data
ncm_code
product_description
0
27101932
EXXON S-10
1
27101932
LUBRAX S-10
2
27101932
OIL
3
36101932
OIL
4
84713012
NOTEBOOK IDEAPAD (1162)
5
84713012
WI-FI ACCESS POINT 6
6
84713012
ETHERNET ROUTER MODEL 000123
My Analysis
- Classification could be done checking just
product_description
.
- Classification could be done checking just
ncm_code
- Classification could be done checking
ncm_code
and product_description
.
My aproach, so far
I am open to more pythonic suggestions.
import pandas as pd
df = pd.DataFrame(foo_data)
foo_data = { 'ncm_code': ['27101932', '27101932', '27101932', '36101932', '84713012', '84713012', '84713012'],
'product_description': ['EXXON S-10','LUBRAX S-10', 'OIL',
'OIL', 'NOTEBOOK IDEAPAD (1162)', 'WI-FI ACCESS POINT 6',
'ETHERNET ROUTER MODEL 000123']}
def foo_func(x):
fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH',
'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE',
'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
if any(item in x for item in info_words):
return 'INFO'
elif any(item in x for item in fuel_words):
return 'FUEL'
df['desired_class'] = df['product_description'].apply(foo_func)
df
my question
This code is just check one column product_description
.
How could I adapt the analysis so that the function would analyze two columns???
Expected Output
ncm_code
product_description
desired_output
0
27101932
EXXON S-10
FUEL
1
27101932
LUBRAX S-10
FUEL
2
27101932
OIL
FUEL
3
36101932
OIL
MINERAL OIL
4
84713012
NOTEBOOK IDEAPAD (1162)
INFO
5
84713012
WI-FI ACCESS POINT 6
INFO
6
84713012
ETHERNET ROUTER MODEL 000123
INFO
The problem
Line index = 3
should be ‘MINERAL OIL’ cause of the ncm_code
My code is not cheching ncm_code
I want to improve my code to check ncm_code
and desired_output
Thanks.
Solution:
Thanks to @Laurent
def foo_func(x, y):
fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH',
'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE',
'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
if any(item in x for item in info_words):
return "INFO"
if any(item in x for item in fuel_words) and y == "36101932":
return "MINERAL OIL"
return "FUEL"
df["desired_output"] = df.apply(
lambda df_: foo_func(df_["product_description"], df_["ncm_code"]), axis=1
)
Answers:
Here is one way to modify your function:
def foo_func(x, y):
fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH',
'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE',
'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
if any(item in x for item in info_words):
return "INFO"
if any(item in x for item in fuel_words) and y == "36101932":
return "MINERAL OIL"
return "FUEL"
And then:
df["desired_output"] = df.apply(
lambda df_: foo_func(df_["product_description"], df_["ncm_code"]), axis=1
)
print(df)
# Output
ncm_code product_description desired_output
0 27101932 EXXON S-10 FUEL
1 27101932 LUBRAX S-10 FUEL
2 27101932 OIL FUEL
3 36101932 OIL MINERAL OIL
4 84713012 NOTEBOOK IDEAPAD (1162) INFO
5 84713012 WI-FI ACCESS POINT 6 INFO
6 84713012 ETHERNET ROUTER MODEL 000123 INFO
I use two Columns to do identify specifics products on my data.
Example Dataframe with foo_data
ncm_code | product_description | |
---|---|---|
0 | 27101932 | EXXON S-10 |
1 | 27101932 | LUBRAX S-10 |
2 | 27101932 | OIL |
3 | 36101932 | OIL |
4 | 84713012 | NOTEBOOK IDEAPAD (1162) |
5 | 84713012 | WI-FI ACCESS POINT 6 |
6 | 84713012 | ETHERNET ROUTER MODEL 000123 |
My Analysis
- Classification could be done checking just
product_description
. - Classification could be done checking just
ncm_code
- Classification could be done checking
ncm_code
andproduct_description
.
My aproach, so far
I am open to more pythonic suggestions.
import pandas as pd
df = pd.DataFrame(foo_data)
foo_data = { 'ncm_code': ['27101932', '27101932', '27101932', '36101932', '84713012', '84713012', '84713012'],
'product_description': ['EXXON S-10','LUBRAX S-10', 'OIL',
'OIL', 'NOTEBOOK IDEAPAD (1162)', 'WI-FI ACCESS POINT 6',
'ETHERNET ROUTER MODEL 000123']}
def foo_func(x):
fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH',
'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE',
'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
if any(item in x for item in info_words):
return 'INFO'
elif any(item in x for item in fuel_words):
return 'FUEL'
df['desired_class'] = df['product_description'].apply(foo_func)
df
my question
This code is just check one column product_description
.
How could I adapt the analysis so that the function would analyze two columns???
Expected Output
ncm_code | product_description | desired_output | |
---|---|---|---|
0 | 27101932 | EXXON S-10 | FUEL |
1 | 27101932 | LUBRAX S-10 | FUEL |
2 | 27101932 | OIL | FUEL |
3 | 36101932 | OIL | MINERAL OIL |
4 | 84713012 | NOTEBOOK IDEAPAD (1162) | INFO |
5 | 84713012 | WI-FI ACCESS POINT 6 | INFO |
6 | 84713012 | ETHERNET ROUTER MODEL 000123 | INFO |
The problem
Line index = 3
should be ‘MINERAL OIL’ cause of the ncm_code
My code is not cheching ncm_code
I want to improve my code to check ncm_code
and desired_output
Thanks.
Solution:
Thanks to @Laurent
def foo_func(x, y):
fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH',
'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE',
'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
if any(item in x for item in info_words):
return "INFO"
if any(item in x for item in fuel_words) and y == "36101932":
return "MINERAL OIL"
return "FUEL"
df["desired_output"] = df.apply(
lambda df_: foo_func(df_["product_description"], df_["ncm_code"]), axis=1
)
Here is one way to modify your function:
def foo_func(x, y):
fuel_words = ['LUBRAX', 'EXXON', 'GASOLINE', 'PETROL', 'OIL']
info_words = ['ACCESS POINT','DRIVE CD/DVD', 'DISPLAYPORT', 'NOTEBOOK', 'CPU', 'SWITCH',
'MICROCOMP','TABLET', 'DDR', 'PATCH PANEL', 'HDMI', 'VGA', 'STORAGE',
'ETHERNET', 'FIREWAL' , 'CORE I3', 'CORE I5', 'CORE I7', 'COREI3','CORE I5', 'CORE I7']
if any(item in x for item in info_words):
return "INFO"
if any(item in x for item in fuel_words) and y == "36101932":
return "MINERAL OIL"
return "FUEL"
And then:
df["desired_output"] = df.apply(
lambda df_: foo_func(df_["product_description"], df_["ncm_code"]), axis=1
)
print(df)
# Output
ncm_code product_description desired_output
0 27101932 EXXON S-10 FUEL
1 27101932 LUBRAX S-10 FUEL
2 27101932 OIL FUEL
3 36101932 OIL MINERAL OIL
4 84713012 NOTEBOOK IDEAPAD (1162) INFO
5 84713012 WI-FI ACCESS POINT 6 INFO
6 84713012 ETHERNET ROUTER MODEL 000123 INFO