Python: Lookup based on 4 conditions using conditional_join
Question:
Hi I want to do a lookup to get the factor value for my dataset based on 3 conditions.
Below is the lookup table:
Lookup_Table = {'State_Cd': ['TX','TX','TX','TX','CA','CA','CA','CA'],
'Deductible': [0,0,1000,1000,0,0,1000,1000],
'Revenue_1': [-99999999,25000000,-99999999,25000000,-99999999,25000000,-99999999,25000000],
'Revenue_2': [24999999,99000000,24999999,99000000,24999999,99000000,24999999,99000000],
'Factor': [0.15,0.25,0.2,0.3,0.11,0.15,0.13,0.45]
}
Lookup_Table = pd.DataFrame(Lookup_Table, columns = ['State_Cd','Deductible','Revenue_1','Revenue_2','Factor'])
lookup output:
Lookup_Table
State_Cd Deductible Revenue_1 Revenue_2 Factor
0 TX 0 -99999999 24999999 0.15
1 TX 0 25000000 99000000 0.25
2 TX 1000 -99999999 24999999 0.20
3 TX 1000 25000000 99000000 0.30
4 CA 0 -99999999 24999999 0.11
5 CA 0 25000000 99000000 0.15
6 CA 1000 -99999999 24999999 0.13
7 CA 1000 25000000 99000000 0.45
And then below is my dataset.
Dataset = {'Policy': ['A','B','C'],
'State': ['CA','TX','TX'],
'Deductible': [0,1000,0],
'Revenue': [10000000,30000000,1000000]
}
Dataset = pd.DataFrame(Dataset, columns = ['Policy','State','Deductible','Revenue'])
Dataset output:
Dataset
Policy State Deductible Revenue
0 A CA 0 1500000
1 B TX 1000 30000000
2 C TX 0 1000000
So basically to do the lookup the State must be matching to the State_Cd in lookup table, Deductible should be matching on the deductible in the lookup table, and lastly for Revenue it should be in between Revenue_1 and Revenue_2 (Revenue_1<=Revenue<=Revenue_2). To get to the desired factor value.
Below is my expected output on getting the Factor:
Policy State Deductible Revenue Factor
0 A CA 0 1500000 0.11
1 B TX 1000 30000000 0.30
2 C TX 0 1000000 0.15
I’m trying the conditional_join from janitor package. However I’m having an error. Is there something missing on my code?
import janitor
Data_Final = (Dataset.conditional_join(Lookup_Table,
# variable arguments
# tuple is of the form:
# col_from_left_df, col_from_right_df, comparator
('Revenue', 'Revenue_1', '>='),
('Revenue', 'Revenue_2', '<='),
('State', 'State_Cd', '=='),
('Deductible', 'Deductible', '=='),
how = 'left',sort_by_appearance = False
))
Below is the error
TypeError: __init__() got an unexpected keyword argument 'copy'
Answers:
This would be one approach:
factors = list()
for index, row in Dataset.iterrows():
revenue = row["Revenue"]
mask1 = Lookup_Table["State_Cd"] == row["State"]
mask2 = Lookup_Table["Deductible"] == row["Deductible"]
selection = Lookup_Table[mask1 & mask2]
mask3 = selection["Revenue_1"] <= revenue
mask4 = revenue <= selection["Revenue_2"]
result = selection[mask3 & mask4]
factors.append(result["Factor"].values[0])
Dataset["Factor"] = factors
Hi I want to do a lookup to get the factor value for my dataset based on 3 conditions.
Below is the lookup table:
Lookup_Table = {'State_Cd': ['TX','TX','TX','TX','CA','CA','CA','CA'],
'Deductible': [0,0,1000,1000,0,0,1000,1000],
'Revenue_1': [-99999999,25000000,-99999999,25000000,-99999999,25000000,-99999999,25000000],
'Revenue_2': [24999999,99000000,24999999,99000000,24999999,99000000,24999999,99000000],
'Factor': [0.15,0.25,0.2,0.3,0.11,0.15,0.13,0.45]
}
Lookup_Table = pd.DataFrame(Lookup_Table, columns = ['State_Cd','Deductible','Revenue_1','Revenue_2','Factor'])
lookup output:
Lookup_Table
State_Cd Deductible Revenue_1 Revenue_2 Factor
0 TX 0 -99999999 24999999 0.15
1 TX 0 25000000 99000000 0.25
2 TX 1000 -99999999 24999999 0.20
3 TX 1000 25000000 99000000 0.30
4 CA 0 -99999999 24999999 0.11
5 CA 0 25000000 99000000 0.15
6 CA 1000 -99999999 24999999 0.13
7 CA 1000 25000000 99000000 0.45
And then below is my dataset.
Dataset = {'Policy': ['A','B','C'],
'State': ['CA','TX','TX'],
'Deductible': [0,1000,0],
'Revenue': [10000000,30000000,1000000]
}
Dataset = pd.DataFrame(Dataset, columns = ['Policy','State','Deductible','Revenue'])
Dataset output:
Dataset
Policy State Deductible Revenue
0 A CA 0 1500000
1 B TX 1000 30000000
2 C TX 0 1000000
So basically to do the lookup the State must be matching to the State_Cd in lookup table, Deductible should be matching on the deductible in the lookup table, and lastly for Revenue it should be in between Revenue_1 and Revenue_2 (Revenue_1<=Revenue<=Revenue_2). To get to the desired factor value.
Below is my expected output on getting the Factor:
Policy State Deductible Revenue Factor
0 A CA 0 1500000 0.11
1 B TX 1000 30000000 0.30
2 C TX 0 1000000 0.15
I’m trying the conditional_join from janitor package. However I’m having an error. Is there something missing on my code?
import janitor
Data_Final = (Dataset.conditional_join(Lookup_Table,
# variable arguments
# tuple is of the form:
# col_from_left_df, col_from_right_df, comparator
('Revenue', 'Revenue_1', '>='),
('Revenue', 'Revenue_2', '<='),
('State', 'State_Cd', '=='),
('Deductible', 'Deductible', '=='),
how = 'left',sort_by_appearance = False
))
Below is the error
TypeError: __init__() got an unexpected keyword argument 'copy'
This would be one approach:
factors = list()
for index, row in Dataset.iterrows():
revenue = row["Revenue"]
mask1 = Lookup_Table["State_Cd"] == row["State"]
mask2 = Lookup_Table["Deductible"] == row["Deductible"]
selection = Lookup_Table[mask1 & mask2]
mask3 = selection["Revenue_1"] <= revenue
mask4 = revenue <= selection["Revenue_2"]
result = selection[mask3 & mask4]
factors.append(result["Factor"].values[0])
Dataset["Factor"] = factors