Python: Lookup based on 4 conditions using conditional_join

Question:

Hi I want to do a lookup to get the factor value for my dataset based on 3 conditions.
Below is the lookup table:

Lookup_Table = {'State_Cd': ['TX','TX','TX','TX','CA','CA','CA','CA'],
        'Deductible': [0,0,1000,1000,0,0,1000,1000],
                'Revenue_1': [-99999999,25000000,-99999999,25000000,-99999999,25000000,-99999999,25000000],
                'Revenue_2': [24999999,99000000,24999999,99000000,24999999,99000000,24999999,99000000],
                'Factor': [0.15,0.25,0.2,0.3,0.11,0.15,0.13,0.45]
        }   
Lookup_Table = pd.DataFrame(Lookup_Table, columns = ['State_Cd','Deductible','Revenue_1','Revenue_2','Factor'])

lookup output:

Lookup_Table
State_Cd    Deductible  Revenue_1   Revenue_2   Factor
0   TX             0    -99999999   24999999    0.15
1   TX             0    25000000    99000000    0.25
2   TX          1000    -99999999   24999999    0.20
3   TX          1000    25000000    99000000    0.30
4   CA             0    -99999999   24999999    0.11
5   CA             0    25000000    99000000    0.15
6   CA          1000    -99999999   24999999    0.13
7   CA          1000    25000000    99000000    0.45

And then below is my dataset.

Dataset = {'Policy': ['A','B','C'],
        'State': ['CA','TX','TX'],
                'Deductible': [0,1000,0],
                'Revenue': [10000000,30000000,1000000]          
        }   
Dataset = pd.DataFrame(Dataset, columns = ['Policy','State','Deductible','Revenue'])

Dataset output:

Dataset
Policy  State   Deductible  Revenue
0   A   CA         0       1500000
1   B   TX         1000    30000000
2   C   TX         0       1000000

So basically to do the lookup the State must be matching to the State_Cd in lookup table, Deductible should be matching on the deductible in the lookup table, and lastly for Revenue it should be in between Revenue_1 and Revenue_2 (Revenue_1<=Revenue<=Revenue_2). To get to the desired factor value.
Below is my expected output on getting the Factor:

   Policy   State   Deductible  Revenue    Factor
    0   A   CA             0    1500000     0.11
    1   B   TX          1000    30000000    0.30
    2   C   TX             0    1000000     0.15

I’m trying the conditional_join from janitor package. However I’m having an error. Is there something missing on my code?

import janitor

Data_Final = (Dataset.conditional_join(Lookup_Table,
          # variable arguments
          # tuple is of the form:
          # col_from_left_df, col_from_right_df, comparator
          ('Revenue', 'Revenue_1', '>='), 
          ('Revenue', 'Revenue_2', '<='),
          ('State', 'State_Cd', '=='),
          ('Deductible', 'Deductible', '=='),
          how = 'left',sort_by_appearance = False
             )) 

Below is the error

TypeError: __init__() got an unexpected keyword argument 'copy'
Asked By: Bustergun

||

Answers:

This would be one approach:

factors = list()

for index, row in Dataset.iterrows():
    revenue = row["Revenue"]
    mask1 = Lookup_Table["State_Cd"] == row["State"]
    mask2 = Lookup_Table["Deductible"] == row["Deductible"]
    selection = Lookup_Table[mask1 & mask2]
    mask3 = selection["Revenue_1"] <= revenue
    mask4 = revenue <= selection["Revenue_2"]
    result = selection[mask3 & mask4]
    factors.append(result["Factor"].values[0])

Dataset["Factor"] = factors
Answered By: Eelco van Vliet
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.