Comparing Pandas DataFrame rows against two threshold values

Question:

I have two DataFrames shown below. The DataFrames in reality are larger than the sample below.

df1
     route_no    cost_h1    cost_h2    cost_h3   cost_h4   cost_h5   max   min   location
0    0010          20         22         21         23       26       26    20    NY
1    0011          30         25         23         31       33       33    23    CA
2    0012          67         68         68         69       65       69    67    GA
3    0013          34         33         31         30       35       35    31    MO
4    0014          44         42         40         39       50       50    39    WA
df2
    route_no    cost_h1    cost_h2    cost_h3   cost_h4   cost_h5    location 
0    0020          19         27         21         24       20         NY
1    0021          31         22         23         30       33         CA
2    0023          66         67         68         70       65         GA
3    0022          34         33         31         30       35         MO
4    0025          41         42         40         39       50         WA
5    0030          19         26         20         24       20         NY
6    0032          37         31         31         20       35         MO
7    0034          40         41         39         39       50         WA

The idea is to compare each row of df2 against the appropriate max and min value specified in df1. The threshold value to be compared depends on the match in the location column. If any of the row values are outside the range defined by min and max value, they will be put in a separate dataframe. Please note the number of cost segments are vary.

Asked By: S_Scouse

||

Answers:

Solution

# Merge the dataframes on location to append the min/max columns to df2
df3 = df2.merge(df1[['location', 'max', 'min']], on='location', how='left')

# select the cost like columns
cost = df3.filter(like='cost')

# Check whether the cost values satisfy the interval condition
mask = cost.ge(df3['min'], axis=0) & cost.le(df3['max'], axis=0)

# filter the rows where one or more values in row do not satisfy the condition
df4 = df2[~mask.all(axis=1)]

Result

print(df4)

  route_no  cost_h1  cost_h2  cost_h3  cost_h4  cost_h5 location
0     0020       19       27       21       24       20       NY
1     0021       31       22       23       30       33       CA
2     0023       66       67       68       70       65       GA
3     0022       34       33       31       30       35       MO
5     0030       19       26       20       24       20       NY
6     0032       37       31       31       20       35       MO
Answered By: Shubham Sharma
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.