Applying conditions to dataframe based on columns specified in a list

Question:

I have the sample df as follows:

df = pd.DataFrame({'weight':[10,20,30,40,50],
                'speed':[100,120,140,160,180],
                'distance':[1000,1100,1200,1300,1400],
                'cat':['Y','N','N','N','Y']})

And I am applying the conditions to my df based on the values given below in the following way.

speed_margin = 160
weight_margin = 20
distance_margin = 1300
catergory = 'N'

conditions = np.where((df['speed'] < speed_margin) & (df['cat'] == catergory)
                    & (df['weight'] > weight_margin) & (df['distance']<distance_margin))
df1 = df.loc[conditions] 

However, the columns for the conditions are not always the same, and they are suuplied by the user in the form of a list. For example, if:

conditions_list = ['speed', 'distance', 'cat']

I need to automate the above conditions code to only include the columns that are supplied by the user in conditions_list. As in this case, since there are only 3 column names in the conditions_list (weight col is missing), the conditions must look like:

conditions = np.where((df['speed'] < speed_margin) & (df['cat'] == catergory)
                    & (df['distance']<distance_margin))

if conditions_list was:

conditions_list = ['speed']

Then, conditions must be:

conditions = np.where((df['speed'] < speed_margin))

How can I make sure the conditions are applied only to the columns that are supplied in the list by the user?

Asked By: serdar_bay

||

Answers:

One way would be to define each condition as a lambda to be applied to some column, then use pd.DataFrame.transform for checking each condition, finally aggregating with pd.DataFrame.all:

import pandas as pd

df = pd.DataFrame({'weight':[10,20,30,40,50],
                'speed':[100,120,140,160,180],
                'distance':[1000,1100,1200,1300,1400],
                'cat':['Y','N','N','N','Y']})

speed_margin = 160
weight_margin = 20
distance_margin = 1300
catergory = 'N'

conditions_list = ['speed', 'distance', 'cat']

funcs = {
    "speed":    lambda x: x < speed_margin,
    "cat":      lambda x: x == catergory,
    "weight":   lambda x: x > weight_margin,
    "distance": lambda x: x < distance_margin,
}

conditions = df.transform({col: funcs[col] for col in conditions_list}).all(axis=1)

out = df.loc[conditions]

out:

   weight  speed  distance cat
1      20    120      1100   N
2      30    140      1200   N

PS: if you use np.where, which returns integer indices, safer would be to use iloc instead of loc; but even better would be to omit it entirely and use the boolean mask directly as above.

Answered By: Chrysophylaxs

Make a condition mapping which maps columns to the needed subqueries/conditions. That will allow a quick dataframe query on requested column list:

cond_map = {'speed': 'speed < 160',
            'weight': 'weight > 20',
            'distance': 'distance < 1300',
            'cat': 'cat == "N"'}

df_ = df.query(' and '.join(cond_map[c] for c in cond_list))

Case #1:

cond_list = ['speed', 'distance', 'cat']
df_ = df.query(' and '.join(cond_map[c] for c in cond_list))
print(df_)

   weight  speed  distance cat
1      20    120      1100   N
2      30    140      1200   N

Case #2:

cond_list = ['speed']
df_ = df.query(' and '.join(cond_map[c] for c in cond_list))
print(df_)

   weight  speed  distance cat
0      10    100      1000   Y
1      20    120      1100   N
2      30    140      1200   N
Answered By: RomanPerekhrest
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.