How to create a new column in a Pandas dataframe based on conditions using two existing columns i.e., multiple and/or operators in each condition?


I am working on improving some existing Python code using Numpy and Pandas and am still learning so need some help with this scenario. The existing code is very verbose and I imagine there is a way to trim down on the amount of code written in addition to making the script more efficient.

The dataset I am working with needs a column created to determine if accounting transactions should be "posted" or "not posted" based on combinations of values from columns "condition_code" and "trans_type". Below is a mockup of what the existing code generally looks like.

conditions = [df['trans_type'].eq('D67'), 
  (df['condition_code'].eq('H')) & 
  (df['trans_type'].eq('D4S') | df['trans_type'].eq('D4U') | ... 
  many more .eq statements),
  other conditions in the same format...]

This code works as is, but there are many or statements for each of the conditions resulting in almost 200 lines of code.

My intent was to use the in operator for the list trans_type values, but the second excerpt of code does not work because I’m returning the whole column of values instead of iterating through each row. Any advice or help will be much appreciated. Would love to have this code be easier to read.

conditions = [df['trans_type'] == 'D67', 
  (df['condition_code'] in ['A', 'H']) & (df['trans_type'] in 
  ['D4S', 'D4U', 'D4V', ...]), 
  more similar conditions...]

I know now why my approach does not work, but have no clue how to tackle this now. Any advice, recommendations, or other methods I should look into?

Asked By: X8bitReignbeaux



pandas has a built-in isin method for this:

conditions = (
    & df["condition_code"].eq("H")
    & df["trans_type"].isin(["D4S", "D4U"])
    & ... # other conditions
Answered By: Code Different