How can I count in my pandas dataframe how many times the data column has reached the threshold, with different threshold conditions?

Question:

As described in the question, I want to count in a new column how often my threshold has been reached by each data. I have different conditions to meet the threhsold. There are exactly two conditions. Either greater than/equal to or less than/equal to the threshold.

My structure of the Panda Dataframe looks like this:

Number Metric  Threshold   Data1    Data2   Data3
  1     'M1'      10         12       8       11
  2     'M2'      20         20       14       1
  3     'M3'      30         50       22      44
  4     'M4'      40         39       28      41

In this case are the conditions for Metric 1 and Metric 2: Greater than/equal the threshold (Threshold <=) and for Metric 3 and Metric 4: Less than/equal the threshold (Threshold >=).
Below is an example of the result with the defined conditions shown.

The result with the different conditions should look like this:

Number Metric  Threshold   Data1    Data2   Data3  Threshold_Met_Count
  1     'M1'     10         12       8       11        2
  2     'M2'     20         20       14       1        1
  3     'M3'     30         50       22      44        1
  4     'M4'     40         39       28      41        2

I made my conditions into a dictionary. After that I tried to iterate over it and make a comparison with the respective column and the threshold column. But unfortunately I could not do it.

EDIT:

I used the solution from @mozway.
mozway solution:

    mask1 = df['Metric'].isin(['M1', 'M2'])
mask2 = df['Metric'].isin(['M3', 'M4'])

df2 = df.filter(like='Data')
df['Threshold_Met_Count'] =  (
   df2[mask1].ge(df['Threshold'], axis=0).sum(axis=1)
  +df2[mask2].le(df['Threshold'], axis=0).sum(axis=1)
 )

I get this error message:

TypeError: ‘>=’ not supported between instances of ‘float’ and ‘str’

The problem was that my metric column has string data and my threshold data has float data. So I changed the code in this way and this fits with my use case:

    mask1 = df['Number'].isin([1, 2])
mask2 = df['Number'].isin([3, 4])

df2 = df.filter(like='Data').astype(float)
df['Threshold_Met_Count'] =  (
   df2[mask1].ge(df['Threshold'], axis=0).sum(axis=1)
  +df2[mask2].le(df['Threshold'], axis=0).sum(axis=1)
 )
Asked By: newbie

||

Answers:

# Define the conditions as a dictionary
conditions = {'M1': '>=', 'M2': '>=', 'M3': '<=', 'M4': '<='}

# Define a function to count the number of metrics that meet the condition
def count_threshold_met(row):
    count = 0
    for metric, condition in conditions.items():
        if condition == '>=':
            if row[metric] >= row['Threshold']:
                count += 1
        elif condition == '<=':
            if row[metric] <= row['Threshold']:
                count += 1
    return count

# Apply the function to each row of the DataFrame to create a new column
df['Threshold_Met_Count'] = df.apply(count_threshold_met, axis=1)
Answered By: Khang Nguyen

You can use multiple conditions (with isin) and sum:

mask1 = df['Metric'].isin(['M1', 'M2'])
mask2 = df['Metric'].isin(['M3', 'M4'])

df2 = df.filter(like='Data')
df['Threshold_Met_Count'] =  (
   df2[mask1].ge(df['Threshold'], axis=0).sum(axis=1)
  +df2[mask2].le(df['Threshold'], axis=0).sum(axis=1)
 )

A more hacky approach but that requires a single comparison, would be to use a mapping dictionary and numpy.sign:

# -1 = ≥ threshold, 1 = ≤ threshold
flag = {'M1': -1, 'M2': -1, 'M3': 1, 'M4': 1}

df['Threshold_Met_Count'] = (
   np.sign(df.filter(like='Data').sub(df['Threshold'], axis=0))
     .ne(df['Metric'].map(flag), axis=0).sum(axis=1)
)

Output:

   Number Metric  Threshold  Data1  Data2  Data3  Threshold_Met_Count
0       1     M1         10     12      8     11                    2
1       2     M2         20     20     14      1                    1
2       3     M3         30     50     22     44                    1
3       4     M4         40     39     28     41                    2
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.