How can I count in my pandas dataframe how many times the data column has reached the threshold, with different threshold conditions?
Question:
As described in the question, I want to count in a new column how often my threshold has been reached by each data. I have different conditions to meet the threhsold. There are exactly two conditions. Either greater than/equal to or less than/equal to the threshold.
My structure of the Panda Dataframe looks like this:
Number Metric Threshold Data1 Data2 Data3
1 'M1' 10 12 8 11
2 'M2' 20 20 14 1
3 'M3' 30 50 22 44
4 'M4' 40 39 28 41
In this case are the conditions for Metric 1 and Metric 2: Greater than/equal the threshold (Threshold <=) and for Metric 3 and Metric 4: Less than/equal the threshold (Threshold >=).
Below is an example of the result with the defined conditions shown.
The result with the different conditions should look like this:
Number Metric Threshold Data1 Data2 Data3 Threshold_Met_Count
1 'M1' 10 12 8 11 2
2 'M2' 20 20 14 1 1
3 'M3' 30 50 22 44 1
4 'M4' 40 39 28 41 2
I made my conditions into a dictionary. After that I tried to iterate over it and make a comparison with the respective column and the threshold column. But unfortunately I could not do it.
EDIT:
I used the solution from @mozway.
mozway solution:
mask1 = df['Metric'].isin(['M1', 'M2'])
mask2 = df['Metric'].isin(['M3', 'M4'])
df2 = df.filter(like='Data')
df['Threshold_Met_Count'] = (
df2[mask1].ge(df['Threshold'], axis=0).sum(axis=1)
+df2[mask2].le(df['Threshold'], axis=0).sum(axis=1)
)
I get this error message:
TypeError: ‘>=’ not supported between instances of ‘float’ and ‘str’
The problem was that my metric column has string data and my threshold data has float data. So I changed the code in this way and this fits with my use case:
mask1 = df['Number'].isin([1, 2])
mask2 = df['Number'].isin([3, 4])
df2 = df.filter(like='Data').astype(float)
df['Threshold_Met_Count'] = (
df2[mask1].ge(df['Threshold'], axis=0).sum(axis=1)
+df2[mask2].le(df['Threshold'], axis=0).sum(axis=1)
)
Answers:
# Define the conditions as a dictionary
conditions = {'M1': '>=', 'M2': '>=', 'M3': '<=', 'M4': '<='}
# Define a function to count the number of metrics that meet the condition
def count_threshold_met(row):
count = 0
for metric, condition in conditions.items():
if condition == '>=':
if row[metric] >= row['Threshold']:
count += 1
elif condition == '<=':
if row[metric] <= row['Threshold']:
count += 1
return count
# Apply the function to each row of the DataFrame to create a new column
df['Threshold_Met_Count'] = df.apply(count_threshold_met, axis=1)
You can use multiple conditions (with isin
) and sum:
mask1 = df['Metric'].isin(['M1', 'M2'])
mask2 = df['Metric'].isin(['M3', 'M4'])
df2 = df.filter(like='Data')
df['Threshold_Met_Count'] = (
df2[mask1].ge(df['Threshold'], axis=0).sum(axis=1)
+df2[mask2].le(df['Threshold'], axis=0).sum(axis=1)
)
A more hacky approach but that requires a single comparison, would be to use a mapping dictionary and numpy.sign
:
# -1 = ≥ threshold, 1 = ≤ threshold
flag = {'M1': -1, 'M2': -1, 'M3': 1, 'M4': 1}
df['Threshold_Met_Count'] = (
np.sign(df.filter(like='Data').sub(df['Threshold'], axis=0))
.ne(df['Metric'].map(flag), axis=0).sum(axis=1)
)
Output:
Number Metric Threshold Data1 Data2 Data3 Threshold_Met_Count
0 1 M1 10 12 8 11 2
1 2 M2 20 20 14 1 1
2 3 M3 30 50 22 44 1
3 4 M4 40 39 28 41 2
As described in the question, I want to count in a new column how often my threshold has been reached by each data. I have different conditions to meet the threhsold. There are exactly two conditions. Either greater than/equal to or less than/equal to the threshold.
My structure of the Panda Dataframe looks like this:
Number Metric Threshold Data1 Data2 Data3
1 'M1' 10 12 8 11
2 'M2' 20 20 14 1
3 'M3' 30 50 22 44
4 'M4' 40 39 28 41
In this case are the conditions for Metric 1 and Metric 2: Greater than/equal the threshold (Threshold <=) and for Metric 3 and Metric 4: Less than/equal the threshold (Threshold >=).
Below is an example of the result with the defined conditions shown.
The result with the different conditions should look like this:
Number Metric Threshold Data1 Data2 Data3 Threshold_Met_Count
1 'M1' 10 12 8 11 2
2 'M2' 20 20 14 1 1
3 'M3' 30 50 22 44 1
4 'M4' 40 39 28 41 2
I made my conditions into a dictionary. After that I tried to iterate over it and make a comparison with the respective column and the threshold column. But unfortunately I could not do it.
EDIT:
I used the solution from @mozway.
mozway solution:
mask1 = df['Metric'].isin(['M1', 'M2'])
mask2 = df['Metric'].isin(['M3', 'M4'])
df2 = df.filter(like='Data')
df['Threshold_Met_Count'] = (
df2[mask1].ge(df['Threshold'], axis=0).sum(axis=1)
+df2[mask2].le(df['Threshold'], axis=0).sum(axis=1)
)
I get this error message:
TypeError: ‘>=’ not supported between instances of ‘float’ and ‘str’
The problem was that my metric column has string data and my threshold data has float data. So I changed the code in this way and this fits with my use case:
mask1 = df['Number'].isin([1, 2])
mask2 = df['Number'].isin([3, 4])
df2 = df.filter(like='Data').astype(float)
df['Threshold_Met_Count'] = (
df2[mask1].ge(df['Threshold'], axis=0).sum(axis=1)
+df2[mask2].le(df['Threshold'], axis=0).sum(axis=1)
)
# Define the conditions as a dictionary
conditions = {'M1': '>=', 'M2': '>=', 'M3': '<=', 'M4': '<='}
# Define a function to count the number of metrics that meet the condition
def count_threshold_met(row):
count = 0
for metric, condition in conditions.items():
if condition == '>=':
if row[metric] >= row['Threshold']:
count += 1
elif condition == '<=':
if row[metric] <= row['Threshold']:
count += 1
return count
# Apply the function to each row of the DataFrame to create a new column
df['Threshold_Met_Count'] = df.apply(count_threshold_met, axis=1)
You can use multiple conditions (with isin
) and sum:
mask1 = df['Metric'].isin(['M1', 'M2'])
mask2 = df['Metric'].isin(['M3', 'M4'])
df2 = df.filter(like='Data')
df['Threshold_Met_Count'] = (
df2[mask1].ge(df['Threshold'], axis=0).sum(axis=1)
+df2[mask2].le(df['Threshold'], axis=0).sum(axis=1)
)
A more hacky approach but that requires a single comparison, would be to use a mapping dictionary and numpy.sign
:
# -1 = ≥ threshold, 1 = ≤ threshold
flag = {'M1': -1, 'M2': -1, 'M3': 1, 'M4': 1}
df['Threshold_Met_Count'] = (
np.sign(df.filter(like='Data').sub(df['Threshold'], axis=0))
.ne(df['Metric'].map(flag), axis=0).sum(axis=1)
)
Output:
Number Metric Threshold Data1 Data2 Data3 Threshold_Met_Count
0 1 M1 10 12 8 11 2
1 2 M2 20 20 14 1 1
2 3 M3 30 50 22 44 1
3 4 M4 40 39 28 41 2