check if values are between two values pandas

Question:

I have a two values that are being found in a for loop like so:

for i in range(df_zones.shape[0]):

   filter_max = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[-1]
   filter_min = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[0]

I have another dataframe with 4 columns of measurements with a timeseries index, like so:

DateTime meas1 meas2 meas3 meas4
2022-1-1 1.1 1.2 1.3 1.1

There are 1000’s of rows of data.

What I am trying to do is have another column that is labeled as ‘isZone’, where this means, are any of the values in the row between filter_max and filter_min.

DateTime meas1 meas2 meas3 meas4 isZone
2022-1-1 1.1 1.5 1.5 1.7 0
2022-1-2 2.2 1.4 1.5 1.7 0
2022-1-3 3.1 1.2 1.3 1.1 1
2022-1-4 4.1 1.2 1.3 1.1 1
2022-1-5 5.1 1.2 1.3 1.1 1

I have read about the pandas between function. But I really can’t figure out how to make this work. Is there a quicker way to do this in numpy? any guidance would be appreciated.

Asked By: lostinpython

||

Answers:

You can solve this with apply and pandas’ between:

df_zones['Flag'] = df_zones.apply(lambda x: 1 if x.between(filter_min,filter_max).any() else 0,axis=1)

How about trying with .T and using a list-comprehension this way?

df_zones['Flag'] = [1 if df_zones.T[x].between(min_,max_).any() else 0 for x in df_zones.T]

Or without the transposing:

df_zones['Flag'] = [1 if df_zones.loc[x,:].between(min_,max_).any() else 0 for x in df_zones.index]
Answered By: Celius Stingher

The method above took 30 minutes to compute, the below method is done in under 2 seconds.

In the end, the best method was to append all the items to their own lists, and make a function that combines all the pandas between checks and checks row-wise if there is a one or not.

”’

    def arrayBoolCheck(arrays):
        df = pd.DataFrame(arrays).T
        df_new = (df.iloc[:, :] == 1).any(axis=1).astype(int)
        return df_new

    isZone1, isZone2, isZone3, isZone4 = [], [], [], [], []
    for i in range(df_zones.shape[0]):

        filter_max = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[-1]
        filter_min = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[0]
    
        isZone1.append(df_instrument[f"meas1"].between(filter_min, filter_max, inclusive='both').astype(int).values)
        isZone2.append(df_instrument[f"meas2"].between(filter_min, filter_max, inclusive='both').astype(int).values)
        isZone3.append(df_instrument[f"meas3"].between(filter_min, filter_max, inclusive='both').astype(int).values)
        isZone4.append(df_instrument[f"meas4"].between(filter_min, filter_max, inclusive='both').astype(int).values)

# Zone Labels Dataframe
     df = pd.DataFrame(data=[
            self.arrayBoolCheck(np.array(isZone1)),
            self.arrayBoolCheck(np.array(isZone2)),
            self.arrayBoolCheck(np.array(isZone3)),
            self.arrayBoolCheck(np.array(isZone4))],

        index=[f"isZone1",
               f"isZone2",
               f"isZone3",
               f"isZone4"]).T

”’

I found that keeping the measurements in their respective columns was better for the analysis. but the same function could be used to combine them all into one column if needed.

Answered By: lostinpython
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.