check if values are between two values pandas

Question

I have a two values that are being found in a for loop like so:

for i in range(df_zones.shape[0]):

   filter_max = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[-1]
   filter_min = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[0]

I have another dataframe with 4 columns of measurements with a timeseries index, like so:

DateTime	meas1	meas2	meas3	meas4
2022-1-1	1.1	1.2	1.3	1.1

There are 1000’s of rows of data.

What I am trying to do is have another column that is labeled as ‘isZone’, where this means, are any of the values in the row between filter_max and filter_min.

DateTime	meas1	meas2	meas3	meas4	isZone
2022-1-1	1.1	1.5	1.5	1.7	0
2022-1-2	2.2	1.4	1.5	1.7	0
2022-1-3	3.1	1.2	1.3	1.1	1
2022-1-4	4.1	1.2	1.3	1.1	1
2022-1-5	5.1	1.2	1.3	1.1	1

I have read about the pandas between function. But I really can’t figure out how to make this work. Is there a quicker way to do this in numpy? any guidance would be appreciated.

Asked By: lostinpython

||

Source

Answer 1

You can solve this with apply and pandas’ between:

df_zones['Flag'] = df_zones.apply(lambda x: 1 if x.between(filter_min,filter_max).any() else 0,axis=1)

How about trying with .T and using a list-comprehension this way?

df_zones['Flag'] = [1 if df_zones.T[x].between(min_,max_).any() else 0 for x in df_zones.T]

Or without the transposing:

df_zones['Flag'] = [1 if df_zones.loc[x,:].between(min_,max_).any() else 0 for x in df_zones.index]

Answered By: Celius Stingher

Answer 2

The method above took 30 minutes to compute, the below method is done in under 2 seconds.

In the end, the best method was to append all the items to their own lists, and make a function that combines all the pandas between checks and checks row-wise if there is a one or not.

”’

    def arrayBoolCheck(arrays):
        df = pd.DataFrame(arrays).T
        df_new = (df.iloc[:, :] == 1).any(axis=1).astype(int)
        return df_new

    isZone1, isZone2, isZone3, isZone4 = [], [], [], [], []
    for i in range(df_zones.shape[0]):

        filter_max = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[-1]
        filter_min = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[0]
    
        isZone1.append(df_instrument[f"meas1"].between(filter_min, filter_max, inclusive='both').astype(int).values)
        isZone2.append(df_instrument[f"meas2"].between(filter_min, filter_max, inclusive='both').astype(int).values)
        isZone3.append(df_instrument[f"meas3"].between(filter_min, filter_max, inclusive='both').astype(int).values)
        isZone4.append(df_instrument[f"meas4"].between(filter_min, filter_max, inclusive='both').astype(int).values)

# Zone Labels Dataframe
     df = pd.DataFrame(data=[
            self.arrayBoolCheck(np.array(isZone1)),
            self.arrayBoolCheck(np.array(isZone2)),
            self.arrayBoolCheck(np.array(isZone3)),
            self.arrayBoolCheck(np.array(isZone4))],

        index=[f"isZone1",
               f"isZone2",
               f"isZone3",
               f"isZone4"]).T

”’

I found that keeping the measurements in their respective columns was better for the analysis. but the same function could be used to combine them all into one column if needed.

Answered By: lostinpython

check if values are between two values pandas

Question:

Answers: