Finding Value between two numbers in pandas dataframe

Question:

I have two pandas dataframe "A" and "B". I would like to find out row number from "B" where value of "A" lies in between two numbers of "B" data frame.

Table A

Index 0
0 0.084
1 0.169
2 0.252
3 0.337
4 0.419
5 0.504
6 0.589

Table B

Index 0
0 0.071
1 0.167
2 0.244
3 0.320

In the case of the above tables let’s take one example. The First Number from Table "A" is 0.084 it’s Actually in between Table B 0 & 1 Index value i.e. 0.071 and 0.167. I am looking out for an output as [0,1] which is basically row numbers of two values.

Asked By: Abhay Dodiya

||

Answers:

First initialize empty array for result:

res = [[]] * len(A.iloc[:, 0])

Then we implement nested loop through A and B, and check each value in A is between B values and return the index

The condition return the start index only:

(A.iloc[:, 0][i] > B.iloc[:, 0][j]) & (A.iloc[:, 0][i] < B.iloc[:, 0][j+1])

So I get the value and add 1 to it later and add them to a list:

res[i]=([j , j+1])

The full code:

import pandas as pd

A = [0.084, 0.169, 0.252, 0.337, 0.419, 0.504, 0.589]
B = [0.071, 0.167, 0.244, 0.320]

A = pd.DataFrame(A)
B = pd.DataFrame(B)

res = [[]] * len(A.iloc[:, 0])

for i in range(0, len(A.iloc[:, 0])):
    for j in range(0, len(B.iloc[:, 0])-1):
        if (A.iloc[:, 0][i] > B.iloc[:, 0][j]) & (A.iloc[:, 0][i] < B.iloc[:, 0][j+1]):
            res[i]=([j , j+1])

print(res)

The output:
enter image description here

Note: I assume that B is always sorted in ascending order

Answered By: Ola Galal

I assume that B is sorted in ascending order, and that the bins are non-overlapping. To bin a value, you can first take the difference between that value and boundaries of all available bins. The desired bin is then found at index i where the difference is positive at i but transits to negative at i+1.

def apply_fn(x):
    delta = x.iloc[:,1] - x.iloc[:,3]
    mask = delta.ge(0) & delta.shift(-1).lt(0)  # True where transit happens
    return x[mask]

A = pd.DataFrame([0.084, 0.169, 0.252, 0.337, 0.419, 0.504, 0.589])
B = pd.DataFrame([0.071, 0.167, 0.244, 0.320])
C = A.reset_index().join(B.reset_index(), lsuffix='_A', rsuffix='_B', how='cross')
D = C.groupby('index_A').apply(apply_fn).reset_index(drop=True)[['index_A', 'index_B']]
D['index_B'] = [[i, i+1] for i in D['index_B'].tolist()]
print(D)

Output

    index_A index_B
0   0   [0, 1]
1   1   [1, 2]
2   2   [2, 3]
Answered By: bui
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.