how to merge 2 pandas dataframes of different sizes/indices on floor(value x, value y)

Question:

Given 2 pandas dataframes:

df1 = pd.DataFrame({col1: [0.5, 0.75, 1.1, 1.6,  2, 3, 5.5, 10, 11.2] })
df2 = pd.DataFrame({col2: [0, 3, 10,15] })

Each of the df1[col1] value is within the range of df2[col2] values:

df2[col2].iloc[y] <= df1[col1].iloc[x] < df2[col2].iloc[y+1]

How to merge df1 and df2 in a way that each value from df1[col1] equals to the min value of fitting range from df2[col2]. E.g. df1[col1].iloc[1] = 0.75 it resides between df2[col2].iloc[0] and df2[col2].iloc[1] (0.75 fits the range: 0, 3) so df1['result'].iloc[1] = df2[col2].iloc[0]

expected result: 

df1['result'] = [0, 0, 0, 0, 0, 3, 3, 10, 10]
Asked By: AdR

||

Answers:

Use custom generator to produce the needed sequence:

def gen_range_bounds(s1, s2):
    ranges = list(zip(s2[:-1], s2[1:]))  # collect consecutive ranges
    for v in s1:
        for low, high in ranges:
            if low <= v < high:
                yield low  # yield min bound of the range
                break

df1['result'] = list(gen_range_bounds(df1['col1'], df2['col2']))
print(df1) 

    col1  result
0   0.50       0
1   0.75       0
2   1.10       0
3   1.60       0
4   2.00       0
5   3.00       3
6   5.50       3
7  10.00      10
8  11.20      10
Answered By: RomanPerekhrest

Use pd.Series.values.searchsorted(), which returns indices where elements should be inserted to maintain order.

for example:

df1 = pd.DataFrame({'col1': [0.5, 0.75, 1.1, 1.6,  2, 3, 5.5, 10, 11.2] })
df2 = pd.DataFrame({'col2': [0, 3, 10,15] })

df2['col2'].values.searchsorted(0.5) # return 1
df2['col2'].values.searchsorted(5.5) # return 2
df2['col2'].values.searchsorted(10)  # return 2

You want value instead of indices, so like this:

# get indices, return: [1 1 1 1 1 2 2 3 3]
indices = df2['col2'].values.searchsorted(df1['col1'], side='right')

# get values, return: [0, 0, 0, 0, 0, 3, 3, 10, 10]
df1['result'] = [df2['col2'].iloc[i-1] for i in indices]
Answered By: luhao

Looks like some form of inequality join – if that is the case, you can use conditional_join from pyjanitor to get your results

# pip install pyjanitor
import pandas as pd
import janitor

(df1
.conditional_join(
    df2.astype(float).assign(col3 = lambda f: f.col2.shift(-1).fillna(f.col2)), 
    ('col1', 'col2', '>='), ('col1', 'col3', '<'), 
    right_columns='col2')
)
    col1  col2
0   0.50   0.0
1   0.75   0.0
2   1.10   0.0
3   1.60   0.0
4   2.00   0.0
5   3.00   3.0
6   5.50   3.0
7  10.00  10.0
8  11.20  10.0
Answered By: sammywemmy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.