Adding new data to a Dataframe from another Dataframe based on condition

Question:

So my question here is how can I add data in new column to dataframe based on conditions from another dataframe. It is kinda difficult to say it so I am giving an example here

df1

columns  a   b  c
         0   10  1
         10  15  3
         15  20  5


df2
columns  d      e  
         3.3   10   
         5.5   20
         14.5  11
         17.2  5
   

What I want to do here is to add another column f to df2, and its value is from df1 such that
if d[i] is between a[j] and b[j], then copy the value c[j] to the new column f[i] in df2.
for example: d[1] = 5.5 so 0< 5.5< 10 hence, the value of f[1] = c[0] = 1

the final results should look like

df2
columns  d      e    f
         3.3   10    1 
         5.5   20    1
         14.5  11    3
         17.2  5     5
   

Any help is greatly appreciated!

Regards,

Steve

Asked By: Steve Xu

||

Answers:

You could use:

result = []
for item in df2['d']:
    for row in df1.iterrows():
        if row[1]['a'] <= item <= row[1]['b']:
            val = (row[1]['c'])
            break
        else:
            val = None
    result.append(val)
            
df2['f'] = result

print(df2)
Answered By: user19077881
import pandas as pd
df1 = pd.DataFrame({'a':[0,10,15],'b':[10,15,20],'c':[1,3,5]})
df2 = pd.DataFrame({'d':[3.3,5.5,9.5,17.2],'e':[10,20,11,5]})
df2['f']=0
for i in range(df2.shape[0]):
    for j in range(df1.shape[0]):
        if df2.d[i]>=df1.a[j] and df2.d[i]<=df1.b[j]:
            df2.f[i]=df1.c[j]
df2
Answered By: Amber

Assuming non-overlapping intervals in df1 a and b, you can use pd.cut with a pd.IntervalIndex:

import pandas as pd

# Your dfs here
df1 = pd.read_clipboard()
df2 = pd.read_clipboard()

idx = pd.IntervalIndex.from_arrays(df1["a"], df1["b"])
mapping = df1["c"].set_axis(idx)

df2["f"] = pd.cut(df2["d"], idx).map(mapping)

df2:

      d   e  f
0   3.3  10  1
1   5.5  20  1
2  14.5  11  3
3  17.2   5  5
Answered By: Chrysophylaxs

What about this option ?

# merge the two dfs
df = pd.merge(df2, df1, left_on='d', right_on='b', how='left')
df2['f'] = None
df2['f'] = df.apply(lambda x: x['c'] if x['a_x'] <= x['d'] <= x['b_x'] else None, axis=1)
Answered By: Lorenzo Bassetti

if you do not have overlapping intervals, the pd.IntervalIndex accepted solution is a perfect fit.

Another option is with conditional_join from pyjanitor, which can also handle overlapping intervals:

# pip install pyjanitor
import pandas as pd
import janitor
(df2
.conditional_join(
    # types have to be same
    # for columns to be compared
    df1.astype({"a":float, "b":float}), 
    ('d', 'a', '>='), 
    ('d', 'b','<='), 
    # depending on the data size,
    # numba may offer more performance
    use_numba=False,
    right_columns = {'c':'f'})
)
      d   e  f
0   3.3  10  1
1   5.5  20  1
2  14.5  11  3
3  17.2   5  5
Answered By: sammywemmy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.