Merge two pandas dataframes that have slightly different values on the column which is being merged
Question:
How can I merge two data frames when the column has a slight offset than the column I am merging to?
df1 =
col1
col2
1
a
2
b
3
c
df2 =
col1
col3
1.01
d
2
e
2.95
f
so, the merged column would end up like this even though the values in col1 are slightly different.
df_merge =
col1
col2
col3
1
a
d
2
b
e
3
c
f
I have seen scenarios like this where "col1" is a string, but I’m wondering if it’s possible to do this with something like pandas.merge()
in the scenario where there is slight numerical offset (e.g +/- 0.05).
Answers:
Lets do merge_asof
with tolerance
parameter
pd.merge_asof(
df1.astype({'col1': 'float'}).sort_values('col1'),
df2.sort_values('col1'),
on='col1',
direction='nearest',
tolerance=.05
)
col1 col2 col3
0 1.0 a d
1 2.0 b e
2 3.0 c f
PS: if the dataframes are already sorted on col1
then there is no need to sort again.
How can I merge two data frames when the column has a slight offset than the column I am merging to?
df1 =
col1 | col2 |
---|---|
1 | a |
2 | b |
3 | c |
df2 =
col1 | col3 |
---|---|
1.01 | d |
2 | e |
2.95 | f |
so, the merged column would end up like this even though the values in col1 are slightly different.
df_merge =
col1 | col2 | col3 |
---|---|---|
1 | a | d |
2 | b | e |
3 | c | f |
I have seen scenarios like this where "col1" is a string, but I’m wondering if it’s possible to do this with something like pandas.merge()
in the scenario where there is slight numerical offset (e.g +/- 0.05).
Lets do merge_asof
with tolerance
parameter
pd.merge_asof(
df1.astype({'col1': 'float'}).sort_values('col1'),
df2.sort_values('col1'),
on='col1',
direction='nearest',
tolerance=.05
)
col1 col2 col3
0 1.0 a d
1 2.0 b e
2 3.0 c f
PS: if the dataframes are already sorted on col1
then there is no need to sort again.