pandas conditional merge on multiple columns
Question:
I have two dataframes of structured similar to:
conditions = pd.DataFrame({
'keywords_0':["a", "c", "e"],
'keywords_1':["b", "d", "f"],
'keywords_2':["00", "01", "02"],
'price': [1, 2 ,3] })
target = pd.DataFrame({
'keywords_0':["a", "c", "e"],
'keywords_1':["b", "d", np.nan],
'keywords_2':["00", np.nan, np.nan] })
conditions:
target:
expected result:
I would like to do inner merger of those with logic similar to: "look for first keys that match conditions.keywords_0 == target.keywords_0
and if target.keywords_1.isna()
then match on those rows but if it is not NA then proceed comparing next keywords.
That seems hard to do, is it possible ?
EDIT: Thank you for all of suggestions but I had to provide more information
Answers:
Not sure of the expected result but from what you describe I would do this:
conditions = pd.DataFrame({
'keywords_0':["a", "c", "e"],
'keywords_1':["b", "d", "f"]
})
target = pd.DataFrame({
'keywords_0':["a", "c"],
'keywords_1':["b", np.nan]
})
merged = pd.merge(conditions, target, on="keywords_0", how="inner")
mask = merged.apply(lambda x: x["keywords_1_y"] is np.nan or x["keywords_1_x"] == x["keywords_1_y"], axis=1)
result = merged[mask]
which gives
keywords_0 keywords_1_x keywords_1_y
0 a b b
1 c d NaN
We can merge with first layer key , then condition select by 2nd key
out = conditions.merge(target,on='keywords_0').query('(keywords_1_x == keywords_1_y) or (keywords_1_y != keywords_1_y)')
Out[41]:
keywords_0 keywords_1_x keywords_1_y
0 a b b
1 c d NaN
merged_df = conditions.merge(target, how='inner', on='keywords_0', suffix=['_cond','_target')
merged_df = merged_df.loc[merged_df['keywords_1_target'].isna()]
code above merges the two dataframes on keywords_0 and then removing any rows where keywords_1 within the target dataframe are na
I have ended up separating dataframe into parts and merging the parts conditional on the different parts of key
I have two dataframes of structured similar to:
conditions = pd.DataFrame({
'keywords_0':["a", "c", "e"],
'keywords_1':["b", "d", "f"],
'keywords_2':["00", "01", "02"],
'price': [1, 2 ,3] })
target = pd.DataFrame({
'keywords_0':["a", "c", "e"],
'keywords_1':["b", "d", np.nan],
'keywords_2':["00", np.nan, np.nan] })
conditions:
target:
expected result:
I would like to do inner merger of those with logic similar to: "look for first keys that match conditions.keywords_0 == target.keywords_0
and if target.keywords_1.isna()
then match on those rows but if it is not NA then proceed comparing next keywords.
That seems hard to do, is it possible ?
EDIT: Thank you for all of suggestions but I had to provide more information
Not sure of the expected result but from what you describe I would do this:
conditions = pd.DataFrame({
'keywords_0':["a", "c", "e"],
'keywords_1':["b", "d", "f"]
})
target = pd.DataFrame({
'keywords_0':["a", "c"],
'keywords_1':["b", np.nan]
})
merged = pd.merge(conditions, target, on="keywords_0", how="inner")
mask = merged.apply(lambda x: x["keywords_1_y"] is np.nan or x["keywords_1_x"] == x["keywords_1_y"], axis=1)
result = merged[mask]
which gives
keywords_0 keywords_1_x keywords_1_y
0 a b b
1 c d NaN
We can merge with first layer key , then condition select by 2nd key
out = conditions.merge(target,on='keywords_0').query('(keywords_1_x == keywords_1_y) or (keywords_1_y != keywords_1_y)')
Out[41]:
keywords_0 keywords_1_x keywords_1_y
0 a b b
1 c d NaN
merged_df = conditions.merge(target, how='inner', on='keywords_0', suffix=['_cond','_target')
merged_df = merged_df.loc[merged_df['keywords_1_target'].isna()]
code above merges the two dataframes on keywords_0 and then removing any rows where keywords_1 within the target dataframe are na
I have ended up separating dataframe into parts and merging the parts conditional on the different parts of key