pandas conditional merge on multiple columns

Question:

I have two dataframes of structured similar to:

conditions = pd.DataFrame({
    'keywords_0':["a", "c", "e"], 
    'keywords_1':["b", "d", "f"],
    'keywords_2':["00", "01", "02"],
    'price': [1, 2 ,3] })
target = pd.DataFrame({
    'keywords_0':["a", "c", "e"], 
    'keywords_1':["b", "d", np.nan],
    'keywords_2':["00", np.nan, np.nan] })

conditions:

enter image description here

target:

enter image description here

expected result:

enter image description here

I would like to do inner merger of those with logic similar to: "look for first keys that match conditions.keywords_0 == target.keywords_0 and if target.keywords_1.isna() then match on those rows but if it is not NA then proceed comparing next keywords.

That seems hard to do, is it possible ?

EDIT: Thank you for all of suggestions but I had to provide more information

Asked By: euh

||

Answers:

Not sure of the expected result but from what you describe I would do this:

conditions = pd.DataFrame({
    'keywords_0':["a", "c", "e"], 
    'keywords_1':["b", "d", "f"]
                })
target = pd.DataFrame({
    'keywords_0':["a", "c"], 
    'keywords_1':["b", np.nan]
                })


merged = pd.merge(conditions, target, on="keywords_0", how="inner")
mask = merged.apply(lambda x: x["keywords_1_y"] is np.nan or x["keywords_1_x"] == x["keywords_1_y"], axis=1)
result = merged[mask]

which gives

  keywords_0 keywords_1_x keywords_1_y
0          a            b            b
1          c            d          NaN

We can merge with first layer key , then condition select by 2nd key

out = conditions.merge(target,on='keywords_0').query('(keywords_1_x == keywords_1_y) or (keywords_1_y != keywords_1_y)')
Out[41]: 
  keywords_0 keywords_1_x keywords_1_y
0          a            b            b
1          c            d          NaN
Answered By: BENY
merged_df = conditions.merge(target, how='inner', on='keywords_0', suffix=['_cond','_target')
merged_df = merged_df.loc[merged_df['keywords_1_target'].isna()]

code above merges the two dataframes on keywords_0 and then removing any rows where keywords_1 within the target dataframe are na

Answered By: Hillygoose

I have ended up separating dataframe into parts and merging the parts conditional on the different parts of key

Answered By: euh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.