Compare two pandas dataframes and create column if condition is met

Question:

Suppose I have these two dataframes:

df1=

ID   index 

087    4
087    5
087    6
df2=

ID   index

087    1
087    2
087    3
...
087    10
087    11
087    12
...

And I would like to compare/join them based on the ID and index columns and then create a ‘pred’ column in the rows that df1 also has. So that the resulting df would look like this:

result_df=

ID   index  pred

087    1     0
087    2     0
087    3     0
087    4     1
087    5     1
087    6     1
087    7     0
...
087    12    0
...

Does anybody have a neat solution to this?

Asked By: John

||

Answers:

If you want to merge them you can use this code:

import pandas as pd

new_df = pd.merge(df1, df2, how='inner', on = 'ID')
new_df
Answered By: Hisham

I would do it following way

import pandas as pd
df1 = pd.DataFrame({"ID":[87,87,87],"index":[4,5,6]})
df2 = pd.DataFrame({"ID":[87,87,87,87,87,87],"index":[1,2,3,10,11,12]})
df1["pred"] = 1 # mark rows of df1
df = df1.merge(df2,"outer",["ID","index"])
df.pred = df.pred.fillna(0.0)
print(df)

output

   ID  index  pred
0  87      4   1.0
1  87      5   1.0
2  87      6   1.0
3  87      1   0.0
4  87      2   0.0
5  87      3   0.0
6  87     10   0.0
7  87     11   0.0
8  87     12   0.0

Explanation: I add pred column to df1 so after OUTER merge rows from df1 will have set pred value, whilst those from df2 will have NaN which I then replace using zeros.

Answered By: Daweo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.