Python -How to compare columns from two dataframe and create 3rd with new values?
Question:
I have two dataframes that contains names. What I am need to do is to check which of the names in second dataframe are not present in the first dataframe.
For this example
list1 = ['Mark','Sofi','Joh','Leo','Jason']
df1 = pd.DataFrame(list1, columns =['Names'])
and
list2 = ['Mark','Sofi','David','Matt','Jason']
df2 = pd.DataFrame(list2, columns =['Names'])
So basically I in this simple example we can see that David and Matt from second dataframe do not exist in the first dataframe.
I need programmatically to come up with 3rd dataframe that will have results like this:
Names
David
Matt
My first thought was to try using pandas merge function but I am unable to get the unique set of names from df2 that are not in df1.
Any thoughts on how to do this?
Answers:
You can create the 3rd dataframe filtering the 2nd with a condition like this..
df3 = df2[~df2['Names'].isin(df1['Names'])]
You can also use merge
with indicator
:
>>> df1.merge(df2, on='Names', how='outer', indicator='exist')
Names exist
0 Mark both
1 Sofi both
2 Joh left_only
3 Leo left_only
4 Jason both
5 David right_only
6 Matt right_only
>>> (df1.merge(df2, on='Names', how='outer', indicator='exist')
.loc[lambda x: x.pop('exist') == 'right_only'])
Names
5 David
6 Matt
Input dataframes:
list1 = ['Mark','Sofi','Joh','Leo','Jason']
df1 = pd.DataFrame(list1, columns =['Names'])
list2 = ['Mark','Sofi','David','Matt','Jason']
df2 = pd.DataFrame(list2, columns =['Names'])
Here is another approach,
key_diff = set(df2.Names).difference(df1.Names)
where_diff = df2.Names.isin(key_diff)
df3 = df2[where_diff]
Refer this link for more
Using Set Operations
df3 = pd.DataFrame(set(list2) - set(list1), columns= ["Names"])
I have two dataframes that contains names. What I am need to do is to check which of the names in second dataframe are not present in the first dataframe.
For this example
list1 = ['Mark','Sofi','Joh','Leo','Jason']
df1 = pd.DataFrame(list1, columns =['Names'])
and
list2 = ['Mark','Sofi','David','Matt','Jason']
df2 = pd.DataFrame(list2, columns =['Names'])
So basically I in this simple example we can see that David and Matt from second dataframe do not exist in the first dataframe.
I need programmatically to come up with 3rd dataframe that will have results like this:
Names
David
Matt
My first thought was to try using pandas merge function but I am unable to get the unique set of names from df2 that are not in df1.
Any thoughts on how to do this?
You can create the 3rd dataframe filtering the 2nd with a condition like this..
df3 = df2[~df2['Names'].isin(df1['Names'])]
You can also use merge
with indicator
:
>>> df1.merge(df2, on='Names', how='outer', indicator='exist')
Names exist
0 Mark both
1 Sofi both
2 Joh left_only
3 Leo left_only
4 Jason both
5 David right_only
6 Matt right_only
>>> (df1.merge(df2, on='Names', how='outer', indicator='exist')
.loc[lambda x: x.pop('exist') == 'right_only'])
Names
5 David
6 Matt
Input dataframes:
list1 = ['Mark','Sofi','Joh','Leo','Jason']
df1 = pd.DataFrame(list1, columns =['Names'])
list2 = ['Mark','Sofi','David','Matt','Jason']
df2 = pd.DataFrame(list2, columns =['Names'])
Here is another approach,
key_diff = set(df2.Names).difference(df1.Names)
where_diff = df2.Names.isin(key_diff)
df3 = df2[where_diff]
Refer this link for more
Using Set Operations
df3 = pd.DataFrame(set(list2) - set(list1), columns= ["Names"])