Create a boolean mask by matching the full rows of two dataframes

Question:

I have two dataframes each containing two columns of American states and towns. I want to create a new column in the first dataframe that has boolean values that indicate if which the towns paired with their states are in the second dataframe.

example:

df = pd.DataFrame({'countries':['france', 'germany', 'spain', 'uk', 'norway', 'italy'], 
                   'capitals':['paris', 'berlin', 'madrid', 'london', 'oslo', 'rome']})

df2 = pd.DataFrame({'countries':['france', 'spain', 'uk', 'italy'], 
                   'capitals':['paris', 'madrid', 'london', 'rome']})

df

  countries capitals
0    france    paris
1   germany   berlin
2     spain   madrid
3        uk   london
4    norway     oslo
5     italy     rome

df2

  countries capitals
0    france    paris
1     spain   madrid
2        uk   london
3     italy     rome

what I want to do is

df> countries  capitals  bool
    france     paris     True
    germany    berlin    False
    spain      madrid    True
    uk         london    True
    norway     oslo      False
    italy      rome      True

Thank you!

Asked By: Qdr

||

Answers:

Perform a FULL OUTER JOIN with an indicator.

u = df.merge(df2, how='outer', indicator='bool')
u['bool'] = u['bool'] == 'both'
u

  countries capitals   bool
0    france    paris   True
1   germany   berlin  False
2     spain   madrid   True
3        uk   london   True
4    norway     oslo  False
5     italy     rome   True

In the intermediate step, we see

df.merge(df2, how='outer', indicator='bool')

  countries capitals       bool
0    france    paris       both
1   germany   berlin  left_only
2     spain   madrid       both
3        uk   london       both
4    norway     oslo  left_only
5     italy     rome       both

indicator specifies where the row is present. We now want to mark all the rows where “bool” shows “both” (to get your intended output).

Answered By: cs95
df = pd.DataFrame({'countries':['france', 'germany', 'spain', 'uk', 'norway', 'italy'], 
                   'capitals':['paris', 'berlin', 'madrid', 'london', 'oslo', 'rome']})

df2 = pd.DataFrame({'countries':['france', 'spain', 'uk', 'italy'], 
                   'capitals':['paris', 'madrid', 'london', 'rome']})

df['bool'] = False

# Loop efficiently through pandas data frame
for idx, row in df.iterrows():
    if row.countries in df2.countries.values:
        df.loc[idx, 'bool'] = True 

print(df)
  countries capitals   bool
0    france    paris   True
1   germany   berlin  False
2     spain   madrid   True
3        uk   london   True
4    norway     oslo  False
5     italy     rome   True
Answered By: Nathaniel

Method isin will do the trick:

>>> df1['bool'] = df1['countries'].isin(df2['countries'].values)
>>> df1
  countries capitals   bool
0    france    paris   True
1   germany   berlin  False
2     spain   madrid   True
3        uk   london   True
4    norway     oslo  False
5     italy     rome   True
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.