Create a boolean mask by matching the full rows of two dataframes
Question:
I have two dataframes each containing two columns of American states and towns. I want to create a new column in the first dataframe that has boolean values that indicate if which the towns paired with their states are in the second dataframe.
example:
df = pd.DataFrame({'countries':['france', 'germany', 'spain', 'uk', 'norway', 'italy'],
'capitals':['paris', 'berlin', 'madrid', 'london', 'oslo', 'rome']})
df2 = pd.DataFrame({'countries':['france', 'spain', 'uk', 'italy'],
'capitals':['paris', 'madrid', 'london', 'rome']})
df
countries capitals
0 france paris
1 germany berlin
2 spain madrid
3 uk london
4 norway oslo
5 italy rome
df2
countries capitals
0 france paris
1 spain madrid
2 uk london
3 italy rome
what I want to do is
df> countries capitals bool
france paris True
germany berlin False
spain madrid True
uk london True
norway oslo False
italy rome True
Thank you!
Answers:
Perform a FULL OUTER JOIN with an indicator.
u = df.merge(df2, how='outer', indicator='bool')
u['bool'] = u['bool'] == 'both'
u
countries capitals bool
0 france paris True
1 germany berlin False
2 spain madrid True
3 uk london True
4 norway oslo False
5 italy rome True
In the intermediate step, we see
df.merge(df2, how='outer', indicator='bool')
countries capitals bool
0 france paris both
1 germany berlin left_only
2 spain madrid both
3 uk london both
4 norway oslo left_only
5 italy rome both
indicator
specifies where the row is present. We now want to mark all the rows where “bool” shows “both” (to get your intended output).
df = pd.DataFrame({'countries':['france', 'germany', 'spain', 'uk', 'norway', 'italy'],
'capitals':['paris', 'berlin', 'madrid', 'london', 'oslo', 'rome']})
df2 = pd.DataFrame({'countries':['france', 'spain', 'uk', 'italy'],
'capitals':['paris', 'madrid', 'london', 'rome']})
df['bool'] = False
# Loop efficiently through pandas data frame
for idx, row in df.iterrows():
if row.countries in df2.countries.values:
df.loc[idx, 'bool'] = True
print(df)
countries capitals bool
0 france paris True
1 germany berlin False
2 spain madrid True
3 uk london True
4 norway oslo False
5 italy rome True
Method isin
will do the trick:
>>> df1['bool'] = df1['countries'].isin(df2['countries'].values)
>>> df1
countries capitals bool
0 france paris True
1 germany berlin False
2 spain madrid True
3 uk london True
4 norway oslo False
5 italy rome True
I have two dataframes each containing two columns of American states and towns. I want to create a new column in the first dataframe that has boolean values that indicate if which the towns paired with their states are in the second dataframe.
example:
df = pd.DataFrame({'countries':['france', 'germany', 'spain', 'uk', 'norway', 'italy'],
'capitals':['paris', 'berlin', 'madrid', 'london', 'oslo', 'rome']})
df2 = pd.DataFrame({'countries':['france', 'spain', 'uk', 'italy'],
'capitals':['paris', 'madrid', 'london', 'rome']})
df
countries capitals
0 france paris
1 germany berlin
2 spain madrid
3 uk london
4 norway oslo
5 italy rome
df2
countries capitals
0 france paris
1 spain madrid
2 uk london
3 italy rome
what I want to do is
df> countries capitals bool
france paris True
germany berlin False
spain madrid True
uk london True
norway oslo False
italy rome True
Thank you!
Perform a FULL OUTER JOIN with an indicator.
u = df.merge(df2, how='outer', indicator='bool')
u['bool'] = u['bool'] == 'both'
u
countries capitals bool
0 france paris True
1 germany berlin False
2 spain madrid True
3 uk london True
4 norway oslo False
5 italy rome True
In the intermediate step, we see
df.merge(df2, how='outer', indicator='bool')
countries capitals bool
0 france paris both
1 germany berlin left_only
2 spain madrid both
3 uk london both
4 norway oslo left_only
5 italy rome both
indicator
specifies where the row is present. We now want to mark all the rows where “bool” shows “both” (to get your intended output).
df = pd.DataFrame({'countries':['france', 'germany', 'spain', 'uk', 'norway', 'italy'],
'capitals':['paris', 'berlin', 'madrid', 'london', 'oslo', 'rome']})
df2 = pd.DataFrame({'countries':['france', 'spain', 'uk', 'italy'],
'capitals':['paris', 'madrid', 'london', 'rome']})
df['bool'] = False
# Loop efficiently through pandas data frame
for idx, row in df.iterrows():
if row.countries in df2.countries.values:
df.loc[idx, 'bool'] = True
print(df)
countries capitals bool
0 france paris True
1 germany berlin False
2 spain madrid True
3 uk london True
4 norway oslo False
5 italy rome True
Method isin
will do the trick:
>>> df1['bool'] = df1['countries'].isin(df2['countries'].values)
>>> df1
countries capitals bool
0 france paris True
1 germany berlin False
2 spain madrid True
3 uk london True
4 norway oslo False
5 italy rome True