Check if two dataframes have the same values in the column using .isin in koalas dataframe

Question:

I am having a small issue in comparing two dataframes and the dataframes are detailed as below.
The dataframes detailed below are all in koalas.

import databricks.koalas as ks


mini_team_df_1 = ks.DataFrame(['0000340b'], columns = ['team_code'])

mini_receipt_df_2 = ks.DataFrame(['0000340b'], columns = ['team_code'])

mini_receipt_df_2['match_flag'] = mini_receipt_df_2['team_code'].isin(ks.DataFrame(mini_team_df_1))

mini_receipt_df_2

I am executing this code on databricks and I expect the mini_receipt_df_2 should have the output as follows:

    team_code   match_flag

0   0000340b     True

But in my code shown above, the output is as follows:

    team_code   match_flag
0   0000340b     False

This makes no sense to me as using the .isin function would give me the True value for team_code = 0000340b as this is the same in both dataframes.

May someone help me understand what is wrong?

Thank you

Asked By: Anna

||

Answers:

Try this:

mini_receipt_df_2['match_flag'] = np.isin(mini_team_df_1['team_code'].to_numpy(), mini_receipt_df_2['team_code'])

Output:

>>> mini_receipt_df_2
  team_code  match_flag
0  0000340b        True
Answered By: user17242583
mini_receipt_df_2.merge(mini_team_df_1,how='left',suffixes=[None,'_2'])
    .assign(match_flag=True)

out:

  team_code  match_flag
0  0000340b        True
Answered By: G.G