Python – count number of elements that are equal between two columns of two dataframes

Question:

I have two dataframes: df1, df2
that contain two columns, col1 and col2. I would like to calculate the number of elements in column col1 of df1 that are equal to col2 of df2. How can I do that?

Asked By: adrCoder

||

Answers:

I assume you’re using pandas.

One way is to simply use pd.merge and merge on the second column, and return the length of that column.

pd.merge(df1, df2, on="column_to_merge")

Pandas does an inner merge by default.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

Answered By: Gabriel

You can use Series.isin df1.col1.isin(df2.col2).sum():

df1 = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6]})
df2 = pd.DataFrame({'col2': [1, 3, 5, 7]})

nb_comon_elements = df1.col1.isin(df2.col2).sum()

assert nb_comon_elements == 3

Be cautious depending on your use case because:

df1 = pd.DataFrame({'col1': [1, 1, 1, 2, 7]})
df1.col1.isin(df2.col2).sum()

Would return 4 and not 2, because all 1 from df1.col1 are present in df2.col2. If that’s not the expected behaviour you could drop duplicates from df1.col1 before testing the intersection size:

df1.col1.drop_duplicates().isin(df2.col2).sum()

Which in this example would return 2.

To better understand why this is happening you can have look at what .isin is returning:

df1['isin df2.col2'] = df1.col1.isin(df2.col2)

Which gives:

   col1  isin df2.col2
0     1           True
1     1           True
2     1           True
3     2          False
4     7           True

Now .sum() adds up the booleans from column isin df2.col2 (a total of 4 True).

Answered By: cglacet
  1. We can use the eq() method of a Pandas DataFrame to compare the elements of two columns and return a boolean result indicating whether the elements are equal or not. You can then use the sum() method to count the number of True values.

For example:

import pandas as pd

# create two sample dataframes
df1 = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]})
df2 = pd.DataFrame({'col1': [1, 3, 5, 7], 'col2': [2, 4, 6, 8]})

# compare the elements of the 'col1' column in df1 and df2
comparison = df1['col1'].eq(df2['col1'])

# count the number of elements that are equal
count = comparison.sum()

print(count)

This would output 2, since there are two elements in the ‘col1’ column of df1 and df2 that are equal (1 and 3).

  1. We can also use the where() method to select only the rows in which the elements of the two columns are equal, and then use the count() method to count the number of rows.

For example:

import pandas as pd

# create two sample dataframes
df1 = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [5, 6, 7, 8]})
df2 = pd.DataFrame({'col1': [1, 3, 5, 7], 'col2': [2, 4, 6, 8]})

# select rows where the elements of the 'col1' column in df1 and df2 are equal
equal_rows = df1.where(df1['col1'].eq(df2['col1']))

# count the number of rows
count = equal_rows.count()['col1']

print(count)

This would also output 2.

Answered By: Ashutosh Dwivedi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.