Faster way to fetch rows from one dataframe based on values from another

Question:

I have one dataframe (df_lookup) which has 2 columns, another dataframe (df_master) that has over 100 columns.

I want to be able to pull all rows from df_master into a third dataframe if BOTH the values of two columns present in df_lookup match in df_master.

I am currently using the below brute force way of doing this:

for index, row in df_lookup.iterrows():    
    for index1, row1 in df_master.iterrows():
        if ((row['col 1'] == row1['col1']) and (row['col 2'] == row1['col2'])):
            df_new.loc[len(df_new)] = row1

I am hoping if there is a faster way of looking up since df_master is huge.

Asked By: Krishna

||

Answers:

the first way is using from merge method:

df = df_lookup.merge(df_master, how= 'inner', on=['col1', 'col2'])
Answered By: Alireza

You can use merge:

df_new = df_lookup.merge(df_master,left_on=['col 1', 'col 2'],right_on=['col1','col2'])
Answered By: A.Demiraj

if col1 and col2 are strings or numbers or dates or each type that can convert to string format, you can use from the below code and it is faster than merge method:

df = df_master[(df_master.col1.map(str)+df_master.col2.map(str)).isin((df_lookup.col1.map(str)+df_lookup.col2.map(str)).unique())]

if your columns are strings, delete .map method and your code can run very fast

Answered By: Alireza
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.