Faster way to fetch rows from one dataframe based on values from another
Question:
I have one dataframe (df_lookup) which has 2 columns, another dataframe (df_master) that has over 100 columns.
I want to be able to pull all rows from df_master into a third dataframe if BOTH the values of two columns present in df_lookup match in df_master.
I am currently using the below brute force way of doing this:
for index, row in df_lookup.iterrows():
for index1, row1 in df_master.iterrows():
if ((row['col 1'] == row1['col1']) and (row['col 2'] == row1['col2'])):
df_new.loc[len(df_new)] = row1
I am hoping if there is a faster way of looking up since df_master is huge.
Answers:
the first way is using from merge method:
df = df_lookup.merge(df_master, how= 'inner', on=['col1', 'col2'])
You can use merge:
df_new = df_lookup.merge(df_master,left_on=['col 1', 'col 2'],right_on=['col1','col2'])
if col1 and col2 are strings or numbers or dates or each type that can convert to string format, you can use from the below code and it is faster than merge method:
df = df_master[(df_master.col1.map(str)+df_master.col2.map(str)).isin((df_lookup.col1.map(str)+df_lookup.col2.map(str)).unique())]
if your columns are strings, delete .map method and your code can run very fast
I have one dataframe (df_lookup) which has 2 columns, another dataframe (df_master) that has over 100 columns.
I want to be able to pull all rows from df_master into a third dataframe if BOTH the values of two columns present in df_lookup match in df_master.
I am currently using the below brute force way of doing this:
for index, row in df_lookup.iterrows():
for index1, row1 in df_master.iterrows():
if ((row['col 1'] == row1['col1']) and (row['col 2'] == row1['col2'])):
df_new.loc[len(df_new)] = row1
I am hoping if there is a faster way of looking up since df_master is huge.
the first way is using from merge method:
df = df_lookup.merge(df_master, how= 'inner', on=['col1', 'col2'])
You can use merge:
df_new = df_lookup.merge(df_master,left_on=['col 1', 'col 2'],right_on=['col1','col2'])
if col1 and col2 are strings or numbers or dates or each type that can convert to string format, you can use from the below code and it is faster than merge method:
df = df_master[(df_master.col1.map(str)+df_master.col2.map(str)).isin((df_lookup.col1.map(str)+df_lookup.col2.map(str)).unique())]
if your columns are strings, delete .map method and your code can run very fast