Is there a way to do vlookup using python?
Question:
Lets say I have two dataframes df1 and df2 and I need to do vlookup on name and give out names which are matching.
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'name': ['A', 'B', 'C', 'D'],
'val1': [5, 6, 7, 8],
'val2': [1, 2, 3, 4],
})
df2 = pd.DataFrame({
'name': ['B', 'D', 'E', 'F'],
'abc': [15, 16, 17, 18],
'def': [11, 21, 31, 41],
})
Expected Output:
name val1 val2 matched_name
A 5 1 NaN
B 6 2 B
C 7 3 NaN
D 8 4 D
I thought this could be done by:
df1['matched_name'] = df1['name'].map(df2['name'])
But I’m getting all NaN’s in matched column. Is there a way to do this?
Answers:
It’s not really a vlookup, but you can use where
and isin
:
df1['matched_name'] = df1['name'].where(df1['name'].isin(df2['name']))
A more convoluted way, using a merge
, which allows you to also add other columns if needed:
out = df1.merge(df2[['name']].rename(columns={'name': 'matched_name'}),
left_on='name', right_on='matched_name', how='left')
Output:
name val1 val2 matched_name
0 A 5 1 NaN
1 B 6 2 B
2 C 7 3 NaN
3 D 8 4 D
You can use the numpy where which will see if the name exists in the other dataframes name column and since the names are exact match, you can use the value in first dataframe:
df1['matched_name']=np.where(df1.name.isin(df2.name),df1.name,np.nan)
This gives the output as:
name val1 val2 matched_name
0 A 5 1 NaN
1 B 6 2 B
2 C 7 3 NaN
3 D 8 4 D
Lets say I have two dataframes df1 and df2 and I need to do vlookup on name and give out names which are matching.
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'name': ['A', 'B', 'C', 'D'],
'val1': [5, 6, 7, 8],
'val2': [1, 2, 3, 4],
})
df2 = pd.DataFrame({
'name': ['B', 'D', 'E', 'F'],
'abc': [15, 16, 17, 18],
'def': [11, 21, 31, 41],
})
Expected Output:
name val1 val2 matched_name
A 5 1 NaN
B 6 2 B
C 7 3 NaN
D 8 4 D
I thought this could be done by:
df1['matched_name'] = df1['name'].map(df2['name'])
But I’m getting all NaN’s in matched column. Is there a way to do this?
It’s not really a vlookup, but you can use where
and isin
:
df1['matched_name'] = df1['name'].where(df1['name'].isin(df2['name']))
A more convoluted way, using a merge
, which allows you to also add other columns if needed:
out = df1.merge(df2[['name']].rename(columns={'name': 'matched_name'}),
left_on='name', right_on='matched_name', how='left')
Output:
name val1 val2 matched_name
0 A 5 1 NaN
1 B 6 2 B
2 C 7 3 NaN
3 D 8 4 D
You can use the numpy where which will see if the name exists in the other dataframes name column and since the names are exact match, you can use the value in first dataframe:
df1['matched_name']=np.where(df1.name.isin(df2.name),df1.name,np.nan)
This gives the output as:
name val1 val2 matched_name
0 A 5 1 NaN
1 B 6 2 B
2 C 7 3 NaN
3 D 8 4 D