Map dataFrame values to another DataFrame
Question:
I have these two dataFrames
data1 = [[1,'A'],[2,'B'],[3,'C'],[4,'D'],[5,'E']]
data2 = [1,1,1,1,2,5,4,3]
df1 = pd.DataFrame(data1,columns = ['one','two'])
df2 = pd.DataFrame(data2,columns = ['one'])
I want to map all values of df2 of column one
with df1 of column two
. In simple terms i want to use df1 as a dictionary. I want output like this for df2
one
0 A
1 A
2 A
3 A
4 B
5 E
6 D
7 C
I am doing this
df2['one']= df2['one'].apply(lambda x: df1.two[df1.one == x])
Which gives me output
one
0 A
1 A
2 A
3 A
4 NaN
5 NaN
6 NaN
7 NaN
All A is correct but why latter all are NaN?
Answers:
Try this, much better syntax and functionality over using apply
with a lambda function:
df2['one'].map(df1.set_index('one')['two'])
Output:
0 A
1 A
2 A
3 A
4 B
5 E
6 D
7 C
Name: one, dtype: object
Why your method doesn’t work…. Look at the output of :
df2['one'].apply(lambda x: df1.two[df1.one == x])
Output:
0 1 2 3 4
0 A NaN NaN NaN NaN
1 A NaN NaN NaN NaN
2 A NaN NaN NaN NaN
3 A NaN NaN NaN NaN
4 NaN B NaN NaN NaN
5 NaN NaN NaN NaN E
6 NaN NaN NaN D NaN
7 NaN NaN C NaN NaN
Because of index alignment in pandas only the first column, 0. get assigned. Here, you are using pd.Series.apply
where you are applying the lambda function over the elements of a series and assigning it back to a dataFrame causing your mal-alignment issues.
dict df1 columns and map to df2.
df2.one=df2.one.map(dict(zip(df1.one,df1.two)))
one
0 A
1 A
2 A
3 A
4 B
5 E
6 D
7 C
you can achieve that by performing a join.
import pandas as pd
data1 = [[1,'A'],[2,'B'],[3,'C'],[4,'D'],[5,'E']]
data2 = [1,1,1,1,2,5,4,3]
df1 = pd.DataFrame(data1,columns = ['one','two'])
df2 = pd.DataFrame(data2,columns = ['one'])
print(df1)
print(df2)
merge_df = pd.merge(df1,df2, on=['one'])[['two']]
print(merge_df)
output
two
0 A
1 A
2 A
3 A
4 B
5 C
6 D
7 E
df2.one = df2.one.map(dict(zip(df1.one,df1.two)))
I have tried this solution and it works for me:
I have these two dataFrames
data1 = [[1,'A'],[2,'B'],[3,'C'],[4,'D'],[5,'E']]
data2 = [1,1,1,1,2,5,4,3]
df1 = pd.DataFrame(data1,columns = ['one','two'])
df2 = pd.DataFrame(data2,columns = ['one'])
I want to map all values of df2 of column one
with df1 of column two
. In simple terms i want to use df1 as a dictionary. I want output like this for df2
one
0 A
1 A
2 A
3 A
4 B
5 E
6 D
7 C
I am doing this
df2['one']= df2['one'].apply(lambda x: df1.two[df1.one == x])
Which gives me output
one
0 A
1 A
2 A
3 A
4 NaN
5 NaN
6 NaN
7 NaN
All A is correct but why latter all are NaN?
Try this, much better syntax and functionality over using apply
with a lambda function:
df2['one'].map(df1.set_index('one')['two'])
Output:
0 A
1 A
2 A
3 A
4 B
5 E
6 D
7 C
Name: one, dtype: object
Why your method doesn’t work…. Look at the output of :
df2['one'].apply(lambda x: df1.two[df1.one == x])
Output:
0 1 2 3 4
0 A NaN NaN NaN NaN
1 A NaN NaN NaN NaN
2 A NaN NaN NaN NaN
3 A NaN NaN NaN NaN
4 NaN B NaN NaN NaN
5 NaN NaN NaN NaN E
6 NaN NaN NaN D NaN
7 NaN NaN C NaN NaN
Because of index alignment in pandas only the first column, 0. get assigned. Here, you are using pd.Series.apply
where you are applying the lambda function over the elements of a series and assigning it back to a dataFrame causing your mal-alignment issues.
dict df1 columns and map to df2.
df2.one=df2.one.map(dict(zip(df1.one,df1.two)))
one
0 A
1 A
2 A
3 A
4 B
5 E
6 D
7 C
you can achieve that by performing a join.
import pandas as pd
data1 = [[1,'A'],[2,'B'],[3,'C'],[4,'D'],[5,'E']]
data2 = [1,1,1,1,2,5,4,3]
df1 = pd.DataFrame(data1,columns = ['one','two'])
df2 = pd.DataFrame(data2,columns = ['one'])
print(df1)
print(df2)
merge_df = pd.merge(df1,df2, on=['one'])[['two']]
print(merge_df)
output
two
0 A
1 A
2 A
3 A
4 B
5 C
6 D
7 E
df2.one = df2.one.map(dict(zip(df1.one,df1.two)))
I have tried this solution and it works for me: