Change Values in Dataframe with Values in Some Other Columns in Other Dataframe
Question:
I want to change values in my datframe
student = pd.DataFrame({'id': [1,2,3,4,5,6,7,8,9,10,],
'homeground': ['TOKYO','SOUTH KOREA','RIYADH','JAPAN','TOKYO','OSAKA','SAUDI ARABIA','SEOUL','','BUSAN']})
this is the master homegroud
hg = pd.DataFrame({'id_country':[1,2,2,3,3,3,3],
'country': ['SAUDI ARABIA','SOUTH KOREA','SOUTH KOREA','JAPAN','JAPAN','JAPAN','JAPAN'],
'id_city':[1,2,3,4,5,6,7],
'city': ['RIYADH','SEOUL','BUSAN','TOKYO','TOKYO','OSAKA','OSAKA']})
I want to change homeground values in student so the result will be like this
id homeground
1 4
2 2
3 1
4 3
5 4
6 6
7 1
8 2
9 0
10 3
Answers:
Use Series.map
by city
, then by country
with lowercase and removed duplicates and last replace missing values to 0
if no match, because duplicates are removed all dupes without first values in both mappings:
s1 = student.homeground.map(hg.drop_duplicates(['city']).set_index('city')['id_city'])
s = hg.drop_duplicates(['country']).set_index('country')['id_country'].rename(str.lower)
s2 = student.homeground.str.lower().map(s)
student['homeground'] = s1.fillna(s2).fillna(0, downcast='int')
print (student)
id homeground
0 1 4
1 2 2
2 3 1
3 4 3
4 5 4
5 6 6
6 7 1
7 8 2
8 9 0
9 10 3
EDIT: If need avoid duplicates – output are unique values in lists:
s11 = hg.drop_duplicates(['city','id_city']).groupby('city')['id_city'].agg(list)
s1 = student.homeground.map(s11)
s22 = (hg.drop_duplicates(['country','id_country'])
.groupby('country')['id_country'].agg(list).rename(str.lower))
s2 = student.homeground.str.lower().map(s22)
student['homeground'] = s1.fillna(s2).fillna(0, downcast='int')
print (student)
id homeground
0 1 [4, 5]
1 2 [2]
2 3 [1]
3 4 [3]
4 5 [4, 5]
5 6 [6, 7]
6 7 [1]
7 8 [2]
8 9 0
9 10 [3]
Or in joined values by ,
:
s11 = (hg.drop_duplicates(['city','id_city'])
.assign(id_city = lambda x: x['id_city'].astype(str))
.groupby('city')['id_city'].agg(','.join))
s1 = student.homeground.map(s11)
s22 = (hg.drop_duplicates(['country','id_country'])
.assign(id_country = lambda x: x['id_country'].astype(str))
.groupby('country')['id_country']
.agg(','.join).rename(str.lower))
s2 = student.homeground.str.lower().map(s22)
student['homeground'] = s1.fillna(s2).fillna('0', downcast='int')
print (student)
id homeground
0 1 4,5
1 2 2
2 3 1
3 4 3
4 5 4,5
5 6 6,7
6 7 1
7 8 2
8 9 0
9 10 3
I want to change values in my datframe
student = pd.DataFrame({'id': [1,2,3,4,5,6,7,8,9,10,],
'homeground': ['TOKYO','SOUTH KOREA','RIYADH','JAPAN','TOKYO','OSAKA','SAUDI ARABIA','SEOUL','','BUSAN']})
this is the master homegroud
hg = pd.DataFrame({'id_country':[1,2,2,3,3,3,3],
'country': ['SAUDI ARABIA','SOUTH KOREA','SOUTH KOREA','JAPAN','JAPAN','JAPAN','JAPAN'],
'id_city':[1,2,3,4,5,6,7],
'city': ['RIYADH','SEOUL','BUSAN','TOKYO','TOKYO','OSAKA','OSAKA']})
I want to change homeground values in student so the result will be like this
id homeground
1 4
2 2
3 1
4 3
5 4
6 6
7 1
8 2
9 0
10 3
Use Series.map
by city
, then by country
with lowercase and removed duplicates and last replace missing values to 0
if no match, because duplicates are removed all dupes without first values in both mappings:
s1 = student.homeground.map(hg.drop_duplicates(['city']).set_index('city')['id_city'])
s = hg.drop_duplicates(['country']).set_index('country')['id_country'].rename(str.lower)
s2 = student.homeground.str.lower().map(s)
student['homeground'] = s1.fillna(s2).fillna(0, downcast='int')
print (student)
id homeground
0 1 4
1 2 2
2 3 1
3 4 3
4 5 4
5 6 6
6 7 1
7 8 2
8 9 0
9 10 3
EDIT: If need avoid duplicates – output are unique values in lists:
s11 = hg.drop_duplicates(['city','id_city']).groupby('city')['id_city'].agg(list)
s1 = student.homeground.map(s11)
s22 = (hg.drop_duplicates(['country','id_country'])
.groupby('country')['id_country'].agg(list).rename(str.lower))
s2 = student.homeground.str.lower().map(s22)
student['homeground'] = s1.fillna(s2).fillna(0, downcast='int')
print (student)
id homeground
0 1 [4, 5]
1 2 [2]
2 3 [1]
3 4 [3]
4 5 [4, 5]
5 6 [6, 7]
6 7 [1]
7 8 [2]
8 9 0
9 10 [3]
Or in joined values by ,
:
s11 = (hg.drop_duplicates(['city','id_city'])
.assign(id_city = lambda x: x['id_city'].astype(str))
.groupby('city')['id_city'].agg(','.join))
s1 = student.homeground.map(s11)
s22 = (hg.drop_duplicates(['country','id_country'])
.assign(id_country = lambda x: x['id_country'].astype(str))
.groupby('country')['id_country']
.agg(','.join).rename(str.lower))
s2 = student.homeground.str.lower().map(s22)
student['homeground'] = s1.fillna(s2).fillna('0', downcast='int')
print (student)
id homeground
0 1 4,5
1 2 2
2 3 1
3 4 3
4 5 4,5
5 6 6,7
6 7 1
7 8 2
8 9 0
9 10 3