Update subset of values from one column based on another dataframe
Question:
I have a dataframe df1
:-
Store_id
fruit
region
1
orange
x
2
apple
y
3
NotKnown
z
5
Notknown
q
6
banana
w
I have a dataframe df2
:-
Store_id
fruit
region
1
orange
x
2
apple
y
3
pears
z
5
strawberry
q
6
banana
w
8
mango
i
Expected df1
:-
Store_id
fruit
region
1
orange
x
2
apple
y
3
pears
z
5
strawberry
q
6
banana
w
Store_id
is the primary key.
How do I update column fruit
of df1
based on df2
column fruit
for value NotKnown
under fruit
column of df1
Answers:
reg_to_fru = df2.set_index("region")["fruit"]
df1.fruit = df1.region.map(reg_to_fru)
you can form a mapper (a Series) from df2
as region -> fruit, then map the region in df1
with it:
In [39]: reg_to_fru = df2.set_index("region")["fruit"]
In [40]: reg_to_fru
Out[40]:
region
x orange
y apple
z pears
q strawberry
w banana
i mango
Name: fruit, dtype: object
In [41]: df1.fruit = df1.region.map(reg_to_fru)
In [42]: df1
Out[42]:
Store_id fruit region
0 1 orange x
1 2 apple y
2 3 pears z
3 5 strawberry q
4 6 banana w
Let’s try
df1['fruit'] = (df1['fruit'].str.lower().replace('notknown', pd.NA)
.fillna(df1.merge(df2, on=['Store_id', 'region'], how='left')['fruit_y']))
print(df1)
Store_id fruit region
0 1 orange x
1 2 apple y
2 3 pears z
3 5 strawberry q
4 6 banana w
I have a dataframe df1
:-
Store_id | fruit | region |
---|---|---|
1 | orange | x |
2 | apple | y |
3 | NotKnown | z |
5 | Notknown | q |
6 | banana | w |
I have a dataframe df2
:-
Store_id | fruit | region |
---|---|---|
1 | orange | x |
2 | apple | y |
3 | pears | z |
5 | strawberry | q |
6 | banana | w |
8 | mango | i |
Expected df1
:-
Store_id | fruit | region |
---|---|---|
1 | orange | x |
2 | apple | y |
3 | pears | z |
5 | strawberry | q |
6 | banana | w |
Store_id
is the primary key.
How do I update column fruit
of df1
based on df2
column fruit
for value NotKnown
under fruit
column of df1
reg_to_fru = df2.set_index("region")["fruit"]
df1.fruit = df1.region.map(reg_to_fru)
you can form a mapper (a Series) from df2
as region -> fruit, then map the region in df1
with it:
In [39]: reg_to_fru = df2.set_index("region")["fruit"]
In [40]: reg_to_fru
Out[40]:
region
x orange
y apple
z pears
q strawberry
w banana
i mango
Name: fruit, dtype: object
In [41]: df1.fruit = df1.region.map(reg_to_fru)
In [42]: df1
Out[42]:
Store_id fruit region
0 1 orange x
1 2 apple y
2 3 pears z
3 5 strawberry q
4 6 banana w
Let’s try
df1['fruit'] = (df1['fruit'].str.lower().replace('notknown', pd.NA)
.fillna(df1.merge(df2, on=['Store_id', 'region'], how='left')['fruit_y']))
print(df1)
Store_id fruit region
0 1 orange x
1 2 apple y
2 3 pears z
3 5 strawberry q
4 6 banana w