Replace multiple values in a column for a subset of the dataframe in python
Question:
I’d like to replace multiple values of a column but only for selected rows (that is a subset of the data) defined by a condition on a different column in python. For example for the table below I’d like to replace 22000 with twentytwo and 23000 with twentythree but only if Courses is equal to Spark as shown below.
Here is code for the initial dataframe.
df = pd.DataFrame({
'Courses': ["Spark", "PySpark", "Spark", "Spark", "PySpark", "PySpark"],
'Fee': [22000, 25000, 23000, 24000, 22000, 23000],
})
Courses
Fee
0
Spark
22000
1
PySpark
25000
2
Spark
23000
3
Spark
24000
4
PySpark
22000
5
PySpark
23000
Below is the desired outcome.
Courses
Fee
0
Spark
twentytwo
1
PySpark
25000
2
Spark
twentythree
3
Spark
24000
4
PySpark
22000
5
PySpark
23000
I have tried using df["Fee"].replace({22000:'twentytwo', 23000:'twentythree'}, inplace=True)
but this results in replacement of all the specified values in the column and not just where Courses is equal to Spark.
Answers:
First, we need a more stringent selector in the form of df[x, y]
where x
selects rows (where Courses equals Spark) and y
selects columns ("Fee"):
df[df["Courses"]=="Spark", "Fee"]
We can then try using replace()
, but pandas will tell us that we are operating on a "slice" and not the real data frame. From this post, df.update()
may be useful.
Solution:
replacements = {22000:'twentytwo', 23000:'twentythree'}
df.update(df.loc[df["Courses"]=="Spark", "Fee"].replace(replacements))
Finally, since none of the values to be replaced are in "Courses", we can safely remove the column selector "Fee".
...df.loc[df["Courses"]=="Spark"]...
I’d like to replace multiple values of a column but only for selected rows (that is a subset of the data) defined by a condition on a different column in python. For example for the table below I’d like to replace 22000 with twentytwo and 23000 with twentythree but only if Courses is equal to Spark as shown below.
Here is code for the initial dataframe.
df = pd.DataFrame({
'Courses': ["Spark", "PySpark", "Spark", "Spark", "PySpark", "PySpark"],
'Fee': [22000, 25000, 23000, 24000, 22000, 23000],
})
Courses | Fee | |
---|---|---|
0 | Spark | 22000 |
1 | PySpark | 25000 |
2 | Spark | 23000 |
3 | Spark | 24000 |
4 | PySpark | 22000 |
5 | PySpark | 23000 |
Below is the desired outcome.
Courses | Fee | |
---|---|---|
0 | Spark | twentytwo |
1 | PySpark | 25000 |
2 | Spark | twentythree |
3 | Spark | 24000 |
4 | PySpark | 22000 |
5 | PySpark | 23000 |
I have tried using df["Fee"].replace({22000:'twentytwo', 23000:'twentythree'}, inplace=True)
but this results in replacement of all the specified values in the column and not just where Courses is equal to Spark.
First, we need a more stringent selector in the form of df[x, y]
where x
selects rows (where Courses equals Spark) and y
selects columns ("Fee"):
df[df["Courses"]=="Spark", "Fee"]
We can then try using replace()
, but pandas will tell us that we are operating on a "slice" and not the real data frame. From this post, df.update()
may be useful.
Solution:
replacements = {22000:'twentytwo', 23000:'twentythree'}
df.update(df.loc[df["Courses"]=="Spark", "Fee"].replace(replacements))
Finally, since none of the values to be replaced are in "Courses", we can safely remove the column selector "Fee".
...df.loc[df["Courses"]=="Spark"]...