Replace multiple values in a column for a subset of the dataframe in python

Question

I’d like to replace multiple values of a column but only for selected rows (that is a subset of the data) defined by a condition on a different column in python. For example for the table below I’d like to replace 22000 with twentytwo and 23000 with twentythree but only if Courses is equal to Spark as shown below.

Here is code for the initial dataframe.

df = pd.DataFrame({
   'Courses': ["Spark", "PySpark", "Spark", "Spark", "PySpark", "PySpark"],
   'Fee': [22000, 25000, 23000, 24000, 22000, 23000],
})

	Courses	Fee
0	Spark	22000
1	PySpark	25000
2	Spark	23000
3	Spark	24000
4	PySpark	22000
5	PySpark	23000

Below is the desired outcome.

	Courses	Fee
0	Spark	twentytwo
1	PySpark	25000
2	Spark	twentythree
3	Spark	24000
4	PySpark	22000
5	PySpark	23000

I have tried using df["Fee"].replace({22000:'twentytwo', 23000:'twentythree'}, inplace=True) but this results in replacement of all the specified values in the column and not just where Courses is equal to Spark.

Asked By: Ngan NL

||

Source

Answer 1

First, we need a more stringent selector in the form of df[x, y] where x selects rows (where Courses equals Spark) and y selects columns ("Fee"):

df[df["Courses"]=="Spark", "Fee"]

We can then try using replace(), but pandas will tell us that we are operating on a "slice" and not the real data frame. From this post, df.update() may be useful.

Solution:

replacements = {22000:'twentytwo', 23000:'twentythree'}
df.update(df.loc[df["Courses"]=="Spark", "Fee"].replace(replacements))

Finally, since none of the values to be replaced are in "Courses", we can safely remove the column selector "Fee".

...df.loc[df["Courses"]=="Spark"]...

Answered By: Tony Liu

Replace multiple values in a column for a subset of the dataframe in python

Question:

Answers: