Using Pandas to dynamically replace values found in other columns

Question:

I have a dataset looks like this:

Car Make Model Engine
Toyota Rav 4 8cyl6L Toyota 8cyl6L
Mitsubishi Eclipse 2.1T Mitsubishi 2.1T
Monster Gravedigger 25Lsc Monster 25Lsc

The data was clearly concatenated from Make + Model + Engine at some point but the car Model was not provided to me.

I’ve been trying to use Pandas to say that if we take Car, replace instances of Make with a nothing, replace instances of Engine with nothing, then trim the spaces around the result, we will have Model.

Car Make Model Engine
Toyota Rav 4 8cyl6L Toyota Rav 4 8cyl6L
Mitsubishi Eclipse 2.1T Mitsubishi Eclipse 2.1T
Monster Gravedigger 25Lsc Monster Gravedigger 25Lsc

There’s something I’m doing wrong when I’m trying to reference another column in this manner.

df['Model'] = df['Car'].str.replace(df['Make'],'')

gives me an error of "unhashable type: ‘Series’". I’m guessing I’m accidentally inputting the entire ‘Make’ column.

At every row I want to make a different substitution using data from other columns in that row. How would I accomplish this?

Asked By: bobh2os

||

Answers:

you can use:

df['Model']=df.apply(lambda x: x['Car'].replace(x['Make'],"").replace(x['Engine'],""),axis=1)
print(df)
'''
                         Car        Make          Model  Engine
0        Toyota Rav 4 8cyl6L      Toyota         Rav 4   8cyl6L
1    Mitsubishi Eclipse 2.1T  Mitsubishi       Eclipse     2.1T
2  Monster Gravedigger 25Lsc     Monster   Gravedigger    25Lsc
'''
Answered By: Clegane

A regex proposition using re.sub :

import re

df['Model'] = [re.sub(f'{b}|{c}', '', a) for a,b,c in zip(df['Car'], df['Make'], df["Engine"])]

# Output :

print(df)

                         Car        Make          Model  Engine
0        Toyota Rav 4 8cyl6L      Toyota         Rav 4   8cyl6L
1    Mitsubishi Eclipse 2.1T  Mitsubishi       Eclipse     2.1T
2  Monster Gravedigger 25Lsc     Monster   Gravedigger    25Lsc
Answered By: abokey
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.