Using Pandas to dynamically replace values found in other columns
Question:
I have a dataset looks like this:
Car
Make
Model
Engine
Toyota Rav 4 8cyl6L
Toyota
8cyl6L
Mitsubishi Eclipse 2.1T
Mitsubishi
2.1T
Monster Gravedigger 25Lsc
Monster
25Lsc
The data was clearly concatenated from Make + Model + Engine at some point but the car Model was not provided to me.
I’ve been trying to use Pandas to say that if we take Car, replace instances of Make with a nothing, replace instances of Engine with nothing, then trim the spaces around the result, we will have Model.
Car
Make
Model
Engine
Toyota Rav 4 8cyl6L
Toyota
Rav 4
8cyl6L
Mitsubishi Eclipse 2.1T
Mitsubishi
Eclipse
2.1T
Monster Gravedigger 25Lsc
Monster
Gravedigger
25Lsc
There’s something I’m doing wrong when I’m trying to reference another column in this manner.
df['Model'] = df['Car'].str.replace(df['Make'],'')
gives me an error of "unhashable type: ‘Series’". I’m guessing I’m accidentally inputting the entire ‘Make’ column.
At every row I want to make a different substitution using data from other columns in that row. How would I accomplish this?
Answers:
you can use:
df['Model']=df.apply(lambda x: x['Car'].replace(x['Make'],"").replace(x['Engine'],""),axis=1)
print(df)
'''
Car Make Model Engine
0 Toyota Rav 4 8cyl6L Toyota Rav 4 8cyl6L
1 Mitsubishi Eclipse 2.1T Mitsubishi Eclipse 2.1T
2 Monster Gravedigger 25Lsc Monster Gravedigger 25Lsc
'''
A regex
proposition using re.sub
:
import re
df['Model'] = [re.sub(f'{b}|{c}', '', a) for a,b,c in zip(df['Car'], df['Make'], df["Engine"])]
# Output :
print(df)
Car Make Model Engine
0 Toyota Rav 4 8cyl6L Toyota Rav 4 8cyl6L
1 Mitsubishi Eclipse 2.1T Mitsubishi Eclipse 2.1T
2 Monster Gravedigger 25Lsc Monster Gravedigger 25Lsc
I have a dataset looks like this:
Car | Make | Model | Engine |
---|---|---|---|
Toyota Rav 4 8cyl6L | Toyota | 8cyl6L | |
Mitsubishi Eclipse 2.1T | Mitsubishi | 2.1T | |
Monster Gravedigger 25Lsc | Monster | 25Lsc |
The data was clearly concatenated from Make + Model + Engine at some point but the car Model was not provided to me.
I’ve been trying to use Pandas to say that if we take Car, replace instances of Make with a nothing, replace instances of Engine with nothing, then trim the spaces around the result, we will have Model.
Car | Make | Model | Engine |
---|---|---|---|
Toyota Rav 4 8cyl6L | Toyota | Rav 4 | 8cyl6L |
Mitsubishi Eclipse 2.1T | Mitsubishi | Eclipse | 2.1T |
Monster Gravedigger 25Lsc | Monster | Gravedigger | 25Lsc |
There’s something I’m doing wrong when I’m trying to reference another column in this manner.
df['Model'] = df['Car'].str.replace(df['Make'],'')
gives me an error of "unhashable type: ‘Series’". I’m guessing I’m accidentally inputting the entire ‘Make’ column.
At every row I want to make a different substitution using data from other columns in that row. How would I accomplish this?
you can use:
df['Model']=df.apply(lambda x: x['Car'].replace(x['Make'],"").replace(x['Engine'],""),axis=1)
print(df)
'''
Car Make Model Engine
0 Toyota Rav 4 8cyl6L Toyota Rav 4 8cyl6L
1 Mitsubishi Eclipse 2.1T Mitsubishi Eclipse 2.1T
2 Monster Gravedigger 25Lsc Monster Gravedigger 25Lsc
'''
A regex
proposition using re.sub
:
import re
df['Model'] = [re.sub(f'{b}|{c}', '', a) for a,b,c in zip(df['Car'], df['Make'], df["Engine"])]
# Output :
print(df)
Car Make Model Engine
0 Toyota Rav 4 8cyl6L Toyota Rav 4 8cyl6L
1 Mitsubishi Eclipse 2.1T Mitsubishi Eclipse 2.1T
2 Monster Gravedigger 25Lsc Monster Gravedigger 25Lsc