Pandas – replace/remove everything after a specified string
Question:
My data is formatted like this:
product_name
HP Ryzen 5 Hexa Core 5500U – (16 GB/512 GB SSD/Windows 11 Home) 15s- eq2182AU Thin and Light Laptop
DELL Inspiron Athlon Dual Core 3050U – (8 GB/256 GB SSD/Windows 11 Home) Inspiron 3525 Notebook
These names are too long, and I would like to shorten them. A common theme with all rows of my data is that all the text before the first occurrence of - (
is what I want to preserve for the product name.
How do I remove all the text that comes after - (
, including - (
itself?
Answers:
Try this:
string.split(" - (")[0]
Assuming it’s a pandas dataframe df
, something like would use regex to perform the replacement for all items under 'product_name'
.
df['product_name'] = df['product_name'].str.replace('- (', '', regex=True)
pandas’s applymap should do it:
import pandas as pd
def shorten(s):
return s.split(' - (')[0]
df = pd.DataFrame(['abc - (123)', 'def - (456)'])
print(df)
df = df.applymap(shorten)
print(df)
Output:
0
0 abc - (123)
1 def - (456)
0
0 abc
1 def
If you want to only modify a specific column, e.g. "product_name", use apply instead:
import pandas as pd
def shorten(s):
return s.split(' - (')[0]
df = pd.DataFrame([['abc - (123)'], ['def - (456)']], columns = ['product_name'])
print(df)
df['product_name'] = df['product_name'].apply(shorten)
print(df)
My data is formatted like this:
product_name |
---|
HP Ryzen 5 Hexa Core 5500U – (16 GB/512 GB SSD/Windows 11 Home) 15s- eq2182AU Thin and Light Laptop |
DELL Inspiron Athlon Dual Core 3050U – (8 GB/256 GB SSD/Windows 11 Home) Inspiron 3525 Notebook |
These names are too long, and I would like to shorten them. A common theme with all rows of my data is that all the text before the first occurrence of - (
is what I want to preserve for the product name.
How do I remove all the text that comes after - (
, including - (
itself?
Try this:
string.split(" - (")[0]
Assuming it’s a pandas dataframe df
, something like would use regex to perform the replacement for all items under 'product_name'
.
df['product_name'] = df['product_name'].str.replace('- (', '', regex=True)
pandas’s applymap should do it:
import pandas as pd
def shorten(s):
return s.split(' - (')[0]
df = pd.DataFrame(['abc - (123)', 'def - (456)'])
print(df)
df = df.applymap(shorten)
print(df)
Output:
0
0 abc - (123)
1 def - (456)
0
0 abc
1 def
If you want to only modify a specific column, e.g. "product_name", use apply instead:
import pandas as pd
def shorten(s):
return s.split(' - (')[0]
df = pd.DataFrame([['abc - (123)'], ['def - (456)']], columns = ['product_name'])
print(df)
df['product_name'] = df['product_name'].apply(shorten)
print(df)