Pandas – replace/remove everything after a specified string

Question:

My data is formatted like this:

product_name
HP Ryzen 5 Hexa Core 5500U – (16 GB/512 GB SSD/Windows 11 Home) 15s- eq2182AU Thin and Light Laptop
DELL Inspiron Athlon Dual Core 3050U – (8 GB/256 GB SSD/Windows 11 Home) Inspiron 3525 Notebook

These names are too long, and I would like to shorten them. A common theme with all rows of my data is that all the text before the first occurrence of - ( is what I want to preserve for the product name.

How do I remove all the text that comes after - (, including - ( itself?

Asked By: KLG

||

Answers:

Try this:
string.split(" - (")[0]

Assuming it’s a pandas dataframe df, something like would use regex to perform the replacement for all items under 'product_name'.

df['product_name'] = df['product_name'].str.replace('- (', '', regex=True)

Answered By: neveratdennys

pandas’s applymap should do it:

import pandas as pd

def shorten(s):
    return s.split(' - (')[0]
    
df = pd.DataFrame(['abc - (123)', 'def - (456)'])
print(df)
df = df.applymap(shorten)
print(df)

Output:

             0
0  abc - (123)
1  def - (456)
     0
0  abc
1  def


If you want to only modify a specific column, e.g. "product_name", use apply instead:

import pandas as pd

def shorten(s):
    return s.split(' - (')[0]
    
df = pd.DataFrame([['abc - (123)'], ['def - (456)']], columns = ['product_name'])
print(df)
df['product_name'] = df['product_name'].apply(shorten)
print(df)
Answered By: Tanner Firl
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.