how to get specific string of pandas column value?

Question:

what I want to do is delete certain parts of a string and take the rest and insert it into a new column.

Example:

df = pd.read_excel("sdAll.xlsx")
print(df)

output =

0      asin="ASF23KJSA"
1      asin="SAFSAF3324S"
2      asin="ASFAS213434"
3      asin="1SF23AF2342S"
4      asin="ASF23KJSA"
             ...
424    asin="ASF23KJSA"
425    asin="1SF23AF2342S"
426    asin="ASF23KJSA"
427    asin="BSAFSAF3324S"
428    asin="B095437HDM"

I want to delete the asin="" part and insert the remaining part into another column.

df.head()

 Timeframe Ad Type Start Date   End Date                           Portfolio name Currency  ...    Spend 14 Day Total Sales Total Advertising Cost of Sales (ACOS)  Total Return on Advertising Spend (ROAS)  14 Day Total Orders (#)  14 Day Total Units (#)
0      L30D      SD 2022-11-08 2022-11-08                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
1      L30D      SD 2022-11-11 2022-12-03                                        -      USD  ...  0.00530                  0                                    NaN                                       0.0                        0                       0
2      L30D      SD 2022-11-09 2022-11-22                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
3      L30D      SD 2022-11-25 2022-12-04                                        -      USD  ...  0.09434                  0                                    NaN                                       0.0                        0                       0
4      L30D      SD 2022-11-09 2022-11-23                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
Asked By: Shamna Sama

||

Answers:

Why dont you try this

df.insert_your_col_name.str.split('=').str[-1].str.replace('"', '').str.strip()

This will return your wanted string series, usually I also like to do a strip after for good measure.

You can also try str extract, with the following capture group

df.your_col.str.extract(r'"(.*)"')
Answered By: INGl0R1AM0R1

You can use str.replace and regex with capturing group.

import pandas as pd
df = pd.DataFrame({'old_column' : ['asin="ASF23KJSA"' , 'asin="SAFSAF3324S"', 'asin="ASFAS213434"' , 'asin="1SF23AF2342S"' , 'asin="ASF23KJSA"']})
df['new_column'] = df['old_column'].str.replace(r'asin="(.*)"', r'1', regex=True)
print(df)

Output:

            old_column    new_column
0     asin="ASF23KJSA"     ASF23KJSA
1   asin="SAFSAF3324S"   SAFSAF3324S
2   asin="ASFAS213434"   ASFAS213434
3  asin="1SF23AF2342S"  1SF23AF2342S
4     asin="ASF23KJSA"     ASF23KJSA

Explanation:

  • Capturing group
    (

    .* : means "0 or more of any character"

    ) Close capturing group

Answered By: I'mahdi

You replace the asin= part with an empty string, strip leading/ending whitespaces and write it in a new column.

df["new_column_name"] = df["asin_column_name"].str.replace("asin=", "").str.strip()
Answered By: Gandhi

You can use pandas.Series.str.extract :

df["new_col"] = df["original_col"].str.extract('"([A-Z0-9]+)"', expand=False) #or pat = '"(.+)"'

# Output :

print(df)
            original_col       new_col
0       asin="ASF23KJSA"     ASF23KJSA
1     asin="SAFSAF3324S"   SAFSAF3324S
2     asin="ASFAS213434"   ASFAS213434
3    asin="1SF23AF2342S"  1SF23AF2342S
4       asin="ASF23KJSA"     ASF23KJSA
424     asin="ASF23KJSA"     ASF23KJSA
425  asin="1SF23AF2342S"  1SF23AF2342S
426     asin="ASF23KJSA"     ASF23KJSA
427  asin="BSAFSAF3324S"  BSAFSAF3324S
428    asin="B095437HDM"    B095437HDM
Answered By: abokey
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.