pandas/regex: Remove the string after the hyphen or parenthesis character (including) carry string after the comma in pandas dataframe

Question:

I have a dataframe contains one column which has multiple strings separated by the comma, but in this string, I want to remove all matter after hyphen (including hyphen), main point is after in some cases hyphen is not there but directed parenthesis is there so I also want to remove that as well and carry all the after the comma how can I do it? You can see this case in last row.

dd = pd.DataFrame()
dd['sin'] = ['U147(BCM), U35(BCM)','P01-00(ECM), P02-00(ECM)', 'P3-00(ECM), P032-00(ECM)','P034-00(ECM)', 'P23F5(PCM), P04-00(ECM)']

Expected output

dd['sin']
# output 
U147 U35
P01 P02
P3 P032
P034
P23F5 P04

Want to carry only string before the hyphen or parenthesis or any special character.

Asked By: Sushil Kokil

||

Answers:

The following code seems to reproduce your desired result:

dd['sin'] = dd['sin'].str.split(", ")
dd = dd.explode('sin').reset_index()
dd['sin'] = dd['sin'].str.replace('W.*', '', regex=True)

Which gives dd['sin'] as:

0     U147
1      U35
2      P01
3      P02
4       P3
5     P032
6     P034
7    P23F5
8      P04
Name: sin, dtype: object

The call of .reset_index() in the second line is optional depending on whether you want to preserve which row that piece of the string came from.

Answered By: Frodnar

You can use the following regex:

r"-d{2}|([EBP]CM)|s"

Here is the code:

sin = ['U147(BCM), U35(BCM)','P01-00(ECM), P02-00(ECM)', 'P3-00(ECM), P032-00(ECM)','P034-00(ECM)', 'P23F5(PCM), P04-00(ECM)']

dd = pd.DataFrame()
dd['sin'] = sin
dd['sin'] = dd['sin'].str.replace(r'-d{2}|([EBP]CM)|s', '', regex=True)
print(dd)

OUTPUT:

         sin
0   U147,U35
1    P01,P02
2    P3,P032
3       P034
4  P23F5,P04

EDIT

Or use this line to remove the comma:

dd['sin'] = dd['sin'].str.replace(r'-d{2}|([EBP]CM)|s', '', regex=True).str.replace(',',' ')

OUTPUT:

         sin
0   U147 U35
1    P01 P02
2    P3 P032
3       P034
4  P23F5 P04
Answered By: ScottC
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.