pandas – how can I remove some character after find specific character

Question

I have a data frame like this.

document_group
A12J3/381
A02J3/40
B12P4/2536
C10P234/3569

and I would like to get like this

 document_group
    A12J3/38
    A02J3/40
    B12P4/25
    C10P234/35

I have tried to adapt a function for single string like this

def remove_str_start(s, start):
    return s[:start] + s[start]

and work with this sample

s='H02J3/381'
s.find('/')
remove_str_start(s,s.find('/')+2)

it returns ‘H02J3/38’, what I want to do while s is the input data frame and start is cutting the char start from the position char.

but when I tried with data frame

remove_str_start(df['document_group'],df['document_group'].str.find('/')+2)

the result returns an error

could everyone help me with this kind of situation?

Asked By: Hook Im

||

Answer 1

We can use str.replace here:

df["document_group"] = df["document_group"].str.replace(r'/(d{2})d+$', r'1', regex=True)

Here is a Python regex demo showing that the replacement logic is working.

Answer 2

You can also str.split remove the unwanted parts and put together:

s = df.document_group.str.split('/')
df['document_group'] = s.str[0] + "/" + s.str[1].str[:2]

prints:

  document_group
0       A12J3/38
1       A02J3/40
2       B12P4/25
3     C10P234/35

Answered By: sophocles

Answer 3

You are trying too hard, just:

Create the column you want: for each value, the same value till the character where you find "/" plus 3 (because you want the / and the next 2)

df['new_column'] = [e[:e.find('/') + 3] for e in filt['your_initial_column']]

Regards,

Question: