Regex string parsing: pattern starts with ; but can end with [;,)%&@]


I am attempting to parse strings using Regex. The strings look like:


I want to parse it to:


So when we see a ; remove everything up until we see one of the following characters: [;,)%&@] (or replace with empty space "").

I am using re package in Python:

string = re.sub('^[^-].*[)/]$', '', string)

This is what I have right now:


Which as I understand it says: starting at the pattern with ;, read everything that matches in between ; and [;,)%&@] characters

But the result is wrong and looks like:


Demo here.

What am I missing?

EDIT: @InSync pointed out that there is a discrepancy if ; is in the end characters as well. As worded above, it should result inStack&verflow%s**;**best! but instead I want to see Stack&verflow%sbest!. Perhaps two regex lines are appropriate here, I am not sure; if you can get to Stack&verflow%s**;**best! then the rest is just simple replacement of all the remaining ;.

EDIT2: The code I found that works was

import re

def remove_semicolons(name):
    name = re.sub(';.*?(?=[;,)%&@])', '', name)
    name = re.sub(';','',name)
    return name


Or if you feel like causing a headache to the next programmer who looks at your code:

import re

semicolon_string = 'Stack;O&verflow;i%s;the;best!'

cleaned_string = re.sub(';','',re.sub(';.*?(?=[;,)%&@])', '', semicolon_string))
Asked By: Pleasant Gopher



Alright in my answer I assume you have a typo in your expected output. Remove everything starting with ; up to (;,)%&@) and so

Stack ;O &verflow ;i %s ;the ;best!

would become


for the regex you want to start with ; then anything after 0 or more times .* (if you require a character change to .+) followed by your ending characters [;,)%&@]. To exclude them you need to add a positive lookahead ?(?=[;,)%&@]). This as the name suggests looks ahead one character and tries to match it to your sequence

For a final regex:


or if you require characters in between:

Answered By: OmO Walker