Find character selection after specified delimiter

Question:

I’m trying to (with a regular expression) find EVERYTHING specificed within my chararacer-set, after a delimiter (which is a colon).

Example:

Test3131:PythonBoolJava!Python
Overflow:PythonBoolFAKE!Python@021!
Overflo!w2:PythonBoolUnix-Python;?
Over3_flow:PythonBoolUnix^Python%

Desired output:

Test3131:PythonBoolJavaPython
Overflow:PythonBoolFAKEPython021
Overflo!w2:PythonBoolUnixPython
Over3_flow:PythonBoolUnixPython

So –
Ignore all data before and including the delimiter :
Search for all characters regardless of line position using the regex [$&+,:;=?@#|'<>.^*()%!-]

Upon when matched, I would choose to mark in my dataset manually.

What I have tried:
However, this was to no avail.

.*:.*[$&+,:;=?@#|'<>.^*()%!-]

However

Asked By: StackStackAndStack

||

Answers:

If available, for example Python’s PyPi’s regex module, maybe:

(?::|G(?!^)).*?K[!#-.:-@^|]+

See an online demo. Notice how I condensed your character list down using the ascii-table to [!#-.:-@^|]. It still would capture all characters you have given.

  • (?: – Open non-capture group;
    • : – Capture the first colon;
    • | – Or;
    • G(?!^) – Asssert position at end of previous match but exclude start-line;
    • ) – Close non-capture group;
  • .*?K – 0+ (Lazy) characters upto we reset starting point of reported match;
  • [!#-.:-@^|]+ – Any 1+ of given characters.

Another option, if available through JavaScript or Python’s PyPi regex module, for example, is a zero-width lookbehind:

(?<=^[^:]*:.*?)[!#-.:-@^|]+

See an online demo

  • (?<=^[^:]*:.*?) – Positive lookbehind to check if there is a colon after start-line anchor and 0+ non-colon characters and any 0+ (lazy) characters right after that;
  • [!#-.:-@^|]+ – Any 1+ of given characters.

Code sample for Python:

import regex as re

l_in = ["Test3131:PythonBoolJavaPython", "Overflow:PythonBoolFAKEPython021", "Overflo!w2:PythonBoolUnixPython", "Over3_flow:PythonBoolUnixPython"]
l_out1 = [re.sub(r"(?::|G(?!^)).*?K[!#-.:-@^|]+", '', el) for el in l_in]
l_out2 = [re.sub(r"(?<=^[^:]*:.*?)[!#-.:-@^|]+", '', el) for el in l_in]

print(l_out1, l_out2)

Prints:

['Test3131:PythonBoolJavaPython',
 'Overflow:PythonBoolFAKEPython021',
 'Overflo!w2:PythonBoolUnixPython',
 'Over3_flow:PythonBoolUnixPython']
['Test3131:PythonBoolJavaPython',
 'Overflow:PythonBoolFAKEPython021',
 'Overflo!w2:PythonBoolUnixPython',
 'Over3_flow:PythonBoolUnixPython']
Answered By: JvdV
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.