python regex lookbehind to remove _sublabel1 in string like "__label__label1_sublabel1"

Question:

i have dataset that prepare for train in fasttext and i wanna remove sublabels from dataset
for example:

__label__label1_sublabel1 __label__label2_sublabel1 __label__label3 __label__label1_sublabel4 sometext some sentce som data.

Any help much appreciated
thanks

im tried this:

r'(?<=__label__[^_]+)w+'

isnt working
exact code:

ptrn = r'(?<=__label__[^_]+)w+'

re.sub(ptrn, '', test_String)

and this error was occured:
error:

error Traceback (most recent call
last)
c:UsersTHoseiniDesktopprojectsensani_classificationtes4t.ipynb
Cell 3 in <cell line: 3>()
1 ptrn = r'(?<=label[^_]+)w+’
—-> 3 re.sub(ptrn, ”, test_String)

File
c:UsersTHoseiniAppDataLocalProgramsPythonPython310libre.py:209,
in sub(pattern, repl, string, count, flags)
202 def sub(pattern, repl, string, count=0, flags=0):
203 """Return the string obtained by replacing the leftmost
204 non-overlapping occurrences of the pattern in string by the
205 replacement repl. repl can be either a string or a callable;
206 if a string, backslash escapes in it are processed. If it is
207 a callable, it’s passed the Match object and must return
208 a replacement string to be used."""
–> 209 return _compile(pattern, flags).sub(repl, string, count)

File
c:UsersTHoseiniAppDataLocalProgramsPythonPython310libre.py:303,
in _compile(pattern, flags)
301 if not sre_compile.isstring(pattern):
302 raise TypeError("first argument must be string or compiled pattern")
–> 303 p = sre_compile.compile(pattern, flags)
304 if not (flags & DEBUG):
305 if len(_cache) >= _MAXCACHE:
306 # Drop the oldest item

File
c:UsersTHoseiniAppDataLocalProgramsPythonPython310libsre_compile.py:792,
in compile(p, flags)
–> 198 raise error("look-behind requires fixed-width pattern")
199 emit(lo) # look behind
200 _compile(code, av[1], flags)

error: look-behind requires fixed-width pattern

Asked By: Taha Hosseiny

||

Answers:

try this regex:

(__label__[^_s]+)w+

and a sample code in python:

import re
test_string = """__label__label1_sublabel1 __label__label2_sublabel1 __label__label3 __label__label1_sublabel4 sometext some sentce som data."""

ptrn = r'(__label__[^_s]+)w+'
re.sub(ptrn, r'1', test_string) 

The re.sub() function stands for a substring and returns a string with replaced values.
[^character_group] means negation: Matches any single character that is not in character_group. and w matches any word character. s matches any white-space character.

and output are like expected:

__label__label1 __label__label2 __label__label __label__label1 sometext some sentce som data.
Answered By: Ria