How to match a string that doesn't start with <del> but ends with ######## with regex

Question

In each row of df[‘Description’], there is a user field that has 8 digit numbers that I need to grab. But I do not want to grab the ones with <del’> in front of it. The numbers that should be retrieved are 11111113 and 11111114.
The data looks something like this (without the single quotation):

<del'>11111111 Random text here </del'><br>
<br'><del'>11111112 Random text here </del'></br'><br>
<p'>11111113 Random text here </p'><br>
<br'>11111114 Random text here </br'>

I have tried variations of this:

df['SN_Fixed_List']=[re.findall(r'b(?!<del>)s*[0-9]{8}b',x) for x in df['Description']]

Asked By: JarvisButler290

||

Source

Answer 1

You can use

df['SN_Fixed_List'] = df['Description'].str.extract(r'^(?!.*<del'>).*b(d{8})b', expand=False)

See the regex demo.

Details:

^ – start of string
(?!.*<del'>) – no <del'> allowed in the string
.* – any zero or more chars other than line break chars as many as possible
b(d{8})b – eight digits as whole word (captured into Group 1 the value of which is output with Series.str.extract).

Answered By: Wiktor Stribiżew

How to match a string that doesn't start with <del> but ends with ######## with regex

Question:

Answers: