How to match a string that doesn't start with <del> but ends with ######## with regex

Question:

In each row of df[‘Description’], there is a user field that has 8 digit numbers that I need to grab. But I do not want to grab the ones with <del’> in front of it. The numbers that should be retrieved are 11111113 and 11111114.
The data looks something like this (without the single quotation):

<del'>11111111 Random text here </del'><br>
<br'><del'>11111112 Random text here </del'></br'><br>
<p'>11111113 Random text here </p'><br>
<br'>11111114 Random text here </br'>

I have tried variations of this:

df['SN_Fixed_List']=[re.findall(r'b(?!<del>)s*[0-9]{8}b',x) for x in df['Description']]
Asked By: JarvisButler290

||

Answers:

You can use

df['SN_Fixed_List'] = df['Description'].str.extract(r'^(?!.*<del'>).*b(d{8})b', expand=False)

See the regex demo.

Details:

  • ^ – start of string
  • (?!.*<del'>) – no <del'> allowed in the string
  • .* – any zero or more chars other than line break chars as many as possible
  • b(d{8})b – eight digits as whole word (captured into Group 1 the value of which is output with Series.str.extract).
Answered By: Wiktor Stribiżew
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.