How to match a string that doesn't start with <del> but ends with ######## with regex
Question:
In each row of df[‘Description’], there is a user field that has 8 digit numbers that I need to grab. But I do not want to grab the ones with <del’> in front of it. The numbers that should be retrieved are 11111113 and 11111114.
The data looks something like this (without the single quotation):
<del'>11111111 Random text here </del'><br>
<br'><del'>11111112 Random text here </del'></br'><br>
<p'>11111113 Random text here </p'><br>
<br'>11111114 Random text here </br'>
I have tried variations of this:
df['SN_Fixed_List']=[re.findall(r'b(?!<del>)s*[0-9]{8}b',x) for x in df['Description']]
Answers:
You can use
df['SN_Fixed_List'] = df['Description'].str.extract(r'^(?!.*<del'>).*b(d{8})b', expand=False)
See the regex demo.
Details:
^
– start of string
(?!.*<del'>)
– no <del'>
allowed in the string
.*
– any zero or more chars other than line break chars as many as possible
b(d{8})b
– eight digits as whole word (captured into Group 1 the value of which is output with Series.str.extract
).
In each row of df[‘Description’], there is a user field that has 8 digit numbers that I need to grab. But I do not want to grab the ones with <del’> in front of it. The numbers that should be retrieved are 11111113 and 11111114.
The data looks something like this (without the single quotation):
<del'>11111111 Random text here </del'><br>
<br'><del'>11111112 Random text here </del'></br'><br>
<p'>11111113 Random text here </p'><br>
<br'>11111114 Random text here </br'>
I have tried variations of this:
df['SN_Fixed_List']=[re.findall(r'b(?!<del>)s*[0-9]{8}b',x) for x in df['Description']]
You can use
df['SN_Fixed_List'] = df['Description'].str.extract(r'^(?!.*<del'>).*b(d{8})b', expand=False)
See the regex demo.
Details:
^
– start of string(?!.*<del'>)
– no<del'>
allowed in the string.*
– any zero or more chars other than line break chars as many as possibleb(d{8})b
– eight digits as whole word (captured into Group 1 the value of which is output withSeries.str.extract
).