Python Regular Expression Match All 5 Digit Numbers but None Larger

Question:

I’m attempting to string match 5-digit coupon codes spread throughout a HTML web page. For example, 53232, 21032, 40021 etc… I can handle the simpler case of any string of 5 digits with [0-9]{5}, though this also matches 6, 7, 8… n digit numbers. Can someone please suggest how I would modify this regular expression to match only 5 digit numbers?

Asked By: Bryce Thomas

||

Answers:

Without padding the string for special case start and end of string, as in John La Rooy answer one can use the negatives lookahead and lookbehind to handle both cases with a single regular expression

>>> import re
>>> s = "88888 999999 3333 aaa 12345 hfsjkq 98765"
>>> re.findall(r"(?<!d)d{5}(?!d)", s)
['88888', '12345', '98765']
Answered By: Xavier Combelle

full string: ^[0-9]{5}$

within a string: [^0-9][0-9]{5}[^0-9]

Answered By: jdonley

A very simple way would be to match all groups of digits, like with r'd+', and then skip every match that isn’t five characters long when you process the results.

Answered By: sth

You probably want to match a non-digit before and after your string of 5 digits, like [^0-9]([0-9]{5})[^0-9]. Then you can capture the inner group (the actual string you want).

Answered By: Bob
>>> import re
>>> s="four digits 1234 five digits 56789 six digits 012345"
>>> re.findall(r"D(d{5})D", s)
['56789']

if they can occur at the very beginning or the very end, it’s easier to pad the string than mess with special cases

>>> re.findall(r"D(d{5})D", " "+s+" ")
Answered By: John La Rooy

You could try

Dd{5}D

or maybe

bd{5}b

I’m not sure how python treats line-endings and whitespace there though.

I believe ^d{5}$ would not work for you, as you likely want to get numbers that are somewhere within other text.

Answered By: Zaki

Note: There is problem in using D since D matches any character that is not a digit , instead use b.
b is important here because it matches the word boundary but only at end or beginning of a word .

import re  

input = "four digits 1234 five digits 56789 six digits 01234,56789,01234"


re.findall(r"bd{5}b", input)  

result : ['56789', '01234', '56789', '01234']

but if one uses
re.findall(r”D(d{5})D”, s)
output : [‘56789’, ‘01234’]
D is unable to handle comma or any continuously entered numerals.

b is important part here it matches the empty string but only at end or beginning of a word .

More documentation: https://docs.python.org/2/library/re.html

More Clarification on usage of D vs b:

This example uses D but it doesn’t capture all the five digits number.

This example uses b while capturing all five digits number.

Cheers

Answered By: igauravsehrawat

I use Regex with easier expression :

re.findall(r"d{5}", mystring)

It will research 5 numerical digits. But you have to be sure not to have another 5 numerical digits in the string

Answered By: Audrey M
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.