How do you find all instances of ISBN number using Python Regex

Question:

I would really appreciate some assistance…

I’m trying to retrieve an ISBN number (13 digits) from pages, but the number set in so many different formats and that’s why I can’t retrieve all the different instances:

ISBN-13: 978 1 4310 0862 9
ISBN: 9781431008629
ISBN9781431008629
ISBN 9-78-1431-008-629
ISBN: 9781431008629 more text of the number
isbn : 9781431008629 

My output should be: ISBN: 9781431008629

myISBN = re.findall("ISBN" + r'[wW]{1,17}',text)
myISBN = myISBN[0]
print (myISBN)

I appreciate your time

Asked By: user1835437

||

Answers:

You can use

(?i)ISBN(?:-13)?D*(d(?:W*d){12})

See the regex demo. Then, remove all non-digits from Group 1 value.

Regex details:

  • (?i) – case insensitive modifier, same as re.I
  • ISBN – an ISBN string
  • (?:-13)? – an optional -13 string
  • D* – zero or more non-digits
  • (d(?:W*d){12}) – Group 1: a digit and then twelve occurrences of any zero or more non-word chars and then a digit.

See the Python demo:

import re
texts = ['ISBN-13: 978 1 4310 0862 9',
    'ISBN: 9781431008629',
    'ISBN9781431008629',
    'ISBN 9-78-1431-008-629',
    'ISBN: 9781431008629 more text of the number',
    'isbn : 9781431008629']
rx = re.compile(r'ISBN(?:-13)?D*(d(?:W*d){12})', re.I)
for text in texts:
    m = rx.search(text)
    if m:
        print(text, '=> ISBN:', ''.join([d for d in m.group(1) if d.isdigit()]))

Output:

ISBN-13: 978 1 4310 0862 9 => ISBN: 9781431008629
ISBN: 9781431008629 => ISBN: 9781431008629
ISBN9781431008629 => ISBN: 9781431008629
ISBN 9-78-1431-008-629 => ISBN: 9781431008629
ISBN: 9781431008629 more text of the number => ISBN: 9781431008629
isbn : 9781431008629 => ISBN: 9781431008629

I’d split the problem to two steps. First to extract the potential ISBN and in the second step to check if the ISBN is correct (13 numbers):

import re

text = """
ISBN-13: 978 1 4310 0862 9
ISBN: 9781431008629
ISBN9781431008629
ISBN 9-78-1431-008-629
ISBN: 9781431008629 more text of the number
isbn : 9781431008629"""

pat1 = re.compile(r"(?i)ISBN(?:-13)?s*:?([ d-]+)")
pat2 = re.compile(r"d+")

for m in pat1.findall(text):
    numbers = "".join(pat2.findall(m))
    if len(numbers) == 13:
        print("ISBN:", numbers)

Prints:

ISBN: 9781431008629
ISBN: 9781431008629
ISBN: 9781431008629
ISBN: 9781431008629
ISBN: 9781431008629
ISBN: 9781431008629
Answered By: Andrej Kesely
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.