Regex: AttributeError: 'NoneType' object has no attribute 'groups'

Question:

I have a string which I want to extract a subset of. This is part of a larger Python script.

This is the string:

import re

htmlString = '</dd><dt> Fine, thank you.&#160;</dt><dd> Molt bé, gràcies. (<i>mohl behh, GRAH-syuhs</i>)'

Which I want to pull-out “Molt bé, gràcies. mohl behh, GRAH-syuhs“. And for that I use regular expression using re.search:

SearchStr = '(</dd><dt>)+ ([w+,.s]+)([&#d;]+)(</dt><dd>)+ ([w,swsw?!.]+) ((<i>)([ws,-]+)(</i>))'

Result = re.search(SearchStr, htmlString)

print Result.groups()
AttributeError: 'NoneType' object has no attribute 'groups'

Since Result.groups() doesn’t work, neither do the extractions I want to make (i.e. Result.group(5) and Result.group(7)).
But I don’t understand why I get this error? The regular expression works in TextWrangler, why not in Python? Im a beginner in Python.

Asked By: jO.

||

Answers:

You are getting AttributeError because you’re calling groups on None, which hasn’t any methods.

regex.search returning None means the regex couldn’t find anything matching the pattern from supplied string.

when using regex, it is nice to check whether a match has been made:

Result = re.search(SearchStr, htmlString)

if Result:
    print Result.groups()
Answered By: thkang
import re

htmlString = '</dd><dt> Fine, thank you.&#160;</dt><dd> Molt bé, gràcies. (<i>mohl behh, GRAH-syuhs</i>)'

SearchStr = '(</dd><dt>)+ ([w+,.s]+)([&#d;]+)(</dt><dd>)+ ([w,swsw?!.]+) ((<i>)([ws,-]+)(</i>))'

Result = re.search(SearchStr.decode('utf-8'), htmlString.decode('utf-8'), re.I | re.U)

print Result.groups()

Works that way. The expression contains non-latin characters, so it usually fails. You’ve got to decode into Unicode and use re.U (Unicode) flag.

I’m a beginner too and I faced that issue a couple of times myself.

Answered By: antonavy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.