Python web-scraping error – TypeError: can't use a string pattern on a bytes-like object

Question:

I want to build a web scraper. Currently, I’m learning Python. This is the very basics!

Python Code

import urllib.request
import re

htmlfile = urllib.request.urlopen("http://basketball.realgm.com/")

htmltext = htmlfile.read()
title = re.findall('<title>(.*)</title>', htmltext)

print (htmltext)

Error:

  File "C:Python33libre.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
Asked By: Jtwa

||

Answers:

Use bytes literal as pattern:

title = re.findall(b'<title>(.*)</title>', htmltext)

or decode the retrieved data to string:

title = re.findall('<title>(.*)</title>', htmltext.decode('utf-8'))

(change utf-8 with appropriate encoding of the document)

Answered By: falsetru

You have to decode your data. Since the website in question says

charset=iso-8859-1

use that. utf-8 won’t work in this case.

htmltext = htmlfile.read().decode('iso-8859-1')
Answered By: timgeb
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.