HTML Scraping with Python, document_fromstring is empty

Question:

I am trying to extract some data from a website using python. I found a (document that exactly fits to my problem.

But when I run the provided code

from lxml import html
import requests


page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)

#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')


print 'Buyers: ', buyers
print 'Prices: ', prices

I get an error:

  File "C:Python27libsite-packageslxmlhtml__init__.py", line 617, in document_fromstring
    "Document is empty")

ParserError: Document is empty

Anyone an idea what the problem could be?

Asked By: WitheShadow

||

Answers:

Your script works fine for me. I’m getting output:

Buyers:  ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
Prices:  ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25', '$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11', '$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68', '$15.00', '$114.07', '$10.09']

I recommend you to try latest lxml package. And check that desired webpage is available for you at that time.

Answered By: Alderven
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.