HTML Scraping with Python, document_fromstring is empty
Question:
I am trying to extract some data from a website using python. I found a (document that exactly fits to my problem.
But when I run the provided code
from lxml import html
import requests
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')
print 'Buyers: ', buyers
print 'Prices: ', prices
I get an error:
File "C:Python27libsite-packageslxmlhtml__init__.py", line 617, in document_fromstring
"Document is empty")
ParserError: Document is empty
Anyone an idea what the problem could be?
Answers:
Your script works fine for me. I’m getting output:
Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25', '$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11', '$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68', '$15.00', '$114.07', '$10.09']
I recommend you to try latest lxml package. And check that desired webpage is available for you at that time.
I am trying to extract some data from a website using python. I found a (document that exactly fits to my problem.
But when I run the provided code
from lxml import html
import requests
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')
print 'Buyers: ', buyers
print 'Prices: ', prices
I get an error:
File "C:Python27libsite-packageslxmlhtml__init__.py", line 617, in document_fromstring
"Document is empty")
ParserError: Document is empty
Anyone an idea what the problem could be?
Your script works fine for me. I’m getting output:
Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25', '$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11', '$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68', '$15.00', '$114.07', '$10.09']
I recommend you to try latest lxml package. And check that desired webpage is available for you at that time.