Xpath returns empty array – lxml
Question:
I’m trying to write a program that scrapes https://www.tcgplayer.com/ to get a list of Pokemon TCG prices based on a specified list
from lxml import etree, html
import requests
import string
def clean_text(element):
all_text = element.text_content()
cleaned = ' '.join(all_text.split())
return cleaned
page = requests.get("http://www.tcgplayer.com/product/231462/pokemon-first-partner-pack-pikachu?xid=pi731833d1-f2cc-4043-9551-4ca08506b43a&page=1&Language=English")
tree = html.fromstring(page.content)
price = tree.xpath("/html/body/div[2]/div/div/section[2]/section/div/div[2]/section[3]/div/section[1]/ul/li[1]/span[2]")
print(price)
However, when I am running this code the output ends up just being an empty list "[]"
I have tried using selenium and the browser function that it has, however I would like it to not need to open a browser for 100+ cards to get the price data. I have tested this code on another website url and xpath (https://www.pricecharting.com/game/pokemon-promo/jolteon-v-swsh183, /html/body/div[1]/div[2]/div/div/table/tbody[1]/tr[1]/td[4]/span[1]) – so I wonder if it is just how https://www.tcgplayer.com/ is built.
The expected return value is around $5
Answers:
Question answered above by @Grismar:
When you test the XPath on a site, you probably do this in the Developer Console in the browser, after the page has loaded. At that point in time, any JavaScript will have already executed and completed and the page may have been updated or even been constructed from scratch by it. When using requests
, it just loads the basic page and no scripts get executed – you’ll need something that can execute JavaScript to get the same result, like selenium
I’m trying to write a program that scrapes https://www.tcgplayer.com/ to get a list of Pokemon TCG prices based on a specified list
from lxml import etree, html
import requests
import string
def clean_text(element):
all_text = element.text_content()
cleaned = ' '.join(all_text.split())
return cleaned
page = requests.get("http://www.tcgplayer.com/product/231462/pokemon-first-partner-pack-pikachu?xid=pi731833d1-f2cc-4043-9551-4ca08506b43a&page=1&Language=English")
tree = html.fromstring(page.content)
price = tree.xpath("/html/body/div[2]/div/div/section[2]/section/div/div[2]/section[3]/div/section[1]/ul/li[1]/span[2]")
print(price)
However, when I am running this code the output ends up just being an empty list "[]"
I have tried using selenium and the browser function that it has, however I would like it to not need to open a browser for 100+ cards to get the price data. I have tested this code on another website url and xpath (https://www.pricecharting.com/game/pokemon-promo/jolteon-v-swsh183, /html/body/div[1]/div[2]/div/div/table/tbody[1]/tr[1]/td[4]/span[1]) – so I wonder if it is just how https://www.tcgplayer.com/ is built.
The expected return value is around $5
Question answered above by @Grismar:
When you test the XPath on a site, you probably do this in the Developer Console in the browser, after the page has loaded. At that point in time, any JavaScript will have already executed and completed and the page may have been updated or even been constructed from scratch by it. When using
requests
, it just loads the basic page and no scripts get executed – you’ll need something that can execute JavaScript to get the same result, likeselenium