Xpath returns empty array – lxml

Question:

I’m trying to write a program that scrapes https://www.tcgplayer.com/ to get a list of Pokemon TCG prices based on a specified list

from lxml import etree, html
import requests
import string

def clean_text(element):
    all_text = element.text_content()
    cleaned = ' '.join(all_text.split())
    return cleaned


page = requests.get("http://www.tcgplayer.com/product/231462/pokemon-first-partner-pack-pikachu?xid=pi731833d1-f2cc-4043-9551-4ca08506b43a&page=1&Language=English")

tree = html.fromstring(page.content)

price = tree.xpath("/html/body/div[2]/div/div/section[2]/section/div/div[2]/section[3]/div/section[1]/ul/li[1]/span[2]")

print(price)

However, when I am running this code the output ends up just being an empty list "[]"

I have tried using selenium and the browser function that it has, however I would like it to not need to open a browser for 100+ cards to get the price data. I have tested this code on another website url and xpath (https://www.pricecharting.com/game/pokemon-promo/jolteon-v-swsh183, /html/body/div[1]/div[2]/div/div/table/tbody[1]/tr[1]/td[4]/span[1]) – so I wonder if it is just how https://www.tcgplayer.com/ is built.

The expected return value is around $5

Asked By: Mitchell Prior

||

Answers:

Question answered above by @Grismar:

When you test the XPath on a site, you probably do this in the Developer Console in the browser, after the page has loaded. At that point in time, any JavaScript will have already executed and completed and the page may have been updated or even been constructed from scratch by it. When using requests, it just loads the basic page and no scripts get executed – you’ll need something that can execute JavaScript to get the same result, like selenium

BeautifulSoup scraping returns no data

Answered By: Mitchell Prior
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.