Get request returns unfriendly response Python

Question:

I am trying to perform a get request on TCG Player via Requests on Python. I checked the sites robots.txt which specifies:

User-agent: *
Crawl-Delay: 10
Allow: /
Sitemap: https://www.tcgplayer.com/sitemap/index.xml

This is my first time seeing a robots.txt file.

My code is as follows:

import requests
url = "http://www.tcgplayer.com"
r = requests.get(url)
print(r.text)

I cannot include r.text in my post because the character limit would be exceeded.

I would have expected to be recieve the HTML content of the webpage, but I got an ‘unfriendly’ response instead. What is the meaning of the text above? Is there a way to get the HTML so I can scrape the site?

By ‘unfriendly’ I mean:

The HTML that is returned does not match the HTML that is produced by typing the URL into my web browser.

Asked By: karafar

||

Answers:

This is probably due to some server-side rendering of web content, as indicated by the empty <div id="app"></div> block in the scraped result. To properly handle such content, you will need to use a more advanced web scraping tool, like Selenium. I’d recommend this tutorial to get started.

Answered By: Krishnan Shankar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.