Web Scraping using python(Beautifulsoup)

Question:

I am just started learning web scraping using python Beautifulsoup and requests library and using Pycharm tool.

import requests
from bs4 import BeautifulSoup
    
result1 = requests.get("https://www.grainger.com/")
print('result1 is '+ str(result1.status_code))

While I am using this website its keeps on loading and if I use google.com it’s giving output.

I wonder why I didn’t get output for the above website?

Answers:

Hmm… there are a couple of things.

  1. The website might not exist
  2. You’re using http instead of https
  3. That site blocks scraping (send a user agent header)
  4. It might be a problem with requests. Try using a different library.
Answered By: wyatt-stanke

To get status 200 from this site, specify User-Agent HTTP header:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'}

result1 = requests.get("https://www.grainger.com/", headers=headers)

print('result1 is '+ str(result1.status_code))

Prints:

result1 is 200

The reason why this is works is because some sites will ignore requests that don’t appear to be made from a web browser. By default, requests uses the User-Agent python-requests, so the website can tell you are not requesting the website from a web browser. The reason why your request hangs and eventually times out is likely because their server is ignoring your request.

Answered By: Andrej Kesely