Scrape websites with python

Question:

I have just started python. I am trying to web scrape a website to fetch the price and title from it. I have gone through multiple tutorial and blog, the most common libraries are beautiful soup and scrapy. My question is that is there any way to scrape a website without using any library?
If there is a way to scrape a website without using any 3rd party library like beautifulsoup and scrapy. It can use builtin libraries
Please suggest me a blog, article or tutorial so that I can learn

Asked By: user13683097

||

Answers:

Instead of using scrapy you can use urllib.

Instead of beautifulsoup you can use regex.

But scrapy and beautifulsoup do your life easier.

Scrapy, not easy library so you can use requests or urllib.

i think the best, popular and easy to learn and use libraries in python web scraping are requests, lxml and BeautifulSoup which has the latest version is bs4 in summary ‘Requests’ lets us make HTML requests to the website’s server for retrieving the data on its page. Getting the HTML content of a web page is the first and foremost step of web scraping.

Let’s take a look at the advantages and disadvantages of the Requests Python library

Advantages:

  • Simple
  • Basic/Digest Authentication
  • International Domains and URLs
  • Chunked Requests
  • HTTP(S) Proxy Support

Disadvantages:

  • Retrieves only static content of a page
  • Can’t be used for parsing HTML
  • Can’t handle websites made purely with JavaScript

We know the requests library cannot parse the HTML retrieved from a web page. Therefore, we require lxml, a high performance, blazingly fast, production-quality HTML, and XML parsing Python library.

Let’s take a look at the advantages and disadvantages of the lxml Python library.

Advantages:

  • Faster than most of the parser out there
  • Light-weight
  • Uses element trees
  • Pythonic API

Disadvantages:

  • Does not work well with poorly designed HTML
  • The official documentation is not very beginner-friendly

BeautifulSoup is perhaps the most widely used Python library for web scraping. It creates a parse tree for parsing HTML and XML documents. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8.

One major advantage of the Beautiful Soup library is that it works very well with poorly designed HTML and has a lot of functions. The combination of Beautiful Soup and Requests is quite common in the industry.

Advantages:

  • Requires a few lines of code
  • Great documentation
  • Easy to learn for beginners
  • Robust
  • Automatic encoding detection

Disadvantages:

  • Slower than lxml

If you want to learn how to scrape web pages using Beautiful Soup, this tutorial is for you:

turtorial

by the way there so many libraries you can try like Scrapy, Selenium Library for Web Scraping, regex and urllib

Answered By: Umutambyi Gad

I think it’s not possible to scrap websites without using any library. You can refer bellow the blog to learn more about web scraping using python. here explain how anyone can scrap websites using python in an easy manner.

https://spurqlabs.com/how-to-do-web-scraping-crawling-using-python-with-selenium/

Answered By: jyotsna jadhav