Beginner – trying to scrape link and export to excel in Python and BS4

Question:

I have tried to loop some web scraping from a demo site Webscraper.io – it’s a demo site with laptops, where I’m trying to scrape the title of the laptop, the price and the link for the laptops. But I’m finding it very difficult to figure out, how to scrape all the information and exporting it to excel. Particularly how do I add the link to the current information?

Here is what I have done so far:

import requests
from bs4 import BeautifulSoup
from pprint import pprint

url ="https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"

r = requests.get(url)

html = r.text

soup = BeautifulSoup(html)

css_selector = {"class": "col-sm-4 col-lg-4 col-md-4"}

laptops = soup.find_all("div", attrs=css_selector)

for laptop in laptops:
    text = laptop.get_text()
    print(text)

But i still need some way to add the link for the laptops as well… and some way to export to scrapoing to excel. ‘

I have tried to export the current data to excel:

import pandas as pd

df = pd.DataFrame(laptop)

df.to_excel("laptop_.xlsx", encoding="utf-8")

But i’m just getting a excel-file looking like this:

enter image description here

Asked By: The_N00b

||

Answers:

Try printing out the laptop data. You will see that what is outputted is the same information in the Excel:

<div class="col-sm-4 col-lg-4 col-md-4">
<div class="thumbnail">
<img alt="item" class="img-responsive" src="/images/test-sites/e-commerce/items/cart2.png"/>
<div class="caption">
<h4 class="pull-right price">$1799.00</h4>
<h4>
<a class="title" href="/test-sites/e-commerce/allinone/product/544" title="Asus ROG Strix SCAR Edition GL503VM-ED115T">Asus ROG Strix S...</a>
</h4>
<p class="description">Asus ROG Strix SCAR Edition GL503VM-ED115T, 15.6" FHD 120Hz, Core i7-7700HQ, 16GB, 256GB SSD + 1TB SSHD, GeForce GTX 1060 6GB, Windows 10 Home</p>
</div>
<div class="ratings">
<p class="pull-right">8 reviews</p>
<p data-rating="3">
<span class="glyphicon glyphicon-star"></span>
<span class="glyphicon glyphicon-star"></span>
<span class="glyphicon glyphicon-star"></span>
</p>
</div>
</div>
</div>

The part you say you want to extract is the link, which is found here:

<a class="title" href="/test-sites/e-commerce/allinone/product/544" title="Asus ROG Strix SCAR Edition GL503VM-ED115T">Asus ROG Strix S...</a>

One way you could get the link is by finding this tag inside of the div tag it’s located in:

for laptop in laptops:
    laptop_link = laptop.find('a') # Find the title link
    text = laptop_link.get_text()
    print(text)

Then, to get the hyperlink itself as opposed to the text inside, you need to get the tag’s href attribute, like this:

for laptop in laptops:
    laptop_link = laptop.find('a') # Find the title link
    text = laptop_link['href'] # Get the link attribute
    print(text)
Answered By: Xiddoc
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.