Scraping <span> "text" </span> with BeautifulSoup

Question

I am working on scraping the data from a website using BeautifulSoup. For whatever reason, I cannot seem to find a way to get the text between span elements to print. Here is what I am running.


import requests
from bs4 import BeautifulSoup
import pandas as pd

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
url = 'https://www.amazon.com/GymCope-Anti-Tear-Cushioning-Non-Slip-Exercise/dp/B0921F1T2P/ref=sr_1_3_sspa?brr=1&pd_rd_r=4b40f0a8-f2d8-44dc-9a98-413c64d3fa34&pd_rd_w=P9ZJI&pd_rd_wg=RS7zW&pf_rd_p=9875e817-188b-48a2-986d-8146749644ac&pf_rd_r=AGWBT5KT04TYKGPZASKA&qid=1642452438&rd=1&rnid=3407731&s=sporting-goods&sr=1-3-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExVjhWTk0xQU5WWldPJmVuY3J5cHRlZElkPUEwODE0MzYwMTdMTDZSNDVST08yMiZlbmNyeXB0ZWRBZElkPUEwODQ4MDM0MlE4WEtVUjFKMUdLMiZ3aWRnZXROYW1lPXNwX2F0Zl9icm93c2UmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl'
response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html)
bsr = soup.find("div", class_="a-section table-padding").text

and seeing this,

>>> bsr
'    ASIN   B0921F1T2P    Customer Reviews  nn n  4.6 out of 5 stars    n    41 ratings   nnn 4.6 out of 5 stars     Best Sellers Rank    #69,660 in Sports & Outdoors (See Top 100 in Sports & Outdoors)  #234 in Yoga Mats       Date First Available   April 8, 2021    '

I tried


bsra = soup.find("div", class_="a-section table-padding").find_next('span').get_text()

but it comes out


> > > bsr
> > > '\n  4.6 out of 5 stars    '

I want only to scrape "Best Sellers Rank" as in the picture. Thanks.

Asked By: Bünyamin Erkaya

||

Source

Answer 1

Referenced picture in your question is missing, but you can get rank by selecting your elements more specific:

soup.select_one('th:-soup-contains("Best Sellers Rank") + td').text.split()[0]

Example

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
url = 'https://www.amazon.com/GymCope-Anti-Tear-Cushioning-Non-Slip-Exercise/dp/B0921F1T2P/ref=sr_1_3_sspa?brr=1&pd_rd_r=4b40f0a8-f2d8-44dc-9a98-413c64d3fa34&pd_rd_w=P9ZJI&pd_rd_wg=RS7zW&pf_rd_p=9875e817-188b-48a2-986d-8146749644ac&pf_rd_r=AGWBT5KT04TYKGPZASKA&qid=1642452438&rd=1&rnid=3407731&s=sporting-goods&sr=1-3-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExVjhWTk0xQU5WWldPJmVuY3J5cHRlZElkPUEwODE0MzYwMTdMTDZSNDVST08yMiZlbmNyeXB0ZWRBZElkPUEwODQ4MDM0MlE4WEtVUjFKMUdLMiZ3aWRnZXROYW1lPXNwX2F0Zl9icm93c2UmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl'
response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html)

soup.select_one('th:-soup-contains("Best Sellers Rank") + td').text.split()[0]

Output

#84,712

Answered By: HedgeHog

Scraping <span> "text" </span> with BeautifulSoup

Question:

Answers:

Example

Output