html-parsing

How can I scrape a specific URL from a webpage using BeautifulSoup?

How can I scrape a specific URL from a webpage using BeautifulSoup? Question: I’m writing a Python script that parses HTML (a classifieds website) and sends me email notifications on specific products and price points. Everything works here except for the "listing_url" capture, which I want displayed in the email so I can click on …

Total answers: 1

Parsing multi-line table cells from space-aligned table data

Parsing multi-line table cells from space-aligned table data Question: I have a bit of a messy file generated that just dumps everything into HTML <pre> tags and decides to separate the headers into 2 lines. I am a Python and regex newb and having trouble figuring out a way to merge those 2 lines into …

Total answers: 1

Using BeautifulSoup to parse html, I am getting unwanted prints. Why is that?

Using BeautifulSoup to parse html, I am getting unwanted prints. Why is that? Question: I am using beautiful soup to parse an HTML document on Jupyter Notebook. This is a sample from the file. Please note that this same HTML sample is repeated multiple times. The below table tags are siblings and are surrounded by …

Total answers: 1

How to extract specific part of html using Beautifulsoup?

How to extract specific part of html using Beautifulsoup? Question: I am trying to extract the what’s within the ‘title’ tag from the following html, but so far I didn’t manage to. <div class="pull_right date details" title="22.12.2022 01:49:03 UTC-03:00"> This is my code: from bs4 import BeautifulSoup with open("messages.html") as fp: soup = BeautifulSoup(fp, ‘html.parser’) …

Total answers: 1

How to dynamically find the nearest specific parent of a selected element?

How to dynamically find the nearest specific parent of a selected element? Question: I want to parse many html pages and remove a div that contains the text "Message", using beautifulsoup html.parser and python. The div has no name or id, so pointing to it is not possible. I am able to do this for …

Total answers: 1

Bs4 fail when try to get next url

Bs4 fail when try to get next url Question: There is my code def parser(): flag = True url = ‘https://quotes.toscrape.com’ while flag: responce = requests.get(url) soup = BeautifulSoup(responce.text, ‘html.parser’) quote_l = soup.find_all(‘span’, {‘class’: ‘text’}) q_count = 0 for i in range(len(quote_l)): if q_count >= 5: flag = False break quote = soup.find_all(‘span’, {‘class’: ‘text’})[i] …

Total answers: 1

BeautifulSoup not returning links

BeautifulSoup not returning links Question: For my python bootcamp I am trying to create a log of the articles from this site, and return the highest upvoted. The rest of the code works, but I cannot get it to return the href properly. I get "none." I have tried everything I know to do… can …

Total answers: 2

Unable to open LOCAL HTML page for scrapping using BS$ Python

Unable to open LOCAL HTML page for scrapping using BS$ Python Question: I have written following code to open a local HTML file saved on my Desktop: However while running this code I get following error: I have no prior experience of handling this in Python or BS4. I tried various solutions online but couldn’t …

Total answers: 1

How to export specific div from webpage to dataframe?

How to export specific div from webpage to dataframe? Question: I want to export a specific div from the webpage. In this case, I want to export div with id "producer-votes-wrapper"; this part of the page has all the numbers(data) I want to get. Using previous examples and questions, I tried to do it by …

Total answers: 1

Python – Beautifulsoup – parse multiple span elements

Python – Beautifulsoup – parse multiple span elements Question: I am trying to extract title from ‘span’. Using the below code as an example, the output I am looking for is 6536 and 9319, which are part of ‘title’. Seen below: span aria-label="6536 users starred this repository" class="Counter js-social-count" data-plural-suffix="users starred this repository" data-singular-suffix="user starred …

Total answers: 1