beautifulsoup

Get ALL strings from html

Get ALL strings from html Question: I send get requests to different sites. In response I get HTML pages. How can I get only strings from the HTML page? I mean all strings in general (the ones colored white in my screenshot). I understand how I can get "div", "code", "a", and etc tags. But …

Total answers: 1

How to print text and certain specified tags of XML file using BeautifulSoup

How to print text and certain specified tags of XML file using BeautifulSoup Question: I’m parsing the XML of a Microsoft Word .docx file with BeautifulSoup. I’d like to be able to extract the text of the XML file while still printing certain tags that I choose. I can get the text of the file …

Total answers: 1

Is there a really simple method for printing scraped output to a csv file?

Is there a really simple method for printing scraped output to a csv file? Question: Python: Python 3.11.2 Python Editor: PyCharm 2022.3.3 (Community Edition) – Build PC-223.8836.43 OS: Windows 11 Pro, 22H2, 22621.1413 Browser: Chrome 111.0.5563.65 (Official Build) (64-bit) I have a URL (e.g., https://dockets.justia.com/docket/puerto-rico/prdce/3:2023cv01127/175963) from which I’m scraping nine items. I’m looking to have …

Total answers: 3

Scraping content from what appear to be identical HTML elements

Scraping content from what appear to be identical HTML elements Question: Python: Python 3.11.2 Python Editor: PyCharm 2022.3.3 (Community Edition) – Build PC-223.8836.43 OS: Windows 11 Pro, 22H2, 22621.1413 Browser: Chrome 111.0.5563.65 (Official Build) (64-bit) I’m looking at the following URL — https://dockets.justia.com/docket/puerto-rico/prdce/3:2023cv01127/175963 — from which I’m attempting to scrape data from class elements that …

Total answers: 1

Python: Scrape href from td – can't get it to work correctly

Python: Scrape href from td – can't get it to work correctly Question: I’m very new to python and have gone through previous questions on SO but could not solve it. Here is my code: import requests import pandas as pd from bs4 import BeautifulSoup from urllib.parse import urlparse url = "https://en.wikipedia.org/wiki/List_of_curling_clubs_in_the_United_States" data = requests.get(url).text …

Total answers: 1

Parse a table from wikipedia that is hidden

Parse a table from wikipedia that is hidden Question: I’am pretty new here. I want to parse a table from wikipedia from a following link: https://en.wikipedia.org/wiki/MIUI I was able to parse first table, but I can’t figure out how to get the information from the second table there, the information that contains "version history" of …

Total answers: 1

Retrieving all hrefs in anchor tags

Retrieving all hrefs in anchor tags Question: import warnings import numpy as np from datetime import datetime import json from bs4 import BeautifulSoup warnings.filterwarnings(‘ignore’) url = "https://understat.com/league/EPL/2022" response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") for link in soup.find_all("a", class_="match-info"): href = link.get("href") print(href) unfortunately this code does not find any results the desired results are …

Total answers: 2

Beautiful Soup: 'NoneType' object has no attribute 'text'

Beautiful Soup: 'NoneType' object has no attribute 'text' Question: I got this code to work to scrape a table on a webpage, which I’m very happy with. However, on a rare occasion, a title might miss a ‘genre’ or an ‘image URL’ field. As soon as the scraper hits an item in the list that …

Total answers: 1

Why is beautifulsoup not returning data elements?

Why is beautifulsoup not returning data elements? Question: I’ve tried many things to return the data on this page: https://www.hebban.nl/rank . For some reason it’s not returning any data point, after many tries. Can someone point me in the right direction and tell me what I’m doing wrong. I’m learning but I seem to be …

Total answers: 1

Multiple span tag under one parent DIV id always returns first record

Multiple span tag under one parent DIV id always returns first record Question: I have multiple span tag with same class name under one parent div id. But, the BeautifulSoup item loop always returns first attribute only, rest of the attributes are not printing. Note : All of my span class names are same as …

Total answers: 1