html-parsing

How to decode strange symbols from parser (bs4) into Cyrillic?

How to decode strange symbols from parser (bs4) into Cyrillic? Question: I tried to import ‘lxml’ and to find what encoding this is but for no success. Websites with decoding functions can’t transfer it back to Cyrillic. Only Windows-1250 and ISO-8859-1 can encode SOME symbols in the text. import os import requests from bs4 import …

Total answers: 1

parse a website with beautiful soup – attempting to parse value unsuccessfully

parse a website with beautiful soup – attempting to parse value unsuccessfully Question: Hi everyone i am parsing an html doc with beautifulsoup. However, one area of information I cant seem to parse: the html: <small> <span class="label label-primary">CVE-2019-11198</span> <span class="label label-warning">6.1 – Medium</span> – August 05, 2019 </small> I am parsing this whole block, …

Total answers: 2

Scraping webpage using BeautifulSoup

Scraping webpage using BeautifulSoup Question: I am attempting to scrape this site: https://www.senate.gov/general/contact_information/senators_cfm.cfm My Code: import requests from bs4 import BeautifulSoup URL = ‘https://www.senate.gov/general/contact_information/senators_cfm.cfm’ page = requests.get(URL) soup = BeautifulSoup(page.content, ‘html.parser’) print(soup) The issue is that it’s not actually going to the site. The HTML that I get in my soup var is not at …

Total answers: 2

Using regex to parse string python3

Using regex to parse string python3 Question: I am trying to access gSecureToken from the following string: $(“#ejectButton”).on(“click”, function(e) { $(“#ejectButton”).prop(“disabled”, true); $.ajax({ url : “/apps_home/eject/”, type : “POST”, data : { gSecureToken : “7b9854390a079b03cce068b577cd9af6686826b8” }, dataType : “json”, success : function(data, textStatus, xhr) { $(“#smbStatus”).html(”); $(“#smbEnable”).removeClass(‘greenColor’).html(‘OFF’); showPopup(“MiFi Share”, “<p>Eject completed. It is now safe …

Total answers: 3

Differences between .text and .get_text()

Differences between .text and .get_text() Question: In BeautifulSoup, is there any difference between .text and .get_text()? Which one should be preferred for getting element’s text? >>> from bs4 import BeautifulSoup >>> >>> html = “<div>text1 <span>text2</span><div>” >>> soup = BeautifulSoup(html, “html.parser”) >>> div = soup.div >>> div.text ‘text1 text2’ >>> div.get_text() ‘text1 text2’ Asked By: …

Total answers: 1

How to get HTML from a beautiful soup object

How to get HTML from a beautiful soup object Question: I have the following bs4 object listing: >>> listing <div class=”listingHeader”> <h2> …. >>> type(listing) <class ‘bs4.element.Tag’> I want to extract the raw html as a string. I’ve tried: >>> a = listing.contents >>> type(a) <type ‘list’> So this does not work. How can I …

Total answers: 1

beautiful soup getting tag.id

beautiful soup getting tag.id Question: I’m attempting to get a list of div ids from a page. When I print out the attributes, I get the ids listed. for tag in soup.find_all(class_=”bookmark blurb group”) : print(tag.attrs) results in: {‘id’: ‘bookmark_8199633’, ‘role’: ‘article’, ‘class’: [‘bookmark’, ‘blurb’, ‘group’]} {‘id’: ‘bookmark_7744613’, ‘role’: ‘article’, ‘class’: [‘bookmark’, ‘blurb’, ‘group’]} {‘id’: …

Total answers: 2

BeautifulSoup parent tag

BeautifulSoup parent tag Question: I have some html that I want to extract text from. Here’s an example of the html: <p>TEXT I WANT <i> &#8211; </i></p> Now, there are, obviously, lots of <p> tags in this document. So, find(‘p’) is not a good way to get at the text I want to extract. However, …

Total answers: 5

Inserting into a html file using python

Inserting into a html file using python Question: I have a html file where I would like to insert a <meta> tag between the <head> & </head> tags using python. If I open the file in append mode how do I get to the relevant position where the <meta> tag is to be inserted? Asked …

Total answers: 2

Python BeautifulSoup scrape tables

Python BeautifulSoup scrape tables Question: I am trying to create a table scrape with BeautifulSoup. I wrote this Python code: import urllib2 from bs4 import BeautifulSoup url = “http://dofollow.netsons.org/table1.htm” # change to whatever your url is page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) for i in soup.find_all(‘form’): print i.attrs[‘class’] I need to scrape Nome, Cognome, Email. …

Total answers: 3