urllib

Strange character added when decoding with urllib

Strange character added when decoding with urllib Question: I’m trying to parse a query string like this: filename=logo.txt\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01x&filename=.hidden.txt Since it mixes bytes and text, I tried to alter it such that it will produce the desired escaped url output like so: extended = ‘filename=logo.txt\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01x&filename=.hidden.txt’ fixbytes = bytes(extended, ‘utf-8’) fixbytes = fixbytes.decode("unicode_escape") algoext = ‘?’ + …

Total answers: 3

JSON from webpage into Python script: urllib.error.HTTPError: HTTP Error 403: Forbidden

JSON from webpage into Python script: urllib.error.HTTPError: HTTP Error 403: Forbidden Question: I made a program and it works with a local json (data) file! Code Block: def datas(self): with open ("C:\Users\Messi\Desktop\Python\\tek.json", "r") as dosya: dataApi = json.load(dosya) return dataApi I uploaded this data json to a website which is a Lemp Server! https://bestpurpleshampoo.com/tek.json I …

Total answers: 1

Web scrapping gives different output every time

Web scrapping gives different output every time Question: from urllib import request from bs4 import BeautifulSoup page_url = "http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=GTX&bop=And&Page=1&PageSize=36&order=BESTMATCH" uclient = request.urlopen(page_url) #open a webclient html_page = uclient.read() page_soup = BeautifulSoup(html_page,"html.parser") uclient.close() containers = page_soup.find_all("div",{"class" :"item-cell"}) title_list = [] for contain in containers: title = contain.select("img")[0]["title"] print(title)# for troubleshooting print(len(title_list)) #for troubleshooting title_list.append(title) print(title_list) Can …

Total answers: 1

How to get file from url in python?

How to get file from url in python? Question: I want to download text files using python, how can I do so? I used requests module’s urlopen(url).read() but it gives me the bytes representation of file. Asked By: Yaver Javid || Source Answers: When downloading text files with python I like to use the wget …

Total answers: 3

how to avoid square brackets from url after unquote

how to avoid square brackets from url after unquote Question: i’ve decided to add a querystring on URL like this import urllib import urllib.parse from urllib.parse import urlencode url = "https://datausa.io/api/data?Geography=04000US06&drilldowns=Race,Ethnicity&measures=Hispanic%20Population,Hispanic%20Population%20Moe" parts = urllib.parse.urlparse(url) query_dict = urllib.parse.parse_qs(parts.query) query_dict[‘Geography’] = ‘04000US24’ new_parts = list(parts) new_parts[4] = urlencode(query_dict) print(urllib.parse.urlunparse(new_parts)) and i got this result https://datausa.io/api/data?Geography=04000US24&drilldowns=%5B%27Race%2CEthnicity%27%5D&measures=%5B%27Hispanic+Population%2CHispanic+Population+Moe%27%5D and so …

Total answers: 1

urllib.error.HTTPError: HTTP Error 403: Forbidden with urllib.requests

urllib.error.HTTPError: HTTP Error 403: Forbidden with urllib.requests Question: I am trying to read an image URL from the internet and be able to get the image onto my machine via python, I used example used in this blog post https://www.geeksforgeeks.org/how-to-open-an-image-from-the-url-in-pil/ which was https://media.geeksforgeeks.org/wp-content/uploads/20210318103632/gfg-300×300.png, however, when I try my own example it just doesn’t seem to …

Total answers: 2

BeautifulSoup findAll not returning results

BeautifulSoup findAll not returning results Question: I want to get the product name and prices of this page. I pretty much repeated the exact same thing, I did for the product name for the price, but I’m not getting anything. from urllib.request import Request, urlopen from bs4 import BeautifulSoup as bSoup header = {‘User-Agent’:’Mozilla/5.0 (Windows …

Total answers: 1

Scraping a specific GTAG value from a website

Scraping a specific GTAG value from a website Question: I am trying to scrape website and return their GTM container ID , I found a solution which is only working for a single specific website. Which is working for : (https://www.observepoint.com/) import urllib3 import re from bs4 import BeautifulSoup http = urllib3.PoolManager() response = http.request(‘GET’, …

Total answers: 1

Creating URLs in a loop

Creating URLs in a loop Question: I am trying to create a list of URLs using a for loop. It prints all the correct URLs, but is not saving them in a list. Ultimately I want to download multiple files using urlretrieve. for i, j in zip(range(0, 17), range(1, 18)): if i < 8 or …

Total answers: 2

Error occured when getting the data file through URL using python

Error occured when getting the data file through URL using python Question: I tried to load data from a URL url = ‘http://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv’ chipo = pd.read_csv(url, sep = ‘t’) and there is an error URLError: <urlopen error [Errno 11004] getaddrinfo failed> I’ve checked this answer but this does not help. I’ve also tried fetching data …

Total answers: 1