How to send requests to a downloaded html file

Question:

I have a .html file downloaded and want to send a request to this file to grab it’s content.

However, if I do the following:

import requests
html_file  = "/user/some_html.html"
r = requests.get(html_file)

Gives the following error:

Invalid URL ‘some_html.html’: No schema supplied.

If I add a schema I get the following error:

HTTPConnectionPool(host=’some_html.html’, port=80): Max retries exceeded with url:

I want to know how to specifically send a request to a html file when it’s downloaded.

Asked By: tesla john

||

Answers:

You are accessing html file from local directory. get() method uses HTTPConnection and port 80 to access data from website not a local directory. To access file from local directory using get() method use Xampp or Wampp.
for accessing file from local directory you can use open() while requests.get() is for accessing file from Port 80 using http Connection in simple word from internet not local directory

import requests
html_file  = "/user/some_html.html"
t=open(html_file, "r")
for v in t.readlines():
  print(v)

Output:
enter image description here

Answered By: M. Twarog

You don’t "send a request to a html file". Instead, you can send a request to a HTTP server on the internet which will return a response with the contents of a html file.

The file itself knows nothing about "requests". If you have the file stored locally and want to do something with it, then you can open it just like any other file.

If you are interested in learning more about the request and response model, I suggest you try a something like

response = requests.get("http://stackoverflow.com")

You should also read about HTTP and requests and responses to better understand how this works.

Answered By: Code-Apprentice

You can do it by setting up a local server to your html file.
If you use Visual Studio Code, you can install Live Server by Ritwick Dey.

Then you do as follows:

1 – Make the first request and save the html content into a .html file:

my_req.py

import requests

file_path = './'
file_name = 'my_file'

url = "https://www.qwant.com/"

response = requests.request("GET", url)

w = open(file_path + file_name + '.html', 'w')
w.write(response.text)

2 – With Live Server installed on Visual Studio Code, click on my_file.html and then click on Go Live.

Go Live

and

3 – Now you can make a request to your local http schema:

second request

import requests

url = "http://127.0.0.1:5500/my_file.html"

response = requests.request("GET", url)

print(response.text)

And, tcharan!! do what you need to do.

On a crawler work, I had one situation where there was a difference between the content displayed on the website and the content retrieved with the response.text so the xpaths did not were the same as on the website, so I needed to download the content, making a local html file, and get the new ones xpaths to get the info that I needed.

Answered By: Rogerio Kurek

You can try this:

from requests_html import HTML
with open("htmlfile.html") as htmlfile:
    sourcecode = htmlfile.read()
    parsedHtml = HTML(html=sourcecode)
print(parsedHtml)
Answered By: memr404
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.