Python – Send HTTP GET string – Receive 301 Moved Permanently – What's next?

Question:

I’m trying to use Python 2 to send my own HTTP GET message to a web server, retrieve html text, and write it to an html file (no urllib, urllib2, httplib, requests, etc. allowed).

import socket 
tcpSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpSocket.connect(('python.org', 80))

http_get = """GET / HTTP/1.1r
Host: www.python.org/r
Connection: keep-aliver
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8r
Upgrade-Insecure-Requests: 1r
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36r
Accept-Encoding: gzip, deflate, sdchr
Accept-Language: en-US,en;q=0.8rnrn"""

tcpSocket.send(http_get)
m = tcpSocket.recv(4096)
tcpSocket.close()

print m

Output:

HTTP/1.1 301 Moved Permanently
Location: https://www.python.org//
Connection: Keep-Alive
Content-length: 0

Why does it return 301 when the location is apparently still the same? What message and to where should I send next to get the html content?

Thank you very much!

Asked By: Myath

||

Answers:

Your problem is that the url you are seeking doesn’t serve over http://, but rather redirects to https://. To show that your code fundamentally works with a proper target I have changed your get request to

import socket
tcpSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpSocket.connect(('www.cnn.com', 80))

http_get = """GET / HTTP/1.1r
Host: www.cnn.com/r
Connection: keep-aliver
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8r
Upgrade-Insecure-Requests: 1r
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36r
Accept-Encoding: gzip, deflate, sdchr
Accept-Language: en-US,en;q=0.8rnrn"""

http_get_minimum = """GET / HTTP/1.1rnHost: www.cnn.comrnConnection: closernrn"""

tcpSocket.send(http_get_minimum)
m = tcpSocket.recv(4096)
tcpSocket.close()

and received

HTTP/1.1 200 OK
x-servedByHost: prd-10-60-168-42.nodes.56m.dmtio.net
Cache-Control: max-age=60
X-XSS-Protection: 1; mode=block
Content-Security-Policy: default-src ‘self’ http://.cnn.com: https://.cnn.com: .cnn.net: .turner.com: .ugdturner.com: .vgtf.net:; script-src ‘unsafe-inline’ ‘unsafe-eval’ ‘self’ *; style-src ‘unsafe-inline’ ‘self’ *; frame-src ‘self’ *; object-src ‘self’ *; img-src ‘self’ * data:; media-src ‘self’ *; font-src ‘self’ *; connect-src ‘self’ *;
Content-Type: text/html; charset=utf-8
Via: 1.1 varnish
Content-Length: 74864
Accept-Ranges: bytes
Date: Mon, 05 Oct 2015 00:39:54 GMT
Via: 1.1 varnish
Age: 170
Connection: close
X-Served-By: cache-iad2144-IAD, cache-sjc3129-SJC
X-Cache: HIT, HIT
X-Cache-Hits: 2, 95
X-Timer: S1444005594.675567,VS0,VE0
Vary: Accept-Encoding

UPDATE: Yes, there is extra functionality required from what you have presented to be able to request over HTTPS. There are some primary differences between http and https, however, beginning with the default port, which is 80 for http and 443 for https. Https works by transmitting normal http interactions through an encrypted system, so that in theory, the information cannot be accessed by any party other than the client and end server. There are two common types of encryption layers: Transport Layer Security (TLS) and Secure Sockets Layer (SSL), both of which encode the data records being exchanged.

When using an https connection, the server responds to the initial connection by offering a list of encryption methods it supports. In response, the client selects a connection method, and the client and server exchange certificates to authenticate their identities. After this is done, both parties exchange the encrypted information after ensuring that both are using the same key, and the connection is closed. In order to host https connections, a server must have a public key certificate, which embeds key information with a verification of the key owner’s identity. Most certificates are verified by a third party so that clients are assured that the key is secure.

Answered By: Shawn Mehan

I had the same problem and changing port from 80 to 443 solved it.

Answered By: ahmadkarimi12

I know this is a very old question, but maybe someone else will have the same problem.
So I knew that the problem was with requesting on port 80 instead of port 443. But just switching this didn’t do the trick. Response I was getting was: b’x15x03x03x00x02x022x15x03x03x00x02x01x00′. If you decode that, it’s just gibberish. The meaning of this is, that the server tryed to initiate TLS handshake. After wraping socket with TLS, I got the response I wanted.

import socket
from fake_useragent import UserAgent
import ssl
h = "www.google.com"
ua = UserAgent().chrome
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM) #TCP

target = socket.gethostbyname(h)
t_port = 443
sock.connect((target, t_port))
context = ssl.create_default_context()
sock = context.wrap_socket(sock, server_hostname=h)

request = f"GET / HTTP/1.1rnHost: {h} rnUser-Agent: {ua}rnConnection: keep-alive rnrn"        
sock.send(request.encode())
ret = sock.recv(4096)
print('[+]' + ret.decode())

Answered By: Jan Zajc