How to parse raw HTTP request in Python 3?

Question

I am looking for a native way to parse an http request in Python 3.

This question shows a way to do it in Python 2, but uses now deprecated modules, (and Python 2) and I am looking for a way to do it in Python 3.

I would mainly like to just figure out what resource is requested and parse the headers and from a simple request. (i.e):

GET /index.html HTTP/1.1
Host: localhost
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

Can someone show me a basic way to parse this request?

Asked By: Startec

||

Source

Answer 1

Each one of those field names should be delimited by carriage return then newline, and then the field name and value are delimited by a colon. So assuming you already have the response as a string, it should be as easy as:

fields = resp.split("rn")
fields = fields[1:] #ignore the GET / HTTP/1.1
output = {}
for field in fields:
    key,value = field.split(':', 1)#split each line by http field name and value
    output[key] = value

Update 4/13

Using the example http resp in the linked to post:

resp = 'GET /search?sourceid=chrome&ie=UTF-8&q=ergterst HTTP/1.1rnHost: www.google.comrnConnection: keep-alivernA
ccept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5rnUser-Agent: Mozill
a/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.
13rnAccept-Encoding: gzip,deflate,sdchrnAvail-Dictionary: GeNLY2f-rnAccept-Language: en-US,en;q=0.8rn'


fields = resp.split("rn")
fields = fields[1:] #ignore the GET / HTTP/1.1
output = {}
for field in fields:
    if not field:
        continue
    key,value = field.split(':', 1)
    output[key] = value    
print(output)

An additional check to make sure field is not empty is needed. OUtput:

{'Host': ' www.google.com', 'Connection': ' keep-alive', 'Accept': ' application/xml,application/xhtml+xml,text/html;q=
0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'User-Agent': ' Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) App
leWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.13', 'Accept-Encoding': ' gzip,deflate,sdch', 'Avail-D
ictionary': ' GeNLY2f-', 'Accept-Language': ' en-US,en;q=0.8'}

Answered By: Liam Kelly

Answer 2

You could use the email.message.Message class from the email module in the standard library.

By modifying the answer from the question you linked, below is a Python3 example of parsing HTTP headers.

Suppose you wanted to create a dictionary containing all of your header fields:

import email
import pprint

request_string = 'GET / HTTP/1.1rnHost: localhostrnConnection: keep-alivernCache-Control: max-age=0rnUpgrade-Insecure-Requests: 1rnUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36rnAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8rnAccept-Encoding: gzip, deflate, sdchrnAccept-Language: en-US,en;q=0.8'

# pop the first line so we only process headers
_, headers = request_string.split('rn', 1)

# construct a message from the request string. note: the return is already a dict-like object.
message = email.message_from_string(headers)

# construct a dictionary containing the headers
headers = dict(message.items())

# pretty-print the dictionary of headers
pprint.pprint(headers, width=160)

if you ran this at a python prompt, the result would look like:

{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
 'Accept-Encoding': 'gzip, deflate, sdch',
 'Accept-Language': 'en-US,en;q=0.8',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive',
 'Host': 'localhost',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}

Answered By: Corey Goldberg

Answer 3

Here are some Python packages aimed at proper HTTP protocol parsing:

https://dpkt.readthedocs.io/en/latest/api/api_auto.html#module-dpkt.http
https://h11.readthedocs.io/en/latest/
https://github.com/benoitc/http-parser/ (C backend)
https://github.com/MagicStack/httptools (based on NodeJS’s C backend)
https://github.com/silentsignal/netlib-offline (shameless plug)

Answered By: buherator

How to parse raw HTTP request in Python 3?

Question:

Answers: