How to get the raw content of a response in requests with Python?
Question:
Trying to get the raw data of the HTTP response content in requests
in Python. I am interested in forwarding the response through another channel, which means that ideally the content should be as pristine as possible.
What would be a good way to do this?
Answers:
If you are using a requests.get
call to obtain your HTTP response, you can use the raw
attribute of the response. Here is the code from the requests
docs. The stream=True
parameter in the requests.get
call is required for this to work.
>>> r = requests.get('https://github.com/timeline.json', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'x1fx8bx08x00x00x00x00x00x00x03'
After requests.get()
, you can use r.content
to extract the raw Byte-type content.
r = requests.get('https://yourweb.com', stream=True)
r.content
To add to @brien answer, as stated in the docs:
In general, however, you should use a pattern like this to save what is being streamed to a file:
r = requests.get('https://yourweb.com', stream=True)
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
Using Response.iter_content will handle a lot of what you would otherwise have to handle when using Response.raw directly. When streaming a download, the above is the preferred and recommended way to retrieve the content. Note that chunk_size can be freely adjusted to a number that may better fit your use cases.
That pattern not only has the advantages described above, but is also a good to fetch data in environments with limited memory.
Trying to get the raw data of the HTTP response content in requests
in Python. I am interested in forwarding the response through another channel, which means that ideally the content should be as pristine as possible.
What would be a good way to do this?
If you are using a requests.get
call to obtain your HTTP response, you can use the raw
attribute of the response. Here is the code from the requests
docs. The stream=True
parameter in the requests.get
call is required for this to work.
>>> r = requests.get('https://github.com/timeline.json', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'x1fx8bx08x00x00x00x00x00x00x03'
After requests.get()
, you can use r.content
to extract the raw Byte-type content.
r = requests.get('https://yourweb.com', stream=True)
r.content
To add to @brien answer, as stated in the docs:
In general, however, you should use a pattern like this to save what is being streamed to a file:
r = requests.get('https://yourweb.com', stream=True)
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
Using Response.iter_content will handle a lot of what you would otherwise have to handle when using Response.raw directly. When streaming a download, the above is the preferred and recommended way to retrieve the content. Note that chunk_size can be freely adjusted to a number that may better fit your use cases.
That pattern not only has the advantages described above, but is also a good to fetch data in environments with limited memory.