Why am I only getting an empty file when I try to scrape an XML file from a site with Requests in Python?

Question:

I’m trying to use Python to download XML files from this site:

https://media.waec.wa.gov.au/

But the following examples are both leaving me with just an empty XML file. The first saves me an "InsecureRequestWarning" message but the outcome of both is the same.

r = requests.get('https://media.waec.wa.gov.au/2022%20North%20West%20Central%20By-Election%20-%20LA%20VERBOSE%20RESULTS.xml', verify='~ file path for locally saved site certificate PEM file ~')
r.raw.decode_content = True
with open('~ file path for saved file ~', 'wb') as f:
        shutil.copyfileobj(r.raw, f)
r = requests.get('https://media.waec.wa.gov.au/2022%20North%20West%20Central%20By-Election%20-%20LA%20VERBOSE%20RESULTS.xml', verify=False)
r.raw.decode_content = True
with open('~ file path for saved file ~', 'wb') as f:
        shutil.copyfileobj(r.raw, f)
Asked By: Sergei Walankov

||

Answers:

You receive an empty file, because you didn’t receive a response. When I tried your snippet I received http 403 status code. This happened because this site didn’t accept a request without headers

Below you can find code, which makes me able to save the result to the xml file.

import requests

headers = {'User-Agent': 'Python User Agent'}
url = 'http://media.waec.wa.gov.au/2022%20North%20West%20Central%20By-Election%20-%20LA%20VERBOSE%20RESULTS.xml'
res = requests.get(url, headers=headers)

with open('my_file.xml', 'w') as file:
    file.write(res.text)
Answered By: JacekK
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.