How to resume file download in Python 3.5?

Question:

I am using python 3.5 requests module to download a file using the following code, how to make this code “auto-resume” the download from partially downloaded file.

response = requests.get(url, stream=True)

total_size = int(response.headers.get('content-length'))  

with open(file_path + file_name, "wb") as file:
    for data in tqdm(iterable = response.iter_content(chunk_size = 1024), total = total_size//1024, unit = 'KB'):
        file.write(data)

I would prefer to use only requests module to achieve this if possible.

Asked By: ibrahimcetin

||

Answers:

I don’t think requests has this built in—but you can do it manually pretty easily (as long as the server supports it).

The key is Range requests. To fetch part of a resource starting at byte 12345, you add this header:

Range: bytes=12345-

And then you can just append the results onto your file.


Ideally, you should verify that you get back a 206 Partial Content instead of a 200, and that the headers include the range you wanted:

Content-Range: bytes 12345-123456/123456
Content-Length: 111112

You also may want to pre-validate that the server handles ranges. You can do this by looking at the headers in your initial response, or by doing a HEAD, which checks for this:

Accept-Ranges: bytes

If the header is missing entirely, or has none as a value, or has a list of values that doesn’t include bytes, the server doesn’t support resuming.

And also maybe check the Content-Length to verify that you didn’t already finish the whole file right before getting interrupted.


So, the code would look something like this:

def fetch_or_resume(url, filename):
    with open(filename, 'ab') as f:
        headers = {}
        pos = f.tell()
        if pos:
            headers['Range'] = f'bytes={pos}-'
        response = requests.get(url, headers=headers, stream=True)
        if pos:
            validate_as_you_want_(pos, response)
        total_size = int(response.headers.get('content-length'))  
        for data in tqdm(iterable = response.iter_content(chunk_size = 1024), total = total_size//1024, unit = 'KB'):
            f.write(data)

One common bug from people writing download manager type software is trying to keep track of how much has been read in previous requests. Don’t do that’s just use the file itself to tell you how much you have. After all, if you read 23456 bytes but only flushed 12345 to the file, that 12345 is where you want to start.

Answered By: abarnert