PycURL Attachments and Progress Functions

Question:

Working on a small project utilising an API that you send a request to and then it gives you back a response with a zip file attached which you then download. My first pass at automating this download tried to utilise the setopt(curl.WRITEDATA, fp) function but this proceeded to crash my Python script every time I tried. I then changed tack and used WRITEFUNCTION instead to write the data to a buffer then write that out to a file which works fine consistently.

This was all fine but then I wanted to add a progress bar to see how much of the file had been downloaded and give some user feedback etc. This is where things started to get strange because now the progress bar gets to 100% within a second and the zip file has not completed its download. When I altered my progress function to just print the size of the file it was downloading it reports a number of the order of a few 100 bytes (much smaller than the zip file). Is there anyway to use the functions in pycurl (and curl underneath) to track the progress of the attachment download as opposed to the request itself?

Also if anyone could help with the WRITEDATA problem that might help as well, I guess the two problems might be connected.

Asked By: Jonathan Rainer

||

Answers:

The following code will download a file using pycurl and display the current progress (as text):

import pycurl
# for displaying the output text
from sys import stderr as STREAM

# replace with your own url and path variables
url = "http://speedtest.tele2.net/100MB.zip"
path = 'test_file.dat'

# use kiB's
kb = 1024

# callback function for c.XFERINFOFUNCTION
def status(download_t, download_d, upload_t, upload_d):
    STREAM.write('Downloading: {}/{} kiB ({}%)r'.format(
        str(int(download_d/kb)),
        str(int(download_t/kb)),
        str(int(download_d/download_t*100) if download_t > 0 else 0)
    ))
    STREAM.flush()

# download file using pycurl
with open(path, 'wb') as f:
    c = pycurl.Curl()
    c.setopt(c.URL, url)
    c.setopt(c.WRITEDATA, f)
    # display progress
    c.setopt(c.NOPROGRESS, False)
    c.setopt(c.XFERINFOFUNCTION, status)
    c.perform()
    c.close()

# keeps progress on screen after download completes
print()

The output should look something like this:

Downloading: 43563/122070 kiB (35%)

If you want to use an actual progress bar, that can be done, too. But it takes more work.

The following code uses the tqdm package to generate a progress bar. It updates in realtime as the file is downloading and also shows the download speed and the estimated time remaining. Due to a limitation of the way tqdm works, the requests package is also needed. That also has to do with the reason why the total_dl_d variable is an array and not an integer.

import pycurl
# needed to predict total file size
import requests
# progress bar
from tqdm import tqdm

# replace with your own url and path variables
url = 'http://speedtest.tele2.net/100MB.zip'
path = 'test_file.dat'

# show progress % and amount in bytes
r = requests.head(url)
total_size = int(r.headers.get('content-length', 0))
block_size = 1024

# create a progress bar and update it manually
with tqdm(total=total_size, unit='iB', unit_scale=True) as pbar:
    # store dotal dl's in an array (arrays work by reference)
    total_dl_d = [0]
    def status(download_t, download_d, upload_t, upload_d, total=total_dl_d):
        # increment the progress bar
        pbar.update(download_d - total[0])
        # update the total dl'd amount
        total[0] = download_d

    # download file using pycurl
    with open(path, 'wb') as f:
        c = pycurl.Curl()
        c.setopt(c.URL, url)
        c.setopt(c.WRITEDATA, f)
        # follow redirects:
        c.setopt(c.FOLLOWLOCATION, True)
        # custom progress bar
        c.setopt(c.NOPROGRESS, False)
        c.setopt(c.XFERINFOFUNCTION, status)
        c.perform()
        c.close()

Explanation of possible causes to the issues described:

(There was no code provided in the question, so I’ll have to guess a little bit about what exactly was causing the mentioned issues…)

Based on the variable name (fp i.e. file_path)…

The file-write (WRITEDATA) issue was likely due to a file path (str) being provided instead of a file object (io.BufferedWriter).

Based on my own experience…

The XFERINFOFUNCTION callback is called repeatedly during file download. The callback only provides the total file size and the total that has already been downloaded as parameters. It does not calculate the delta (difference) since the last time it was called. The issue that was described with the progress bar ("the progress bar gets to 100% within a second and the zip file has not completed its download") is likely due to the total amount (downloaded) being used as the update amount when an increment amount is expected. If the progress bar is being incremented each time by the total amount then it is not going to reflect the actual amount downloaded. It is going to show a much larger amount. Then, it will exceed 100% and have all sorts of glitches.


Sources:

Answered By: Elliot G.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.