PyCurl request hangs infinitely on perform

Question:

I have written a script to fetch scan results from Qualys to be run each week for the purpose of metrics gathering.

The first part of this script involves fetching a list of references for each of the scans that were run in the past week for further processing.

The problem is that, while this will work perfectly sometimes, other times the script will hang on the c.perform() line. This is manageable when running the script manually as it can just be re-run until it works. However, I am looking to run this as a scheduled task each week without any manual interaction.

Is there a foolproof way that I can detect if a hang has occurred and resend the PyCurl request until it works?

I have tried setting the c.TIMEOUT and c.CONNECTTIMEOUT options but these don’t seem to be effective. Also, as no exception is thrown, simply putting it in a try-except block also won’t fly.

The function in question is below:

# Retrieve a list of all scans conducted in the past week
# Save this to refs_raw.txt
def getScanRefs(usr, pwd):

    print("getting scan references...")

    with open('refs_raw.txt','wb') as refsraw: 
        today = DT.date.today()
        week_ago = today - DT.timedelta(days=7)
        strtoday = str(today)
        strweek_ago = str(week_ago)

        c = pycurl.Curl()

        c.setopt(c.URL, 'https://qualysapi.qualys.eu/api/2.0/fo/scan/?action=list&launched_after_datetime=' + strweek_ago + '&launched_before_datetime=' + strtoday)
        c.setopt(c.HTTPHEADER, ['X-Requested-With: pycurl', 'Content-Type: text/xml'])
        c.setopt(c.USERPWD, usr + ':' + pwd)
        c.setopt(c.POST, 1)
        c.setopt(c.PROXY, 'companyproxy.net:8080')
        c.setopt(c.CAINFO, certifi.where())
        c.setopt(c.SSL_VERIFYPEER, 0)
        c.setopt(c.SSL_VERIFYHOST, 0)
        c.setopt(c.CONNECTTIMEOUT, 3)
        c.setopt(c.TIMEOUT, 3)

        refsbuffer = BytesIO()
        c.setopt(c.WRITEDATA, refsbuffer)
        c.perform()

        body = refsbuffer.getvalue()
        refsraw.write(body)
        c.close()

    print("Got em!")
Asked By: I_GNU_it_all_along

||

Answers:

I fixed the issue myself by launching a separate process using multiprocessing to launch the API call in a separate process, killing and restarting if it goes on for longer than 5 seconds. It’s not very pretty but is cross-platform. For those looking for a solution that is more elegant but only works on *nix look into the signal library, specifically SIGALRM.

Code below:

# As this request for scan references sometimes hangs it will be run in a separate thread here
# This will be terminated and relaunched if no response is received within 5 seconds
def performRequest(usr, pwd):
    today = DT.date.today()
    week_ago = today - DT.timedelta(days=7)
    strtoday = str(today)
    strweek_ago = str(week_ago)

    c = pycurl.Curl()

    c.setopt(c.URL, 'https://qualysapi.qualys.eu/api/2.0/fo/scan/?action=list&launched_after_datetime=' + strweek_ago + '&launched_before_datetime=' + strtoday)
    c.setopt(c.HTTPHEADER, ['X-Requested-With: pycurl', 'Content-Type: text/xml'])
    c.setopt(c.USERPWD, usr + ':' + pwd)
    c.setopt(c.POST, 1)
    c.setopt(c.PROXY, 'companyproxy.net:8080')
    c.setopt(c.CAINFO, certifi.where())
    c.setopt(c.SSL_VERIFYPEER, 0)
    c.setopt(c.SSL_VERIFYHOST, 0)

    refsBuffer = BytesIO()
    c.setopt(c.WRITEDATA, refsBuffer)
    c.perform()
    c.close()
    body = refsBuffer.getvalue()
    refsraw = open('refs_raw.txt', 'wb')
    refsraw.write(body)
    refsraw.close()

# Retrieve a list of all scans conducted in the past week
# Save this to refs_raw.txt
def getScanRefs(usr, pwd):

    print("Getting scan references...") 

    # Occasionally the request will hang infinitely. Launch in separate method and retry if no response in 5 seconds
    success = False
    while success != True:
        sendRequest = multiprocessing.Process(target=performRequest, args=(usr, pwd))
        sendRequest.start()

        for seconds in range(5):
            print("...")
            time.sleep(1)

        if sendRequest.is_alive():
            print("Maximum allocated time reached... Resending request")
            sendRequest.terminate()
            del sendRequest
        else:
            success = True

    print("Got em!")
Answered By: I_GNU_it_all_along

The question is old but i will add this answer, it might help someone.

the only way to terminate a running curl after executing "perform()" is by using callbacks:

1- using CURLOPT_WRITEFUNCTION:
as stated from docs:

Your callback should return the number of bytes actually taken care of. If that amount differs from the amount passed to your callback function, it’ll signal an error condition to the library. This will cause the transfer to get aborted and the libcurl function used will return CURLE_WRITE_ERROR.

the drawback with this method is curl calls the write function only when receives new data from the server, so in case of server stopped sending data curl will just keep waiting at the server side and will not receive your kill signal

2- the alternative and the best so far is using progress callback:

the beauty of progress callback is being curl will call it at least once per seconds even if no data coming from the server which will give you the opportunity to return non zero value as a kill switch to curl

use option CURLOPT_XFERINFOFUNCTION,
note it is better than using CURLOPT_PROGRESSFUNCTION as quoted in docs:

We encourage users to use the newer CURLOPT_XFERINFOFUNCTION instead, if you can.

also you need to set option CURLOPT_NOPROGRESS

CURLOPT_NOPROGRESS must be set to 0 to make this function actually get called.

This is an example to show you both write and progress functions implementations in python:

# example of using write and progress function to terminate curl
import pycurl

open('mynewfile', 'w') as f  # used to save downloaded data
counter = 0

# define callback functions which will be used by curl
def my_write_func(data):
    """write to file"""
    f.write(data)
    counter += len(data)

    # an example to terminate curl: tell curl to abort if the downloaded data exceeded 1024 byte by returning -1 or any number 
    # not equal to len(data) 
    if counter >= 1024:
        return -1

def progress(*data):
    """it receives progress from curl and can be used as a kill switch
    Returning a non-zero value from this callback will cause curl to abort the transfer
    """
    d_size, downloaded, u_size, uploade = data

    # an example to terminate curl: tell curl to abort if the downloaded data exceeded 1024 byte by returning non zero value 
    if downloaded >= 1024:
        return -1


# initialize curl object and options
c = pycurl.Curl()

# callback options
c.setopt(pycurl.WRITEFUNCTION, my_write_func)

self.c.setopt(pycurl.NOPROGRESS, 0)  # required to use a progress function
self.c.setopt(pycurl.XFERINFOFUNCTION, self.progress) 
# self.c.setopt(pycurl.PROGRESSFUNCTION, self.progress)  # you can use this option but pycurl.XFERINFOFUNCTION is recommended
# put other curl options as required

# executing curl
c.perform()
Answered By: Mahmoud Elshahat
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.