Raise Airflow Exception to Fail Task from CURL request

Question:

I am using airflow to schedule and automate Python scripts housed on a Ubuntu server. The DAG triggers a CURL request that hits a Flask API on the same machine which actually runs the script. Here is a high level overview of the flow:

Airflow –> Curl request –> Flask API –> Python Script

DAG Task:

    t2 = BashOperator (
    task_id='extract_pcty_data',
    bash_command=f"""curl -d '{dataset}' -H 'Content-Type: application/json' -X POST {base_url}{endpoint}""",
)

Endpoint Registration:

api.add_resource(paylocity, "/api/v1/application/paylocity")

Resource Object:

class paylocity(Resource):
def __init__(self):
    self.reqparse = reqparse.RequestParser()

def get(self):
    return 200

def post(self):
    try:
        if request.json:
            data = request.json
            query = data['dataset']

        pcty = PaylocityAPI()
        pcty.auth()
        pcty.get_employees()
        pcty.get_paystatements()
        pcty.load_dataset()
        pcty.clean_up()
        return 200


    except Exception as e:
        print(traceback.print_exc(e))
        raise ValueError(e)

The issue I am running into, is that the script will fail for some reason which gets caught by the try/catch block and then raises the value error – but it does not cause the script to fail because the HTTP request response returned is 500 - Internal Server Error. What I am looking for is a simple and elegant way to interpret an HTTP response that is not 200 - OK as a "failure" and raising something like a ValueError or AirflowException to cause the task to fail. Any guidance or support would be greatly appreciated!

Asked By: rroline

||

Answers:

To those of you from Google looking for a simple and elegant answer to this or a similar question. Curl has a few flags that allow you to specify how you want the fail behavior of a request to act. For my specific scenario: --fail was the most appropriate. There is also --fail-with-body that allows you to get the content of the fail response rather than just the non-zero exit code. From their docs:

-f, --fail

(HTTP) Fail fast with no output at all on server errors. This is useful to enable scripts and users to better deal with failed attempts. In normal cases when an HTTP server fails to deliver a document, it returns an HTML document stating so (which often also describes why and more). This flag will prevent curl from outputting that and return error 22.

This method is not fail-safe and there are occasions where non-successful response codes will slip through, especially when authentication is involved (response codes 401 and 407).

Example:

 curl --fail https://example.com
Answered By: rroline