Running Job On Airflow Based On Webrequest
Question:
I wanted to know if airflow tasks can be executed upon getting a request over HTTP. I am not interested in the scheduling part of Airflow. I just want to use it as a substitute for Celery.
So an example operation would be something like this.
- User submits a form requesting for some report.
- Backend receives the request and sends the user a notification that the request has been received.
- The backend then schedules a job using Airflow to run immediately.
- Airflow then executes a series of tasks associated with a DAG. For example, pull data from redshift first, pull data from MySQL, make some operations on the two result sets, combine them and then upload the results to Amazon S3, send an email.
From whatever I read online, you can run airflow jobs by executing airflow ...
on the command line. I was wondering if there is a python api which can execute the same thing.
Thanks.
Answers:
You should look at Airflow HTTP Sensor for your needs. You can use this to trigger a dag.
The Airflow REST API Plugin would help you out here. Once you have followed the instructions for installing the plugin you would just need to hit the following url: http://{HOST}:{PORT}/admin/rest_api/api/v1.0/trigger_dag?dag_id={dag_id}&run_id={run_id}&conf={url_encoded_json_parameters}
, replacing dag_id with the id of your dag, either omitting run_id or specify a unique id, and passing a url encoded json for conf (with any of the parameters you need in the triggered dag).
Here is an example JavaScript function that uses jQuery to call the Airflow api:
function triggerDag(dagId, dagParameters){
var urlEncodedParameters = encodeURIComponent(dagParameters);
var dagRunUrl = "http://airflow:8080/admin/rest_api/api/v1.0/trigger_dag?dag_id="+dagId+"&conf="+urlEncodedParameters;
$.ajax({
url: dagRunUrl,
dataType: "json",
success: function(msg) {
console.log('Successfully started the dag');
},
error: function(e){
console.log('Failed to start the dag');
}
});
}
A new option in airflow is the experimental, but built-in, API endpoint in the more recent builds of 1.7 and 1.8. This allows you to run a REST service on your airflow server to listen to a port and accept cli jobs.
I only have limited experience myself, but I have run test dags with success. Per the docs:
/api/experimental/dags/<DAG_ID>/dag_runs
creates a dag_run for a given dag id (POST).
That will schedule an immediate run of whatever dag you want to run. It does still use the scheduler, though, waiting for a heartbeat to see that dag is running and pass tasks to the worker. This is exactly the same behavior as the CLI, though, so I still believe it fits your use-case.
Documentation on how to configure it is available here: https://airflow.apache.org/api.html
There are some simple example clients in the github, too, under airflow/api/clients
Airflow’s experimental REST API interface can be used for this purpose.
Following request will trigger a DAG:
curl -X POST
http://<HOST>:8080/api/experimental/dags/process_data/dag_runs
-H 'Cache-Control: no-cache'
-H 'Content-Type: application/json'
-d '{"conf":"{"START_DATE":"2018-06-01 03:00:00", "STOP_DATE":"2018-06-01 23:00:00"}'
Following request retrieves a list of Dag Runs for a specific DAG ID:
curl -i -H "Accept: application/json" -H "Content-Type: application/json" -X GET http://<HOST>:8080/api/experimental/dags/process_data/dag_runs
For the GET API to work set rbac
flag to True
at airflow.cfg
.
UPDATE: stable Airflow REST API released:
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html
Almost everything stays the same, except API URL change.
Also "conf" is now required to be an object, so I added additional wrapping:
def trigger_dag_v2(self, dag_id, run_id=None, conf=None, execution_date=None):
endpoint = '/api/v1/dags/{}/dagRuns'.format(dag_id)
url = urljoin(self._api_base_url, endpoint)
data = self._request(url, method='POST',
json={
"run_id": run_id,
"conf": {'conf': json.dumps(event)},
"execution_date": execution_date,
})
return data['message']
OLD ANSWER:
Airflow has REST API (currently experimental) – available here:
https://airflow.apache.org/api.html#endpoints
If you do not want to install plugins as suggested in other answers – here is code how you can do it directly with the API:
def trigger_dag(self, dag_id, run_id=None, conf=None, execution_date=None):
endpoint = '/api/experimental/dags/{}/dag_runs'.format(dag_id)
url = urljoin(self._api_base_url, endpoint)
data = self._request(url, method='POST',
json={
"run_id": run_id,
"conf": conf,
"execution_date": execution_date,
})
return data['message']
More examples working with airflow API in python are available here:
https://github.com/apache/airflow/blob/master/airflow/api/client/json_client.py
I found this post while trying to do the same, after further investigation, I switch to ArgoEvents. It is basically the same but based on event-driven flows so it is much more suitable for this use case.
Link:
https://argoproj.github.io/argo
Airflow now has support for stable REST API. Using stable REST API, you can trigger DAG as:
curl --location --request POST 'localhost:8080/api/v1/dags/unpublished/dagRuns'
--header 'Content-Type: application/json'
--header 'Authorization: Basic YWRtaW46YWRtaW4='
--data-raw '{
"dag_run_id": "dag_run_1",
"conf": {
"key": "value"
}
}'
I wanted to know if airflow tasks can be executed upon getting a request over HTTP. I am not interested in the scheduling part of Airflow. I just want to use it as a substitute for Celery.
So an example operation would be something like this.
- User submits a form requesting for some report.
- Backend receives the request and sends the user a notification that the request has been received.
- The backend then schedules a job using Airflow to run immediately.
- Airflow then executes a series of tasks associated with a DAG. For example, pull data from redshift first, pull data from MySQL, make some operations on the two result sets, combine them and then upload the results to Amazon S3, send an email.
From whatever I read online, you can run airflow jobs by executing airflow ...
on the command line. I was wondering if there is a python api which can execute the same thing.
Thanks.
You should look at Airflow HTTP Sensor for your needs. You can use this to trigger a dag.
The Airflow REST API Plugin would help you out here. Once you have followed the instructions for installing the plugin you would just need to hit the following url: http://{HOST}:{PORT}/admin/rest_api/api/v1.0/trigger_dag?dag_id={dag_id}&run_id={run_id}&conf={url_encoded_json_parameters}
, replacing dag_id with the id of your dag, either omitting run_id or specify a unique id, and passing a url encoded json for conf (with any of the parameters you need in the triggered dag).
Here is an example JavaScript function that uses jQuery to call the Airflow api:
function triggerDag(dagId, dagParameters){
var urlEncodedParameters = encodeURIComponent(dagParameters);
var dagRunUrl = "http://airflow:8080/admin/rest_api/api/v1.0/trigger_dag?dag_id="+dagId+"&conf="+urlEncodedParameters;
$.ajax({
url: dagRunUrl,
dataType: "json",
success: function(msg) {
console.log('Successfully started the dag');
},
error: function(e){
console.log('Failed to start the dag');
}
});
}
A new option in airflow is the experimental, but built-in, API endpoint in the more recent builds of 1.7 and 1.8. This allows you to run a REST service on your airflow server to listen to a port and accept cli jobs.
I only have limited experience myself, but I have run test dags with success. Per the docs:
/api/experimental/dags/<DAG_ID>/dag_runs
creates a dag_run for a given dag id (POST).
That will schedule an immediate run of whatever dag you want to run. It does still use the scheduler, though, waiting for a heartbeat to see that dag is running and pass tasks to the worker. This is exactly the same behavior as the CLI, though, so I still believe it fits your use-case.
Documentation on how to configure it is available here: https://airflow.apache.org/api.html
There are some simple example clients in the github, too, under airflow/api/clients
Airflow’s experimental REST API interface can be used for this purpose.
Following request will trigger a DAG:
curl -X POST
http://<HOST>:8080/api/experimental/dags/process_data/dag_runs
-H 'Cache-Control: no-cache'
-H 'Content-Type: application/json'
-d '{"conf":"{"START_DATE":"2018-06-01 03:00:00", "STOP_DATE":"2018-06-01 23:00:00"}'
Following request retrieves a list of Dag Runs for a specific DAG ID:
curl -i -H "Accept: application/json" -H "Content-Type: application/json" -X GET http://<HOST>:8080/api/experimental/dags/process_data/dag_runs
For the GET API to work set rbac
flag to True
at airflow.cfg
.
UPDATE: stable Airflow REST API released:
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html
Almost everything stays the same, except API URL change.
Also "conf" is now required to be an object, so I added additional wrapping:
def trigger_dag_v2(self, dag_id, run_id=None, conf=None, execution_date=None):
endpoint = '/api/v1/dags/{}/dagRuns'.format(dag_id)
url = urljoin(self._api_base_url, endpoint)
data = self._request(url, method='POST',
json={
"run_id": run_id,
"conf": {'conf': json.dumps(event)},
"execution_date": execution_date,
})
return data['message']
OLD ANSWER:
Airflow has REST API (currently experimental) – available here:
https://airflow.apache.org/api.html#endpoints
If you do not want to install plugins as suggested in other answers – here is code how you can do it directly with the API:
def trigger_dag(self, dag_id, run_id=None, conf=None, execution_date=None):
endpoint = '/api/experimental/dags/{}/dag_runs'.format(dag_id)
url = urljoin(self._api_base_url, endpoint)
data = self._request(url, method='POST',
json={
"run_id": run_id,
"conf": conf,
"execution_date": execution_date,
})
return data['message']
More examples working with airflow API in python are available here:
https://github.com/apache/airflow/blob/master/airflow/api/client/json_client.py
I found this post while trying to do the same, after further investigation, I switch to ArgoEvents. It is basically the same but based on event-driven flows so it is much more suitable for this use case.
Link:
https://argoproj.github.io/argo
Airflow now has support for stable REST API. Using stable REST API, you can trigger DAG as:
curl --location --request POST 'localhost:8080/api/v1/dags/unpublished/dagRuns'
--header 'Content-Type: application/json'
--header 'Authorization: Basic YWRtaW46YWRtaW4='
--data-raw '{
"dag_run_id": "dag_run_1",
"conf": {
"key": "value"
}
}'