Airflow: How to SSH and run BashOperator from a different server

Question:

Is there a way to ssh to different server and run BashOperator using Airbnb’s Airflow?
I am trying to run a hive sql command with Airflow but I need to SSH to a different box in order to run the hive shell.
My tasks should look like this:

  1. SSH to server1
  2. start Hive shell
  3. run Hive command

Thanks!

Asked By: CMPE

||

Answers:

NOT available for airflow 2.x.

I think that I just figured it out:

  1. Create a SSH connection in UI under Admin > Connection. Note: the connection will be deleted if you reset the database

  2. In the Python file add the following

     from airflow.contrib.hooks import SSHHook
     sshHook = SSHHook(conn_id=<YOUR CONNECTION ID FROM THE UI>)
    
  3. Add the SSH operator task

     t1 = SSHExecuteOperator(
         task_id="task1",
         bash_command=<YOUR COMMAND>,
         ssh_hook=sshHook,
         dag=dag)
    

Thanks!

Answered By: CMPE

One thing to note with Anton’s answer is that the argument is actually ssh_conn_id, not conn_id for the SSHOperator object. At least in version 1.10.

A quick example would look like

from datetime import timedelta, datetime
import airflow
from airflow import DAG
from airflow.contrib.operators.ssh_operator import SSHOperator
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email': ['[email protected]'],
    'email_on_failure': False,
    'email_on_retry': False,
    'start_date': datetime.now() - timedelta(minutes=20),
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}
dag = DAG(dag_id='testing_stuff',
          default_args=default_args,
          schedule_interval='0,10,20,30,40,50 * * * *',
          dagrun_timeout=timedelta(seconds=120))
# Step 1 - Dump data from postgres databases
t1_bash = """
echo 'Hello World'
"""
t1 = SSHOperator(
    ssh_conn_id='ssh_default',
    task_id='test_ssh_operator',
    command=t1_bash,
    dag=dag)
Answered By: politeauthority

Here is a working example with the ssh operator in Airflow 2:

[BEWARE: the output of this operator is base64 encoded]

from airflow.providers.ssh.operators.ssh import SSHOperator
from airflow.providers.ssh.hooks.ssh import SSHHook
sshHook = SSHHook(ssh_conn_id="conn-id", key_file='/opt/airflow/keys/ssh.key')
# a hook can also be defined directly in the code:
# sshHook = SSHHook(remote_host='server.com', username='admin', key_file='/opt/airflow/keys/ssh.key')

ls = SSHOperator(
        task_id="ls",
        command= "ls -l",
        ssh_hook = sshHook,
        dag = dag)

The conn-id is the one set in the Admin -> Connections.
The key_file is the private ssh key.

Answered By: artBCode
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.