Airflow Impersonation with 'run_as_user' Not Working

Question:

I am trying to get impersonation working without success. I am following the instructions here – https://airflow.apache.org/security.html#impersonation

I launched airflow webserver, scheduler, and worker as sudo running under the ‘airflow’ user. This user is setup in the sudoers file to allow no password logins.

I created a BashOperator and a PythonOperator with the run_as_user parameter set to an existing user named ‘linus’ on the server. When I am logged in as ‘airflow’, I am able to switch users by running sudo -u linus without it prompting me for a password.

dag = DAG('test_impersonation', default_args=args)

def print_user(**kwargs):
    print('USER:', getpass.getuser())

t1 = BashOperator(task_id='bash_task', 
                  bash_command='touch /home/linus/test.x',
                  run_as_user='linus',
                  dag=dag)

t2 = PythonOperator(task_id='py_task',
                    python_callable='print_user', 
                    run_as_user='linus',
                    dag=dag)

I am testing these tasks with the following commands in the terminal:

airflow test test_impersonation bash_task 2018-03-30
airflow test test_impersonation py_task 2018-03-30

The first command (BashOperator task) fails with a permission denied error telling me it’s still running as the ‘airflow’ user.

The second command (PythonOperator task) prints the following:

USER: airflow

I expect this to print USER: linus

Is there anything I am missing? Any help would be greatly appreciated.

Thanks for reading!

Asked By: Linus

||

Answers:

I’m not exactly certain, but it looks like the sudo -u prepend is applied in a task_runner, which is setup by the executor, and probably runs the cli run command; while the cli test command only calls run on the task_instance in test mode, and this doesn’t prepend the sudo -u.

Answered By: dlamblin

I have the same issue and seems that os.getlogin() in a TaskFunction() of dag.py module returns wrong information! I have similar code, and I was confused by this issue. However in reality the function’s code works under proper run_as_user id. I’ve figured that out by proper folder access permissions, accessible environment vars, and it also created some folders, the owner of which was the specified user, and not the airflow user! (which is still reported by s.getlogin() by the way, just rechecked).

So please check user’s id in another way, like accessing folders which have exclusive access and/or create some folders!

Update: Use getpass.getuser() and be surprised by different result!
As per os.getlogin() doc:

"os.getlogin() Returns the name of the user logged in on the controlling terminal of
the process… Recommend using getpass.getuser() … which uses the
environment variables LOGNAME or USERNAME to find out who the user is…

my test returns:

[svc.airflow]$> sudo -u myusername test.py
  ...  os.getlogin=svc.airflow, getpass.getuser=myusername
Answered By: Fedor
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.