Multiple inheritance using `BaseBranchOperator` in Airflow
Question:
Can one use multiple inheritance using BaseBranchOperator
in Airflow?
I want to define an operator like:
from airflow.models import BaseOperator
from airflow.operators.branch import BaseBranchOperator
class MyOperator(BaseOperator, BaseBranchOperator):
def execute(self, context):
print('hi')
def choose_branch(self, context):
if True:
return 'task_A'
else:
return 'task_B'
In that case, is it accurate to think that the execute
method will run before the choose_branch
method?
Answers:
This won’t work.
In Airflow each operator has execute function that set the operator logic. In case of BaseBranchOperator
the execute
function leverages choose_branch
and handle the logic of how to skip tasks, so all user left to do is just say what task to skip and this is done in choose_branch
:
def choose_branch(self, context: Context) -> str | Iterable[str]:
"""
Abstract method to choose which branch to run.
Subclasses should implement this, running whatever logic is
necessary to choose a branch and returning a task_id or list of
task_ids.
:param context: Context dictionary as passed to execute()
"""
raise NotImplementedError
So when you want to implement your own branch operator all you need to do is inherit from BaseBranchOperator
and override the choose_branch
function.
You can decide that you don’t want this mechanism and you want to build your own branch logic which means that you will need to implement the logic of how to skip tasks. In that case you will implement MyBaseBranchOperator
and then your actual branch operator (In your case MyOperator
) will be:
class MyOperator(MyBaseBranchOperator):
...
I think what you are really after is pre_execute()
which triggers right before execute()
is called
So probably what you want is:
class MyOperator(BaseBranchOperator):
def pre_execute(self, context):
print('hi')
def choose_branch(self, context):
if True:
return 'task_A'
else:
return 'task_B'
Thank you @Elad Kalif for your answer. That was inspiring.
However, I’ve learned that pre_execute
has not access to the context
variable.
Current context is accessible only during the task execution. The context is not accessible during pre_execute or post_execute. Calling this method outside execution context will raise an error.
Another solution I’ve found was to make it like this:
from airflow.operators.branch import BaseBranchOperator
from myoperator import MyOperator
class MyBranchOperator(MyOperator, BaseBranchOperator):
def __init__(
self,
next_tasks={"success": None, "failure": None},
*args,
**kwargs,
)
MyOperator.__init__(
self,
*args,
**kwargs,
)
self.next_tasks = next_tasks
def execute(self, context):
if something_wrong:
err_msg = f'Ops'
raise ValueError(err_msg)
print('hi')
branches_to_execute = self.choose_branch(context)
self.skip_all_except(context["ti"], branches_to_execute)
return branches_to_execute
def choose_branch(self, context):
if True: // plug condition here
return self.next_tasks["success"]
else:
return self.next_tasks["failure"]
Can one use multiple inheritance using BaseBranchOperator
in Airflow?
I want to define an operator like:
from airflow.models import BaseOperator
from airflow.operators.branch import BaseBranchOperator
class MyOperator(BaseOperator, BaseBranchOperator):
def execute(self, context):
print('hi')
def choose_branch(self, context):
if True:
return 'task_A'
else:
return 'task_B'
In that case, is it accurate to think that the execute
method will run before the choose_branch
method?
This won’t work.
In Airflow each operator has execute function that set the operator logic. In case of BaseBranchOperator
the execute
function leverages choose_branch
and handle the logic of how to skip tasks, so all user left to do is just say what task to skip and this is done in choose_branch
:
def choose_branch(self, context: Context) -> str | Iterable[str]:
"""
Abstract method to choose which branch to run.
Subclasses should implement this, running whatever logic is
necessary to choose a branch and returning a task_id or list of
task_ids.
:param context: Context dictionary as passed to execute()
"""
raise NotImplementedError
So when you want to implement your own branch operator all you need to do is inherit from BaseBranchOperator
and override the choose_branch
function.
You can decide that you don’t want this mechanism and you want to build your own branch logic which means that you will need to implement the logic of how to skip tasks. In that case you will implement MyBaseBranchOperator
and then your actual branch operator (In your case MyOperator
) will be:
class MyOperator(MyBaseBranchOperator):
...
I think what you are really after is pre_execute()
which triggers right before execute()
is called
So probably what you want is:
class MyOperator(BaseBranchOperator):
def pre_execute(self, context):
print('hi')
def choose_branch(self, context):
if True:
return 'task_A'
else:
return 'task_B'
Thank you @Elad Kalif for your answer. That was inspiring.
However, I’ve learned that pre_execute
has not access to the context
variable.
Current context is accessible only during the task execution. The context is not accessible during pre_execute or post_execute. Calling this method outside execution context will raise an error.
Another solution I’ve found was to make it like this:
from airflow.operators.branch import BaseBranchOperator
from myoperator import MyOperator
class MyBranchOperator(MyOperator, BaseBranchOperator):
def __init__(
self,
next_tasks={"success": None, "failure": None},
*args,
**kwargs,
)
MyOperator.__init__(
self,
*args,
**kwargs,
)
self.next_tasks = next_tasks
def execute(self, context):
if something_wrong:
err_msg = f'Ops'
raise ValueError(err_msg)
print('hi')
branches_to_execute = self.choose_branch(context)
self.skip_all_except(context["ti"], branches_to_execute)
return branches_to_execute
def choose_branch(self, context):
if True: // plug condition here
return self.next_tasks["success"]
else:
return self.next_tasks["failure"]