Multiple inheritance using `BaseBranchOperator` in Airflow

Question:

Can one use multiple inheritance using BaseBranchOperator in Airflow?

I want to define an operator like:

from airflow.models import BaseOperator
from airflow.operators.branch import BaseBranchOperator


class MyOperator(BaseOperator, BaseBranchOperator):

    def execute(self, context):
        print('hi')

    def choose_branch(self, context):
        if True:
            return 'task_A'
        else:
            return 'task_B'

In that case, is it accurate to think that the execute method will run before the choose_branch method?

Asked By: cavalcantelucas

||

Answers:

This won’t work.

In Airflow each operator has execute function that set the operator logic. In case of BaseBranchOperator the execute function leverages choose_branch and handle the logic of how to skip tasks, so all user left to do is just say what task to skip and this is done in choose_branch:

def choose_branch(self, context: Context) -> str | Iterable[str]:
    """
    Abstract method to choose which branch to run.

    Subclasses should implement this, running whatever logic is
    necessary to choose a branch and returning a task_id or list of
    task_ids.

    :param context: Context dictionary as passed to execute()
    """
    raise NotImplementedError

So when you want to implement your own branch operator all you need to do is inherit from BaseBranchOperator and override the choose_branch function.

You can decide that you don’t want this mechanism and you want to build your own branch logic which means that you will need to implement the logic of how to skip tasks. In that case you will implement MyBaseBranchOperator and then your actual branch operator (In your case MyOperator) will be:

class MyOperator(MyBaseBranchOperator):
    ...

I think what you are really after is pre_execute() which triggers right before execute() is called

So probably what you want is:

class MyOperator(BaseBranchOperator):

    def pre_execute(self, context):
        print('hi')

    def choose_branch(self, context):
        if True:
            return 'task_A'
        else:
            return 'task_B'
Answered By: Elad Kalif

Thank you @Elad Kalif for your answer. That was inspiring.

However, I’ve learned that pre_execute has not access to the context variable.

Current context is accessible only during the task execution. The context is not accessible during pre_execute or post_execute. Calling this method outside execution context will raise an error.

Another solution I’ve found was to make it like this:

from airflow.operators.branch import BaseBranchOperator
from myoperator import MyOperator


class MyBranchOperator(MyOperator, BaseBranchOperator):

    def __init__(
        self,
        next_tasks={"success": None, "failure": None},
        *args,
        **kwargs,
    )
    MyOperator.__init__(
            self,
            *args,
            **kwargs,
        )
        self.next_tasks = next_tasks

    def execute(self, context):
        if something_wrong:
            err_msg = f'Ops'
            raise ValueError(err_msg)

        print('hi')

        branches_to_execute = self.choose_branch(context)
        self.skip_all_except(context["ti"], branches_to_execute)
        return branches_to_execute

    def choose_branch(self, context):
        if True:  // plug condition here
            return self.next_tasks["success"]
        else:
            return self.next_tasks["failure"]

Answered By: cavalcantelucas