airflow webserver showing next run as start of data interval

Question:

I have a dag like that:

@dag(
dag_id = "data-sync",
schedule_interval = '*/30 * * * *',
start_date=pendulum.datetime(2023, 3, 9, tz="Asia/Hong_Kong"),
catchup=False,
dagrun_timeout=timedelta(minutes=20),
)

So it runs every 30 minutes , starting today in my timezone. No catchup….
In the webserver UI I have these different fields :

enter image description here

What I find strange from these fields is the next run time… I was looking at it between 21:01 and 21:29 … and it’s still show the next run as 21:00 or in another words the next run is past…

Does the next run mean the logical date in airflow ? that is the start time of the interval ? it is quite non intuitive to look at it and see a time in the past…

Asked By: moth

||

Answers:

What you see is the logical date.

it is quite non intuitive to look at it and see a time in the past…

If you consider data pipeline flows it is intuitive.
Lets explain this with daily run for simplicity.
In daily interval the data of date 2023-02-01 is ready in 2023-02-02 meaning that in 2023-02-02 00:00 you have the full data of 2023-02-01 00:002023-02-02 00:00 thus only in 2023-02-02 you can start running the workflow of 2023-02-01. Same goes for hourly jobs.
Normally you care about what date is ready and less about the timestamp it actually run.

If you are looking to know when the process is going to run you have that in the Graph View when you hover over the Next Run indicator:

enter image description here

In this case (using your code example) the run of logical date 2023-03-09 14:30 will start in 2023-03-09 15:00 as this is when the 30 minute interval of the run ends, this will happen in 7 minutes (Note that the current time is 14:53 UTC as shown in the yellow bar)

Answered By: Elad Kalif
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.