airflow webserver showing next run as start of data interval
Question:
I have a dag like that:
@dag(
dag_id = "data-sync",
schedule_interval = '*/30 * * * *',
start_date=pendulum.datetime(2023, 3, 9, tz="Asia/Hong_Kong"),
catchup=False,
dagrun_timeout=timedelta(minutes=20),
)
So it runs every 30 minutes , starting today in my timezone. No catchup….
In the webserver UI I have these different fields :
What I find strange from these fields is the next run time… I was looking at it between 21:01 and 21:29 … and it’s still show the next run as 21:00 or in another words the next run is past…
Does the next run mean the logical date in airflow ? that is the start time of the interval ? it is quite non intuitive to look at it and see a time in the past…
Answers:
What you see is the logical date.
it is quite non intuitive to look at it and see a time in the past…
If you consider data pipeline flows it is intuitive.
Lets explain this with daily run for simplicity.
In daily interval the data of date 2023-02-01
is ready in 2023-02-02
meaning that in 2023-02-02 00:00
you have the full data of 2023-02-01 00:00
– 2023-02-02 00:00
thus only in 2023-02-02
you can start running the workflow of 2023-02-01
. Same goes for hourly jobs.
Normally you care about what date is ready and less about the timestamp it actually run.
If you are looking to know when the process is going to run you have that in the Graph View when you hover over the Next Run indicator:
In this case (using your code example) the run of logical date 2023-03-09 14:30
will start in 2023-03-09 15:00
as this is when the 30 minute interval of the run ends, this will happen in 7 minutes (Note that the current time is 14:53 UTC
as shown in the yellow bar)
I have a dag like that:
@dag(
dag_id = "data-sync",
schedule_interval = '*/30 * * * *',
start_date=pendulum.datetime(2023, 3, 9, tz="Asia/Hong_Kong"),
catchup=False,
dagrun_timeout=timedelta(minutes=20),
)
So it runs every 30 minutes , starting today in my timezone. No catchup….
In the webserver UI I have these different fields :
What I find strange from these fields is the next run time… I was looking at it between 21:01 and 21:29 … and it’s still show the next run as 21:00 or in another words the next run is past…
Does the next run mean the logical date in airflow ? that is the start time of the interval ? it is quite non intuitive to look at it and see a time in the past…
What you see is the logical date.
it is quite non intuitive to look at it and see a time in the past…
If you consider data pipeline flows it is intuitive.
Lets explain this with daily run for simplicity.
In daily interval the data of date 2023-02-01
is ready in 2023-02-02
meaning that in 2023-02-02 00:00
you have the full data of 2023-02-01 00:00
– 2023-02-02 00:00
thus only in 2023-02-02
you can start running the workflow of 2023-02-01
. Same goes for hourly jobs.
Normally you care about what date is ready and less about the timestamp it actually run.
If you are looking to know when the process is going to run you have that in the Graph View when you hover over the Next Run indicator:
In this case (using your code example) the run of logical date 2023-03-09 14:30
will start in 2023-03-09 15:00
as this is when the 30 minute interval of the run ends, this will happen in 7 minutes (Note that the current time is 14:53 UTC
as shown in the yellow bar)