Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-7039

Specific DAG Schedule & DST Results in Skipped DAG Run

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.10.7
    • Fix Version/s: None
    • Component/s: scheduler
    • Labels:
    • Environment:
      Amazon Linux 2 AMI

      Description

      Scenario:
      EC2 running airflow is in Eastern Time (America/New_York),
      airflow.cfg>[core]>default_timezone=America/New_York (automatically changes correctly)

      Monday morning after Daylight Savings Time applied a handful of DAG runs were not executed as expected.  The strange part is that these DAGs were the only jobs that did not behave as expected, all other DAGs ran normally.  Additionally, only the first expected run after DST was skipped, subsequent runs later that day were scheduled successfully.

      Here is the pattern observed:

      DAG Schedule which skipped first run:  (0 <hour_a>,<hour_b> * * 1,2,3,4,5)
      e.g. Schedules M-F, with two distinct runs per day.
      DAGs that run at one time, M-F & DAGs that run at two times, not M-F did not experience this issue.  

       

      Based on the logs, it appears as if the expected run that was missed was not seen by the scheduler whatsoever (see below):

       

       

      2020 03 06 6:30 AM ET (BEFORE DST, EXPECTED BEHAVIOR):
      [2020-03-06 06:31:01,220] {logging_mixin.py:112} INFO - [2020-03-06 06:31:01,220] {settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=697
      [2020-03-06 06:31:01,222] {scheduler_job.py:153} INFO - Started process (PID=697) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-06 06:31:01,228] {scheduler_job.py:1539} INFO - Processing file /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
      [2020-03-06 06:31:01,228] {logging_mixin.py:112} INFO - [2020-03-06 06:31:01,228] {dagbag.py:403} INFO - Filling up the DagBag from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-06 06:31:01,238] {scheduler_job.py:1551} INFO - DAG(s) dict_keys(['<DAG_NAME>']) retrieved from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-06 06:31:01,305] {scheduler_job.py:1262} INFO - Processing <DAG_NAME>
      [2020-03-06 06:31:01,348] {logging_mixin.py:112} WARNING - /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170: Warning: (1300, "Invalid utf8mb4 character string: '80047D'") result = self._query(query)
      [2020-03-06 06:31:01,362] {scheduler_job.py:1272} INFO - Created <DagRun <DAG_NAME> @ 2020-03-05T15:30:00+00:00: scheduled__2020-03-05T15:30:00+00:00, externally triggered: False>
      [2020-03-06 06:31:01,366] {scheduler_job.py:740} INFO - Examining DAG run <DagRun <DAG_NAME> @ 2020-03-05 15:30:00+00:00: scheduled__2020-03-05T15:30:00+00:00, externally triggered: False>
      [2020-03-06 06:31:01,389] {scheduler_job.py:440} INFO - Skipping SLA check for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
      [2020-03-06 06:31:01,395] {scheduler_job.py:1613} INFO - Creating / updating <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-05 15:30:00+00:00 [scheduled]> in ORM
      [2020-03-06 06:31:01,414] {scheduler_job.py:161} INFO - Processing /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.192 seconds
      20200306 10 AM ET (BEFORE DST, EXPECTED BEHAVIOR):
      [2020-03-06 10:30:00,083] {logging_mixin.py:112} INFO - [2020-03-06 10:30:00,082] {settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=16194
      [2020-03-06 10:30:00,085] {scheduler_job.py:153} INFO - Started process (PID=16194) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-06 10:30:00,090] {scheduler_job.py:1539} INFO - Processing file /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
      [2020-03-06 10:30:00,090] {logging_mixin.py:112} INFO - [2020-03-06 10:30:00,090] {dagbag.py:403} INFO - Filling up the DagBag from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-06 10:30:00,099] {scheduler_job.py:1551} INFO - DAG(s) dict_keys(['<DAG_NAME>']) retrieved from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-06 10:30:00,159] {scheduler_job.py:1262} INFO - Processing <DAG_NAME>
      [2020-03-06 10:30:00,193] {logging_mixin.py:112} WARNING - /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170: Warning: (1300, "Invalid utf8mb4 character string: '80047D'")
      result = self._query(query)
      [2020-03-06 10:30:00,207] {scheduler_job.py:1272} INFO - Created <DagRun <DAG_NAME> @ 2020-03-06T11:30:00+00:00: scheduled__2020-03-06T11:30:00+00:00, externally triggered: False>
      [2020-03-06 10:30:00,212] {scheduler_job.py:740} INFO - Examining DAG run <DagRun <DAG_NAME> @ 2020-03-06 11:30:00+00:00: scheduled__2020-03-06T11:30:00+00:00, externally triggered: False>
      [2020-03-06 10:30:00,232] {scheduler_job.py:440} INFO - Skipping SLA check for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
      [2020-03-06 10:30:00,236] {scheduler_job.py:1613} INFO - Creating / updating <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-06 11:30:00+00:00 [scheduled]> in ORM
      [2020-03-06 10:30:00,251] {scheduler_job.py:161} INFO - Processing /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.166 seconds
       

      2020 03 09 6:30 ET (AFTER DST, UNEXPECTED BEHAVIOR):
      [2020-03-09 06:31:55,240] {logging_mixin.py:112} INFO - [2020-03-09 06:31:55,240] {settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=23336
      [2020-03-09 06:31:55,242] {scheduler_job.py:153} INFO - Started process (PID=23336) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-09 06:31:55,247] {scheduler_job.py:1539} INFO - Processing file /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
      [2020-03-09 06:31:55,247] {logging_mixin.py:112} INFO - [2020-03-09 06:31:55,247] {dagbag.py:403} INFO - Filling up the DagBag from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-09 06:31:55,257] {scheduler_job.py:1551} INFO - DAG(s) dict_keys(['<DAG_NAME>']) retrieved from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-09 06:31:55,312] {scheduler_job.py:1262} INFO - Processing <DAG_NAME>
      [2020-03-09 06:31:55,338] {scheduler_job.py:440} INFO - Skipping SLA check for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
      [2020-03-09 06:31:55,343] {scheduler_job.py:161} INFO - Processing /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.101 seconds
      20200309 10 AM ET (AFTER DST, EXPECTED BEHAVIOR):
      [2020-03-09 10:30:54,487] {logging_mixin.py:112} INFO - [2020-03-09 10:30:54,487] {settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=6421
      [2020-03-09 10:30:54,490] {scheduler_job.py:153} INFO - Started process (PID=6421) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-09 10:30:54,498] {scheduler_job.py:1539} INFO - Processing file /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
      [2020-03-09 10:30:54,499] {logging_mixin.py:112} INFO - [2020-03-09 10:30:54,499] {dagbag.py:403} INFO - Filling up the DagBag from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-09 10:30:54,508] {scheduler_job.py:1551} INFO - DAG(s) dict_keys(['<DAG_NAME>']) retrieved from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
      [2020-03-09 10:30:54,585] {scheduler_job.py:1262} INFO - Processing <DAG_NAME>
      [2020-03-09 10:30:54,621] {logging_mixin.py:112} WARNING - /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170: Warning: (1300, "Invalid utf8mb4 character string: '80047D'")
      result = self._query(query)
      [2020-03-09 10:30:54,634] {scheduler_job.py:1272} INFO - Created <DagRun <DAG_NAME> @ 2020-03-09T10:30:00+00:00: scheduled__2020-03-09T10:30:00+00:00, externally triggered: False>
      [2020-03-09 10:30:54,638] {scheduler_job.py:740} INFO - Examining DAG run <DagRun <DAG_NAME> @ 2020-03-09 10:30:00+00:00: scheduled__2020-03-09T10:30:00+00:00, externally triggered: False>
      [2020-03-09 10:30:54,660] {scheduler_job.py:440} INFO - Skipping SLA check for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
      [2020-03-09 10:30:54,665] {scheduler_job.py:1613} INFO - Creating / updating <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-09 10:30:00+00:00 [scheduled]> in ORM
      [2020-03-09 10:30:54,683] {scheduler_job.py:161} INFO - Processing /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.193 seconds

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Peter Kim Peter Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: