Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-6194

Task instances aren't running after meeting dependencies

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.10.6
    • Fix Version/s: None
    • Component/s: DagRun, executors, scheduler, worker
    • Labels:
      None

      Description

      We recently had an issue arise with our Airflow instance which caused the scheduler to enter some sort of a deadlocked state in the middle of operation. In this state, all DAG runs were listed as 'scheduled' and it didn't appear as if anything at all was happening.

      Initially, I thought this might be an issue with our configuration, but I couldn't quite track down why this issue wouldn't have arisen earlier and, looking at the logs, I've been seeing some strange behavior that I can't quite explain.

      The most notable thing is that, for whatever reason, the Executor Class listed under all of our jobs is 'NoneType', previously 'LocalExecutor'. Looking at our logs, this change initially happened when we updated our instance two days prior to this initial deadlock, however, I have since cleared the database altogether and find that even starting from scratch, 'NoneType' is appearing.

      In these same logs, I can see jobs continuously running for this DAG run, however the start and end times are less than a second apart. At the same time, all task instances are either listed a 'success' or 'scheduled' so I'm not entirely sure what the running jobs are. 

      If I look in the Task Instance Details for any of these scheduled tasks, I see 

      All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
      - The scheduler is down or under heavy load
      
      If this task instance does not start soon please contact your Airflow administrator for assistance.

      Upon viewing the logs in the airflow for the scheduler, nothing seem awry.

      So to summarize, the scheduler seems to be doing it's job, as DAG runs are properly scheduled and set as 'running' however the instances themselves are not completing properly. Due to the listing of 'NoneType' instead of 'LocalExecutor' for the jobs, my theory is that there is some issue with the LocalExecutor, that's causing it not properly execute jobs. Again, clearing the database didn't seem to help this, and I now run into this deadlock almost immediately with a test DAG I'm running.

      If I can provide any additional information, please let me know. I'd love to get this resolved or figured out, as we're currently unable to use Airflow because of this.

      Thanks!

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                charlie_plenty Charlie
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: