Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-6965

The get_task_instances method is performed three times during one creation of the DAGRun file.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.10.9
    • Fix Version/s: None
    • Component/s: scheduler
    • Labels:
      None

      Description

      Hello,

      Task_instances queries are executed three times. This is redundant. If we can limit the number of these queries, we can achieve performance improvements.

      First query:

      perform_file: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792

      process_dags: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853

      create_dag_run: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L726

      create_dagrun: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L638

      verify_integrity: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dag.py#L1454

      get_task_instances: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436

      Third query:

      perform_file: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792

      process_dags: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853

      _process_task_instances: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738

      update_state: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L685

      get_task_instances: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L292
      ]

      perform_file: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792

      process_dags: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853

      _process_task_instances: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738

      verify_integrity: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L684

      get_task_instances: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436

      https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L292

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              kamil.bregula Kamil Bregula
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: