-
Type:
Improvement
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 1.10.9
-
Fix Version/s: None
-
Component/s: scheduler
-
Labels:None
-
Epic Link:
Hello,
Task_instances queries are executed three times. This is redundant. If we can limit the number of these queries, we can achieve performance improvements.
First query:
perform_file: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792
process_dags: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853
create_dag_run: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L726
create_dagrun: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L638
verify_integrity: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dag.py#L1454
get_task_instances: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436
Third query:
perform_file: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792
process_dags: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853
_process_task_instances: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738
update_state: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L685
get_task_instances: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L292
]
perform_file: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792
process_dags: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853
_process_task_instances: https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738
verify_integrity: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L684
get_task_instances: https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436