Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-6532

Fetch celery states using batch method instead Pool

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.10.7
    • Fix Version/s: None
    • Component/s: executors
    • Labels:
      None

      Description

      One aspect that is worth checking is how much time Celery takes to receive task statuses.
      https://github.com/apache/airflow/blob/77099b876814ec0008fd8da18f35de70deccbe03/airflow/executors/celery_executor.py#L246-L259
      My clients use MySQL as the result backend, so celery sends 100 queries to the database for 100 tasks.
      https://github.com/celery/celery/blob/77099b876814ec0008fd8da18f35de70deccbe03/airflow/backends/database/__init__.py#L149-L164
      In my opinion, this can speed up if we replace our code by calling the method from Celery - celery.backends.base:BaseKeyValueStoreBackend.get_many
      https://github.com/celery/celery/blob/77099b876814ec0008fd8da18f35de70deccbe03/celery/backends/base.py#L711-L747
      Unfortunately, this method works only with Redis, so we will have to extend the mget / get_many method in DatabaseBackend class to work properly.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                kamil.bregula Kamil Bregula
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: