Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-5866

Task_instance table too large causing issues?

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.10.3, 1.10.6
    • Fix Version/s: None
    • Component/s: database, scheduler
    • Labels:
      None

      Description

      mysql rds metastore - db.m5.large instance, 5.7.26 version

       

      task_instance table has 2,848,160 rows

      dag_run table has 22768 rows

      dag table has 23 rows

      log table has 17,916,891 rows

       

      airflow 1.10.3, using LocalExecutor, python 2.7, single ec2 m5.4xlarge, parallelism set to 45 (ie max 45 tasks at once). Just using externally triggered dags, no SLAs. No subdags/backfills. 4 gunicorn workers. Using dynamic dags

       

      Everything was fine until yesterday, around 300 dag runs every day. Now today these issues appear all of a sudden (no code change, environment change.etc). I suspect the task_instance table has gotten too big and causing scheduler and mysql issues.

       

      1.

      'Recent tasks' are showing blank on the web ui home page. admin/airflow/task_stats fails to display with 504 error after few mins but dag_stats endpoint shows dags are in running state

       

      2.

      dag_runs are stuck in running state > 20 hrs, seems no new tasks are being run (they are stuck in scheduled/queued state)

       

      I then tried terminating the EC2 and getting a new one, the dagruns and tasks would then start finishing but then after few hours got into same situation as points 1/2 above. I believe certain dag ids (with many tasks) are hitting the issue, will know m

       

       

      You can see notable change in the graphs just after 4th november midnight (that is when the issue started). Around 30 dagruns (yes there are diff execution_dates running for same dagid at same time) start around 11pm each night.

       

      Scheduler/webserver pids have remained up the entire time, no ec2 autoheals happened

       

       

      scheduler log shows:

      [2019-11-07 12:25:18,287] {jobs.py:1185} INFO - DAG daga has 29/45 running and queued tasks
      [2019-11-07 12:25:18,287] {jobs.py:1185} INFO - DAG daga has 30/45 running and queued tasks
      [2019-11-07 12:25:18,287] {jobs.py:1185} INFO - DAG daga has 31/45 running and queued tasks
      [2019-11-07 12:25:18,287] {jobs.py:1185} INFO - DAG daga has 32/45 running and queued tasks
      [2019-11-07 12:25:18,287] {jobs.py:1185} INFO - DAG daga has 33/45 running and queued tasks
      [2019-11-07 12:25:18,287] {jobs.py:1185} INFO - DAG daga has 34/45 running and queued tasks
      [2019-11-07 12:25:18,287] {jobs.py:1185} INFO - DAG daga has 35/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG dage has 0/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG dage has 0/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,435] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 36/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 37/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 38/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 39/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 40/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 41/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 42/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 43/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 44/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:21,436] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:21,437] {jobs.py:1185} INFO - DAG dage has 0/45 running and queued tasks
      [2019-11-07 12:25:42,193] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:42,194] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:42,194] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:42,194] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:42,194] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:42,194] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:42,194] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:42,194] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:42,194] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:42,194] {jobs.py:1185} INFO - DAG dagb has 0/45 running and queued tasks
      [2019-11-07 12:25:49,493] {jobs.py:1185} INFO - DAG dagc has 0/45 running and queued tasks
      [2019-11-07 12:25:49,493] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,494] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,494] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,494] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,494] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,494] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,494] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,494] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,494] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,495] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,495] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,495] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,495] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,495] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,495] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,495] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,495] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,495] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,496] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,496] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,496] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,496] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,496] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,496] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,496] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,496] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-07 12:25:49,496] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks

      [2019-11-06 11:53:08,813] {jobs.py:1185} INFO - DAG daga has 45/45 running and queued tasks
      [2019-11-06 11:53:08,813] {jobs.py:1191} INFO - Not executing <TaskInstance: task1 2019-11-03 00:00:00+00:00 [scheduled]> since the number of tasks running or queued from DAG daga is >= to the DAG's task concurrency limit of 45

       

       

      select state,count(1) from airflowdb.task_instance --not mysql syntax (querying via Presto)

      where dag_id = 'daga' and execution_date in ( timestamp'2019-11-03 00:00:00',timestamp'2019-11-04 00:00:00',timestamp'2019-11-06 00:00:00',timestamp'2019-11-07 00:00:00') --these are the dates in the dag_run table with state of running

      group by state

      success 871

      failed 34

      upstream_failed 68

      skipped 568

      scheduled 486

      <null> 573

       

       

       

        Attachments

        1. EC2's `ps -ef pipe wc -l` count.png
          61 kB
          t oo
        2. mysql cpu_util.png
          97 kB
          t oo
        3. mysql dbconnections.png
          80 kB
          t oo
        4. Mysql queuedepth.png
          99 kB
          t oo
        5. mysql read latency.png
          82 kB
          t oo
        6. mysql write IOPS.png
          108 kB
          t oo
        7. mysql write latency.png
          97 kB
          t oo

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              toopt4 t oo
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: