Uploaded image for project: 'Apache Airflow'
  1. Apache Airflow
  2. AIRFLOW-5214

Airflow leaves too many TIME_WAIT TCP connections

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.10.2, 1.10.4
    • Fix Version/s: None
    • Component/s: DagRun, database
    • Labels:
      None
    • Environment:
      CentOS 7, Airflow 1.10.4, Maria DB

      Description

      Dear experts,

      in Airflow version 1.10.2 as well as 1.10.4, we experience a severe problem with the limitation of the number of concurrent tasks.

      We observe that for more than 8 tasks being started and executed in parallel, that the majority of those tasks fails with the error "Can't connect to MySQL server" and error code 2006(99). This error code boils down to "Cannot bind socket to resource", which is why we started looking into the TCP conenctions of our Airflow host (a single node that hosts the webserver, scheduler and worker).

      When the 8 tasks are simultaneously running, we observe more than 15,000 TIME_WAIT connections while less than 50 are established. Given, that the number of available ports is somewhat smaller than 30,000, this large number of blocked but unused TCP connections would explain the failing of further task executions.
      Can anyone explain how these many open connections blocking ports/sockets come about? Given that we have connection pooling enabled, we do not see any explanation yet.

      Your help is very much appreciated, this issue strongly limits our current performance!

      Cheers

      Oliver

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              oricken Oliver Ricken
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: