Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4317

Single Overloaded Impalad Causes Entire Cluster to Hang

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:
      None
    • Environment:
      Enterprise CDH 5.7.0, Parcels
      impalad version 2.5.0-cdh5.7.0 RELEASE (build ad3f5adabedf56fe6bd9eea39147c067cc552703)

      Description

      Occasionally we experience heavy load on a single impalad host. This leads to the entire cluster to hang and prevents any impala queries from being able to execute.

      Here's what we observe:
      -load increases on a single impalad
      -query throughput across the entire impala cluster drops and we cannot get any queries to execute
      -running threads continues to increase until we restart the impala service
      -in the impalad logs we see errors connecting to the unhealthy host. Example: Couldn't open transport for ux-reporting-engine-worker-23-prod-us-east-1a:22000 (connect() failed: Connection timed out)

      Questions:
      Why does the entire Impala service become unstable due to the health of a single impalad?
      Theoretically, shouldn't the impala statestore prevent the single impalad host from being used and allow queries to be processed by healthy nodes?

        Attachments

        1. worker23.png
          60 kB
          Scott Wallace
        2. threads.png
          207 kB
          Scott Wallace
        3. queries.png
          185 kB
          Scott Wallace
        4. load.png
          108 kB
          Scott Wallace
        5. health.png
          146 kB
          Scott Wallace
        6. cached_clients.png
          55 kB
          Scott Wallace

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                swallace Scott Wallace
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: