Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4426

Hang in ImpalaServer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Cannot Reproduce
    • Impala 2.7.0
    • None
    • Distributed Exec

    Description

      We saw this hang in testing where the Impalad was unresponsive (no debug webpage, fragments not starting, etc). The .threads are stack traces from two different instances of the hangs on the same cluster. The lock order graphs are some analysis of the lock order, but I haven't found the actual deadlock from the current stacks (the one cycle is IMPALA-4409, which there wasn't any evidence of in the stacks).

      I don't think any of these related issues apply:

      • IMPALA-4037 involves ChildQuery::Cancel, which we don't see in the stacks
      • IMPALA-4038 involves a query holding a coordinator lock while doing an RPC, which we don't see in the stacks
      • IMPALA-4409 involves CancelInternal(), but would require a thread to be blocked in CancelInternal() itself after GetQueryExecState() returns. Instead we see all the stacks sitting in GetQueryExecState()
        It looks like a classic deadlock but I haven't been able to figure out what the cycle of dependencies is.

      Attachments

        1. deadlock-1-threads
          3.44 MB
          Tim Armstrong
        2. deadlock-2-threads
          1.07 MB
          Tim Armstrong
        3. impala-server-lock-order.dot
          1.0 kB
          Tim Armstrong
        4. impala-server-lock-order.pdf
          19 kB
          Tim Armstrong

        Activity

          People

            sailesh Sailesh Mukil
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: