Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-412

Impala might hang when an impalad die during query execution

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 1.0
    • Fix Version/s: Impala 1.1
    • Component/s: None
    • Labels:
      None

      Description

      Here's the detailed description of what could lead to a hang.

      A query started and the client started fetching. The fetch will block because impala coorindator is blocked on waiting for data arrival (in DataStreamManager) from its child fragments. The fetch call is holding a lock on exec_state.

      The wait for data arrival cannot detect if its child fragment instance is healthy running or not. It will wait until it's either cancelled, or some data arrives.

      Now, all the child fragment instances are dead because the nodes die. The coordinator node is still running and waiting for data. Statestore detects the node failure and try to issue a query cancellation. However, it can't issue a query because the fetch call (FetchInternal) is holding the exec_state lock. CancelInternal() can't proceed because GetQueryExecState() can't lock the exec_state lock.

      GetQueryExecState() is blocked on the exec_state lock while holding query_exec_state_map_lock_. This will cause the webserver to hang because the webserver is waiting on query_exec_state_map_lock_ to see which query is still alive.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                skye Skye Wanderman-Milne
                Reporter:
                alan@cloudera.com Alan Choi
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: