Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-412

Impala might hang when an impalad die during query execution

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.0
    • Impala 1.1
    • None
    • None

    Description

      Here's the detailed description of what could lead to a hang.

      A query started and the client started fetching. The fetch will block because impala coorindator is blocked on waiting for data arrival (in DataStreamManager) from its child fragments. The fetch call is holding a lock on exec_state.

      The wait for data arrival cannot detect if its child fragment instance is healthy running or not. It will wait until it's either cancelled, or some data arrives.

      Now, all the child fragment instances are dead because the nodes die. The coordinator node is still running and waiting for data. Statestore detects the node failure and try to issue a query cancellation. However, it can't issue a query because the fetch call (FetchInternal) is holding the exec_state lock. CancelInternal() can't proceed because GetQueryExecState() can't lock the exec_state lock.

      GetQueryExecState() is blocked on the exec_state lock while holding query_exec_state_map_lock_. This will cause the webserver to hang because the webserver is waiting on query_exec_state_map_lock_ to see which query is still alive.

      Attachments

        Issue Links

          Activity

            People

              skye Skye Wanderman-Milne
              alan@cloudera.com Alan Choi
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: