Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2990

Coordinator should timeout and cancel queries with unresponsive / stuck executors

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The coordinator currently waits indefinitely if it does not hear back from a backend. This could cause a query to hang indefinitely in case of a network error, etc.

      We should add logic for determining when a backend is unresponsive and kill the query. The logic should mostly revolve around Coordinator::Wait() and Coordinator::UpdateFragmentExecStatus() based on whether it receives periodic updates from a backed (via FragmentExecState::ReportStatusCb()).

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            twmarshall Thomas Tauber-Marshall
            sailesh Sailesh Mukil
            Votes:
            2 Vote for this issue
            Watchers:
            20 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment