Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9299 Node Blacklisting: Coordinators should blacklist unhealthy nodes
  3. IMPALA-9300

Add a limit on the number of nodes that can be blacklisted per query

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend
    • None
    • ghx-label-5

    Description

      We currently have no limit on the number of nodes that can be blacklisted if an Exec() RPC fails.

      For data transfer (TransmitData()) RPC failures, we blacklist at most one node per status update (so typically one node per query).

      It would be nice to have a global limit on the number of nodes blacklisted to prevent a single query from blacklisting a large part of the cluster. This can help guard against intermittent, cluster-wide, hardware issues that might only last a few seconds. It would be nice if the max number of blacklist-able nodes is a function of the cluster size (e.g. a query cannot blacklist more than a third of the nodes in the cluster).

      TBD if the value should be configurable or not. 

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            stakiar Sahil Takiar

            Dates

              Created:
              Updated:

              Slack

                Issue deployment