Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9299 Node Blacklisting: Coordinators should blacklist unhealthy nodes
  3. IMPALA-9253

Blacklist additional posix error codes for failed DataStreamService RPCs

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Distributed Exec
    • Labels:
      None
    • Epic Color:
      ghx-label-8

      Description

      Filing as a follow up to IMPALA-9137IMPALA-9137 blacklists a node if a RPC fails with specific posix error codes:

      • 107 = ENOTCONN: Transport endpoint is not connected
      • 108 = ESHUTDOWN: Cannot send after transport endpoint shutdown
      • 111 = ECONNREFUSED: Connection refused

      These codes were produced by running a query, killing a node running that query, and then seeing what error codes the query failed with.

      There may be other error codes that are worth using for node blacklisting as well. One way to come up with more error codes is to use iptables to introduce network faults between Impala processes and see how RPCs fail.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              stakiar Sahil Takiar

              Dates

              • Created:
                Updated:

                Issue deployment