Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9299 Node Blacklisting: Coordinators should blacklist unhealthy nodes
  3. IMPALA-9253

Blacklist additional posix error codes for failed DataStreamService RPCs

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Distributed Exec
    • None
    • ghx-label-8

    Description

      Filing as a follow up to IMPALA-9137IMPALA-9137 blacklists a node if a RPC fails with specific posix error codes:

      • 107 = ENOTCONN: Transport endpoint is not connected
      • 108 = ESHUTDOWN: Cannot send after transport endpoint shutdown
      • 111 = ECONNREFUSED: Connection refused

      These codes were produced by running a query, killing a node running that query, and then seeing what error codes the query failed with.

      There may be other error codes that are worth using for node blacklisting as well. One way to come up with more error codes is to use iptables to introduce network faults between Impala processes and see how RPCs fail.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stakiar Sahil Takiar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: