Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9229

IPC: Retry on connection reset or socket timeout during SASL negotiation

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.3-alpha, 0.23.7, 3.0.0-alpha1
    • Fix Version/s: None
    • Component/s: ipc
    • Labels:
      None

      Description

      When an RPC server is overloaded, incoming connections may not get accepted in time, causing listen queue overflow. The impact on client varies depending on the type of OS in use. On Linux, connections in this state look fully connected to the clients, but they are without buffers, thus any data sent to the server will get dropped.

      This won't be a problem for protocols where client first wait for server's greeting. Even for clients-speak-first protocols, it will be fine if the overload is transient and such connections are accepted before the retransmission of dropped packets arrive. Otherwise, clients can hit socket timeout after several retransmissions. In certain situations, connection will get reset while clients still waiting for ack.

      We have seen this happening to IPC clients during SASL negotiation. Since no call has been sent, we should allow retry when connection reset or socket timeout happens in this stage.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              kihwal Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated: