Hadoop Common
  1. Hadoop Common
  2. HADOOP-2864

Improve the Scalability and Robustness of IPC


    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.16.0
    • Fix Version/s: None
    • Component/s: ipc
    • Labels:


      This jira is intended to enhance IPC's scalability and robustness.

      Currently an IPC server can easily hung due to a disk failure or garbage collection, during which it cannot respond to the clients promptly. This has caused a lot of dropped calls and delayed responses thus many running applications fail on timeout. On the other side if busy clients send a lot of requests to the server in a short period of time or too many clients communicate with the server simultaneously, the server may be swarmed by requests and cannot work responsively.

      The proposed changes aim to

      1. provide a better client/server coordination
        • Server should be able to throttle client during burst of requests.
        • A slow client should not affect server from serving other clients.
        • A temporary hanging server should not cause catastrophic failures to clients.
      2. Client/server should detect remote side failures. Examples of failures include: (1) the remote host is crashed; (2) the remote host is crashed and then rebooted; (3) the remote process is crashed or shut down by an operator;
      3. Fairness. Each client should be able to make progress.

        Issue Links


          Hairong Kuang added a comment -

          Design document is attached.

          Hairong Kuang added a comment - Design document is attached.


            • Assignee:
              Hairong Kuang
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              4 Start watching this issue


              • Created: