Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2864

Improve the Scalability and Robustness of IPC

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.16.0
    • None
    • ipc
    • None

    Description

      This jira is intended to enhance IPC's scalability and robustness.

      Currently an IPC server can easily hung due to a disk failure or garbage collection, during which it cannot respond to the clients promptly. This has caused a lot of dropped calls and delayed responses thus many running applications fail on timeout. On the other side if busy clients send a lot of requests to the server in a short period of time or too many clients communicate with the server simultaneously, the server may be swarmed by requests and cannot work responsively.

      The proposed changes aim to

      1. provide a better client/server coordination
        • Server should be able to throttle client during burst of requests.
        • A slow client should not affect server from serving other clients.
        • A temporary hanging server should not cause catastrophic failures to clients.
      2. Client/server should detect remote side failures. Examples of failures include: (1) the remote host is crashed; (2) the remote host is crashed and then rebooted; (3) the remote process is crashed or shut down by an operator;
      3. Fairness. Each client should be able to make progress.

      Attachments

        1. RPCScalabilityDesignWeb.pdf
          85 kB
          Hairong Kuang

        Issue Links

          Activity

            People

              hairong Hairong Kuang
              hairong Hairong Kuang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: