Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3109

RPC should accepted connections even when rpc queue is full (ie undo part of HADOOP-2910)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None

      Description

      HADOOP-2910 changed HDFS to stop accepting new connections when the rpc queue is full. It should continue to accept connections and let the OS deal with limiting connections.

      HADOOP-2910's decision to not read from open sockets when queue is full is exactly right - backup on the
      client sockets and they will just wait( especially with HADOOP-2188 that removes client timeouts).
      However we should continue to accept connections:

      The OS refuses new connections after a large number of connections are open (this is configurable parameter). With this patch, we have new lower limit for # of open connections when the RPC queue is full.
      The problem is that when there is a surge of requests, we would stop
      accepting connection and clients will get a connection failed (a change from old behavior).
      Instead if you continue to accept connections it is likely that the surge will be over shortly and
      clients will get served. Of course if the surge lasts a long time the OS will stop accepting connections
      and clients will fail and there not much one can do (except raise the os limit).

      I propose that we continue accepting connections, but not read from
      connections when the RPC queue is full. (ie undo part of 2910 work back to the old behavior).

        Issue Links

          Activity

          Hide
          cutting Doug Cutting added a comment -

          Wouldn't it be easier to increase the sockets backlog size and remove the connect timeout?

          Show
          cutting Doug Cutting added a comment - Wouldn't it be easier to increase the sockets backlog size and remove the connect timeout?
          Hide
          rangadi Raghu Angadi added a comment -

          > It should continue to accept connections and let the OS deal with limiting connections.
          How can OS limit connections properly if application keeps accepting them?

          There could be some global limit in the OS, but isn't that very harsh on everything else on the machine? Which parameter is this?

          > The problem is that when there is a surge of requests, we would stop
          > accepting connection and clients will get a connection failed (a change from old behavior).
          timeout is removed in 2188.. if it is good for 2188, it good here too. Ideally we should just have 2188 .

          Show
          rangadi Raghu Angadi added a comment - > It should continue to accept connections and let the OS deal with limiting connections. How can OS limit connections properly if application keeps accepting them? There could be some global limit in the OS, but isn't that very harsh on everything else on the machine? Which parameter is this? > The problem is that when there is a surge of requests, we would stop > accepting connection and clients will get a connection failed (a change from old behavior). timeout is removed in 2188.. if it is good for 2188, it good here too. Ideally we should just have 2188 .
          Hide
          chansler Robert Chansler added a comment -

          0.17? Yes!

          Show
          chansler Robert Chansler added a comment - 0.17? Yes!
          Hide
          hairong Hairong Kuang added a comment -

          > Wouldn't it be easier to increase the sockets backlog size and remove the connect timeout?
          Increasing sockets backlog size might be a good solution. What should be a good backlog size? The connect timeout is already removed in HADOOP-2910.

          Show
          hairong Hairong Kuang added a comment - > Wouldn't it be easier to increase the sockets backlog size and remove the connect timeout? Increasing sockets backlog size might be a good solution. What should be a good backlog size? The connect timeout is already removed in HADOOP-2910 .
          Hide
          cutting Doug Cutting added a comment -

          > What should be a good backlog size?

          Perhaps this should be proportional to call queue? Currently we queue 100 calls per handler with 10 handlers, or 1000 by default. The backlog is currently 128. So setting the backlog to the call queue length would make it 1000 by default. Folks with large clusters increase the number of handlers to 50 or so, so they'd get a backlog of 5000. Does that sound like enough, or should we use a multiple of this?

          Show
          cutting Doug Cutting added a comment - > What should be a good backlog size? Perhaps this should be proportional to call queue? Currently we queue 100 calls per handler with 10 handlers, or 1000 by default. The backlog is currently 128. So setting the backlog to the call queue length would make it 1000 by default. Folks with large clusters increase the number of handlers to 50 or so, so they'd get a backlog of 5000. Does that sound like enough, or should we use a multiple of this?
          Hide
          sameerp Sameer Paranjpye added a comment -

          Managing the backlog in a portable way is not easy. Many Linux versions will truncate the backlog to 128 (silently) if it is set higher, for example.

          Show
          sameerp Sameer Paranjpye added a comment - Managing the backlog in a portable way is not easy. Many Linux versions will truncate the backlog to 128 (silently) if it is set higher, for example.
          Hide
          hairong Hairong Kuang added a comment -

          That explains why more than 7 connect requests got served when I set the backlog length to be 1. Looks that Linux does not observe the backlog parameter.

          Show
          hairong Hairong Kuang added a comment - That explains why more than 7 connect requests got served when I set the backlog length to be 1. Looks that Linux does not observe the backlog parameter.
          Hide
          hairong Hairong Kuang added a comment -

          So what I plan to do is to have a thread that only accepts new connections and a threading that reads from accepted connections. Each thread has its own seclector. The accepting thread notifies the reading thread of the newly accepted connection through a pipe.

          Show
          hairong Hairong Kuang added a comment - So what I plan to do is to have a thread that only accepts new connections and a threading that reads from accepted connections. Each thread has its own seclector. The accepting thread notifies the reading thread of the newly accepted connection through a pipe.
          Hide
          hairong Hairong Kuang added a comment -

          I mark this jira to be resolved in 0.18. I will revert the patch to HADOOP-2910.

          Show
          hairong Hairong Kuang added a comment - I mark this jira to be resolved in 0.18. I will revert the patch to HADOOP-2910 .
          Hide
          cutting Doug Cutting added a comment -

          > Many Linux versions will truncate the backlog to 128

          The default max is 128, but it's easy to increase this:

          sudo sysctl -w net.core.somaxconn=2048

          That doesn't seem too onerous.

          We do need to limit the number of accepted connections to substantially less than the file handle limit. Increasing the listen queue length is a cheap way to get headroom beyond this.

          Show
          cutting Doug Cutting added a comment - > Many Linux versions will truncate the backlog to 128 The default max is 128, but it's easy to increase this: sudo sysctl -w net.core.somaxconn=2048 That doesn't seem too onerous. We do need to limit the number of accepted connections to substantially less than the file handle limit. Increasing the listen queue length is a cheap way to get headroom beyond this.
          Hide
          chansler Robert Chansler added a comment -

          This was incorporated into the ultimate resolution for 2910.
          There is no independent patch or change.

          Show
          chansler Robert Chansler added a comment - This was incorporated into the ultimate resolution for 2910. There is no independent patch or change.

            People

            • Assignee:
              hairong Hairong Kuang
              Reporter:
              sanjay.radia Sanjay Radia
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development