Hadoop Common
  1. Hadoop Common
  2. HADOOP-9107

Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0, 2.0.2-alpha
    • Fix Version/s: None
    • Component/s: ipc
    • Labels:
      None

      Description

      This code in Client.java looks fishy:

        public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
            ConnectionId remoteId) throws InterruptedException, IOException {
          Call call = new Call(rpcKind, rpcRequest);
          Connection connection = getConnection(remoteId, call);
          connection.sendParam(call);                 // send the parameter
          boolean interrupted = false;
          synchronized (call) {
            while (!call.done) {
              try {
                call.wait();                           // wait for the result
              } catch (InterruptedException ie) {
                // save the fact that we were interrupted
                interrupted = true;
              }
            }
      
            if (interrupted) {
              // set the interrupt flag now that we are done waiting
              Thread.currentThread().interrupt();
            }
      
            if (call.error != null) {
              if (call.error instanceof RemoteException) {
                call.error.fillInStackTrace();
                throw call.error;
              } else { // local exception
                InetSocketAddress address = connection.getRemoteAddress();
                throw NetUtils.wrapException(address.getHostName(),
                        address.getPort(),
                        NetUtils.getHostname(),
                        0,
                        call.error);
              }
            } else {
              return call.getRpcResult();
            }
          }
        }
      

      Blocking calls are expected to throw InterruptedException if that is interrupted. Also it seems like this method waits on the call objects even if it is interrupted. Currently, this method does not throw an InterruptedException, nor is it documented that this method interrupts the thread calling it. If it is interrupted, this method should still throw InterruptedException, it should not matter if the call was successful or not.

      This is a major issue for clients which do not call this directly, but call HDFS client API methods to write to HDFS, which may be interrupted by the client due to timeouts, but does not throw InterruptedException. Any HDFS client calls can interrupt the thread but it is not documented anywhere.

        Issue Links

          Activity

          Gavin made changes -
          Link This issue is related to HADOOP-6762 [ HADOOP-6762 ]
          Gavin made changes -
          Link This issue is related to HADOOP-6762 [ HADOOP-6762 ]
          Eli Collins made changes -
          Link This issue is related too HADOOP-6762 [ HADOOP-6762 ]
          Hide
          Hari Shreedharan added a comment -

          Karthik, Steve - makes complete sense.

          Show
          Hari Shreedharan added a comment - Karthik, Steve - makes complete sense.
          Hide
          Steve Loughran added a comment -

          given the RPC stack is still around, +1 to making it interruptible -it hurts external clients the most.

          and +1 to both fixes -they should all go together

          Show
          Steve Loughran added a comment - given the RPC stack is still around, +1 to making it interruptible -it hurts external clients the most. and +1 to both fixes -they should all go together
          Hide
          Karthik Kambatla added a comment -

          From HADOOP-6221:

          I think a good tactic would be rather than trying to make the old RPC stack interruptible, focus on making Avro something that you can interrupt, so that going forward you can interrupt client programs trying to talk to unresponsive servers.

          Steve, is there a reason for not making the old RPC stack interruptible?

          I feel we should do both - what Hari is proposing here, and what HADOOP-6221 addresses.

          Show
          Karthik Kambatla added a comment - From HADOOP-6221 : I think a good tactic would be rather than trying to make the old RPC stack interruptible, focus on making Avro something that you can interrupt, so that going forward you can interrupt client programs trying to talk to unresponsive servers. Steve, is there a reason for not making the old RPC stack interruptible? I feel we should do both - what Hari is proposing here, and what HADOOP-6221 addresses.
          Hide
          Hari Shreedharan added a comment -

          I agree that both are pretty similar, but I think we still need to do the cleanup I am proposing here right?

          Show
          Hari Shreedharan added a comment - I agree that both are pretty similar, but I think we still need to do the cleanup I am proposing here right?
          Steve Loughran made changes -
          Link This issue relates to HADOOP-6221 [ HADOOP-6221 ]
          Hide
          Steve Loughran added a comment -

          This is similar to HADOOP-6221, though you are proposing more cleanup.

          Could you use that patch and test as a starting point?

          Show
          Steve Loughran added a comment - This is similar to HADOOP-6221 , though you are proposing more cleanup. Could you use that patch and test as a starting point?
          Hide
          Hari Shreedharan added a comment -

          My take on what should really happen in the catch block:

          • call.setException()
          • Remove call from the calls table.
          • In the receiveResponse method, check if calls.get(callId) returns null before proceeding.
          • throw the InterruptedException (or wrap it and then throw), so client code can know something went wrong and the call failed.
          Show
          Hari Shreedharan added a comment - My take on what should really happen in the catch block: call.setException() Remove call from the calls table. In the receiveResponse method, check if calls.get(callId) returns null before proceeding. throw the InterruptedException (or wrap it and then throw), so client code can know something went wrong and the call failed.
          Hide
          Hari Shreedharan added a comment -

          To ensure that the real client that calls this should know that the call was interrupted, rather than forcing it to check the thread's interrupt flag.

          Show
          Hari Shreedharan added a comment - To ensure that the real client that calls this should know that the call was interrupted, rather than forcing it to check the thread's interrupt flag.
          Hide
          Hari Shreedharan added a comment -

          (1) is insufficient since clients often do not directly call this method. I believe that if this method gets interrupted:

          • Clean up the call object - seems like some clean up is required in the Connection object.
          • throw InterruptedException, regardless of whether the calls complete successfully or not.
          Show
          Hari Shreedharan added a comment - (1) is insufficient since clients often do not directly call this method. I believe that if this method gets interrupted: Clean up the call object - seems like some clean up is required in the Connection object. throw InterruptedException, regardless of whether the calls complete successfully or not.
          Hide
          Karthik Kambatla added a comment -

          The things to fix look like:

          1. document that the method eats up InterruptedException
          2. break after setting interrupted to true in the catch block
          3. throw appropriate exception in the else branch of if (call.error != null)
          Show
          Karthik Kambatla added a comment - The things to fix look like: document that the method eats up InterruptedException break after setting interrupted to true in the catch block throw appropriate exception in the else branch of if (call.error != null)
          Hari Shreedharan made changes -
          Link This issue relates to FLUME-1748 [ FLUME-1748 ]
          Hari Shreedharan made changes -
          Field Original Value New Value
          Affects Version/s 1.1.0 [ 12316501 ]
          Hari Shreedharan created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Hari Shreedharan
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:

                Development