Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-10412

First call from Client fails after Server restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.0
    • None
    • ipc
    • None
    • Linux : centos62-2 2.6.32-220.el6.x86_64,
      jdk : 1.7.0_15

    Description

      This seems to happen only for ProtobufRpc based services. Could not reproduce using simple WritableRpc.

      Steps to reproduce :
      Consider the case of namenode HA failover. nn1 and nn2 are both namenodes, nn1 is 'active' and nn2 is 'standby'
      1) Bring down nn1 process. Now nn2 is active
      2) Bring nn1 process back up. Now nn1 is standby and nn2 is active.
      3) Manually issue failover using command :

      $ hdfs haadmin -failover nn2 nn1

      It is observed that the first call always fails with the Following exception :

      Operation failed: Failed to become active. Couldn't make NameNode at centos62-2/192.168.2.202:8020 active
      java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "centos62-2/192.168.2.202"; destination host is: "centos62-2":8020;
      at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
      at org.apache.hadoop.ipc.Client.call(Client.java:1351)
      at org.apache.hadoop.ipc.Client.call(Client.java:1300)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
      at com.sun.proxy.$Proxy8.transitionToActive(Unknown Source)
      at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToActive(HAServiceProtocolClientSideTranslatorPB.java:100)
      at org.apache.hadoop.ha.HAServiceProtocolHelper.transitionToActive(HAServiceProtocolHelper.java:48)
      at org.apache.hadoop.ha.ZKFailoverController.becomeActive(ZKFailoverController.java:373)
      at org.apache.hadoop.ha.ZKFailoverController.access$900(ZKFailoverController.java:59)
      at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.becomeActive(ZKFailoverController.java:818)
      at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:803)
      at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
      at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
      at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
      Caused by: java.io.EOFException
      at java.io.DataInputStream.readInt(DataInputStream.java:392)
      at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
      at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

      at org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:673)
      at org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:59)
      at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:592)
      at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:589)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
      at org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:589)
      at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94)
      at org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61)
      at org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1548)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)

      The calls succeeds if I issue the same command subsequently

      Attachments

        Activity

          People

            Unassigned Unassigned
            asuresh Arun Suresh
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: