Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13891 HDFS RBF stabilization phase I
  3. HDFS-14161

RBF: Throw StandbyException instead of IOException so that client can retry when can not get connection

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.1, 2.9.2, 3.0.3
    • Fix Version/s: 3.3.0, HDFS-13891
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Hive Client may hang when get IOException, stack follows

      Exception in thread "Thread-150" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a connection to bigdata-nn20.g01:8020
      	at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
      	at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
      	at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
      	at org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
      
      	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:554)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:74)
      Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Cannot get a connection to bigdata-nn20.g01:8020
      	at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.getConnection(RouterRpcClient.java:262)
      	at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:380)
      	at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:752)
      	at org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
      
      	at org.apache.hadoop.ipc.Client.call(Client.java:1503)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1441)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
      	at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:775)
      	at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:253)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
      	at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)
      	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2111)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1390)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1386)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1402)
      	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1494)
      	at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:719)
      	at org.apache.hadoop.hive.ql.session.SessionState.createTmpTableSpaceDir(SessionState.java:635)
      	at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:613)
      	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:540)
      	... 1 more
      

      If router throw RetriableException when can not get connection and client set dfs.client.retry.policy.enabled true, this problem can be resvoled.

        Attachments

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment