Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2184

ResourceManager may fail due to name node in safe mode

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.4.0
    • None
    • resourcemanager
    • None

    Description

      If the historyservice is enabled in resourcemanager, it will try to mkdir when service is inited. And at that time maybe the name node is still in safemode which may cause the historyservice failed and then cause the resouremanager fail. It would be very possible when the cluster is restarted when namenode will be in safemode in a long time.

      Here's the error logs:

      Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /Users/jzhang/Java/lib/hadoop-2.4.0/logs/yarn/system/history/ApplicationHistoryDataRoot. Name node is in safe mode.
      The reported blocks 85 has reached the threshold 0.9990 of total blocks 85. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 19 seconds.
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1195)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3564)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3540)
          at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
          at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
          at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
          at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:415)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
      
          at org.apache.hadoop.ipc.Client.call(Client.java:1410)
          at org.apache.hadoop.ipc.Client.call(Client.java:1363)
          at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
          at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)
          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
          at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
          at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500)
          at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553)
          at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524)
          at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827)
          at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823)
          at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
          at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823)
          at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816)
          at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815)
          at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.serviceInit(FileSystemApplicationHistoryStore.java:120)
          at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
          ... 10 more
      2014-06-20 11:06:25,220 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
      /************************************************************
      SHUTDOWN_MSG: Shutting down ResourceManager at jzhangMBPr.local/192.168.100.152
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zjffdu Jeff Zhang
            zjffdu Jeff Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment