Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9521

RM failed to start due to system services

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.2
    • Fix Version/s: 3.3.0
    • Component/s: None
    • Labels:

      Description

      when starting RM, listing system services directory has failed as follows.

      2019-04-30 17:18:25,441 INFO  client.SystemServiceManagerImpl (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory is configured to /services
      2019-04-30 17:18:25,467 INFO  client.SystemServiceManagerImpl (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation initialized to yarn (auth:SIMPLE)
      2019-04-30 17:18:25,467 INFO  service.AbstractService (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in state STARTED
      org.apache.hadoop.service.ServiceStateException: java.io.IOException: Filesystem closed
              at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
              at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
              at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869)
              at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316)
              at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501)
      Caused by: java.io.IOException: Filesystem closed
              at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473)
              at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639)
              at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1217)
              at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1233)
              at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1200)
              at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179)
              at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175)
              at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
              at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187)
              at org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375)
              at org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282)
              at org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126)
              at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
              ... 13 more
      

      it looks like due to the usage of filesystem cache.
      this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to yarn-site

        Attachments

        1. YARN-9521.001.patch
          2 kB
          kyungwan nam
        2. YARN-9521.002.patch
          3 kB
          kyungwan nam
        3. YARN-9521.003.patch
          6 kB
          kyungwan nam
        4. YARN-9521.004.patch
          6 kB
          kyungwan nam

          Issue Links

            Activity

              People

              • Assignee:
                kyungwan nam kyungwan nam
                Reporter:
                kyungwan nam kyungwan nam
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: