Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-128 [Umbrella] RM Restart Phase 1: State storage and non-work-preserving recovery
  3. YARN-1405

RM hangs on shutdown if calling system.exit in serviceInit or serviceStart

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Enable yarn.resourcemanager.recovery.enabled=true and Pass a local path to yarn.resourcemanager.fs.state-store.uri. such as "file:///tmp/MYTMP"

      if the directory /tmp/MYTMP is not readable or writable, RM should crash and should print "Permission denied Error"

      Currently, RM throws "java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist" Error. RM returns Exiting status 1 but RM process does not shutdown.

      Snapshot of Resource manager log:

      2013-09-27 18:31:36,621 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for nm-tokens
      2013-09-27 18:31:36,694 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(640)) - Failed to load/recover state
      java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist
      at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:379)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518)
      at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
      at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:188)
      at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:112)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:635)
      at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
      2013-09-27 18:31:36,697 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1

        Attachments

        1. YARN-1405.1.patch
          5 kB
          Jian He
        2. rm-threaddump.out
          17 kB
          Jian He

          Activity

            People

            • Assignee:
              jianhe Jian He
              Reporter:
              yeshavora Yesha Vora
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: