Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3934

Application with large ApplicationSubmissionContext can cause RM to exit when ZK store is used

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • resourcemanager

    Description

      Use the following steps to test.

      1. Set up ZK as the RM HA store.
      2. Submit a job that refers to lots of distributed cache files with long HDFS path, which will cause the app state size to exceed ZK's max object size limit.
      3. RM can't write to ZK and exit with the following exception.

      2015-07-10 22:21:13,002 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause:
      org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
              at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
              at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
              at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:944)
              at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:941)
              at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1083)
      

      In this case, RM could have rejected the app during submitApplication RPC if the size of ApplicationSubmissionContext is too large.

      Attachments

        1. YARN-3934-1.patch
          6 kB
          Dustin Cote

        Activity

          People

            cotedm Dustin Cote
            mingma Ming Ma
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: