Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5355 YARN Timeline Service v.2: alpha 2
  3. YARN-6555

Store application flow context in NM state store for work-preserving restart

    XMLWordPrintableJSON

Details

    Description

      If timeline service v2 is enabled and NM is restarted with recovery enabled, then NM fails to start and throws an error as "flow context can't be null".

      This is happening because the flow context did not exist before but now that timeline service v2 is enabled, ApplicationImpl expects it to exist.

      This would also happen even if flow context existed before but since we are not persisting it / reading it during ContainerManagerImpl#recoverApplication, it does not get passed in to ApplicationImpl.

      full stack trace

      2017-05-03 21:51:52,178 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
      java.lang.IllegalArgumentException: flow context cannot be null
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:104)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:90)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
              at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
      

      Attachments

        1. YARN-6555.001.patch
          7 kB
          Rohith Sharma K S
        2. YARN-6555.002.patch
          16 kB
          Rohith Sharma K S
        3. YARN-6555.003.patch
          16 kB
          Rohith Sharma K S

        Issue Links

          Activity

            People

              rohithsharma Rohith Sharma K S
              vrushalic Vrushali C
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: