Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6798

Fix NM startup failure with old state store due to version mismatch

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha4
    • 2.9.0, 3.0.0-beta1
    • nodemanager
    • None
    • Reviewed
    • Hide
      <!-- markdown -->

      This fixes the LevelDB state store for the NodeManager. As of this patch, the state store versions now correspond to the following table.

      * Previous Patch: YARN-5049
        * LevelDB Key: queued
        * Hadoop Versions: 2.9.0, 3.0.0-alpha1
        * Corresponding LevelDB Version: 1.2
      * Previous Patch: YARN-6127
        * LevelDB Key: AMRMProxy/NextMasterKey
        * Hadoop Versions: 2.9.0, 3.0.0-alpha4
        * Corresponding LevelDB Version: 1.1
      Show
      <!-- markdown --> This fixes the LevelDB state store for the NodeManager. As of this patch, the state store versions now correspond to the following table. * Previous Patch: YARN-5049   * LevelDB Key: queued   * Hadoop Versions: 2.9.0, 3.0.0-alpha1   * Corresponding LevelDB Version: 1.2 * Previous Patch: YARN-6127   * LevelDB Key: AMRMProxy/NextMasterKey   * Hadoop Versions: 2.9.0, 3.0.0-alpha4   * Corresponding LevelDB Version: 1.1

    Description

      YARN-6703 rolled back the state store version number for the RM from 2.0 to 1.4.

      YARN-6127 bumped the version for the NM to 3.0

      private static final Version CURRENT_VERSION_INFO = Version.newInstance(3, 0);

      YARN-5049 bumped the version for the NM to 2.0

      private static final Version CURRENT_VERSION_INFO = Version.newInstance(2, 0);

      During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to alpha4.

      2017-07-07 15:48:17,259 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
      org.apache.hadoop.service.ServiceStateException: java.io.IOException: Incompatible version for NM state: expecting NM state version 3.0, but loading version 2.0
              at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
              at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
      Caused by: java.io.IOException: Incompatible version for NM state: expecting NM state version 3.0, but loading version 2.0
              at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
              at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
              at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
              ... 5 more
      2017-07-07 15:48:17,277 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
      /************************************************************
      SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
      ************************************************************/
      

      Attachments

        1. YARN-6798.v1.patch
          1 kB
          Botong Huang
        2. YARN-6798.v2.patch
          1 kB
          Ray Chiang

        Issue Links

          Activity

            People

              botong Botong Huang
              rchiang Ray Chiang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: