Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1623 High Availability Framework for HDFS NN
  3. HDFS-2577

HA: NN fails to start since it tries to start secret manager in safemode

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: HA branch (HDFS-1623)
    • Component/s: ha, namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      After HDFS-2301, the NN fails to start with the following:

      Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot log master key update in safe mode. Name node is in safe mode.
      The reported blocks 0 needs additional 5 blocks to reach the threshold 1.0000 of total blocks 4. Safe mode will be turned off automatically.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logUpdateMasterKey(FSNamesystem.java:4259)
      at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logUpdateMasterKey(DelegationTokenSecretManager.java:285)
      at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.updateCurrentKey(AbstractDelegationTokenSecretManager.java:143)
      at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.startThreads(AbstractDelegationTokenSecretManager.java:98)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startSecretManager(FSNamesystem.java:386)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:440)
      at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:937)
      at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:57)

      1. hdfs-2577.txt
        7 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          The LM changes don't need to be done on trunk, because on the non-HA branch, FSNamesystem.isRunning is an accurate gauge of whether the LM should run. On HDFS it's not the case anymore since in standby mode, the LM doesn't run, even though the FSN is "running". I'll check this into the HA branch, thanks!

          Show
          Todd Lipcon added a comment - The LM changes don't need to be done on trunk, because on the non-HA branch, FSNamesystem.isRunning is an accurate gauge of whether the LM should run. On HDFS it's not the case anymore since in standby mode, the LM doesn't run, even though the FSN is "running". I'll check this into the HA branch, thanks!
          Hide
          Eli Collins added a comment -

          +1 looks great. Need to do the LM change on trunk too right?

          Show
          Eli Collins added a comment - +1 looks great. Need to do the LM change on trunk too right?
          Hide
          Todd Lipcon added a comment -

          Attached patch changes back to the trunk behavior: only start the DT manager if security is enabled.
          Also fixed another related bug I noticed while testing this fix – with HDFS-2301 the lease manager stop was always timing out, since it was looping on namesystem.isRunning() which continues to be true even in standby mode. Added a flag to LeaseManager to decide whether the monitor should run.

          Tests seem to be passing now.

          Show
          Todd Lipcon added a comment - Attached patch changes back to the trunk behavior: only start the DT manager if security is enabled. Also fixed another related bug I noticed while testing this fix – with HDFS-2301 the lease manager stop was always timing out, since it was looping on namesystem.isRunning() which continues to be true even in standby mode. Added a flag to LeaseManager to decide whether the monitor should run. Tests seem to be passing now.
          Hide
          Todd Lipcon added a comment -

          The issue is that in the HDFS-2301 patch, behavior was changed to always start the DT manager, whereas in trunk it only tries to start it if security is enabled. I filed HDFS-2579 to fix the general problem in trunk.

          Show
          Todd Lipcon added a comment - The issue is that in the HDFS-2301 patch, behavior was changed to always start the DT manager, whereas in trunk it only tries to start it if security is enabled. I filed HDFS-2579 to fix the general problem in trunk.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development