Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3399 BookKeeper option support for NN HA
  3. HDFS-3452

BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Normal switch fails.
      (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is 5000. By the time control comes to acquire lock the previous lock is not released which leads to failure in lock acquisition by NN and NN gets shutdown. Ideally it should have been done)
      =============================================================================
      2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock: Failed to acquire lock with /ledgers/lock/lock-0000000007, lock-0000000006 already has it
      2012-05-09 20:15:29,732 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec, stream=null))
      java.io.IOException: Could not acquire lock
      at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107)
      at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406)
      at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551)
      at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322)
      at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598)
      at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287)
      at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
      at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
      at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978)
      at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
      at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
      2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
      /************************************************************
      SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX

      Scenario:
      Start ZKFCS, NNs
      NN1 is active and NN2 is standby
      Stop NN1. NN2 tries to transition to active and gets shut down

      1. HDFS-3452-2.patch
        28 kB
        Uma Maheswara Rao G
      2. HDFS-3452-1.patch
        28 kB
        Uma Maheswara Rao G
      3. HDFS-3452.patch
        25 kB
        Uma Maheswara Rao G
      4. HDFS-3452.patch
        25 kB
        Uma Maheswara Rao G
      5. BK-253-BKJM.patch
        17 kB
        Uma Maheswara Rao G

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Uma Maheswara Rao G
              Reporter:
              suja s
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development