Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8011

standby nn can't started

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 2.3.0
    • None
    • ha
    • None
    • centeros 6.2 64bit

    Description

      We have seen crash when starting the standby namenode, with fatal errors. Any solutions, workarouds, or ideas would be helpful for us.
      1. Here is the context:
      At begining we have 2 namenodes, take A as active and B as standby. For some resons, namenode A was dead, so namenode B is working as active.
      When we try to restart A after a minute, it can't work. During this time a lot of files were put to HDFS, and a lot of files were renamed.
      Nodenode A crashed when "awaiting reported blocks in safemode" each time.

      2. We can see error log below:
      1)2015-03-30 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/xxx/_temporary/xxx/part-r-00074.bz2, replication=3, mtime=1427699913947, atime=1427699081161, blockSize=268435456, blocks=[blk_2103131025_1100889495739], permissions=dm:dm:rw-r-r-, clientName=, clientMachine=, opCode=OP_CLOSE, txid=7632753612]
      java.lang.NullPointerException
      at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoUnderConstruction.java:247)
      at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.commitBlock(BlockInfoUnderConstruction.java:267)
      at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:639)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:813)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:383)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122)
      at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:356)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
      at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)

      2)2015-03-30 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby N
      N.
      java.io.IOException: Failed to apply edit log operation AddBlockOp [path=/xxx/_temporary/xxx/part-m-00121, penultimateBlock=blk_2102331803_1100888911441, lastBlock=blk_2102661068_1100889009168, RpcClientId=, RpcCallId=-2]: error
      null
      at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
      at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122)
      at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:356)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
      at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
      at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)

      Attachments

        Activity

          People

            Unassigned Unassigned
            fj1002817 fujie
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: