Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
2.3.0
-
None
-
None
-
centeros 6.2 64bit
Description
We have seen crash when starting the standby namenode, with fatal errors. Any solutions, workarouds, or ideas would be helpful for us.
1. Here is the context:
At begining we have 2 namenodes, take A as active and B as standby. For some resons, namenode A was dead, so namenode B is working as active.
When we try to restart A after a minute, it can't work. During this time a lot of files were put to HDFS, and a lot of files were renamed.
Nodenode A crashed when "awaiting reported blocks in safemode" each time.
2. We can see error log below:
1)2015-03-30 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/xxx/_temporary/xxx/part-r-00074.bz2, replication=3, mtime=1427699913947, atime=1427699081161, blockSize=268435456, blocks=[blk_2103131025_1100889495739], permissions=dm:dm:rw-r-r-, clientName=, clientMachine=, opCode=OP_CLOSE, txid=7632753612]
java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoUnderConstruction.java:247)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.commitBlock(BlockInfoUnderConstruction.java:267)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:639)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:813)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:383)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
2)2015-03-30 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby N
N.
java.io.IOException: Failed to apply edit log operation AddBlockOp [path=/xxx/_temporary/xxx/part-m-00121, penultimateBlock=blk_2102331803_1100888911441, lastBlock=blk_2102661068_1100889009168, RpcClientId=, RpcCallId=-2]: error
null
at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)