Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14787

NameNode error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • namenode
    • None

    Description

      Hi committee,

      We encountered a NN error as below,

      The primary NN was shut down last Thursday and we recover it by remove some OP in the edit log..  But the standby NN was shut down again yesterday by the same error...

      could you pls help address the possible root cause?

       

      Attach some error log:

      Full log and NameNode configuration pls refer to the attachments.

      Besides, I have attached some java code which could cause the error,

      1. We do some append action in spark streaming program (rt-Append.txt) which caused the primary NN shutdown last Thursday
      2. We do some move & concat operation in data convert program(move&concat.java) which caused the standby NN shutdown yesterday

      2019-08-27 09:51:12,409 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 766146/953617 transactions completed. (80%)2019-08-27 09:51:12,409 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 766146/953617 transactions completed. (80%)2019-08-27 09:51:12,858 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smcjob/.sparkStaging/application_1561429828507_20423/_spark_libs2381992047634476351.zip2019-08-27 09:51:12,870 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smcjob/.sparkStaging/application_1561429828507_20423/oozietest2-0.0.1-SNAPSHOT.jar2019-08-27 09:51:12,898 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smcjob/.sparkStaging/application_1561429828507_20423/spark_conf.zip2019-08-27 09:51:12,910 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smctest/.sparkStaging/application_1561429828507_20424/spark_libs8875310030853528804.zip2019-08-27 09:51:12,927 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smctest/.sparkStaging/application_1561429828507_20424/spark_conf.zip2019-08-27 09:51:13,777 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 857745/953617 transactions completed. (90%)2019-08-27 09:51:14,035 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smc_ss/.sparkStaging/application_1561429828507_20425/spark_libs7422229681005558653.zip2019-08-27 09:51:14,067 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smc_ss/.sparkStaging/application_1561429828507_20426/spark_libs7479542421029947753.zip2019-08-27 09:51:14,070 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication from 2 to 2 for /user/smctest/.sparkStaging/application_1561429828507_20428/spark_libs_7647933078788028649.zip2019-08-27 09:51:14,075 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/*****/v2-data-20190826.mayfly.data, replication=2, mtime=1566870616821, atime=1566870359230, blockSize=134217728, blocks=[blk_1270599798_758966421, blk_1270599852_758967928, blk_1270601282_759026903, blk_1270602443_759027052, blk_1270602446_759061086, blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r-r-, aclEntries=null, clientName=, clientMachine=, overwrite=false, storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=4359520942]java.io.IOException: Mismatched block IDs or generation stamps, attempting to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block # 4/6 of /****/v2-data-20190826.mayfly.data at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27 09:51:14,077 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 11714 ms viajava.lang.Thread.getStackTrace(Thread.java:1559)org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:218)org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1630)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:309)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) Number of suppressed write-lock reports: 0 Longest write-lock held interval: 117142019-08-27 09:51:14,077 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block report queue is full2019-08-27 09:51:14,077 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby NN.java.io.IOException: Mismatched block IDs or generation stamps, attempting to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block # 4/6 of /*****/v2-data-20190826.mayfly.data at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27 09:51:14,105 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.io.IOException: Mismatched block IDs or generation stamps, attempting to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block # 4/6 of /*****/v2-data-20190826.mayfly.data2019-08-27 09:51:14,118 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /**********************************************************SHUTDOWN_MSG: Shutting down NameNode at xxx-nn02.jq/10.129.148.13**********************************************************/2019-08-27 10:43:15,713 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: /***********************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG:   host = xxx-nn02.jq/10.129.148.13STARTUP_MSG:   args = []STARTUP_MSG:   version = 3.0.0-cdh6.0.1

      Attachments

        1. core-site.xml
          4 kB
          Cao, Lionel
        2. hadoop-cmf-hdfs-NAMENODE-smc-nn02.jq.log.out.20190827
          469 kB
          Cao, Lionel
        3. hdfs-site.xml
          9 kB
          Cao, Lionel
        4. move&concat.java
          2 kB
          Cao, Lionel
        5. rt-Append.txt
          2 kB
          Cao, Lionel

        Activity

          People

            Unassigned Unassigned
            lucao Cao, Lionel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: