Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13236

Standby NN down with error encountered while tailing edits

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: journal-node, namenode
    • Labels:
      None

      Description

      After update Hadoop from 2.7.3 to 3.0.0 standby NN down with error encountered while tailing edits from JN:

      Feb 28 01:58:31 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:31,594 INFO [FSImageSaver for /one/hadoop-data/dfs of type IMAGE_AND_EDITS] FSImageFormatProtobuf - Image file /one/hadoop-data/dfs/current/fsimage.ckpt_00000000012748979
      98 of size 4595971949 bytes saved in 93 seconds.
      Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 INFO [Standby State Checkpointer] NNStorageRetentionManager - Going to retain 2 images with txid >= 1274897935
      Feb 28 01:58:33 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:33,445 INFO [Standby State Checkpointer] NNStorageRetentionManager - Purging old image FSImageFile(file=/one/hadoop-data/dfs/current/fsimage_0000000001274897875, cpktTxId
      =0000000001274897875)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 INFO [Edit log tailer] FSImage - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6a168e6f expecting start txid #1274897999
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,660 INFO [Edit log tailer] FSImage - Start loading edits file http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup&segmentTxId=1274897999&storageInfo=-64%3A10
      56233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848&inProgressOk=true, http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup&segmentTxId=1274897999&storageInfo=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848&
      inProgressOk=true
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup&segmentTxId=1274897999&s
      torageInfo=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848&inProgressOk=true, http://srve2916.local:8480/getJournal?jid=datalab-hadoop-backup&segmentTxId=1274897999&storageInfo=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217
      -aef5-6ed206893848&inProgressOk=true' to transaction ID 1274897999
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,661 INFO [Edit log tailer] RedundantEditLogInputStream - Fast-forwarding stream 'http://srvd87.local:8480/getJournal?jid=datalab-hadoop-backup&segmentTxId=1274897999&storageInfo=-64%3A1056233980%3A0%3ACID-1fba08aa-c8bd-4217-aef5-6ed206893848&inProgressOk=true' to transaction ID 1274897999
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,680 ERROR [Edit log tailer] FSEditLogLoader - Encountered exception on operation AddOp [length=0, inodeId=145550319, path=/kafka/parquet/infrastructureGrace/date=2018-02-28/_temporary/1/_temporary/attempt_1516181147167_20856_r_000098_0/part-r-00098.gz.parquet, replication=3, mtime=1519772206615, atime=1519772206615, blockSize=134217728, blocks=[], permissions=root:supergroup:rw-r--r--, aclEntries=null, clientName=DFSClient_attempt_1516181147167_20856_r_000098_0_1523538799_1, clientMachine=10.137.2.142, overwrite=false, RpcClientId=, RpcCallId=271996603, storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_ADD, txid=1274898002]
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected length 16
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.ipc.RetryCache$CacheEntry.<init>(RetryCache.java:74)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.ipc.RetryCache$CacheEntry.<init>(RetryCache.java:86)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.<init>(RetryCache.java:163)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:946)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,688 FATAL [Edit log tailer] EditLogTailer - Unknown error encountered while tailing edits. Shutting down standby NN.
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: java.io.IOException: java.lang.IllegalStateException: Cannot skip to less than the current value (=145550319), where newValue=145550318
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1941)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: Caused by: java.lang.IllegalStateException: Cannot skip to less than the current value (=145550319), where newValue=145550318
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1939)
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: ... 10 more
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,708  INFO [Edit log tailer] ExitUtil - Exiting with status 1: java.io.IOException: java.lang.IllegalStateException: Cannot skip to less than the current value (=145550319), where newValue=145550318
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: 2018-02-28 01:58:34,722 INFO [pool-1-thread-1] NameNode - SHUTDOWN_MSG:
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: /************************************************************
      Feb 28 01:58:34 srvd2135 datalab-namenode[15566]: SHUTDOWN_MSG: Shutting down NameNode at srvd2135.local/10.137.2.39
      

       
      In JN logs: 

      Feb 28 01:57:51 srvd87 datalab-journalnode[29960]: 2018-02-28 01:57:51,552 INFO [IPC Server handler 4 on 8485] FileJournalManager - Finalizing edits file /one/hadoop-data/journal/datalab-hadoop-backup/current/edits_inprogress_00000
      00001274897999 -> /one/hadoop-data/journal/datalab-hadoop-backup/current/edits_0000000001274897999-0000000001274898515
      Feb 28 01:58:34 srvd87 datalab-journalnode[29960]: 2018-02-28 01:58:34,671 INFO [qtp414690789-164] TransferFsImage - Sending fileName: /one/hadoop-data/journal/datalab-hadoop-backup/current/edits_0000000001274897999-0000000001274898515, fileSize: 80772. Sent total: 80772 bytes. Size of last segment intended to send: -1 bytes.
      
      $ zgrep -c 'Size of last segment intended to send' archive/datalab-journalnode.log-20180{22*,3*}.gz
      archive/datalab-journalnode.log-20180227.gz:0    #hadoop 2.7.3
      archive/datalab-journalnode.log-20180301.gz:109  #hadoop 3.0.0
      archive/datalab-journalnode.log-20180302.gz:111  #hadoop 3.0.0
      archive/datalab-journalnode.log-20180304.gz:0    #hadoop 2.7.3
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                _ph Yuriy Malygin
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: