Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16349

FSEditLog checkForGaps break HDFS RollingUpgrade Rollback

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Blocker
    • Resolution: Unresolved
    • 3.2.2, 3.3.1, 3.2.3, 3.3.2
    • None
    • hdfs
    • None

    Description

      2021-11-22 20:36:44,440 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest log: 10.65.57.133:8485=segmentState

      {   startTxId: 3906965   endTxId: 3906965   isInProgress: false }

      lastWriterEpoch: 5
      lastCommittedTxId: 3906964

      2021-11-22 20:36:44,457 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /data12/data/flashHadoopU/namenode/current
      2021-11-22 20:36:44,495 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /data12/data/flashHadoopU/namenode/current/edits_inprogress_0000000000003898378 -> /data12/data/flashHadoopU/namenode/current/edits_0000000000003898378-0000000000003898412
      2021-11-22 20:36:44,657 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
      java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 2510934 but unable to find any edit logs containing txid 2510933
          at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578)
          at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536)
          at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652)
          at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
          at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
          at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
          at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
          at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
          at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
          at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
      2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070
      2021-11-22 20:36:44,760 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
      2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
      2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
      2021-11-22 20:36:44,761 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.

      Old version: 2.7.3

      New version: 3.2.2

      Steps to Reproduce

      Step 1: Start NN1 as active , NN2 as standby .
      Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare"
      Step 3: Start NN2 active and NN1 as standby with rolling upgrade started option.
      Step 4: DN also restarted in upgrade mode.

      Step 5: Restart journalnode with new hadoop version 
      Step 6: a few days later

      Step 7: bring down both NN, journalnode, DN

      Step 8: Start JN with old version
      Step 9: Start NN1 with rolling upgrade rollback option. nn started failed with above ERROR(Above mentioned txid version 2510933 has been deleted because of  checkpoint mechanism)

       

      Attachments

        1. HDFS-16349-branch-3.2.3.patch
          0.9 kB
          chuanjie.duan

        Issue Links

          Activity

            People

              Unassigned Unassigned
              chuanjie.duan chuanjie.duan
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: