Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12836

startTxId could be greater than endTxId when tailing in-progress edit log

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha1
    • 3.1.0, 3.0.1
    • hdfs
    • None
    • Reviewed

    Description

      When dfs.ha.tail-edits.in-progress is true, edit log tailer will also tail those in progress edit log segments. However, in the following code:

              if (onlyDurableTxns && inProgressOk) {
                endTxId = Math.min(endTxId, committedTxnId);
              }
      
              EditLogInputStream elis = EditLogFileInputStream.fromUrl(
                  connectionFactory, url, remoteLog.getStartTxId(),
                  endTxId, remoteLog.isInProgress());
      

      it is possible that remoteLog.getStartTxId() could be greater than endTxId, and therefore will cause the following error:

      2017-11-17 19:55:41,165 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: Error replaying edit log at offset 1048576.  Expected transaction ID was 87
      Recent opcode offsets: 1048576
      org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: got premature end-of-file at txid 86; expected file to go up to 85
              at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197)
              at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
              at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189)
              at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
              at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205)
              at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
              at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
      2017-11-17 19:55:41,165 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Error while reading edits from disk. Will try again.
      org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 1048576.  Expected transaction ID was 87
      Recent opcode offsets: 1048576
              at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:218)
              at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
              at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481)
              at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
      Caused by: org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: got premature end-of-file at txid 86; expected file to go up to 85
              at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197)
              at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
              at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189)
              at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
              at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205)
              ... 9 more
      

      Attachments

        1. HDFS-12836.1.patch
          2 kB
          Chao Sun
        2. HDFS-12836.2.patch
          3 kB
          Chao Sun
        3. HDFS-12836.3.patch
          3 kB
          Chao Sun

        Issue Links

          Activity

            People

              csun Chao Sun
              csun Chao Sun
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: