Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9696

Garbage snapshot records lingering forever

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.2
    • Fix Version/s: 2.6.5, 2.7.4, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We have a cluster where the snapshot feature might have been tested years ago. When the HDFS does not have any snapshot, but I see filediff records persisted in its fsimage. Since it has been restarted many times and checkpointed over 100 times since then, it must haven been persisted and carried over since then.

      1. HDFS-9696.branch-2.6.patch
        3 kB
        Kihwal Lee
      2. HDFS-9696.patch
        3 kB
        Kihwal Lee
      3. HDFS-9696.v2.patch
        3 kB
        Kihwal Lee
      4. HDFS-9696-branch-2.7.patch
        4 kB
        Brahma Reddy Battula

        Issue Links

          Activity

          Hide
          kihwal Kihwal Lee added a comment -
          <INodeReferenceSection></INodeReferenceSection><SnapshotSection><snapshotCounter>0</snapshotCounter></SnapshotSection>
          ...
          <SnapshotDiffSection><diff><inodeid>16385</inodeid></diff><diff><inodeid>43008443</inodeid><filediff><snapshotId>-1</snapshotId><size>0</size><name>action-data.seq</name></filediff>
          </diff><diff><inodeid>43108392</inodeid><filediff><snapshotId>-1</snapshotId><size>302</size><name>some_random_file</name></filediff>
          ...
          </SnapshotDiffSection>
          

          The file with inode number 43008443 exists. As it is shown, there is no snapshot that SnapshotManager is aware of and the snapshot ID of all filediff entries are -1.

          Show
          kihwal Kihwal Lee added a comment - <INodeReferenceSection> </INodeReferenceSection> <SnapshotSection> <snapshotCounter> 0 </snapshotCounter> </SnapshotSection> ... <SnapshotDiffSection> <diff> <inodeid> 16385 </inodeid> </diff> <diff> <inodeid> 43008443 </inodeid> <filediff> <snapshotId> -1 </snapshotId> <size> 0 </size> <name> action-data.seq </name> </filediff> </diff> <diff> <inodeid> 43108392 </inodeid> <filediff> <snapshotId> -1 </snapshotId> <size> 302 </size> <name> some_random_file </name> </filediff> ... </SnapshotDiffSection> The file with inode number 43008443 exists. As it is shown, there is no snapshot that SnapshotManager is aware of and the snapshot ID of all filediff entries are -1.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Kihwal Lee,

          Thanks much for reporting this issue. I have been looking in to HDFS-9406 and observed the same. I have made progress on HDFS-9406 and am still working on.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Kihwal Lee , Thanks much for reporting this issue. I have been looking in to HDFS-9406 and observed the same. I have made progress on HDFS-9406 and am still working on.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Kihwal,

          Since I am working on, do you mind if I assign it to myself?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Kihwal, Since I am working on, do you mind if I assign it to myself? Thanks.
          Hide
          kihwal Kihwal Lee added a comment -

          do you mind if I assign it to myself?

          I don't. But I noticed that none of the original snapshot feature developers are watching HDFS-9406. At some point, we should call them out.

          Show
          kihwal Kihwal Lee added a comment - do you mind if I assign it to myself? I don't. But I noticed that none of the original snapshot feature developers are watching HDFS-9406 . At some point, we should call them out.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Kihwal. Yes, agree. While I have been investigating, I indeed planned to ask the snapshot developers for help at some point.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Kihwal. Yes, agree. While I have been investigating, I indeed planned to ask the snapshot developers for help at some point.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Ah, I intended to write the request message in my prior comment before reassigning, just found that I accidentally reassigned together with the request message. Sorry about that.

          Show
          yzhangal Yongjun Zhang added a comment - Ah, I intended to write the request message in my prior comment before reassigning, just found that I accidentally reassigned together with the request message. Sorry about that.
          Hide
          jingzhao Jing Zhao added a comment - - edited

          Currently I think HDFS-9406 and HDFS-9697 may both be caused by some lingering INode in the diff list. Both failed when loading INode from the inode Map. Compared with the logic for removing inodes from inode map, cleaning diff list is more complicated thus has higher chance to have bug.

          Show
          jingzhao Jing Zhao added a comment - - edited Currently I think HDFS-9406 and HDFS-9697 may both be caused by some lingering INode in the diff list. Both failed when loading INode from the inode Map. Compared with the logic for removing inodes from inode map, cleaning diff list is more complicated thus has higher chance to have bug.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Yes Jing Zhao, your analysis is correct to me per my study in HDFS-9406. Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Yes Jing Zhao , your analysis is correct to me per my study in HDFS-9406 . Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          And I have a solution for HDFS-9697, for the case I created. Yet to prove that it will work with all situations.

          Show
          yzhangal Yongjun Zhang added a comment - And I have a solution for HDFS-9697 , for the case I created. Yet to prove that it will work with all situations.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Kihwal Lee,

          Thanks a lot for reporting the issue here. Upon investigation, we think it's likely duplicate of HDFS-9406, which is now resolved. I'm closing this jira for now. If you still see the problem after applying the fix HDFS-9406, please feel free to reopen.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Kihwal Lee , Thanks a lot for reporting the issue here. Upon investigation, we think it's likely duplicate of HDFS-9406 , which is now resolved. I'm closing this jira for now. If you still see the problem after applying the fix HDFS-9406 , please feel free to reopen. Thanks.
          Hide
          kihwal Kihwal Lee added a comment -

          Thanks, Yongjun Zhang!

          Show
          kihwal Kihwal Lee added a comment - Thanks, Yongjun Zhang !
          Hide
          yzhangal Yongjun Zhang added a comment -

          Very welcome Kihwal Lee!

          Show
          yzhangal Yongjun Zhang added a comment - Very welcome Kihwal Lee !
          Hide
          kihwal Kihwal Lee added a comment -

          It turns out that HDFS-9406 is not related to this issue.

          The garbage snapshot filediffs with snapshotId=-1 were being generated by a bug fixed in HDFS-7056 by Plamen Jeliazkov.

             /** Is this inode in the latest snapshot? */
             public final boolean isInLatestSnapshot(final int latestSnapshotId) {
          -    if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) {
          +    if (latestSnapshotId == Snapshot.CURRENT_STATE_ID ||
          +        latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) {
                 return false;
               }
          

          Konstantin Shvachko explained,

          (7) Plamen says this is because Snapshot.findLatestSnapshot() may return NO_SNAPSHOT_ID, which breaks recordModification() if you don't have that additional check. We see it when commitBlockSynchronization() is called for truncated block.

          We have actually traced the generation of these filediff entries to commitBlockSynchronization() activities when the NN was running 2.5. This stops in 2.7 thanks to HDFS-7056. However, the garbage lives on until those files are deleted. Can we have a sanity check during snapshot diff loading so that these entries can be discarded?

          Show
          kihwal Kihwal Lee added a comment - It turns out that HDFS-9406 is not related to this issue. The garbage snapshot filediffs with snapshotId=-1 were being generated by a bug fixed in HDFS-7056 by Plamen Jeliazkov . /** Is this inode in the latest snapshot? */ public final boolean isInLatestSnapshot( final int latestSnapshotId) { - if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) { + if (latestSnapshotId == Snapshot.CURRENT_STATE_ID || + latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) { return false ; } Konstantin Shvachko explained, (7) Plamen says this is because Snapshot.findLatestSnapshot() may return NO_SNAPSHOT_ID, which breaks recordModification() if you don't have that additional check. We see it when commitBlockSynchronization() is called for truncated block. We have actually traced the generation of these filediff entries to commitBlockSynchronization() activities when the NN was running 2.5. This stops in 2.7 thanks to HDFS-7056 . However, the garbage lives on until those files are deleted. Can we have a sanity check during snapshot diff loading so that these entries can be discarded?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks for the info Kihwal Lee!

          Show
          yzhangal Yongjun Zhang added a comment - Thanks for the info Kihwal Lee !
          Hide
          kihwal Kihwal Lee added a comment -

          One basic sanity check can be done for cases where there is no snapshot. When saving snapshot diff section, we can call getNumSnapshots() to check whether there is any snapshot. If none, saving diff section can be skipped.

          Show
          kihwal Kihwal Lee added a comment - One basic sanity check can be done for cases where there is no snapshot. When saving snapshot diff section, we can call getNumSnapshots() to check whether there is any snapshot. If none, saving diff section can be skipped.
          Hide
          kihwal Kihwal Lee added a comment -

          Does something like this make sense? Saving a diff section involves iterating the entire inode map. When there is no snapshot, we can potentially cut down fsimage saving time and reduce java object generation.

          --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
          +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
          @@ -496,7 +496,10 @@ private void saveInternal(FileOutputStream fout,
                 Step step = new Step(StepType.INODES, filePath);
                 prog.beginStep(Phase.SAVING_CHECKPOINT, step);
                 saveInodes(b);
          -      saveSnapshots(b);
          +      if (context.getSourceNamesystem().getSnapshotManager()
          +          .getNumSnapshots() > 0) {
          +        saveSnapshots(b);
          +      }
                 prog.endStep(Phase.SAVING_CHECKPOINT, step);
          

          If no one objects, I will add a test case and submit a patch.

          Show
          kihwal Kihwal Lee added a comment - Does something like this make sense? Saving a diff section involves iterating the entire inode map. When there is no snapshot, we can potentially cut down fsimage saving time and reduce java object generation. --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java @@ -496,7 +496,10 @@ private void saveInternal(FileOutputStream fout, Step step = new Step(StepType.INODES, filePath); prog.beginStep(Phase.SAVING_CHECKPOINT, step); saveInodes(b); - saveSnapshots(b); + if (context.getSourceNamesystem().getSnapshotManager() + .getNumSnapshots() > 0) { + saveSnapshots(b); + } prog.endStep(Phase.SAVING_CHECKPOINT, step); If no one objects, I will add a test case and submit a patch.
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          Kihwal Lee, the idea is simple and great! Please submit a patch. Thanks!

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - Kihwal Lee , the idea is simple and great! Please submit a patch. Thanks!
          Hide
          kihwal Kihwal Lee added a comment -

          Attaching a patch containing a unit test.

          Show
          kihwal Kihwal Lee added a comment - Attaching a patch containing a unit test.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 0s trunk passed
          +1 compile 0m 43s trunk passed
          +1 checkstyle 0m 25s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 42s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 47s the patch passed
          +1 compile 0m 43s the patch passed
          +1 javac 0m 43s the patch passed
          +1 checkstyle 0m 23s the patch passed
          +1 mvnsite 0m 58s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 52s the patch passed
          +1 javadoc 0m 51s the patch passed
          -1 unit 56m 57s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          76m 14s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots
            hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823518/HDFS-9696.patch
          JIRA Issue HDFS-9696
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 44f817c68730 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 23c6e3c
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16416/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16416/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16416/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 0s trunk passed +1 compile 0m 43s trunk passed +1 checkstyle 0m 25s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 42s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 58s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 52s the patch passed +1 javadoc 0m 51s the patch passed -1 unit 56m 57s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 76m 14s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823518/HDFS-9696.patch JIRA Issue HDFS-9696 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 44f817c68730 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 23c6e3c Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16416/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16416/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16416/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          The test failures seem related. Please take a look.

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - The test failures seem related. Please take a look.
          Hide
          kihwal Kihwal Lee added a comment -

          It looks like these tests failed because the snapshot section wasn't present. When the existing namenode reloads such an image, the snapshot manager state may not be properly reset. I made it skip only the diff section and they seem to pass.

          Show
          kihwal Kihwal Lee added a comment - It looks like these tests failed because the snapshot section wasn't present. When the existing namenode reloads such an image, the snapshot manager state may not be properly reset. I made it skip only the diff section and they seem to pass.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 19s trunk passed
          +1 compile 0m 50s trunk passed
          +1 checkstyle 0m 28s trunk passed
          +1 mvnsite 0m 59s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 1m 46s trunk passed
          +1 javadoc 1m 0s trunk passed
          +1 mvninstall 0m 49s the patch passed
          +1 compile 0m 47s the patch passed
          +1 javac 0m 47s the patch passed
          +1 checkstyle 0m 26s the patch passed
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 51s the patch passed
          +1 javadoc 0m 55s the patch passed
          -1 unit 90m 43s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          111m 12s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestPersistBlocks
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823548/HDFS-9696.v2.patch
          JIRA Issue HDFS-9696
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 6fa8fece0684 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 23c6e3c
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16417/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16417/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16417/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 19s trunk passed +1 compile 0m 50s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 59s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 1m 46s trunk passed +1 javadoc 1m 0s trunk passed +1 mvninstall 0m 49s the patch passed +1 compile 0m 47s the patch passed +1 javac 0m 47s the patch passed +1 checkstyle 0m 26s the patch passed +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 51s the patch passed +1 javadoc 0m 55s the patch passed -1 unit 90m 43s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 111m 12s Reason Tests Failed junit tests hadoop.hdfs.TestPersistBlocks Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823548/HDFS-9696.v2.patch JIRA Issue HDFS-9696 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 6fa8fece0684 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 23c6e3c Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16417/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16417/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16417/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          kihwal Kihwal Lee added a comment -
          -------------------------------------------------------
           T E S T S
          -------------------------------------------------------
          Running org.apache.hadoop.hdfs.TestLeaseRecovery2
          Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 88.065 sec - in org.apache.hadoop.hdfs.TestLeaseRecovery2
          Running org.apache.hadoop.hdfs.TestPersistBlocks
          Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.3 sec - in org.apache.hadoop.hdfs.TestPersistBlocks
          

          The failed test cases all pass when rerun. TestPersistBlocks was reported in HDFS-5770.

          Show
          kihwal Kihwal Lee added a comment - ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.hdfs.TestLeaseRecovery2 Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 88.065 sec - in org.apache.hadoop.hdfs.TestLeaseRecovery2 Running org.apache.hadoop.hdfs.TestPersistBlocks Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.3 sec - in org.apache.hadoop.hdfs.TestPersistBlocks The failed test cases all pass when rerun. TestPersistBlocks was reported in HDFS-5770 .
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good. Thanks Kihwal!

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - +1 patch looks good. Thanks Kihwal!
          Hide
          kihwal Kihwal Lee added a comment -

          Thanks for the review, Tsz Wo Nicholas Sze. I will commit it shortly.

          Show
          kihwal Kihwal Lee added a comment - Thanks for the review, Tsz Wo Nicholas Sze . I will commit it shortly.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10272 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10272/)
          HDFS-9696. Garbage snapshot records linger forever. Contributed by (kihwal: rev 83e57e083f2cf6c0de8a46966c5492faeabd8f2a)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10272 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10272/ ) HDFS-9696 . Garbage snapshot records linger forever. Contributed by (kihwal: rev 83e57e083f2cf6c0de8a46966c5492faeabd8f2a) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
          Hide
          kihwal Kihwal Lee added a comment -

          Committed all the way to branch-2.7. The same fix applies to branch-2.6, but the unit test needs to be rewritten. Chris Trezzo, do you want this for 2.6?

          Show
          kihwal Kihwal Lee added a comment - Committed all the way to branch-2.7. The same fix applies to branch-2.6, but the unit test needs to be rewritten. Chris Trezzo , do you want this for 2.6?
          Hide
          zhz Zhe Zhang added a comment -

          This seems to break TestOpenFilesWithSnapshot in branch-2.7.

          Show
          zhz Zhe Zhang added a comment - This seems to break TestOpenFilesWithSnapshot in branch-2.7.
          Hide
          kihwal Kihwal Lee added a comment -

          Zhe Zhang I will take a look.

          Show
          kihwal Kihwal Lee added a comment - Zhe Zhang I will take a look.
          Hide
          kihwal Kihwal Lee added a comment - - edited

          I think it is broken by HDFS-10763 and a real bug. I will fix it soon. It is not an issue for trunk to branch-2.8, as the lease is inode ID based.

          Show
          kihwal Kihwal Lee added a comment - - edited I think it is broken by HDFS-10763 and a real bug. I will fix it soon. It is not an issue for trunk to branch-2.8, as the lease is inode ID based.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks for looking into this Kihwal Lee! Sorry about the wrong alarm. I merely verified the commit before this one and it passed.

          Show
          zhz Zhe Zhang added a comment - Thanks for looking into this Kihwal Lee ! Sorry about the wrong alarm. I merely verified the commit before this one and it passed.
          Hide
          ctrezzo Chris Trezzo added a comment -

          Kihwal Lee Do you think this is worth backporting to branch-2.6? If the unit test rewrite is small, it seems to make sense to me.

          Show
          ctrezzo Chris Trezzo added a comment - Kihwal Lee Do you think this is worth backporting to branch-2.6? If the unit test rewrite is small, it seems to make sense to me.
          Hide
          kihwal Kihwal Lee added a comment -

          I think it is worth having in branch-2.6. We would if we were still on 2.6. Attaching a patch for 2.6.

          Show
          kihwal Kihwal Lee added a comment - I think it is worth having in branch-2.6. We would if we were still on 2.6. Attaching a patch for 2.6.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          FYR..Uploading the branch-2.7 patch which was committed..Since there was conflict.

          Show
          brahmareddy Brahma Reddy Battula added a comment - FYR..Uploading the branch-2.7 patch which was committed..Since there was conflict.
          Hide
          brahmareddy Brahma Reddy Battula added a comment -

          Oh, branch-2.6 patch is same with branch-2.7.

          Show
          brahmareddy Brahma Reddy Battula added a comment - Oh, branch-2.6 patch is same with branch-2.7.
          Hide
          sjlee0 Sangjin Lee added a comment -

          Committed it to 2.6.5.

          Show
          sjlee0 Sangjin Lee added a comment - Committed it to 2.6.5.

            People

            • Assignee:
              kihwal Kihwal Lee
              Reporter:
              kihwal Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development