Details

    • Release Note:
      Hide
      <!-- markdown -->
      Public service notice:
      * Every restart of a 2.6.x or 2.7.0 DN incurs a risk of unwanted block deletion.
      * Apply this patch if you are running a pre-2.7.1 release.
      Show
      <!-- markdown --> Public service notice: * Every restart of a 2.6.x or 2.7.0 DN incurs a risk of unwanted block deletion. * Apply this patch if you are running a pre-2.7.1 release.

      Description

      A race condition between block pool initialization and the directory scanner may cause a mass deletion of blocks in multiple storages.

      If block pool initialization finds a block on disk that is already in the replica map, it deletes one of the blocks based on size, GS, etc. Unfortunately it always deletes one of the blocks even if identical, thus the replica map must be empty when the pool is initialized.

      The directory scanner starts at a random time within its periodic interval (default 6h). If the scanner starts very early it races to populate the replica map, causing the block pool init to erroneously delete blocks.

      1. HDFS-8486.patch
        6 kB
        Daryn Sharp
      2. HDFS-8486.patch
        6 kB
        Daryn Sharp
      3. HDFS-8486-branch-2.6.02.patch
        6 kB
        Arpit Agarwal
      4. HDFS-8486-branch-2.6.addendum.patch
        0.7 kB
        Arpit Agarwal
      5. HDFS-8486-branch-2.6.patch
        6 kB
        Arpit Agarwal

        Issue Links

          Activity

          Hide
          daryn Daryn Sharp added a comment -

          A subtle reordering of method invocation appears to be the source of the bug.

          Show
          daryn Daryn Sharp added a comment - A subtle reordering of method invocation appears to be the source of the bug.
          Hide
          daryn Daryn Sharp added a comment -

          What you'll notice is a spike in corrupt blocks that tapers down. What's going on is the DN's block report included all the blocks it deleted. Over the next 6 hours, the slice scanner slowly detects missing blocks and reports them as corrupt. After 6 hours, the directory scanner detects and mass removes all the missing blocks.

          In the 6 hour window, the NN does not know the block is under-replicated and it continues to send clients to the DN. Will file a separate bug for the DN not informing the NN when it's missing a block it thought it had.

          Show
          daryn Daryn Sharp added a comment - What you'll notice is a spike in corrupt blocks that tapers down. What's going on is the DN's block report included all the blocks it deleted. Over the next 6 hours, the slice scanner slowly detects missing blocks and reports them as corrupt. After 6 hours, the directory scanner detects and mass removes all the missing blocks. In the 6 hour window, the NN does not know the block is under-replicated and it continues to send clients to the DN. Will file a separate bug for the DN not informing the NN when it's missing a block it thought it had.
          Hide
          daryn Daryn Sharp added a comment -

          After multiple iterations, this is simplest low-risk patch. The crucial part is the BlockPoolSlice realizes it's discovered an on-block disk that has the same path as in-memory. In which case it updates the replica map with the one just found.

          The other part is avoiding the race altogether. The directory scan should not occur until after the block pools are initialized. Although both should be able to "work" simultaneously, until initialized the first time, the directory scanner warns there's no block scanner for every new block it finds.

          Note I found writing a unit test to be extremely difficult. The BlockPoolSlice ctor has numerous side-effects. I instead split out part of duplicate resolution into a static method (sigh, makes future mocking impossible).

          Show
          daryn Daryn Sharp added a comment - After multiple iterations, this is simplest low-risk patch. The crucial part is the BlockPoolSlice realizes it's discovered an on-block disk that has the same path as in-memory. In which case it updates the replica map with the one just found. The other part is avoiding the race altogether. The directory scan should not occur until after the block pools are initialized. Although both should be able to "work" simultaneously, until initialized the first time, the directory scanner warns there's no block scanner for every new block it finds. Note I found writing a unit test to be extremely difficult. The BlockPoolSlice ctor has numerous side-effects. I instead split out part of duplicate resolution into a static method (sigh, makes future mocking impossible).
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 11s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 41s There were no new javac warning messages.
          +1 javadoc 9m 50s There were no new javadoc warning messages.
          +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 2m 13s There were no new checkstyle issues.
          -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse.
          +1 findbugs 3m 20s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 native 3m 18s Pre-build of native portion
          +1 hdfs tests 161m 56s Tests passed in hadoop-hdfs.
              209m 11s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12735927/HDFS-8486.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 7ebe80e
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11153/artifact/patchprocess/whitespace.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11153/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11153/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11153/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 11s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 41s There were no new javac warning messages. +1 javadoc 9m 50s There were no new javadoc warning messages. +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 2m 13s There were no new checkstyle issues. -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse. +1 findbugs 3m 20s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 18s Pre-build of native portion +1 hdfs tests 161m 56s Tests passed in hadoop-hdfs.     209m 11s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12735927/HDFS-8486.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 7ebe80e whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11153/artifact/patchprocess/whitespace.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11153/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11153/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11153/console This message was automatically generated.
          Hide
          daryn Daryn Sharp added a comment -

          removed whitespace...

          Show
          daryn Daryn Sharp added a comment - removed whitespace...
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 15m 10s Findbugs (version ) appears to be broken on trunk.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 28s There were no new javac warning messages.
          +1 javadoc 9m 30s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 50s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 3m 12s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 native 3m 16s Pre-build of native portion
          -1 hdfs tests 162m 13s Tests failed in hadoop-hdfs.
              204m 13s  



          Reason Tests
          Failed unit tests hadoop.hdfs.server.namenode.TestFileTruncate



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12736613/HDFS-8486.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 63e3fee
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11190/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11190/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11190/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 15m 10s Findbugs (version ) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 28s There were no new javac warning messages. +1 javadoc 9m 30s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 50s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 3m 12s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 16s Pre-build of native portion -1 hdfs tests 162m 13s Tests failed in hadoop-hdfs.     204m 13s   Reason Tests Failed unit tests hadoop.hdfs.server.namenode.TestFileTruncate Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12736613/HDFS-8486.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 63e3fee hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11190/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11190/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11190/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Great find, Daryn Sharp. And nice work fixing it... as usual.

          It sounds like this change:

          --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          @@ -1370,9 +1370,9 @@ void initBlockPool(BPOfferService bpos) throws IOException {
               // failures.
               checkDiskError();
           
          -    initDirectoryScanner(conf);
               data.addBlockPool(nsInfo.getBlockPoolID(), conf);
               blockScanner.enableBlockPoolId(bpos.getBlockPoolId());
          +    initDirectoryScanner(conf);
          

          should be sufficient to avoid the problem for the non-federation case, since the FsDatasetSpi#addBlockPool code path will do the initial scan even before the DirectoryScanner is created.

          The change to selectReplicaToDelete should guard against the problem in the federation case, by never deleting a replica just because we already have a replica with the same path in the set. It's a nice robustness improvement.

          Note I found writing a unit test to be extremely difficult. The BlockPoolSlice ctor has numerous side-effects. I instead split out part of duplicate resolution into a static method (sigh, makes future mocking impossible).

          Hmm... it seems like you could create a mock for BlockPoolSlice#resolveDuplicateReplicas, which is the only caller of the static method. For that reason, perhaps we should add @VisibleForTesting to selectReplicaToDelete?

          +1 pending that change. Great work, Daryn.

          Show
          cmccabe Colin P. McCabe added a comment - Great find, Daryn Sharp . And nice work fixing it... as usual. It sounds like this change: --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java @@ -1370,9 +1370,9 @@ void initBlockPool(BPOfferService bpos) throws IOException { // failures. checkDiskError(); - initDirectoryScanner(conf); data.addBlockPool(nsInfo.getBlockPoolID(), conf); blockScanner.enableBlockPoolId(bpos.getBlockPoolId()); + initDirectoryScanner(conf); should be sufficient to avoid the problem for the non-federation case, since the FsDatasetSpi#addBlockPool code path will do the initial scan even before the DirectoryScanner is created. The change to selectReplicaToDelete should guard against the problem in the federation case, by never deleting a replica just because we already have a replica with the same path in the set. It's a nice robustness improvement. Note I found writing a unit test to be extremely difficult. The BlockPoolSlice ctor has numerous side-effects. I instead split out part of duplicate resolution into a static method (sigh, makes future mocking impossible). Hmm... it seems like you could create a mock for BlockPoolSlice#resolveDuplicateReplicas , which is the only caller of the static method. For that reason, perhaps we should add @VisibleForTesting to selectReplicaToDelete ? +1 pending that change. Great work, Daryn.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Since the only change I was requesting was adding the @VisibleForTesting annotation, and since this fix is so critical, I'm going to commit it now and file a follow-on to add the annotation.

          Show
          cmccabe Colin P. McCabe added a comment - Since the only change I was requesting was adding the @VisibleForTesting annotation, and since this fix is so critical, I'm going to commit it now and file a follow-on to add the annotation.
          Hide
          cmccabe Colin P. McCabe added a comment -

          committed to 2.7.1

          Show
          cmccabe Colin P. McCabe added a comment - committed to 2.7.1
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7944 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7944/)
          HDFS-8486. DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7944 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7944/ ) HDFS-8486 . DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #217 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/217/)
          HDFS-8486. DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #217 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/217/ ) HDFS-8486 . DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #947 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/947/)
          HDFS-8486. DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #947 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/947/ ) HDFS-8486 . DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2163 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2163/)
          HDFS-8486. DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2163 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2163/ ) HDFS-8486 . DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #215 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/215/)
          HDFS-8486. DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #215 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/215/ ) HDFS-8486 . DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2145 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2145/)
          HDFS-8486. DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2145 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2145/ ) HDFS-8486 . DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #206 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/206/)
          HDFS-8486. DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #206 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/206/ ) HDFS-8486 . DN startup may cause severe data loss (Daryn Sharp via Colin P. McCabe) (cmccabe: rev 03fb5c642589dec4e663479771d0ae1782038b63) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          Hide
          daryn Daryn Sharp added a comment -

          Public service notice:

          • Every restart of a 2.6.x or 2.7.0 DN incurs a risk of unwanted block deletion.
          • Apply this patch if you are running a pre-2.7.1 release.

          I previously attributed this as an ancient bug but it's new to 2.6. HDFS-2560 did start the scanner too early but the race caused a benign log warning. In 2.6, HDFS-6931 made an unrelated change that introduced the faulty (mass) deletion logic.

          Show
          daryn Daryn Sharp added a comment - Public service notice: Every restart of a 2.6.x or 2.7.0 DN incurs a risk of unwanted block deletion . Apply this patch if you are running a pre-2.7.1 release. I previously attributed this as an ancient bug but it's new to 2.6. HDFS-2560 did start the scanner too early but the race caused a benign log warning. In 2.6, HDFS-6931 made an unrelated change that introduced the faulty (mass) deletion logic.
          Hide
          qwertymaniac Harsh J added a comment -

          (Promoting comment into release-notes area of JIRA just so its better visible)

          Show
          qwertymaniac Harsh J added a comment - (Promoting comment into release-notes area of JIRA just so its better visible)
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Patch for branch-2.6. Can someone familiar with the original change code review it?

          Show
          arpitagarwal Arpit Agarwal added a comment - Patch for branch-2.6. Can someone familiar with the original change code review it?
          Hide
          xyao Xiaoyu Yao added a comment -

          Thanks Arpit. The branch-2.6 patch LGTM, +1.

          Show
          xyao Xiaoyu Yao added a comment - Thanks Arpit. The branch-2.6 patch LGTM, +1.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thanks Xiaoyu Yao, will hold off committing for a couple of days in case there are additional comments.

          Show
          arpitagarwal Arpit Agarwal added a comment - Thanks Xiaoyu Yao , will hold off committing for a couple of days in case there are additional comments.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Merged for 2.6.1.

          Show
          arpitagarwal Arpit Agarwal added a comment - Merged for 2.6.1.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Addendum patch to fix an issue introduced by conflict resolution with HDFS-7430.

          Show
          arpitagarwal Arpit Agarwal added a comment - Addendum patch to fix an issue introduced by conflict resolution with HDFS-7430 .
          Hide
          cnauroth Chris Nauroth added a comment -

          +1 for the addendum patch. Thank you, Arpit.

          Show
          cnauroth Chris Nauroth added a comment - +1 for the addendum patch. Thank you, Arpit.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thanks Chris, pushed to branch-2.6.

          Show
          arpitagarwal Arpit Agarwal added a comment - Thanks Chris, pushed to branch-2.6.
          Hide
          dlmarion Dave Marion added a comment -

          Does this also affect 2.5.0? If so, can someone provide a patch for it? The branch-2.6 patches don't apply cleanly and the code is different.

          Show
          dlmarion Dave Marion added a comment - Does this also affect 2.5.0? If so, can someone provide a patch for it? The branch-2.6 patches don't apply cleanly and the code is different.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          2.5.0 is not affected.

          Show
          arpitagarwal Arpit Agarwal added a comment - 2.5.0 is not affected.
          Hide
          dlmarion Dave Marion added a comment -

          Thanks for the quick response!

          Show
          dlmarion Dave Marion added a comment - Thanks for the quick response!
          Hide
          dlmarion Dave Marion added a comment -

          Thanks for the quick response!

          Show
          dlmarion Dave Marion added a comment - Thanks for the quick response!

            People

            • Assignee:
              daryn Daryn Sharp
              Reporter:
              daryn Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development