Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4015

Safemode should count and report orphaned blocks

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The safemode status currently reports the number of unique reported blocks compared to the total number of blocks referenced by the namespace. However, it does not report the inverse: blocks which are reported by datanodes but not referenced by the namespace.

      In the case that an admin accidentally starts up from an old image, this can be confusing: safemode and fsck will show "corrupt files", which are the files which actually have been deleted but got resurrected by restarting from the old image. This will convince them that they can safely force leave safemode and remove these files – after all, they know that those files should really have been deleted. However, they're not aware that leaving safemode will also unrecoverably delete a bunch of other block files which have been orphaned due to the namespace rollback.

      I'd like to consider reporting something like: "900000 of expected 1000000 blocks have been reported. Additionally, 10000 blocks have been reported which do not correspond to any file in the namespace. Forcing exit of safemode will unrecoverably remove those data blocks"

      Whether this statistic is also used for some kind of "inverse safe mode" is the logical next step, but just reporting it as a warning seems easy enough to accomplish and worth doing.

      1. HDFS-4015.001.patch
        34 kB
        Anu Engineer
      2. HDFS-4015.002.patch
        35 kB
        Anu Engineer
      3. HDFS-4015.003.patch
        31 kB
        Anu Engineer
      4. HDFS-4015.004.patch
        34 kB
        Anu Engineer
      5. HDFS-4015.005.patch
        36 kB
        Anu Engineer
      6. HDFS-4015.006.patch
        36 kB
        Anu Engineer
      7. HDFS-4015.007.patch
        36 kB
        Arpit Agarwal

        Activity

        Hide
        anu Anu Engineer added a comment -

        Changes in this patch are:

        NameNode Changes:

        1. Today we ignore blocks that does not belong to any file, instead of just ignoring those blocks NN checks if any block has generation stamps in future and keep track of those.
        2. While leaving safe mode NN will refuse to leave if HDFS has blocks that are in future.
        3. Exposed BytesInFuture as a JMX value in case hadoop management tools wants to look for this.
        4. Added a new mode to exit safe mode called forceExit.

        Changes in DfsAdmin:

        1. Changed -report to not only detect we are in safe mode, but if we have bytes in future, an appropriate warning is printed.
        2. Supported a new command extension to -safemode called forceExit to indicate that user is ok with losing data and allows namenode to exit safe mode.

        Changes in DfsHealth.html:

        1. Will show modified message that relates to blocks having future generation stamps.

        Test Changes:

        1. Created a test that simulates the namenode meta-data being replaced and data nodes reporting in blocks with generation stamps in future.

        Also attached the screen shots of how this change will appear to users.

        Show
        anu Anu Engineer added a comment - Changes in this patch are: NameNode Changes: Today we ignore blocks that does not belong to any file, instead of just ignoring those blocks NN checks if any block has generation stamps in future and keep track of those. While leaving safe mode NN will refuse to leave if HDFS has blocks that are in future. Exposed BytesInFuture as a JMX value in case hadoop management tools wants to look for this. Added a new mode to exit safe mode called forceExit. Changes in DfsAdmin: Changed -report to not only detect we are in safe mode, but if we have bytes in future, an appropriate warning is printed. Supported a new command extension to -safemode called forceExit to indicate that user is ok with losing data and allows namenode to exit safe mode. Changes in DfsHealth.html: Will show modified message that relates to blocks having future generation stamps. Test Changes: Created a test that simulates the namenode meta-data being replaced and data nodes reporting in blocks with generation stamps in future. Also attached the screen shots of how this change will appear to users.
        Hide
        anu Anu Engineer added a comment -

        Re-attaching the patch since build failed in jenkins , the png files confused jenkins as it tried to compile it.

        Show
        anu Anu Engineer added a comment - Re-attaching the patch since build failed in jenkins , the png files confused jenkins as it tried to compile it.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Hi Anu Engineer, thanks for this improvement. Few comments below, I haven't reviewed the test case yet.

        1. ClientProtocol.java:729: Perhaps we can describe it as "bytes that are at risk for deletion."?
        2. DFSAdmin.java:474: This can happen even without blocks with future generation stamps e.g. DN is restarted after a long downtime and reports blocks for deleted files.
        3. FSNamesystem.java:4438: For turn-off tip, should we check getBytesInFuture after the threshold of reported blocks isreached? One potential issue is that the administrator may see this message and immediately run -forceExit even before block thresholds are reached.
        4. FSNamesystem.java:4445: "you are ok with data loss." might also be confusing. Perhaps we can say "if you are certain that the NameNode was started with the correct FsImage and edit logs."
        5. FSNamesystem.java:4631: Not sure how this works. leaveSafeMode will just return if (isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0))

        Comments also posted at https://github.com/arp7/hadoop/commit/f16f4525a9a814f0945e76af55ad06b5fc18ecb7

        Show
        arpitagarwal Arpit Agarwal added a comment - Hi Anu Engineer , thanks for this improvement. Few comments below, I haven't reviewed the test case yet. ClientProtocol.java:729: Perhaps we can describe it as "bytes that are at risk for deletion."? DFSAdmin.java:474: This can happen even without blocks with future generation stamps e.g. DN is restarted after a long downtime and reports blocks for deleted files. FSNamesystem.java:4438: For turn-off tip, should we check getBytesInFuture after the threshold of reported blocks isreached? One potential issue is that the administrator may see this message and immediately run -forceExit even before block thresholds are reached. FSNamesystem.java:4445: "you are ok with data loss." might also be confusing. Perhaps we can say "if you are certain that the NameNode was started with the correct FsImage and edit logs." FSNamesystem.java:4631: Not sure how this works. leaveSafeMode will just return if (isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0)) Comments also posted at https://github.com/arp7/hadoop/commit/f16f4525a9a814f0945e76af55ad06b5fc18ecb7
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 22m 20s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 3 new or modified test files.
        +1 javac 7m 55s There were no new javac warning messages.
        +1 javadoc 10m 4s There were no new javadoc warning messages.
        +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
        +1 checkstyle 3m 43s There were no new checkstyle issues.
        +1 whitespace 0m 2s The patch has no lines that end in whitespace.
        +1 install 1m 37s mvn install still works.
        +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
        -1 findbugs 7m 40s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings.
        -1 common tests 7m 13s Tests failed in hadoop-common.
        -1 yarn tests 8m 14s Tests failed in hadoop-yarn-server-nodemanager.
        -1 hdfs tests 0m 29s Tests failed in hadoop-hdfs.
        -1 hdfs tests 0m 23s Tests failed in hadoop-hdfs-client.
            70m 43s  



        Reason Tests
        FindBugs module:hadoop-hdfs
        Failed unit tests hadoop.net.TestClusterTopology
          hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor
        Timed out tests org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot
          org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels
        Failed build hadoop-hdfs
          hadoop-hdfs-client



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12764329/HDFS-4015.001.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / 6f335e4
        Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-common.txt
        hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-hdfs.txt
        hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12746/testReport/
        Java 1.7.0_55
        uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12746/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 22m 20s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 3 new or modified test files. +1 javac 7m 55s There were no new javac warning messages. +1 javadoc 10m 4s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 3m 43s There were no new checkstyle issues. +1 whitespace 0m 2s The patch has no lines that end in whitespace. +1 install 1m 37s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. -1 findbugs 7m 40s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. -1 common tests 7m 13s Tests failed in hadoop-common. -1 yarn tests 8m 14s Tests failed in hadoop-yarn-server-nodemanager. -1 hdfs tests 0m 29s Tests failed in hadoop-hdfs. -1 hdfs tests 0m 23s Tests failed in hadoop-hdfs-client.     70m 43s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.net.TestClusterTopology   hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor Timed out tests org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot   org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels Failed build hadoop-hdfs   hadoop-hdfs-client Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12764329/HDFS-4015.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 6f335e4 Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-common.txt hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-hdfs.txt hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12746/artifact/patchprocess/testrun_hadoop-hdfs-client.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12746/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12746/console This message was automatically generated.
        Hide
        anu Anu Engineer added a comment -

        Hi Arpit Agarwal, thanks for the review and comments. I will wait for the rest of the review comments and post a new patch.

        ClientProtocol.java:729: Perhaps we can describe it as "bytes that are at risk for deletion."?

        Makes sense, I will modify this.

        DFSAdmin.java:474: This can happen even without blocks with future generation stamps e.g. DN is restarted after a long downtime and reports blocks for deleted files.

        In this patch we track blocks with generation stamp greater than the current highest generation stamp that is known to NN. I have made the assumption that if DN comes back on-line and reports blocks for files that have been deleted, those Generation IDs for those blocks will be lesser than the current Generation Stamp of NN. Please let me know if you think this assumption is not valid or breaks down in special cases, Could this happen with V1 vs V2 generation stamps ?

        FSNamesystem.java:4438: For turn-off tip, should we check getBytesInFuture after the threshold of reported blocks isreached? One potential issue is that the administrator may see this message and immediately run -forceExit even before block thresholds are reached.

        With this patch we are slightly changing the behavior of SafeMode. Even if we find the threshold blocks we will not exit if we find blocks with future generation stamps, under the assumption that NN meta-data has been modified.

        FSNamesystem.java:4445: "you are ok with data loss." might also be confusing. Perhaps we can say "if you are certain that the NameNode was started with the correct FsImage and edit logs."

        Agreed, I will modify this warning. But we also have have the case where someone is actually replacing the NN metadata, and is ok with data loss.

        FSNamesystem.java:4631: Not sure how this works. leaveSafeMode will just return if (isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0))

        As the error message says , we are refusing to leave the safe mode – we want the users to send up -forceExit to restart NN with right Metadata files before we will move out of safe mode.

        Show
        anu Anu Engineer added a comment - Hi Arpit Agarwal , thanks for the review and comments. I will wait for the rest of the review comments and post a new patch. ClientProtocol.java:729: Perhaps we can describe it as "bytes that are at risk for deletion."? Makes sense, I will modify this. DFSAdmin.java:474: This can happen even without blocks with future generation stamps e.g. DN is restarted after a long downtime and reports blocks for deleted files. In this patch we track blocks with generation stamp greater than the current highest generation stamp that is known to NN. I have made the assumption that if DN comes back on-line and reports blocks for files that have been deleted, those Generation IDs for those blocks will be lesser than the current Generation Stamp of NN. Please let me know if you think this assumption is not valid or breaks down in special cases, Could this happen with V1 vs V2 generation stamps ? FSNamesystem.java:4438: For turn-off tip, should we check getBytesInFuture after the threshold of reported blocks isreached? One potential issue is that the administrator may see this message and immediately run -forceExit even before block thresholds are reached. With this patch we are slightly changing the behavior of SafeMode. Even if we find the threshold blocks we will not exit if we find blocks with future generation stamps, under the assumption that NN meta-data has been modified. FSNamesystem.java:4445: "you are ok with data loss." might also be confusing. Perhaps we can say "if you are certain that the NameNode was started with the correct FsImage and edit logs." Agreed, I will modify this warning. But we also have have the case where someone is actually replacing the NN metadata, and is ok with data loss. FSNamesystem.java:4631: Not sure how this works. leaveSafeMode will just return if (isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0)) As the error message says , we are refusing to leave the safe mode – we want the users to send up -forceExit to restart NN with right Metadata files before we will move out of safe mode.
        Hide
        liuml07 Mingliang Liu added a comment -
        1. This patch looks good overall to me. The first assumption you made, aka the Generation Stamp of those blocks reported by a rejoining DN will be less than the current highest generation stamp that is known to NN, makes sense to me.
        2. I agree with Arpit Agarwal that this tip may not show up until the the thresholds are reached. As it surpasses its following threshold message, once the administrator sees this warning he/she may think that it is the right time to run forceExit even before block thresholds are reached. Or we may need to combine this warning with threshold message.
          FSNamesystem.java
          +      if(blockManager.getBytesInFuture() > 0) {
          +        String msg = "Name node detected blocks with generation stamps " ...
          +        return msg;
          +      }
          +
          
        3. I suppose the reached be 0 when we enter safemode, which stands for safe mode is on, and threshold is not reached yet.
          FSNamesystem.java
          +  @VisibleForTesting
          +  synchronized void enableSafeModeForTesting(Configuration conf) {
          +    SafeModeInfo newSafemode = new SafeModeInfo(conf);
          +    newSafemode.reached = 1;
          +    this.safeMode =  newSafemode;
          +  }
          
        Show
        liuml07 Mingliang Liu added a comment - This patch looks good overall to me. The first assumption you made, aka the Generation Stamp of those blocks reported by a rejoining DN will be less than the current highest generation stamp that is known to NN, makes sense to me. I agree with Arpit Agarwal that this tip may not show up until the the thresholds are reached. As it surpasses its following threshold message, once the administrator sees this warning he/she may think that it is the right time to run forceExit even before block thresholds are reached. Or we may need to combine this warning with threshold message. FSNamesystem.java + if (blockManager.getBytesInFuture() > 0) { + String msg = "Name node detected blocks with generation stamps " ... + return msg; + } + I suppose the reached be 0 when we enter safemode, which stands for safe mode is on, and threshold is not reached yet . FSNamesystem.java + @VisibleForTesting + synchronized void enableSafeModeForTesting(Configuration conf) { + SafeModeInfo newSafemode = new SafeModeInfo(conf); + newSafemode.reached = 1; + this .safeMode = newSafemode; + }
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Please let me know if you think this assumption is not valid or breaks down in special cases, Could this happen with V1 vs V2 generation stamps ?

        Hi Anu Engineer, your assumption is correct. I was just referring to this statement "This means blocks have been reported which do not correspond to any file in the namespace". It's a minor point.

        With this patch we are slightly changing the behavior of SafeMode. Even if we find the threshold blocks we will not exit if we find blocks with future generation stamps, under the assumption that NN meta-data has been modified.

        Agreed it's the right behavior. I meant the timing of displaying the new safe mode tip. It would be better displayed after thresholds are checked, so we know that it is a safe time to run the -forceExit command, assuming the correct metadata is being used. I also like Mingliang Liu's suggestion of combining the two messages if it is feasible. So the message explains both problems if applicable (e.g. there are X missing blocks and there are Y blocks with generation stamps in the future...).

        As the error message says , we are refusing to leave the safe mode – we want the users to send up -forceExit to restart NN with right Metadata files before we will move out of safe mode.

        +      case SAFEMODE_FORCE_EXIT:
        +        if (isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0)) {
        +          LOG.warn("Leaving safe mode due to forceExit. This will cause a data " +
        +              "loss of " + blockManager.getBytesInFuture() + " byte(s).");
        +          // we should leave safe mode before clearing bytes, otherwise
        +          // there is a race condition where bytes in future may not be zero.
        +          leaveSafeMode();
        +          blockManager.clearBytesInFuture();
        

        So it looks like this call to leaveSafeMode is guaranteed to fail and we can remove it. The next iteration of SafeModeMonitor will bring us out of safe mode.

        Show
        arpitagarwal Arpit Agarwal added a comment - Please let me know if you think this assumption is not valid or breaks down in special cases, Could this happen with V1 vs V2 generation stamps ? Hi Anu Engineer , your assumption is correct. I was just referring to this statement "This means blocks have been reported which do not correspond to any file in the namespace". It's a minor point. With this patch we are slightly changing the behavior of SafeMode. Even if we find the threshold blocks we will not exit if we find blocks with future generation stamps, under the assumption that NN meta-data has been modified. Agreed it's the right behavior. I meant the timing of displaying the new safe mode tip. It would be better displayed after thresholds are checked, so we know that it is a safe time to run the -forceExit command, assuming the correct metadata is being used. I also like Mingliang Liu 's suggestion of combining the two messages if it is feasible. So the message explains both problems if applicable (e.g. there are X missing blocks and there are Y blocks with generation stamps in the future... ). As the error message says , we are refusing to leave the safe mode – we want the users to send up -forceExit to restart NN with right Metadata files before we will move out of safe mode. + case SAFEMODE_FORCE_EXIT: + if (isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0)) { + LOG.warn( "Leaving safe mode due to forceExit. This will cause a data " + + "loss of " + blockManager.getBytesInFuture() + " byte (s)." ); + // we should leave safe mode before clearing bytes, otherwise + // there is a race condition where bytes in future may not be zero. + leaveSafeMode(); + blockManager.clearBytesInFuture(); So it looks like this call to leaveSafeMode is guaranteed to fail and we can remove it. The next iteration of SafeModeMonitor will bring us out of safe mode.
        Hide
        anu Anu Engineer added a comment -

        Hi Mingliang Liu Arpit Agarwal, Thanks for your reviews. I have fixed all issues mentioned by both of you in this new patch. Please take a look when you get a chance

        Show
        anu Anu Engineer added a comment - Hi Mingliang Liu Arpit Agarwal , Thanks for your reviews. I have fixed all issues mentioned by both of you in this new patch. Please take a look when you get a chance
        Hide
        anu Anu Engineer added a comment -

        re-based the patch to top of the tree, used same patch number

        Show
        anu Anu Engineer added a comment - re-based the patch to top of the tree, used same patch number
        Hide
        arpitagarwal Arpit Agarwal added a comment - - edited

        Hi Anu Engineer, thanks for addressing the earlier feedback. Feedback on the v2 patch.

        1. We will likely see blocks with future generation stamps during intentional HDFS rollback. We should disable this check if NN has been restarted with a rollback option (either regular or rolling upgrade rollback).
        2. I apologize for not noticing this earlier. FsStatus is tagged as public and stable, so changing the constructor signature is incompatible. Instead we could add a new constructor that initializes bytesInFuture. This will also avoid changes to FileSystem, ViewFS, RawLocalFileSystem.
        3. fsck should also print this new counter. We can do it in a separate Jira.
        4. Don't consider this a binding but I would really like it if bytesInFuture can be renamed especially where it is exposed via public interfaces/metrics. It sounds confusing/ominous. bytesWithFutureGenerationStamps would be more precise.

        Still reviewing the test cases.

        Show
        arpitagarwal Arpit Agarwal added a comment - - edited Hi Anu Engineer , thanks for addressing the earlier feedback. Feedback on the v2 patch. We will likely see blocks with future generation stamps during intentional HDFS rollback. We should disable this check if NN has been restarted with a rollback option (either regular or rolling upgrade rollback). I apologize for not noticing this earlier. FsStatus is tagged as public and stable, so changing the constructor signature is incompatible. Instead we could add a new constructor that initializes bytesInFuture . This will also avoid changes to FileSystem, ViewFS, RawLocalFileSystem. fsck should also print this new counter. We can do it in a separate Jira. Don't consider this a binding but I would really like it if bytesInFuture can be renamed especially where it is exposed via public interfaces/metrics. It sounds confusing/ominous. bytesWithFutureGenerationStamps would be more precise. Still reviewing the test cases.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 22m 47s Pre-patch trunk has 8 extant Findbugs (version 3.0.0) warnings.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 3 new or modified test files.
        +1 javac 7m 53s There were no new javac warning messages.
        +1 javadoc 10m 20s There were no new javadoc warning messages.
        -1 release audit 0m 14s The applied patch generated 1 release audit warnings.
        +1 checkstyle 3m 44s There were no new checkstyle issues.
        +1 whitespace 0m 2s The patch has no lines that end in whitespace.
        +1 install 1m 39s mvn install still works.
        +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
        +1 findbugs 7m 35s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        -1 common tests 7m 33s Tests failed in hadoop-common.
        +1 yarn tests 8m 54s Tests passed in hadoop-yarn-server-nodemanager.
        -1 hdfs tests 216m 1s Tests failed in hadoop-hdfs.
        -1 hdfs tests 0m 23s Tests failed in hadoop-hdfs-client.
            287m 42s  



        Reason Tests
        Failed unit tests hadoop.metrics2.impl.TestGangliaMetrics
          hadoop.hdfs.tools.TestGetGroups
          hadoop.hdfs.TestSafeModeWithStripedFile
          hadoop.hdfs.web.TestWebHdfsTimeouts
          hadoop.hdfs.web.TestWebHdfsContentLength
          hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary
          hadoop.hdfs.server.namenode.TestAddStripedBlocks
          hadoop.cli.TestHDFSCLI
          hadoop.hdfs.server.namenode.TestDecommissioningStatus
          hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
          hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
          hadoop.hdfs.security.TestDelegationTokenForProxyUser
        Timed out tests org.apache.hadoop.hdfs.web.TestWebHDFSAcl
        Failed build hadoop-hdfs-client



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12765220/HDFS-4015.002.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / a8b4d0f
        Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html
        Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html
        Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/patchReleaseAuditProblems.txt
        hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-common.txt
        hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-hdfs.txt
        hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12814/testReport/
        Java 1.7.0_55
        uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12814/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 22m 47s Pre-patch trunk has 8 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 3 new or modified test files. +1 javac 7m 53s There were no new javac warning messages. +1 javadoc 10m 20s There were no new javadoc warning messages. -1 release audit 0m 14s The applied patch generated 1 release audit warnings. +1 checkstyle 3m 44s There were no new checkstyle issues. +1 whitespace 0m 2s The patch has no lines that end in whitespace. +1 install 1m 39s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 7m 35s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 common tests 7m 33s Tests failed in hadoop-common. +1 yarn tests 8m 54s Tests passed in hadoop-yarn-server-nodemanager. -1 hdfs tests 216m 1s Tests failed in hadoop-hdfs. -1 hdfs tests 0m 23s Tests failed in hadoop-hdfs-client.     287m 42s   Reason Tests Failed unit tests hadoop.metrics2.impl.TestGangliaMetrics   hadoop.hdfs.tools.TestGetGroups   hadoop.hdfs.TestSafeModeWithStripedFile   hadoop.hdfs.web.TestWebHdfsTimeouts   hadoop.hdfs.web.TestWebHdfsContentLength   hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary   hadoop.hdfs.server.namenode.TestAddStripedBlocks   hadoop.cli.TestHDFSCLI   hadoop.hdfs.server.namenode.TestDecommissioningStatus   hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer   hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped   hadoop.hdfs.security.TestDelegationTokenForProxyUser Timed out tests org.apache.hadoop.hdfs.web.TestWebHDFSAcl Failed build hadoop-hdfs-client Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12765220/HDFS-4015.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / a8b4d0f Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/patchReleaseAuditProblems.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-common.txt hadoop-yarn-server-nodemanager test log https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-hdfs.txt hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12814/artifact/patchprocess/testrun_hadoop-hdfs-client.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12814/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12814/console This message was automatically generated.
        Hide
        anu Anu Engineer added a comment -

        Arpit Agarwal Thanks for the review. I have fixed all issues flagged by you.

        We will likely see blocks with future generation stamps during intentional HDFS rollback. We should disable this check if NN has been restarted with a rollback option (either regular or rolling upgrade rollback).

        Fixed this by setting shouldPostponeBlocksFromFuture in rollback path.

        I apologize for not noticing this earlier. FsStatus is tagged as public and stable, so changing the constructor signature is incompatible. Instead we could add a new constructor that initializes bytesInFuture. This will also avoid changes to FileSystem, ViewFS, RawLocalFileSystem.

        Thanks for catching this, I really appreciate it. I added a function in Distributed file system that returns this value instead of modifying FsStatus.

        fsck should also print this new counter. We can do it in a separate Jira.

        Sure as soon as this JIRA is committed I will follow up with a JIRA and patch for that.

        Don't consider this a binding but I would really like it if bytesInFuture can be renamed especially where it is exposed via public interfaces/metrics. It sounds confusing/ominous. bytesWithFutureGenerationStamps would be more precise.

        fixed - now counter looks like this via JMX -"BytesWithFutureGenerationStamps" : 1174853312,

        Show
        anu Anu Engineer added a comment - Arpit Agarwal Thanks for the review. I have fixed all issues flagged by you. We will likely see blocks with future generation stamps during intentional HDFS rollback. We should disable this check if NN has been restarted with a rollback option (either regular or rolling upgrade rollback). Fixed this by setting shouldPostponeBlocksFromFuture in rollback path. I apologize for not noticing this earlier. FsStatus is tagged as public and stable, so changing the constructor signature is incompatible. Instead we could add a new constructor that initializes bytesInFuture. This will also avoid changes to FileSystem, ViewFS, RawLocalFileSystem. Thanks for catching this, I really appreciate it. I added a function in Distributed file system that returns this value instead of modifying FsStatus. fsck should also print this new counter. We can do it in a separate Jira. Sure as soon as this JIRA is committed I will follow up with a JIRA and patch for that. Don't consider this a binding but I would really like it if bytesInFuture can be renamed especially where it is exposed via public interfaces/metrics. It sounds confusing/ominous. bytesWithFutureGenerationStamps would be more precise. fixed - now counter looks like this via JMX -"BytesWithFutureGenerationStamps" : 1174853312,
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 21m 46s Pre-patch trunk has 748 extant Findbugs (version 3.0.0) warnings.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
        +1 javac 8m 9s There were no new javac warning messages.
        +1 javadoc 10m 30s There were no new javadoc warning messages.
        -1 release audit 0m 19s The applied patch generated 1 release audit warnings.
        -1 checkstyle 2m 46s The applied patch generated 3 new checkstyle issues (total was 138, now 138).
        +1 whitespace 0m 1s The patch has no lines that end in whitespace.
        +1 install 1m 38s mvn install still works.
        +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse.
        +1 findbugs 4m 38s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 native 3m 11s Pre-build of native portion
        -1 hdfs tests 154m 54s Tests failed in hadoop-hdfs.
        +1 hdfs tests 0m 34s Tests passed in hadoop-hdfs-client.
            209m 6s  



        Reason Tests
        Failed unit tests hadoop.hdfs.server.namenode.TestAddStripedBlocks
          hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
          hadoop.hdfs.server.datanode.TestDirectoryScanner
          hadoop.hdfs.server.balancer.TestBalancer
        Timed out tests org.apache.hadoop.hdfs.TestWriteReadStripedFile
          org.apache.hadoop.hdfs.server.namenode.TestFileContextXAttr



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12765483/HDFS-4015.003.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / fde729f
        Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html
        Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/patchReleaseAuditProblems.txt
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/testrun_hadoop-hdfs.txt
        hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12847/testReport/
        Java 1.7.0_55
        uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12847/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 21m 46s Pre-patch trunk has 748 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 8m 9s There were no new javac warning messages. +1 javadoc 10m 30s There were no new javadoc warning messages. -1 release audit 0m 19s The applied patch generated 1 release audit warnings. -1 checkstyle 2m 46s The applied patch generated 3 new checkstyle issues (total was 138, now 138). +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 38s mvn install still works. +1 eclipse:eclipse 0m 36s The patch built with eclipse:eclipse. +1 findbugs 4m 38s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 11s Pre-build of native portion -1 hdfs tests 154m 54s Tests failed in hadoop-hdfs. +1 hdfs tests 0m 34s Tests passed in hadoop-hdfs-client.     209m 6s   Reason Tests Failed unit tests hadoop.hdfs.server.namenode.TestAddStripedBlocks   hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped   hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.hdfs.server.balancer.TestBalancer Timed out tests org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.hdfs.server.namenode.TestFileContextXAttr Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12765483/HDFS-4015.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / fde729f Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/patchReleaseAuditProblems.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/testrun_hadoop-hdfs.txt hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12847/artifact/patchprocess/testrun_hadoop-hdfs-client.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12847/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12847/console This message was automatically generated.
        Hide
        anu Anu Engineer added a comment -

        update documentation

        Show
        anu Anu Engineer added a comment - update documentation
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 23m 22s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
        +1 javac 8m 3s There were no new javac warning messages.
        +1 javadoc 10m 23s There were no new javadoc warning messages.
        -1 release audit 0m 20s The applied patch generated 1 release audit warnings.
        +1 site 3m 8s Site still builds.
        -1 checkstyle 2m 31s The applied patch generated 3 new checkstyle issues (total was 138, now 138).
        +1 whitespace 0m 2s The patch has no lines that end in whitespace.
        +1 install 1m 36s mvn install still works.
        +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
        +1 findbugs 4m 30s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 native 3m 8s Pre-build of native portion
        -1 hdfs tests 192m 25s Tests failed in hadoop-hdfs.
        +1 hdfs tests 0m 31s Tests passed in hadoop-hdfs-client.
            250m 38s  



        Reason Tests
        Failed unit tests hadoop.cli.TestHDFSCLI
          hadoop.hdfs.server.datanode.TestDirectoryScanner
          hadoop.hdfs.TestRenameWhileOpen



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12765926/HDFS-4015.004.patch
        Optional Tests javadoc javac unit findbugs checkstyle site
        git revision trunk / def374e
        Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/patchReleaseAuditProblems.txt
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/testrun_hadoop-hdfs.txt
        hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12902/testReport/
        Java 1.7.0_55
        uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12902/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 23m 22s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 8m 3s There were no new javac warning messages. +1 javadoc 10m 23s There were no new javadoc warning messages. -1 release audit 0m 20s The applied patch generated 1 release audit warnings. +1 site 3m 8s Site still builds. -1 checkstyle 2m 31s The applied patch generated 3 new checkstyle issues (total was 138, now 138). +1 whitespace 0m 2s The patch has no lines that end in whitespace. +1 install 1m 36s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 4m 30s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 8s Pre-build of native portion -1 hdfs tests 192m 25s Tests failed in hadoop-hdfs. +1 hdfs tests 0m 31s Tests passed in hadoop-hdfs-client.     250m 38s   Reason Tests Failed unit tests hadoop.cli.TestHDFSCLI   hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.hdfs.TestRenameWhileOpen Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12765926/HDFS-4015.004.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / def374e Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/patchReleaseAuditProblems.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/testrun_hadoop-hdfs.txt hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12902/artifact/patchprocess/testrun_hadoop-hdfs-client.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12902/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12902/console This message was automatically generated.
        Hide
        anu Anu Engineer added a comment -

        Release audit, checkstyle and test failures are not related to this patch

        Show
        anu Anu Engineer added a comment - Release audit, checkstyle and test failures are not related to this patch
        Hide
        arpitagarwal Arpit Agarwal added a comment - - edited

        Hi Anu Engineer, the v004 patch looks great. We should handle the case of blocksInFuture not being present in PBHelperClient#convert(GetFsStatsResponseProto res) (newer client + older NameNode). e.g.

            result[ClientProtocol.GET_STATS_BYTES_IN_FUTURE_BLOCKS_IDX] =
                res.hasBlocksInFuture() ? res.getBlocksInFuture() : 0;
        

        Edit: I need to take a closer look at the timing of setting shouldPostponeBlocksFromFuture.

        Show
        arpitagarwal Arpit Agarwal added a comment - - edited Hi Anu Engineer , the v004 patch looks great. We should handle the case of blocksInFuture not being present in PBHelperClient#convert(GetFsStatsResponseProto res) (newer client + older NameNode). e.g. result[ClientProtocol.GET_STATS_BYTES_IN_FUTURE_BLOCKS_IDX] = res.hasBlocksInFuture() ? res.getBlocksInFuture() : 0; Edit: I need to take a closer look at the timing of setting shouldPostponeBlocksFromFuture .
        Hide
        anu Anu Engineer added a comment -

        Hi Arpit Agarwal, Thanks for the review. Good catch on (newer client + older namenode). The new patch fixes that and also updates how RollBack is detected based on off-line comments.

        Show
        anu Anu Engineer added a comment - Hi Arpit Agarwal , Thanks for the review. Good catch on (newer client + older namenode). The new patch fixes that and also updates how RollBack is detected based on off-line comments.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Thanks Anu Engineer. +1 pending Jenkins. Will hold off committing until tomorrow in case Mingliang Liu wants to take another look.

        The new patch fixes that and also updates how RollBack is detected based on off-line comments.

        FTR we felt detecting rollback from the startup option was safer than overloading the meaning of shouldPostponeBlocksFromFuture.

        Show
        arpitagarwal Arpit Agarwal added a comment - Thanks Anu Engineer . +1 pending Jenkins. Will hold off committing until tomorrow in case Mingliang Liu wants to take another look. The new patch fixes that and also updates how RollBack is detected based on off-line comments. FTR we felt detecting rollback from the startup option was safer than overloading the meaning of shouldPostponeBlocksFromFuture .
        Hide
        liuml07 Mingliang Liu added a comment -

        One quick question:
        Consider the name node is in extension period (startup safe mode), the operator sets the safe mode manually. When the operator makes the name node leave safe mode manually, the -force option is not checked, even if there are orphaned blocks. Is this possible? If true, is it expected?

        Another minor comment is that the following code may be re-used:

        +          LOG.error("Refusing to leave safe mode without a force flag. " +
        +              "Exiting safe mode will cause a deletion of " + blockManager
        +              .getBytesInFuture() + " byte(s). Please use " +
        +              "-forceExit flag to exit safe mode forcefully and data loss is " +
        +              "acceptable.");
        
        Show
        liuml07 Mingliang Liu added a comment - One quick question: Consider the name node is in extension period (startup safe mode), the operator sets the safe mode manually. When the operator makes the name node leave safe mode manually, the -force option is not checked, even if there are orphaned blocks. Is this possible? If true, is it expected? Another minor comment is that the following code may be re-used: + LOG.error( "Refusing to leave safe mode without a force flag. " + + "Exiting safe mode will cause a deletion of " + blockManager + .getBytesInFuture() + " byte (s). Please use " + + "-forceExit flag to exit safe mode forcefully and data loss is " + + "acceptable." );
        Hide
        anu Anu Engineer added a comment -

        When the operator makes the name node leave safe mode manually, the -force option is not checked, even if there are orphaned blocks. Is this possible? If true, is it expected?

        If there are orphaned blocks that we discovered during Startup safe mode, operator cannot exit without -forceExit.

        Show
        anu Anu Engineer added a comment - When the operator makes the name node leave safe mode manually, the -force option is not checked, even if there are orphaned blocks. Is this possible? If true, is it expected? If there are orphaned blocks that we discovered during Startup safe mode, operator cannot exit without -forceExit.
        Hide
        liuml07 Mingliang Liu added a comment -

        Thanks for the confirmation (and for the patch). I think the leaveSafeMode depends on isInStartupSafeMode() which is false in this case.

        Show
        liuml07 Mingliang Liu added a comment - Thanks for the confirmation (and for the patch). I think the leaveSafeMode depends on isInStartupSafeMode() which is false in this case.
        Hide
        liuml07 Mingliang Liu added a comment -

        Thanks for the confirmation (and for the patch). I think the leaveSafeMode depends on isInStartupSafeMode() which is false in this case.

        Show
        liuml07 Mingliang Liu added a comment - Thanks for the confirmation (and for the patch). I think the leaveSafeMode depends on isInStartupSafeMode() which is false in this case.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 26m 48s Findbugs (version 3.0.0) appears to be broken on trunk.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
        +1 javac 9m 57s There were no new javac warning messages.
        +1 javadoc 12m 26s There were no new javadoc warning messages.
        -1 release audit 0m 21s The applied patch generated 1 release audit warnings.
        +1 site 3m 33s Site still builds.
        -1 checkstyle 2m 38s The applied patch generated 3 new checkstyle issues (total was 138, now 138).
        +1 whitespace 0m 2s The patch has no lines that end in whitespace.
        +1 install 1m 55s mvn install still works.
        +1 eclipse:eclipse 0m 38s The patch built with eclipse:eclipse.
        -1 findbugs 5m 20s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings.
        +1 native 3m 43s Pre-build of native portion
        -1 hdfs tests 65m 50s Tests failed in hadoop-hdfs.
        +1 hdfs tests 0m 42s Tests passed in hadoop-hdfs-client.
            133m 58s  



        Reason Tests
        FindBugs module:hadoop-hdfs
        Failed unit tests hadoop.hdfs.TestDataTransferKeepalive
          hadoop.hdfs.TestRecoverStripedFile
          hadoop.hdfs.TestHFlush
          hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage
          hadoop.cli.TestHDFSCLI
          hadoop.hdfs.server.blockmanagement.TestNodeCount
          hadoop.fs.TestGlobPaths



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12766677/HDFS-4015.005.patch
        Optional Tests javadoc javac unit findbugs checkstyle site
        git revision trunk / be7a0ad
        Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/patchReleaseAuditProblems.txt
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
        Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/testrun_hadoop-hdfs.txt
        hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12996/testReport/
        Java 1.7.0_55
        uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12996/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 26m 48s Findbugs (version 3.0.0) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 9m 57s There were no new javac warning messages. +1 javadoc 12m 26s There were no new javadoc warning messages. -1 release audit 0m 21s The applied patch generated 1 release audit warnings. +1 site 3m 33s Site still builds. -1 checkstyle 2m 38s The applied patch generated 3 new checkstyle issues (total was 138, now 138). +1 whitespace 0m 2s The patch has no lines that end in whitespace. +1 install 1m 55s mvn install still works. +1 eclipse:eclipse 0m 38s The patch built with eclipse:eclipse. -1 findbugs 5m 20s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. +1 native 3m 43s Pre-build of native portion -1 hdfs tests 65m 50s Tests failed in hadoop-hdfs. +1 hdfs tests 0m 42s Tests passed in hadoop-hdfs-client.     133m 58s   Reason Tests FindBugs module:hadoop-hdfs Failed unit tests hadoop.hdfs.TestDataTransferKeepalive   hadoop.hdfs.TestRecoverStripedFile   hadoop.hdfs.TestHFlush   hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage   hadoop.cli.TestHDFSCLI   hadoop.hdfs.server.blockmanagement.TestNodeCount   hadoop.fs.TestGlobPaths Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12766677/HDFS-4015.005.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / be7a0ad Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/patchReleaseAuditProblems.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/testrun_hadoop-hdfs.txt hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/12996/artifact/patchprocess/testrun_hadoop-hdfs-client.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12996/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12996/console This message was automatically generated.
        Hide
        liuml07 Mingliang Liu added a comment -

        The patch looks good to me overall. More discussion is welcome about safe mode question.

        The latest release audit warning is unrelated. Findbugs warnings may be caused by existing one tracked by HDFS-9242. Checkstyle warnings are caused by existing code, and may be addressed separately. Failing unit tests seem unrelated but we may need double check.

        Show
        liuml07 Mingliang Liu added a comment - The patch looks good to me overall. More discussion is welcome about safe mode question. The latest release audit warning is unrelated. Findbugs warnings may be caused by existing one tracked by HDFS-9242 . Checkstyle warnings are caused by existing code, and may be addressed separately. Failing unit tests seem unrelated but we may need double check.
        Hide
        anu Anu Engineer added a comment -

        Mingliang Liu Thanks for looking that the Hadoop QA results. I did look at test results just to double check.
        2 of them are failures related to globbing that has already been reverted. Other failures are mostly timing related and not related to this patch.

        Show
        anu Anu Engineer added a comment - Mingliang Liu Thanks for looking that the Hadoop QA results. I did look at test results just to double check. 2 of them are failures related to globbing that has already been reverted. Other failures are mostly timing related and not related to this patch.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        When the operator makes the name node leave safe mode manually, the -force option is not checked, even if there are orphaned blocks. Is this possible? If true, is it expected?

        Mingliang Liu, you are right. It's admittedly odd for an administrator to enter safe mode manually during startup but we should guard against the sequence of steps you described.

        I need to think about this some more but we should be able to remove the isInStartupSafeMode() from the clause below. i.e. never exit safe mode without the force flag if there bytes with future generation stamps. (The rollback exception is already handled elsewhere).

            private synchronized void leave(boolean force) {
        ...
              if (!force && isInStartupSafeMode() && (blockManager.getBytesInFuture() >
                  0)) {
        
        Show
        arpitagarwal Arpit Agarwal added a comment - When the operator makes the name node leave safe mode manually, the -force option is not checked, even if there are orphaned blocks. Is this possible? If true, is it expected? Mingliang Liu , you are right. It's admittedly odd for an administrator to enter safe mode manually during startup but we should guard against the sequence of steps you described. I need to think about this some more but we should be able to remove the isInStartupSafeMode() from the clause below. i.e. never exit safe mode without the force flag if there bytes with future generation stamps. (The rollback exception is already handled elsewhere). private synchronized void leave( boolean force) { ... if (!force && isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0)) {
        Hide
        anu Anu Engineer added a comment -

        Arpit Agarwal Mingliang Liu

        This patch fixes the issue where Administrator enters or leaves safe mode. Now in the exit path we don't check if we are in the Startup mode

        Show
        anu Anu Engineer added a comment - Arpit Agarwal Mingliang Liu This patch fixes the issue where Administrator enters or leaves safe mode. Now in the exit path we don't check if we are in the Startup mode
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 28m 33s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
        +1 javac 9m 55s There were no new javac warning messages.
        +1 javadoc 12m 16s There were no new javadoc warning messages.
        +1 release audit 0m 27s The applied patch does not increase the total number of release audit warnings.
        +1 site 4m 2s Site still builds.
        -1 checkstyle 3m 10s The applied patch generated 3 new checkstyle issues (total was 138, now 138).
        +1 whitespace 0m 2s The patch has no lines that end in whitespace.
        +1 install 2m 13s mvn install still works.
        +1 eclipse:eclipse 0m 39s The patch built with eclipse:eclipse.
        +1 findbugs 5m 11s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 native 3m 32s Pre-build of native portion
        -1 hdfs tests 53m 57s Tests failed in hadoop-hdfs.
        +1 hdfs tests 0m 37s Tests passed in hadoop-hdfs-client.
            124m 39s  



        Reason Tests
        Failed unit tests hadoop.hdfs.TestReplaceDatanodeOnFailure
          hadoop.cli.TestHDFSCLI
          hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12768132/HDFS-4015.006.patch
        Optional Tests javadoc javac unit findbugs checkstyle site
        git revision trunk / 124a412
        Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/testrun_hadoop-hdfs.txt
        hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13140/testReport/
        Java 1.7.0_55
        uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13140/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 28m 33s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 9m 55s There were no new javac warning messages. +1 javadoc 12m 16s There were no new javadoc warning messages. +1 release audit 0m 27s The applied patch does not increase the total number of release audit warnings. +1 site 4m 2s Site still builds. -1 checkstyle 3m 10s The applied patch generated 3 new checkstyle issues (total was 138, now 138). +1 whitespace 0m 2s The patch has no lines that end in whitespace. +1 install 2m 13s mvn install still works. +1 eclipse:eclipse 0m 39s The patch built with eclipse:eclipse. +1 findbugs 5m 11s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 32s Pre-build of native portion -1 hdfs tests 53m 57s Tests failed in hadoop-hdfs. +1 hdfs tests 0m 37s Tests passed in hadoop-hdfs-client.     124m 39s   Reason Tests Failed unit tests hadoop.hdfs.TestReplaceDatanodeOnFailure   hadoop.cli.TestHDFSCLI   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12768132/HDFS-4015.006.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / 124a412 Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/testrun_hadoop-hdfs.txt hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/13140/artifact/patchprocess/testrun_hadoop-hdfs-client.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13140/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13140/console This message was automatically generated.
        Hide
        anu Anu Engineer added a comment -

        none of the test failures seem to be related to this patch.

        Show
        anu Anu Engineer added a comment - none of the test failures seem to be related to this patch.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        The TestHDFSCLI failure looks related, the following change fixes it. There is a missing space in DFSAdmin.

        -    String safemode = "-safemode <enter|leave|get|wait|forceExit>:  Safe mode" +
        +    String safemode = "-safemode <enter|leave|get|wait|forceExit>:  Safe mode " +
        
        Show
        arpitagarwal Arpit Agarwal added a comment - The TestHDFSCLI failure looks related, the following change fixes it. There is a missing space in DFSAdmin. - String safemode = "-safemode <enter|leave|get|wait|forceExit>: Safe mode" + + String safemode = "-safemode <enter|leave|get|wait|forceExit>: Safe mode " +
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Attached v7 patch with the trivial edit to fix TestHDFSCLI.

        The change looks good otherwise. +1 with the fix for the test case, pending Jenkins.

        Show
        arpitagarwal Arpit Agarwal added a comment - Attached v7 patch with the trivial edit to fix TestHDFSCLI . The change looks good otherwise. +1 with the fix for the test case, pending Jenkins.
        Hide
        anu Anu Engineer added a comment -

        Arpit Agarwal Thanks for catching it and quickly fixing it.

        Show
        anu Anu Engineer added a comment - Arpit Agarwal Thanks for catching it and quickly fixing it.
        Hide
        jnp Jitendra Nath Pandey added a comment -

        +1 for the latest patch.

        Show
        jnp Jitendra Nath Pandey added a comment - +1 for the latest patch.
        Hide
        liuml07 Mingliang Liu added a comment -

        I revised the safe mode part and it looks good to me.

        +1 for the latest patch.

        Show
        liuml07 Mingliang Liu added a comment - I revised the safe mode part and it looks good to me. +1 for the latest patch.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 24m 37s Findbugs (version 3.0.0) appears to be broken on trunk.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
        +1 javac 8m 57s There were no new javac warning messages.
        +1 javadoc 11m 43s There were no new javadoc warning messages.
        +1 release audit 0m 26s The applied patch does not increase the total number of release audit warnings.
        +1 site 3m 51s Site still builds.
        -1 checkstyle 2m 42s The applied patch generated 3 new checkstyle issues (total was 138, now 138).
        +1 whitespace 0m 3s The patch has no lines that end in whitespace.
        +1 install 2m 5s mvn install still works.
        +1 eclipse:eclipse 0m 42s The patch built with eclipse:eclipse.
        -1 findbugs 5m 38s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings.
        +1 native 3m 56s Pre-build of native portion
        +1 hdfs tests 57m 32s Tests passed in hadoop-hdfs.
        -1 hdfs tests 0m 36s Tests failed in hadoop-hdfs-client.
            122m 52s  



        Reason Tests
        FindBugs module:hadoop-hdfs
        Failed build hadoop-hdfs-client



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12768388/HDFS-4015.007.patch
        Optional Tests javadoc javac unit findbugs checkstyle site
        git revision trunk / 15eb84b
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
        Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/testrun_hadoop-hdfs.txt
        hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13168/testReport/
        Java 1.7.0_55
        uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13168/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 24m 37s Findbugs (version 3.0.0) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 8m 57s There were no new javac warning messages. +1 javadoc 11m 43s There were no new javadoc warning messages. +1 release audit 0m 26s The applied patch does not increase the total number of release audit warnings. +1 site 3m 51s Site still builds. -1 checkstyle 2m 42s The applied patch generated 3 new checkstyle issues (total was 138, now 138). +1 whitespace 0m 3s The patch has no lines that end in whitespace. +1 install 2m 5s mvn install still works. +1 eclipse:eclipse 0m 42s The patch built with eclipse:eclipse. -1 findbugs 5m 38s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. +1 native 3m 56s Pre-build of native portion +1 hdfs tests 57m 32s Tests passed in hadoop-hdfs. -1 hdfs tests 0m 36s Tests failed in hadoop-hdfs-client.     122m 52s   Reason Tests FindBugs module:hadoop-hdfs Failed build hadoop-hdfs-client Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12768388/HDFS-4015.007.patch Optional Tests javadoc javac unit findbugs checkstyle site git revision trunk / 15eb84b checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/testrun_hadoop-hdfs.txt hadoop-hdfs-client test log https://builds.apache.org/job/PreCommit-HDFS-Build/13168/artifact/patchprocess/testrun_hadoop-hdfs-client.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13168/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13168/console This message was automatically generated.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        The build failure looks unrelated to the patch. Since the only Jenkins failure flagged by the v6 patch was fixed by a trivial whitespace change I will commit the v7 patch shortly.

        Show
        arpitagarwal Arpit Agarwal added a comment - The build failure looks unrelated to the patch. Since the only Jenkins failure flagged by the v6 patch was fixed by a trivial whitespace change I will commit the v7 patch shortly.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8699 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8699/)
        HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
        • hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8699 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8699/ ) HDFS-4015 . Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/)
        HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe)

        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/ ) HDFS-4015 . Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #1313 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1313/)
        HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1313 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1313/ ) HDFS-4015 . Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2523 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2523/)
        HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe)

        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2523 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2523/ ) HDFS-4015 . Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Committed to trunk. Keeping Jira open for the branch-2 commit.

        Show
        arpitagarwal Arpit Agarwal added a comment - Committed to trunk. Keeping Jira open for the branch-2 commit.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #591 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/591/)
        HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
        • hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #591 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/591/ ) HDFS-4015 . Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Committed to branch-2 for 2.8.0.

        Thanks for contributing this improvement Anu Engineer, and thanks for the reviews Mingliang Liu and Jitendra Nath Pandey.

        Show
        arpitagarwal Arpit Agarwal added a comment - Committed to branch-2 for 2.8.0. Thanks for contributing this improvement Anu Engineer , and thanks for the reviews Mingliang Liu and Jitendra Nath Pandey .
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #578 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/578/)
        HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #578 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/578/ ) HDFS-4015 . Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2468 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2468/)
        HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe)

        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml
        • hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
        • hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2468 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2468/ ) HDFS-4015 . Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
        Hide
        kihwal Kihwal Lee added a comment -

        Anu Engineer,I have seen intermittent test failures in precommit builds. Do you think it is a test issue?

        java.lang.AssertionError: expected:<18> but was:<0>
        	at org.junit.Assert.fail(Assert.java:88)
        	at org.junit.Assert.failNotEquals(Assert.java:743)
        	at org.junit.Assert.assertEquals(Assert.java:118)
        	at org.junit.Assert.assertEquals(Assert.java:555)
        	at org.junit.Assert.assertEquals(Assert.java:542)
        	at org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:125)
        
        Show
        kihwal Kihwal Lee added a comment - Anu Engineer ,I have seen intermittent test failures in precommit builds. Do you think it is a test issue? java.lang.AssertionError: expected:<18> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:125)
        Hide
        anu Anu Engineer added a comment -

        Could be, let me take a look. Thanks for letting me know.

        Show
        anu Anu Engineer added a comment - Could be, let me take a look. Thanks for letting me know.
        Hide
        vinayrpet Vinayakumar B added a comment -

        Should this be marked as incompatible change?
        Earlier "dfsadmin -safemode leave" was leaving the safemode.
        Now expects "-forceExit" also if there are any future bytes detected.
        Please correct If I am missing something here.

        Show
        vinayrpet Vinayakumar B added a comment - Should this be marked as incompatible change? Earlier "dfsadmin -safemode leave" was leaving the safemode. Now expects "-forceExit" also if there are any future bytes detected. Please correct If I am missing something here.
        Hide
        anu Anu Engineer added a comment - - edited

        Should this be marked as incompatible change?

        I think we can argue both sides, so I am fine with either call you make after reading through both points of view.

        Earlier "dfsadmin -safemode leave" was leaving the safemode.

        The behaviour in 99.99999% of cases is exactly same. So it is a rare case of incompatibility, even if we end up defining this as an incompatibility.

        Now expects "-forceExit" also if there are any future bytes detected.

        Future bytes – is an error condition that should have been flagged by HDFS. This was a missing error check, if we call this incompatibility, it would mean that copying old fsImage or old NN metadata was a supported operation. I would argue that it never was a supported operation in the sense that NN metadata is sacrosanct and you are not supposed to roll it back.

        So from that point of view, it just confirms what we always knew and avoids booting up with incorrect metadata, but both of us very well know that the reason why this JIRA is fixed is because people do this and lose data.

        With this change copy/restoring NN metadata has become a supported operation (that is, HDFS is aware users are going to do this), and we explicitly warn the user of harm that this action can cause. If we were to argue that old behavior was a feature, then we are saying that changing NN metadata and losing data was a supported feature.

        While it is still possible to copy an older version of NN metadata, now HDFS is going warn the end user about data loss. The question you are asking is should we classify that as a incompatibility or as enforcement of the core axioms of HDFS?

        My personal view is that is it not an incompatible change, since HDFS has never officially encouraged people to copy older versions of NN metadata. If you agree with that, then this change merely formalizes that assumption that NN metadata is sacrosanct and if you roll it back, we are in an error state that needs explicit user intervention.

        But I also see that from an end users point of view (especially someone with lot of HDFS experience), this enforcement of a NN metadata integrity takes way some of the old dangerous behavior. Now we have added detection of an error condition which requires explicit action from user, you can syntactically argue that it is an incompatible change, though semantically I would suppose that it is obvious to any HDFS user that copying old versions of NN metadata is a bad idea.

        Show
        anu Anu Engineer added a comment - - edited Should this be marked as incompatible change? I think we can argue both sides, so I am fine with either call you make after reading through both points of view. Earlier "dfsadmin -safemode leave" was leaving the safemode. The behaviour in 99.99999% of cases is exactly same. So it is a rare case of incompatibility, even if we end up defining this as an incompatibility. Now expects "-forceExit" also if there are any future bytes detected. Future bytes – is an error condition that should have been flagged by HDFS. This was a missing error check, if we call this incompatibility, it would mean that copying old fsImage or old NN metadata was a supported operation. I would argue that it never was a supported operation in the sense that NN metadata is sacrosanct and you are not supposed to roll it back. So from that point of view, it just confirms what we always knew and avoids booting up with incorrect metadata, but both of us very well know that the reason why this JIRA is fixed is because people do this and lose data. With this change copy/restoring NN metadata has become a supported operation (that is, HDFS is aware users are going to do this), and we explicitly warn the user of harm that this action can cause. If we were to argue that old behavior was a feature, then we are saying that changing NN metadata and losing data was a supported feature. While it is still possible to copy an older version of NN metadata, now HDFS is going warn the end user about data loss. The question you are asking is should we classify that as a incompatibility or as enforcement of the core axioms of HDFS? My personal view is that is it not an incompatible change, since HDFS has never officially encouraged people to copy older versions of NN metadata. If you agree with that, then this change merely formalizes that assumption that NN metadata is sacrosanct and if you roll it back, we are in an error state that needs explicit user intervention. But I also see that from an end users point of view (especially someone with lot of HDFS experience), this enforcement of a NN metadata integrity takes way some of the old dangerous behavior. Now we have added detection of an error condition which requires explicit action from user, you can syntactically argue that it is an incompatible change, though semantically I would suppose that it is obvious to any HDFS user that copying old versions of NN metadata is a bad idea.
        Hide
        danielpol Daniel Pol added a comment -

        RE:"In this patch we track blocks with generation stamp greater than the current highest generation stamp that is known to NN. I have made the assumption that if DN comes back on-line and reports blocks for files that have been deleted, those Generation IDs for those blocks will be lesser than the current Generation Stamp of NN. Please let me know if you think this assumption is not valid or breaks down in special cases, Could this happen with V1 vs V2 generation stamps ?"

        I'm hitting the case with same Generation ID quite often during testing. Test scenario is run Teragen and for various reasons (mostly Hadoop settings) datanode service on some nodes dies abruptly (think power failures also). While the bad nodes are down, you delete the Teragen output folder (to free up space on the remaining good nodes that now are trying to maintain the replication factor with less nodes). Once all nodes are up again and running the bad nodes have orphaned blocks with the same Generation IDs. Right now its pretty painful to get rid of those manually.

        Show
        danielpol Daniel Pol added a comment - RE:"In this patch we track blocks with generation stamp greater than the current highest generation stamp that is known to NN. I have made the assumption that if DN comes back on-line and reports blocks for files that have been deleted, those Generation IDs for those blocks will be lesser than the current Generation Stamp of NN. Please let me know if you think this assumption is not valid or breaks down in special cases, Could this happen with V1 vs V2 generation stamps ?" I'm hitting the case with same Generation ID quite often during testing. Test scenario is run Teragen and for various reasons (mostly Hadoop settings) datanode service on some nodes dies abruptly (think power failures also). While the bad nodes are down, you delete the Teragen output folder (to free up space on the remaining good nodes that now are trying to maintain the replication factor with less nodes). Once all nodes are up again and running the bad nodes have orphaned blocks with the same Generation IDs. Right now its pretty painful to get rid of those manually.
        Hide
        anu Anu Engineer added a comment - - edited

        Daniel Pol Thanks for reporting this. I will try to repro the case you have described. But just to make sure that we are on the same page, this patch addresses the issue of orphaned blocks when NN is in safe mode. So is your case when you have Datanode down, you delete the directory and then you reboot the datanodes and namenode ?

        Can you please explain the steps to repro this issue ? Thanks in advance.

        Show
        anu Anu Engineer added a comment - - edited Daniel Pol Thanks for reporting this. I will try to repro the case you have described. But just to make sure that we are on the same page, this patch addresses the issue of orphaned blocks when NN is in safe mode. So is your case when you have Datanode down, you delete the directory and then you reboot the datanodes and namenode ? Can you please explain the steps to repro this issue ? Thanks in advance.
        Hide
        danielpol Daniel Pol added a comment - - edited

        Anu Engineer I've seen it before in other cases, but here's my current one. I have an issue on my cluster where datanodes start failing when hit with heavy write activity (even to the point where I can't ssh to the system anymore and needs to be power cycled), mostly during the reduce phase from Terasort. So I start Teragen and during the run some datanodes crash, and the remaining nodes get more data to store compared to expected. At that point I stop the whole cluster and reboot all nodes (even some of the nodes that are not bad still take a long time to respond). Once the cluster is up I delete the Teragen folder (with skipTrash, and I don't use snapshots) because some nodes are now close to their space capacity. However not all space is freed up and upon investigation I see the bad nodes have orphaned blocks. A few runs like this quickly take up all my available space at which point I have to manually clean the orphaned blocks or reformat HDFS. Steps to reproduce in your enviroment;
        1, Start Teragen (make sure its big enough to run for >5 min for example)
        2, While Teragen is running (say half way in) kill the datanode process on some node (not shutdown)
        3. Once Teragen finished, delete the Teragen folder
        4. Restart the whole HDFS cluster, including the killed nodes
        5. You should be able to find on the killed node orphaned blocks now that are not getting deleted. Fsck will say something like "Block blk_1075307258 does not exist" even in safe mode.

        Generally speaking, I think it would be better to be able to detect (and delete) all orphaned blocks, regardless of their source.

        Show
        danielpol Daniel Pol added a comment - - edited Anu Engineer I've seen it before in other cases, but here's my current one. I have an issue on my cluster where datanodes start failing when hit with heavy write activity (even to the point where I can't ssh to the system anymore and needs to be power cycled), mostly during the reduce phase from Terasort. So I start Teragen and during the run some datanodes crash, and the remaining nodes get more data to store compared to expected. At that point I stop the whole cluster and reboot all nodes (even some of the nodes that are not bad still take a long time to respond). Once the cluster is up I delete the Teragen folder (with skipTrash, and I don't use snapshots) because some nodes are now close to their space capacity. However not all space is freed up and upon investigation I see the bad nodes have orphaned blocks. A few runs like this quickly take up all my available space at which point I have to manually clean the orphaned blocks or reformat HDFS. Steps to reproduce in your enviroment; 1, Start Teragen (make sure its big enough to run for >5 min for example) 2, While Teragen is running (say half way in) kill the datanode process on some node (not shutdown) 3. Once Teragen finished, delete the Teragen folder 4. Restart the whole HDFS cluster, including the killed nodes 5. You should be able to find on the killed node orphaned blocks now that are not getting deleted. Fsck will say something like "Block blk_1075307258 does not exist" even in safe mode. Generally speaking, I think it would be better to be able to detect (and delete) all orphaned blocks, regardless of their source.
        Hide
        danielpol Daniel Pol added a comment -

        Anu Engineer Managed to get my space back by triggering a full block report from the bad nodes.

        Show
        danielpol Daniel Pol added a comment - Anu Engineer Managed to get my space back by triggering a full block report from the bad nodes.

          People

          • Assignee:
            anu Anu Engineer
            Reporter:
            tlipcon Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            21 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development