Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10279

Improve validation of the configured number of tolerated failed volumes

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are detected too late and not easily be found. We can move the validation logic for tolerated volumes to a eariler time that before datanode regists to namenode. And this will let us detect the misconfiguration soon and easily.

      1. HDFS-10279.001.patch
        7 kB
        Yiqun Lin
      2. HDFS-10279.002.patch
        7 kB
        Yiqun Lin

        Issue Links

          Activity

          Hide
          linyiqun Yiqun Lin added a comment -

          Thanks Andrew Wang for commit!

          Show
          linyiqun Yiqun Lin added a comment - Thanks Andrew Wang for commit!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #9606 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9606/)
          HDFS-10279. Improve validation of the configured number of tolerated (wang: rev 314aa21a89134fac68ac3cb95efdeb56bd3d7b05)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #9606 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9606/ ) HDFS-10279 . Improve validation of the configured number of tolerated (wang: rev 314aa21a89134fac68ac3cb95efdeb56bd3d7b05) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
          Hide
          andrew.wang Andrew Wang added a comment -

          Thanks again Yiqun Lin, committed to trunk, branch-2, branch-2.8.

          Show
          andrew.wang Andrew Wang added a comment - Thanks again Yiqun Lin , committed to trunk, branch-2, branch-2.8.
          Hide
          linyiqun Yiqun Lin added a comment -

          The failed unit test TestFsDatasetImpl is caused by TestFsDatasetImpl.testCleanShutdownOfVolume which tracked by HDFS-10260, the other failed tests seem not related.

          Show
          linyiqun Yiqun Lin added a comment - The failed unit test TestFsDatasetImpl is caused by TestFsDatasetImpl.testCleanShutdownOfVolume which tracked by HDFS-10260 , the other failed tests seem not related.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 10s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 38s trunk passed
          +1 compile 0m 40s trunk passed with JDK v1.8.0_77
          +1 compile 0m 41s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 1m 5s trunk passed with JDK v1.8.0_77
          +1 javadoc 1m 45s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.8.0_77
          +1 javac 0m 38s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.7.0_95
          +1 javac 0m 38s the patch passed
          +1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 330 unchanged - 1 fixed = 330 total (was 331)
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 8s the patch passed
          +1 javadoc 1m 3s the patch passed with JDK v1.8.0_77
          +1 javadoc 1m 41s the patch passed with JDK v1.7.0_95
          -1 unit 73m 36s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
          -1 unit 71m 7s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          169m 42s



          Reason Tests
          JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
            hadoop.hdfs.TestReadStripedFileWithMissingBlocks
            hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
            hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
            hadoop.hdfs.TestFileAppend
            hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure
          JDK v1.8.0_77 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile
            org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding
          JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID
            hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA
            hadoop.hdfs.TestReadStripedFileWithMissingBlocks
            hadoop.hdfs.server.namenode.TestEditLog
            hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
            hadoop.hdfs.server.datanode.TestCachingStrategy
            hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
            hadoop.hdfs.TestDFSClientFailover
            hadoop.hdfs.server.namenode.ha.TestDNFencing
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure
          JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile
            org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798426/HDFS-10279.002.patch
          JIRA Issue HDFS-10279
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 20322a1af371 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 35f0770
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15146/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15146/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 38s trunk passed +1 compile 0m 40s trunk passed with JDK v1.8.0_77 +1 compile 0m 41s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 5s trunk passed with JDK v1.8.0_77 +1 javadoc 1m 45s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 46s the patch passed +1 compile 0m 38s the patch passed with JDK v1.8.0_77 +1 javac 0m 38s the patch passed +1 compile 0m 38s the patch passed with JDK v1.7.0_95 +1 javac 0m 38s the patch passed +1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 330 unchanged - 1 fixed = 330 total (was 331) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 8s the patch passed +1 javadoc 1m 3s the patch passed with JDK v1.8.0_77 +1 javadoc 1m 41s the patch passed with JDK v1.7.0_95 -1 unit 73m 36s hadoop-hdfs in the patch failed with JDK v1.8.0_77. -1 unit 71m 7s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 169m 42s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   hadoop.hdfs.TestReadStripedFileWithMissingBlocks   hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation   hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead   hadoop.hdfs.TestFileAppend   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure JDK v1.8.0_77 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID   hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA   hadoop.hdfs.TestReadStripedFileWithMissingBlocks   hadoop.hdfs.server.namenode.TestEditLog   hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead   hadoop.hdfs.server.datanode.TestCachingStrategy   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.TestDFSClientFailover   hadoop.hdfs.server.namenode.ha.TestDNFencing   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798426/HDFS-10279.002.patch JIRA Issue HDFS-10279 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 20322a1af371 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 35f0770 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15146/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15146/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          linyiqun Yiqun Lin added a comment -

          Thanks Andrew Wang for review. Update the latest patch to address the comments, pending jenkins.

          Show
          linyiqun Yiqun Lin added a comment - Thanks Andrew Wang for review. Update the latest patch to address the comments, pending jenkins.
          Hide
          andrew.wang Andrew Wang added a comment -

          Looks good, thanks for the patch! Only a few comments:

          • Looks like we don't need the data dirs themselves, just the #. Thus can we store a count instead?
          • Let's use GenericTestUtils.assertExceptionContains to validate the DiskErrorException in the test.

          Also do you mind setting the affects and target version for this JIRA? It's good practice when filing a new JIRA.

          Show
          andrew.wang Andrew Wang added a comment - Looks good, thanks for the patch! Only a few comments: Looks like we don't need the data dirs themselves, just the #. Thus can we store a count instead? Let's use GenericTestUtils.assertExceptionContains to validate the DiskErrorException in the test. Also do you mind setting the affects and target version for this JIRA? It's good practice when filing a new JIRA.
          Hide
          linyiqun Yiqun Lin added a comment -

          Attach a initial patch. Thanks Brahma Reddy Battula for great idea. Andrew Wang, can see this JIRA and review my patch.

          Show
          linyiqun Yiqun Lin added a comment - Attach a initial patch. Thanks Brahma Reddy Battula for great idea. Andrew Wang , can see this JIRA and review my patch.

            People

            • Assignee:
              linyiqun Yiqun Lin
              Reporter:
              linyiqun Yiqun Lin
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development