Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10279

Improve validation of the configured number of tolerated failed volumes

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are detected too late and not easily be found. We can move the validation logic for tolerated volumes to a eariler time that before datanode regists to namenode. And this will let us detect the misconfiguration soon and easily.

      1. HDFS-10279.001.patch
        7 kB
        Yiqun Lin
      2. HDFS-10279.002.patch
        7 kB
        Yiqun Lin

        Activity

        Hide
        linyiqun Yiqun Lin added a comment -

        Attach a initial patch. Thanks Brahma Reddy Battula for great idea. Andrew Wang, can see this JIRA and review my patch.

        Show
        linyiqun Yiqun Lin added a comment - Attach a initial patch. Thanks Brahma Reddy Battula for great idea. Andrew Wang , can see this JIRA and review my patch.
        Hide
        andrew.wang Andrew Wang added a comment -

        Looks good, thanks for the patch! Only a few comments:

        • Looks like we don't need the data dirs themselves, just the #. Thus can we store a count instead?
        • Let's use GenericTestUtils.assertExceptionContains to validate the DiskErrorException in the test.

        Also do you mind setting the affects and target version for this JIRA? It's good practice when filing a new JIRA.

        Show
        andrew.wang Andrew Wang added a comment - Looks good, thanks for the patch! Only a few comments: Looks like we don't need the data dirs themselves, just the #. Thus can we store a count instead? Let's use GenericTestUtils.assertExceptionContains to validate the DiskErrorException in the test. Also do you mind setting the affects and target version for this JIRA? It's good practice when filing a new JIRA.
        Hide
        linyiqun Yiqun Lin added a comment -

        Thanks Andrew Wang for review. Update the latest patch to address the comments, pending jenkins.

        Show
        linyiqun Yiqun Lin added a comment - Thanks Andrew Wang for review. Update the latest patch to address the comments, pending jenkins.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 10s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 38s trunk passed
        +1 compile 0m 40s trunk passed with JDK v1.8.0_77
        +1 compile 0m 41s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 24s trunk passed
        +1 mvnsite 0m 51s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 1m 55s trunk passed
        +1 javadoc 1m 5s trunk passed with JDK v1.8.0_77
        +1 javadoc 1m 45s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 46s the patch passed
        +1 compile 0m 38s the patch passed with JDK v1.8.0_77
        +1 javac 0m 38s the patch passed
        +1 compile 0m 38s the patch passed with JDK v1.7.0_95
        +1 javac 0m 38s the patch passed
        +1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 330 unchanged - 1 fixed = 330 total (was 331)
        +1 mvnsite 0m 49s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 8s the patch passed
        +1 javadoc 1m 3s the patch passed with JDK v1.8.0_77
        +1 javadoc 1m 41s the patch passed with JDK v1.7.0_95
        -1 unit 73m 36s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
        -1 unit 71m 7s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
        +1 asflicense 0m 20s Patch does not generate ASF License warnings.
        169m 42s



        Reason Tests
        JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
          hadoop.hdfs.TestReadStripedFileWithMissingBlocks
          hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
          hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
          hadoop.hdfs.TestFileAppend
          hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure
        JDK v1.8.0_77 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile
          org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding
        JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID
          hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA
          hadoop.hdfs.TestReadStripedFileWithMissingBlocks
          hadoop.hdfs.server.namenode.TestEditLog
          hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
          hadoop.hdfs.server.datanode.TestCachingStrategy
          hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
          hadoop.hdfs.TestDFSClientFailover
          hadoop.hdfs.server.namenode.ha.TestDNFencing
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure
        JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile
          org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798426/HDFS-10279.002.patch
        JIRA Issue HDFS-10279
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 20322a1af371 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 35f0770
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15146/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15146/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 38s trunk passed +1 compile 0m 40s trunk passed with JDK v1.8.0_77 +1 compile 0m 41s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 5s trunk passed with JDK v1.8.0_77 +1 javadoc 1m 45s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 46s the patch passed +1 compile 0m 38s the patch passed with JDK v1.8.0_77 +1 javac 0m 38s the patch passed +1 compile 0m 38s the patch passed with JDK v1.7.0_95 +1 javac 0m 38s the patch passed +1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 330 unchanged - 1 fixed = 330 total (was 331) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 8s the patch passed +1 javadoc 1m 3s the patch passed with JDK v1.8.0_77 +1 javadoc 1m 41s the patch passed with JDK v1.7.0_95 -1 unit 73m 36s hadoop-hdfs in the patch failed with JDK v1.8.0_77. -1 unit 71m 7s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 169m 42s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   hadoop.hdfs.TestReadStripedFileWithMissingBlocks   hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation   hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead   hadoop.hdfs.TestFileAppend   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure JDK v1.8.0_77 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID   hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA   hadoop.hdfs.TestReadStripedFileWithMissingBlocks   hadoop.hdfs.server.namenode.TestEditLog   hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead   hadoop.hdfs.server.datanode.TestCachingStrategy   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.TestDFSClientFailover   hadoop.hdfs.server.namenode.ha.TestDNFencing   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798426/HDFS-10279.002.patch JIRA Issue HDFS-10279 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 20322a1af371 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 35f0770 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15146/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15146/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15146/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        linyiqun Yiqun Lin added a comment -

        The failed unit test TestFsDatasetImpl is caused by TestFsDatasetImpl.testCleanShutdownOfVolume which tracked by HDFS-10260, the other failed tests seem not related.

        Show
        linyiqun Yiqun Lin added a comment - The failed unit test TestFsDatasetImpl is caused by TestFsDatasetImpl.testCleanShutdownOfVolume which tracked by HDFS-10260 , the other failed tests seem not related.
        Hide
        andrew.wang Andrew Wang added a comment -

        Thanks again Yiqun Lin, committed to trunk, branch-2, branch-2.8.

        Show
        andrew.wang Andrew Wang added a comment - Thanks again Yiqun Lin , committed to trunk, branch-2, branch-2.8.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #9606 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9606/)
        HDFS-10279. Improve validation of the configured number of tolerated (wang: rev 314aa21a89134fac68ac3cb95efdeb56bd3d7b05)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #9606 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9606/ ) HDFS-10279 . Improve validation of the configured number of tolerated (wang: rev 314aa21a89134fac68ac3cb95efdeb56bd3d7b05) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
        Hide
        linyiqun Yiqun Lin added a comment -

        Thanks Andrew Wang for commit!

        Show
        linyiqun Yiqun Lin added a comment - Thanks Andrew Wang for commit!

          People

          • Assignee:
            linyiqun Yiqun Lin
            Reporter:
            linyiqun Yiqun Lin
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development