Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10275

TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The unit test TestDataNodeMetrics fails intermittently. The failed info show these:

      Results :
      
      Failed tests: 
        TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 expected:<false> but was:<true>
      
      Tests in error: 
        TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for Min...
        TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting for ...
        TestHFlush.testHFlushInterrupted ? IO The stream is closed
      

      In line 279 in TestDataNodeMetrics, it takes place timed out. Then I looked into the code and found the real reason is that the metric of TotalWriteTime frequently count 0 in each iteration of creating file. And the this leads to retry operations till timeout.
      I debug the test in my local. I found the most suspect reason which cause TotalWriteTime metric count always be 0 is that we using the SimulatedFSDataset for spending time test. In SimulatedFSDataset, it will use the inner class's method SimulatedOutputStream#write to count the write time and the method of this class just updates the length and throws its data away.

          @Override
          public void write(byte[] b,
                    int off,
                    int len) throws IOException  {
            length += len;
          }
      

      So the writing operation hardly not costs any time. So we should use a real way to create file instead of simulated way. I have tested in my local that the test is passed just one time when I delete the simulated way, while the test retries many times to count write time in old way.

        Activity

        Hide
        linyiqun Yiqun Lin added a comment -

        Attach a simple patch from me. I also bump the timeout time in patch to avoid that the test executed on a busy jenkins slave, kindly review.

        Show
        linyiqun Yiqun Lin added a comment - Attach a simple patch from me. I also bump the timeout time in patch to avoid that the test executed on a busy jenkins slave, kindly review.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 16m 54s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 8m 3s trunk passed
        +1 compile 1m 12s trunk passed with JDK v1.8.0_77
        +1 compile 0m 56s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 24s trunk passed
        +1 mvnsite 1m 7s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 2m 32s trunk passed
        +1 javadoc 1m 37s trunk passed with JDK v1.8.0_77
        +1 javadoc 2m 25s trunk passed with JDK v1.7.0_95
        +1 mvninstall 1m 4s the patch passed
        +1 compile 1m 6s the patch passed with JDK v1.8.0_77
        +1 javac 1m 6s the patch passed
        +1 compile 0m 53s the patch passed with JDK v1.7.0_95
        +1 javac 0m 53s the patch passed
        +1 checkstyle 0m 21s the patch passed
        +1 mvnsite 1m 1s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 46s the patch passed
        +1 javadoc 1m 28s the patch passed with JDK v1.8.0_77
        +1 javadoc 2m 17s the patch passed with JDK v1.7.0_95
        -1 unit 106m 38s hadoop-hdfs in the patch failed with JDK v1.8.0_77.
        -1 unit 115m 55s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
        +1 asflicense 0m 36s Patch does not generate ASF License warnings.
        272m 38s



        Reason Tests
        JDK v1.8.0_77 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage
          hadoop.hdfs.security.TestDelegationTokenForProxyUser
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.server.namenode.TestEditLog
          hadoop.hdfs.TestSafeModeWithStripedFile
          hadoop.hdfs.server.mover.TestStorageMover
          hadoop.hdfs.qjournal.TestSecureNNWithQJM
          hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
          hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider
          hadoop.hdfs.server.datanode.TestDirectoryScanner
          hadoop.fs.contract.hdfs.TestHDFSContractSeek
        JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID
          hadoop.hdfs.web.TestWebHdfsTimeouts
          hadoop.hdfs.TestDFSUpgradeFromImage
          hadoop.hdfs.server.namenode.ha.TestEditLogTailer
          hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
          hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.TestReconstructStripedFile
          hadoop.hdfs.server.namenode.ha.TestHAAppend
          hadoop.fs.TestSymlinkHdfsFileContext
          hadoop.hdfs.server.namenode.TestFileTruncate
          hadoop.hdfs.server.datanode.TestDirectoryScanner
          hadoop.fs.contract.hdfs.TestHDFSContractSeek



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797954/HDFS-10275.001.patch
        JIRA Issue HDFS-10275
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux a6d9f4e3c447 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 1b78b2b
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/15129/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/15129/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15129/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15129/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15129/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15129/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 16m 54s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 3s trunk passed +1 compile 1m 12s trunk passed with JDK v1.8.0_77 +1 compile 0m 56s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 24s trunk passed +1 mvnsite 1m 7s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 2m 32s trunk passed +1 javadoc 1m 37s trunk passed with JDK v1.8.0_77 +1 javadoc 2m 25s trunk passed with JDK v1.7.0_95 +1 mvninstall 1m 4s the patch passed +1 compile 1m 6s the patch passed with JDK v1.8.0_77 +1 javac 1m 6s the patch passed +1 compile 0m 53s the patch passed with JDK v1.7.0_95 +1 javac 0m 53s the patch passed +1 checkstyle 0m 21s the patch passed +1 mvnsite 1m 1s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 46s the patch passed +1 javadoc 1m 28s the patch passed with JDK v1.8.0_77 +1 javadoc 2m 17s the patch passed with JDK v1.7.0_95 -1 unit 106m 38s hadoop-hdfs in the patch failed with JDK v1.8.0_77. -1 unit 115m 55s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 36s Patch does not generate ASF License warnings. 272m 38s Reason Tests JDK v1.8.0_77 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.namenode.TestEditLog   hadoop.hdfs.TestSafeModeWithStripedFile   hadoop.hdfs.server.mover.TestStorageMover   hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport   hadoop.hdfs.server.namenode.ha.TestRequestHedgingProxyProvider   hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.fs.contract.hdfs.TestHDFSContractSeek JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID   hadoop.hdfs.web.TestWebHdfsTimeouts   hadoop.hdfs.TestDFSUpgradeFromImage   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.TestReconstructStripedFile   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.fs.TestSymlinkHdfsFileContext   hadoop.hdfs.server.namenode.TestFileTruncate   hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.fs.contract.hdfs.TestHDFSContractSeek Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797954/HDFS-10275.001.patch JIRA Issue HDFS-10275 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux a6d9f4e3c447 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1b78b2b Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/15129/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15129/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15129/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15129/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15129/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15129/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        walter.k.su Walter Su added a comment -

        Good analysis! I think a better way to do this is to use a real FSDataset? Just remove SimulatedFSDataset.setFactory(conf);. What do you think ?

        Show
        walter.k.su Walter Su added a comment - Good analysis! I think a better way to do this is to use a real FSDataset? Just remove SimulatedFSDataset.setFactory(conf); . What do you think ?
        Hide
        linyiqun Yiqun Lin added a comment -

        Hi, Walter Su, I have removed SimulatedFSDataset.setFactory(conf); in my patch, do you means there is no need to bump the timeout time in addition?

        Show
        linyiqun Yiqun Lin added a comment - Hi, Walter Su , I have removed SimulatedFSDataset.setFactory(conf); in my patch, do you means there is no need to bump the timeout time in addition?
        Hide
        walter.k.su Walter Su added a comment -

        sorry I didn't see that. The patch LGTM. +1.

        Show
        walter.k.su Walter Su added a comment - sorry I didn't see that. The patch LGTM. +1.
        Hide
        walter.k.su Walter Su added a comment -

        Committed to trunk, branch-2, branch-2.8, branch-2.7. Thanks Yiqun Lin for the contribution!

        Show
        walter.k.su Walter Su added a comment - Committed to trunk, branch-2, branch-2.8, branch-2.7. Thanks Yiqun Lin for the contribution!
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9626 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9626/)
        HDFS-10275. TestDataNodeMetrics failing intermittently due to (waltersu4549: rev ab903029a9d353677184ff5602966b11ffb408b9)

        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9626 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9626/ ) HDFS-10275 . TestDataNodeMetrics failing intermittently due to (waltersu4549: rev ab903029a9d353677184ff5602966b11ffb408b9) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
        Hide
        linyiqun Yiqun Lin added a comment -

        Thanks Walter Su for commit!

        Show
        linyiqun Yiqun Lin added a comment - Thanks Walter Su for commit!
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Closing the JIRA as part of 2.7.3 release.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.

          People

          • Assignee:
            linyiqun Yiqun Lin
            Reporter:
            linyiqun Yiqun Lin
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development