Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10275

TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly



    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 2.7.3, 3.0.0-alpha1
    • test
    • None
    • Reviewed


      The unit test TestDataNodeMetrics fails intermittently. The failed info show these:

      Results :
      Failed tests: 
        TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232 expected:<false> but was:<true>
      Tests in error: 
        TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for Min...
        TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting for ...
        TestHFlush.testHFlushInterrupted ? IO The stream is closed

      In line 279 in TestDataNodeMetrics, it takes place timed out. Then I looked into the code and found the real reason is that the metric of TotalWriteTime frequently count 0 in each iteration of creating file. And the this leads to retry operations till timeout.
      I debug the test in my local. I found the most suspect reason which cause TotalWriteTime metric count always be 0 is that we using the SimulatedFSDataset for spending time test. In SimulatedFSDataset, it will use the inner class's method SimulatedOutputStream#write to count the write time and the method of this class just updates the length and throws its data away.

          public void write(byte[] b,
                    int off,
                    int len) throws IOException  {
            length += len;

      So the writing operation hardly not costs any time. So we should use a real way to create file instead of simulated way. I have tested in my local that the test is passed just one time when I delete the simulated way, while the test retries many times to count write time in old way.


        1. HDFS-10275.001.patch
          2 kB
          Yiqun Lin



            linyiqun Yiqun Lin
            linyiqun Yiqun Lin
            0 Vote for this issue
            4 Start watching this issue