The unit test TestDataNodeMetrics fails intermittently. The failed info show these:
In line 279 in TestDataNodeMetrics, it takes place timed out. Then I looked into the code and found the real reason is that the metric of TotalWriteTime frequently count 0 in each iteration of creating file. And the this leads to retry operations till timeout.
I debug the test in my local. I found the most suspect reason which cause TotalWriteTime metric count always be 0 is that we using the SimulatedFSDataset for spending time test. In SimulatedFSDataset, it will use the inner class's method SimulatedOutputStream#write to count the write time and the method of this class just updates the length and throws its data away.
So the writing operation hardly not costs any time. So we should use a real way to create file instead of simulated way. I have tested in my local that the test is passed just one time when I delete the simulated way, while the test retries many times to count write time in old way.