Currently there are many tests which start with TestDataNodeVolumeFailure* frequently run timedout or failed. I found one failure test in recent Jenkins building. The stack info:
The related codes:
Here the code waits for the datanode failed all the volume and then become dead. But it timed out. We would be better to compare that if all the volumes are failed then wair for the datanode dead.
In addition, we can use the method checkDiskErrorSync to do the disk error check instead of creaing files. In this JIRA, I would like to extract this logic and defined that in DataNodeTestUtils. And then we can reuse this method for datanode volme failure testing in the future.