Details
Description
In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source class, that is checked during dispatching the moves that the Balancer and the Mover does. This timeout is hardwired to 20 minutes.
In the Balancer we have iterations, and even if an iteration is timing out the Balancer runs further and does an other iteration before it fails if there were no moves happened in a few iterations.
The Mover on the other hand does not have iterations, so if moving a path runs for more than 20 minutes, and there are moves decided and enqueued between two DataNode, after 20 minutes Mover will stop with the following exception reported to the console (lines might differ as this exception came from a CDH5.12.1 installation).
java.io.IOException: Block move timed out
at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
at org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Note that this issue is not coming up if all blocks can be moved inside the DataNodes without having to move the block to an other DataNode.
Attachments
Attachments
Issue Links
- causes
-
HDFS-13975 TestBalancer#testMaxIterationTime fails sporadically
- Resolved
- is caused by
-
HDFS-11015 Enforce timeout in balancer
- Resolved