Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.7.3
-
None
-
Reviewed
Description
When running balancer on large cluster which have more than 3000 Datanodes, it might be hung due to "No mover threads available".
The stack trace shows it waiting forever like below.
"main" #1 prio=5 os_prio=0 tid=0x00007ff6cc014800 nid=0x6b2c waiting on condition [0x00007ff6d1bad000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043) at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017) at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981) at org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611) at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663) at org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
In the log, there are lots of WARN about "No mover threads available".
2017-01-26 15:36:40,085 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads available: skip moving blk_13700554102_1112815018180 with size=268435456 from 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.137:50010
2017-01-26 15:36:40,085 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads available: skip moving blk_4009558842_1103118359883 with size=268435456 from 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.137:50010
2017-01-26 15:36:40,085 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads available: skip moving blk_13881956058_1112996460026 with size=133509566 from 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
What happened here is, when there are no mover threads available, DDatanode.isPendingQEmpty() will return false, so Balancer hung.
Attachments
Attachments
Issue Links
- is depended upon by
-
HDFS-11742 Improve balancer usability after HDFS-8818
- Resolved
- is related to
-
HDFS-8818 Allow Balancer to run faster
- Resolved