Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
One of our clusters sometimes couldn't allocate blocks from any DNs. BlockPlacementPolicyDefault complains with the following messages for all DNs.
the node is too busy (load:x > y)
It turns out the HeartbeatManager's nodesInService was computed incorrectly when admins decomm or recomm dead nodes. Here are two scenarios.
- Decomm dead nodes. It turns out
HDFS-7374has fixed it; not sure if it is intentional. cc / zhz, andrew.wang, atm Here is the sequence of event withoutHDFS-7374.- Cluster has one live node. nodesInService == 1
- The node becomes dead. nodesInService == 0
- Decomm the node. nodesInService == -1
- However,
HDFS-7374introduces another inconsistency when recomm is involved.- Cluster has one live node. nodesInService == 1
- The node becomes dead. nodesInService == 0
- Decomm the node. nodesInService == 0
- Recomm the node. nodesInService == 1
Attachments
Attachments
Issue Links
- is duplicated by
-
HDFS-9287 Block placement completely fails if too many nodes are decommissioning
- Resolved
- is related to
-
HDFS-15761 Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately
- Patch Available