Details
Description
On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks each. Other 1500 nodes were almost empty.
When decommission started, namenode's queue overflowed every 6 minutes.
Looking at the cpu usage, it showed that every 5 minutes org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 100% of the CPU for 1 minute causing the queue to overflow.
public synchronized void decommissionedDatanodeCheck() { for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator(); it.hasNext();) { DatanodeDescriptor node = it.next(); checkDecommissionStateInternal(node); } }
Attachments
Attachments
Issue Links
- is related to
-
HDFS-283 Improve datanode decommission monitoring performance
- Resolved