[HADOOP-4061] Large number of decommission freezes the Namenode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.17.2
Fix Version/s: 0.18.3, 0.19.1
Component/s: None
Labels:
None

Hadoop Flags:

Incompatible change, Reviewed
Release Note:

Hide
Added a new conf property dfs.namenode.decommission.nodes.per.interval so that NameNode checks decommission status of x nodes for every y seconds, where x is the value of dfs.namenode.decommission.nodes.per.interval and y is the value of dfs.namenode.decommission.interval.

Show
Added a new conf property dfs.namenode.decommission.nodes.per.interval so that NameNode checks decommission status of x nodes for every y seconds, where x is the value of dfs.namenode.decommission.nodes.per.interval and y is the value of dfs.namenode.decommission.interval.

Description

On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks each. Other 1500 nodes were almost empty.

When decommission started, namenode's queue overflowed every 6 minutes.

Looking at the cpu usage, it showed that every 5 minutes org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 100% of the CPU for 1 minute causing the queue to overflow.

  public synchronized void decommissionedDatanodeCheck() {
    for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
         it.hasNext();) {
      DatanodeDescriptor node = it.next();
      checkDecommissionStateInternal(node);
    }
  }

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

4061_20081119.patch
19/Nov/08 22:36
12 kB
Tsz-wo Sze
4061_20081120.patch
20/Nov/08 22:45
13 kB
Tsz-wo Sze
4061_20081120b.patch
21/Nov/08 01:04
13 kB
Tsz-wo Sze
4061_20081123.patch
23/Nov/08 09:10
17 kB
Tsz-wo Sze
4061_20081124.patch
24/Nov/08 19:51
20 kB
Tsz-wo Sze
4061_20081124b.patch
24/Nov/08 22:13
19 kB
Tsz-wo Sze
4061_20081124c_0.18.patch
25/Nov/08 17:49
18 kB
Tsz-wo Sze
4061_20081124c.patch
24/Nov/08 23:37
20 kB
Tsz-wo Sze
HADOOP-4061.patch
24/Nov/08 22:24
13 kB
Raghu Angadi

Issue Links

is related to

HDFS-283 Improve datanode decommission monitoring performance

Resolved

Activity

People

Assignee:: Tsz-wo Sze

Reporter:: Koji Noguchi

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 03/Sep/08 18:03

Updated:: 08/Jul/09 16:43

Resolved:: 25/Nov/08 18:06