Issue Details (XML | Word | Printable)

Key: HADOOP-4061
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Tsz Wo (Nicholas), SZE
Reporter: Koji Noguchi
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Large number of decommission freezes the Namenode

Created: 03/Sep/08 06:03 PM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: None
Affects Version/s: 0.17.2
Fix Version/s: 0.18.3, 0.19.1

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 4061_20081119.patch 2008-11-19 10:36 PM Tsz Wo (Nicholas), SZE 12 kB
Text File Licensed for inclusion in ASF works 4061_20081120.patch 2008-11-20 10:45 PM Tsz Wo (Nicholas), SZE 13 kB
Text File Licensed for inclusion in ASF works 4061_20081120b.patch 2008-11-21 01:04 AM Tsz Wo (Nicholas), SZE 13 kB
Text File Licensed for inclusion in ASF works 4061_20081123.patch 2008-11-23 09:10 AM Tsz Wo (Nicholas), SZE 17 kB
Text File Licensed for inclusion in ASF works 4061_20081124.patch 2008-11-24 07:51 PM Tsz Wo (Nicholas), SZE 20 kB
Text File Licensed for inclusion in ASF works 4061_20081124b.patch 2008-11-24 10:13 PM Tsz Wo (Nicholas), SZE 19 kB
Text File Licensed for inclusion in ASF works 4061_20081124c.patch 2008-11-24 11:37 PM Tsz Wo (Nicholas), SZE 20 kB
Text File Licensed for inclusion in ASF works 4061_20081124c_0.18.patch 2008-11-25 05:49 PM Tsz Wo (Nicholas), SZE 18 kB
Text File Licensed for inclusion in ASF works HADOOP-4061.patch 2008-11-24 10:24 PM Raghu Angadi 13 kB
Issue Links:
Reference
 

Hadoop Flags: Incompatible change, Reviewed
Release Note:
Added a new conf property dfs.namenode.decommission.nodes.per.interval so that NameNode checks decommission status of x nodes for every y seconds, where x is the value of dfs.namenode.decommission.nodes.per.interval and y is the value of dfs.namenode.decommission.interval.
Resolution Date: 25/Nov/08 06:06 PM


 Description  « Hide
On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks each. Other 1500 nodes were almost empty.

When decommission started, namenode's queue overflowed every 6 minutes.

Looking at the cpu usage, it showed that every 5 minutes org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 100% of the CPU for 1 minute causing the queue to overflow.

  public synchronized void decommissionedDatanodeCheck() {
    for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
         it.hasNext();) {
      DatanodeDescriptor node = it.next();
      checkDecommissionStateInternal(node);
    }
  }


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
No work has yet been logged on this issue.