Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4061

Large number of decommission freezes the Namenode

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.2
    • Fix Version/s: 0.18.3, 0.19.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Added a new conf property dfs.namenode.decommission.nodes.per.interval so that NameNode checks decommission status of x nodes for every y seconds, where x is the value of dfs.namenode.decommission.nodes.per.interval and y is the value of dfs.namenode.decommission.interval.
      Show
      Added a new conf property dfs.namenode.decommission.nodes.per.interval so that NameNode checks decommission status of x nodes for every y seconds, where x is the value of dfs.namenode.decommission.nodes.per.interval and y is the value of dfs.namenode.decommission.interval.

      Description

      On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks each. Other 1500 nodes were almost empty.

      When decommission started, namenode's queue overflowed every 6 minutes.

      Looking at the cpu usage, it showed that every 5 minutes org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 100% of the CPU for 1 minute causing the queue to overflow.

        public synchronized void decommissionedDatanodeCheck() {
          for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
               it.hasNext();) {
            DatanodeDescriptor node = it.next();
            checkDecommissionStateInternal(node);
          }
        }
      

        Attachments

        1. 4061_20081119.patch
          12 kB
          Tsz Wo Nicholas Sze
        2. 4061_20081120.patch
          13 kB
          Tsz Wo Nicholas Sze
        3. 4061_20081120b.patch
          13 kB
          Tsz Wo Nicholas Sze
        4. 4061_20081123.patch
          17 kB
          Tsz Wo Nicholas Sze
        5. 4061_20081124.patch
          20 kB
          Tsz Wo Nicholas Sze
        6. 4061_20081124b.patch
          19 kB
          Tsz Wo Nicholas Sze
        7. 4061_20081124c_0.18.patch
          18 kB
          Tsz Wo Nicholas Sze
        8. 4061_20081124c.patch
          20 kB
          Tsz Wo Nicholas Sze
        9. HADOOP-4061.patch
          13 kB
          Raghu Angadi

          Issue Links

            Activity

              People

              • Assignee:
                szetszwo Tsz Wo Nicholas Sze
                Reporter:
                knoguchi Koji Noguchi
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: