Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4475

OutOfMemory by BPServiceActor.offerService() takes down DataNode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 2.0.3-alpha, 3.0.0-alpha1
    • None
    • None
    • None

    Description

      In DataNode, there are catchs around BPServiceActor.offerService() call but no catch for OutOfMemory as there is for the DataXeiver as introduced in 0.22.0.

      The issue can be replicated like this:
      1) Create a cluster of X DataNodes and 1 NameNode and low memory settings (-Xmx128M or something similar).
      2) Flood HDFS with small file creations (any should work actually).
      3) DataNodes will hit OoM, stop blockpool service, and shutdown.

      The resolution is to catch the OoMException and handle it properly when calling BPServiceActor.offerService() in DataNode.java; like as done in 0.22.0 of Hadoop. DataNodes should not shutdown or crash but remain in a sort of frozen state until memory issues are resolved by GC.

      LOG ERROR:
      2013-02-04 11:46:01,854 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected exception in block pool Block pool BP-1105714849-10.10.10.110-1360005776467 (storage id DS-1952316202-10.10.10.112-50010-1360005820993) service to vmhost2-vm0/10.10.10.110:8020
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      2013-02-04 11:46:01,854 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1105714849-10.10.10.110-1360005776467 (storage id DS-1952316202-10.10.10.112-50010-1360005820993) service to vmhost2-vm0/10.10.10.110:8020

      Attachments

        Issue Links

          Activity

            People

              zero45 Plamen Jeliazkov
              zero45 Plamen Jeliazkov
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: