Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1187

DFS Scalability: avoid scanning entire list of datanodes in getAdditionalBlocks

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      A new block allocations for a file scans the list of all known datanodes to find if the client that is a also a cluster node. If so, then it tries to allocate a replica locally. This check consumes plenty of CPU, especially if the number of datanodes in a cluster is large.

      An optimization: if the client is also a cluster node, then cache a reference to the corresponding DatanodeDescriptor from the entry in pendingCreate. The method getAdditionalBlock() uses the cached DatanodeDescriptor and thus avoids scanning the entire list of datanodes.

        Activity

        Hide
        hadoopqa Hadoop QA added a comment -
        Show
        hadoopqa Hadoop QA added a comment - Integrated in Hadoop-Nightly #48 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/48/ )
        Hide
        tomwhite Tom White added a comment -

        I've just committed this. Thanks Dhruba!

        Show
        tomwhite Tom White added a comment - I've just committed this. Thanks Dhruba!
        Show
        hadoopqa Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354888/clientPendingCreate2.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525290 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        dhruba dhruba borthakur added a comment -

        Merged with latest code.

        Show
        dhruba dhruba borthakur added a comment - Merged with latest code.
        Hide
        hairong Hairong Kuang added a comment -

        +1 The code looks good.

        Show
        hairong Hairong Kuang added a comment - +1 The code looks good.
        Hide
        dhruba dhruba borthakur added a comment -

        A reference to the client's datanode descriptor is stored in pendingCreates. This avoids a hash-lookup in getAdditionalBlock.

        Show
        dhruba dhruba borthakur added a comment - A reference to the client's datanode descriptor is stored in pendingCreates. This avoids a hash-lookup in getAdditionalBlock.
        Hide
        dhruba dhruba borthakur added a comment -

        It could help. But if we store it in the pendingCreates, then we avoid this lookup completely.

        Show
        dhruba dhruba borthakur added a comment - It could help. But if we store it in the pendingCreates, then we avoid this lookup completely.
        Hide
        rangadi Raghu Angadi added a comment -

        Hairong is adding map for hostname to datanode map. Would that help?

        Show
        rangadi Raghu Angadi added a comment - Hairong is adding map for hostname to datanode map. Would that help?

          People

          • Assignee:
            dhruba dhruba borthakur
            Reporter:
            dhruba dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development