Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1187

DFS Scalability: avoid scanning entire list of datanodes in getAdditionalBlocks

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      A new block allocations for a file scans the list of all known datanodes to find if the client that is a also a cluster node. If so, then it tries to allocate a replica locally. This check consumes plenty of CPU, especially if the number of datanodes in a cluster is large.

      An optimization: if the client is also a cluster node, then cache a reference to the corresponding DatanodeDescriptor from the entry in pendingCreate. The method getAdditionalBlock() uses the cached DatanodeDescriptor and thus avoids scanning the entire list of datanodes.

        Attachments

          Activity

            People

            • Assignee:
              dhruba dhruba borthakur
              Reporter:
              dhruba dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: