Hadoop Common
  1. Hadoop Common
  2. HADOOP-1187

DFS Scalability: avoid scanning entire list of datanodes in getAdditionalBlocks

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      A new block allocations for a file scans the list of all known datanodes to find if the client that is a also a cluster node. If so, then it tries to allocate a replica locally. This check consumes plenty of CPU, especially if the number of datanodes in a cluster is large.

      An optimization: if the client is also a cluster node, then cache a reference to the corresponding DatanodeDescriptor from the entry in pendingCreate. The method getAdditionalBlock() uses the cached DatanodeDescriptor and thus avoids scanning the entire list of datanodes.

        Activity

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development