Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1187

DFS Scalability: avoid scanning entire list of datanodes in getAdditionalBlocks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • None
    • None

    Description

      A new block allocations for a file scans the list of all known datanodes to find if the client that is a also a cluster node. If so, then it tries to allocate a replica locally. This check consumes plenty of CPU, especially if the number of datanodes in a cluster is large.

      An optimization: if the client is also a cluster node, then cache a reference to the corresponding DatanodeDescriptor from the entry in pendingCreate. The method getAdditionalBlock() uses the cached DatanodeDescriptor and thus avoids scanning the entire list of datanodes.

      Attachments

        1. clientPendingCreate2.patch
          4 kB
          Dhruba Borthakur

        Activity

          People

            dhruba Dhruba Borthakur
            dhruba Dhruba Borthakur
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: