Hadoop Common
  1. Hadoop Common
  2. HADOOP-972

Improve the rack-aware replica placement performance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      This issue aims to improve the rack-aware replica placement performance. A major idea is to avoid constructing lists of possible targets for random selection in chooseTarget, which currently needs interating all DatanodeDescriptors. I plan to change the NetworkTopology data structure as follow:
      1. each InnerNode stores its childrens as a list;
      2. each InnerNode adds a new field numberOfLeaves the total number of leaves (i.e. data nodes) in its subtree.
      NetworkTopology will support two new methods:
      1. DatanodeDescriptor chooseRandom( String scope): it randomly choose one leave from scope.
      2. DatanodeDescriptor chooseRandomExclude(String excludedScope): it randomly choose one leave from ~scope

      In addition, Issue 971 will also help improve the performance of the rack-aware DFS patch.

      1. rack_performance3.patch
        37 kB
        Hairong Kuang
      2. rack_performance2.patch
        36 kB
        Hairong Kuang
      3. rack_performance.patch
        35 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Hairong Kuang added a comment -

          The attached patch has a minor modification to the proposal. NetworkTopology adds one chooseRandom method. When the scope starts with ~, it means to choose a datanode excluding from scope; otherwise, it means to choose a random data node from scope.

          Show
          Hairong Kuang added a comment - The attached patch has a minor modification to the proposal. NetworkTopology adds one chooseRandom method. When the scope starts with ~, it means to choose a datanode excluding from scope; otherwise, it means to choose a random data node from scope.
          Hide
          Hairong Kuang added a comment -

          When I explained my patch to Dhruba this morning, he suggested that the number of retries in NetworkTopology.chooseRandom could be reduced by generating a random number in the range of [0, #datanodes-#excluded_datanodes) instead of in the range of [0, #datanodes). This patch incorporates his suggestion.

          Show
          Hairong Kuang added a comment - When I explained my patch to Dhruba this morning, he suggested that the number of retries in NetworkTopology.chooseRandom could be reduced by generating a random number in the range of [0, #datanodes-#excluded_datanodes) instead of in the range of [0, #datanodes). This patch incorporates his suggestion.
          Hide
          Milind Bhandarkar added a comment -

          A few comments:

          Replicator class needs to move out of FSNameSystem, by supplying it the clusterMap at construction. It adds about 450 lines of code to FSNamesystem.java, which is already quite big, affecting readablity.

          We should rename that class ReplicaChooser or some such thing. The name Replicator creates a false impression.

          Show
          Milind Bhandarkar added a comment - A few comments: Replicator class needs to move out of FSNameSystem, by supplying it the clusterMap at construction. It adds about 450 lines of code to FSNamesystem.java, which is already quite big, affecting readablity. We should rename that class ReplicaChooser or some such thing. The name Replicator creates a false impression.
          Hide
          Hairong Kuang added a comment -

          I agree that ReplicaChooser is a better name. But I am not sure if the class should be moved out of FSNamesystem. In additon to clusterMap, it also needs to calculate the avgLoad per data node. FSNameSystem provides the total load of the cluster.

          Show
          Hairong Kuang added a comment - I agree that ReplicaChooser is a better name. But I am not sure if the class should be moved out of FSNamesystem. In additon to clusterMap, it also needs to calculate the avgLoad per data node. FSNameSystem provides the total load of the cluster.
          Hide
          dhruba borthakur added a comment -

          +1, looks good.

          1. It may be possible to further optimize getLeave() by making it non-recursive. But in the current case, the network topology map is only two levels deep and this optimization might not give us any immediate performance gain.

          2. In this implementation, if we have a large number of racks, the time that chooseRandom() takes to pick a node increases when the selected node index lies towards the end of the range of datanode indices. Again, this probably will have some material impact only when the topology tree is deep and there are thousands of racks.

          Show
          dhruba borthakur added a comment - +1, looks good. 1. It may be possible to further optimize getLeave() by making it non-recursive. But in the current case, the network topology map is only two levels deep and this optimization might not give us any immediate performance gain. 2. In this implementation, if we have a large number of racks, the time that chooseRandom() takes to pick a node increases when the selected node index lies towards the end of the range of datanode indices. Again, this probably will have some material impact only when the topology tree is deep and there are thousands of racks.
          Hide
          Hairong Kuang added a comment -

          I did not mark the patch available. It became out-dated. So here comes an updated one.

          Show
          Hairong Kuang added a comment - I did not mark the patch available. It became out-dated. So here comes an updated one.
          Hide
          Hadoop QA added a comment -

          -1, because the javadoc command appears to have generated warning messages when testing the latest attachment (http://issues.apache.org/jira/secure/attachment/12352074/rack_performance2.patch) against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512006. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

          Show
          Hadoop QA added a comment - -1, because the javadoc command appears to have generated warning messages when testing the latest attachment ( http://issues.apache.org/jira/secure/attachment/12352074/rack_performance2.patch ) against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512006 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Hairong Kuang added a comment -

          The patch is free of javadoc warnings.

          Show
          Hairong Kuang added a comment - The patch is free of javadoc warnings.
          Hide
          Hairong Kuang added a comment -

          The rack-aware patch also fixed HADOOP-1013.

          Show
          Hairong Kuang added a comment - The rack-aware patch also fixed HADOOP-1013 .
          Hide
          Hadoop QA added a comment -

          -1, because javac generated 768 warnings (more than the acceptable 766 warnings) when testing the latest attachment (http://issues.apache.org/jira/secure/attachment/12352174/rack_performance3.patch) against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512461. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

          Show
          Hadoop QA added a comment - -1, because javac generated 768 warnings (more than the acceptable 766 warnings) when testing the latest attachment ( http://issues.apache.org/jira/secure/attachment/12352174/rack_performance3.patch ) against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512461 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Hadoop QA added a comment -
          Show
          Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12352174/rack_performance3.patch </a>) against trunk revision <a href= applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512499 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Hairong!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Hairong!

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development