Hadoop Common
  1. Hadoop Common
  2. HADOOP-972

Improve the rack-aware replica placement performance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      This issue aims to improve the rack-aware replica placement performance. A major idea is to avoid constructing lists of possible targets for random selection in chooseTarget, which currently needs interating all DatanodeDescriptors. I plan to change the NetworkTopology data structure as follow:
      1. each InnerNode stores its childrens as a list;
      2. each InnerNode adds a new field numberOfLeaves the total number of leaves (i.e. data nodes) in its subtree.
      NetworkTopology will support two new methods:
      1. DatanodeDescriptor chooseRandom( String scope): it randomly choose one leave from scope.
      2. DatanodeDescriptor chooseRandomExclude(String excludedScope): it randomly choose one leave from ~scope

      In addition, Issue 971 will also help improve the performance of the rack-aware DFS patch.

      1. rack_performance3.patch
        37 kB
        Hairong Kuang
      2. rack_performance2.patch
        36 kB
        Hairong Kuang
      3. rack_performance.patch
        35 kB
        Hairong Kuang

        Issue Links

          Activity

          Hairong Kuang created issue -
          Hairong Kuang made changes -
          Field Original Value New Value
          Attachment rack_performance.patch [ 12351201 ]
          Hide
          Hairong Kuang added a comment -

          The attached patch has a minor modification to the proposal. NetworkTopology adds one chooseRandom method. When the scope starts with ~, it means to choose a datanode excluding from scope; otherwise, it means to choose a random data node from scope.

          Show
          Hairong Kuang added a comment - The attached patch has a minor modification to the proposal. NetworkTopology adds one chooseRandom method. When the scope starts with ~, it means to choose a datanode excluding from scope; otherwise, it means to choose a random data node from scope.
          Hairong Kuang made changes -
          Attachment rack_performance.patch [ 12351201 ]
          Hide
          Hairong Kuang added a comment -

          When I explained my patch to Dhruba this morning, he suggested that the number of retries in NetworkTopology.chooseRandom could be reduced by generating a random number in the range of [0, #datanodes-#excluded_datanodes) instead of in the range of [0, #datanodes). This patch incorporates his suggestion.

          Show
          Hairong Kuang added a comment - When I explained my patch to Dhruba this morning, he suggested that the number of retries in NetworkTopology.chooseRandom could be reduced by generating a random number in the range of [0, #datanodes-#excluded_datanodes) instead of in the range of [0, #datanodes). This patch incorporates his suggestion.
          Hairong Kuang made changes -
          Attachment rack_performance.patch [ 12351302 ]
          Hairong Kuang made changes -
          Attachment rack_performance.patch [ 12351302 ]
          Hairong Kuang made changes -
          Attachment rack_performance.patch [ 12351303 ]
          Hide
          Milind Bhandarkar added a comment -

          A few comments:

          Replicator class needs to move out of FSNameSystem, by supplying it the clusterMap at construction. It adds about 450 lines of code to FSNamesystem.java, which is already quite big, affecting readablity.

          We should rename that class ReplicaChooser or some such thing. The name Replicator creates a false impression.

          Show
          Milind Bhandarkar added a comment - A few comments: Replicator class needs to move out of FSNameSystem, by supplying it the clusterMap at construction. It adds about 450 lines of code to FSNamesystem.java, which is already quite big, affecting readablity. We should rename that class ReplicaChooser or some such thing. The name Replicator creates a false impression.
          Hide
          Hairong Kuang added a comment -

          I agree that ReplicaChooser is a better name. But I am not sure if the class should be moved out of FSNamesystem. In additon to clusterMap, it also needs to calculate the avgLoad per data node. FSNameSystem provides the total load of the cluster.

          Show
          Hairong Kuang added a comment - I agree that ReplicaChooser is a better name. But I am not sure if the class should be moved out of FSNamesystem. In additon to clusterMap, it also needs to calculate the avgLoad per data node. FSNameSystem provides the total load of the cluster.
          Hide
          dhruba borthakur added a comment -

          +1, looks good.

          1. It may be possible to further optimize getLeave() by making it non-recursive. But in the current case, the network topology map is only two levels deep and this optimization might not give us any immediate performance gain.

          2. In this implementation, if we have a large number of racks, the time that chooseRandom() takes to pick a node increases when the selected node index lies towards the end of the range of datanode indices. Again, this probably will have some material impact only when the topology tree is deep and there are thousands of racks.

          Show
          dhruba borthakur added a comment - +1, looks good. 1. It may be possible to further optimize getLeave() by making it non-recursive. But in the current case, the network topology map is only two levels deep and this optimization might not give us any immediate performance gain. 2. In this implementation, if we have a large number of racks, the time that chooseRandom() takes to pick a node increases when the selected node index lies towards the end of the range of datanode indices. Again, this probably will have some material impact only when the topology tree is deep and there are thousands of racks.
          Hide
          Hairong Kuang added a comment -

          I did not mark the patch available. It became out-dated. So here comes an updated one.

          Show
          Hairong Kuang added a comment - I did not mark the patch available. It became out-dated. So here comes an updated one.
          Hairong Kuang made changes -
          Attachment rack_performance2.patch [ 12352074 ]
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1, because the javadoc command appears to have generated warning messages when testing the latest attachment (http://issues.apache.org/jira/secure/attachment/12352074/rack_performance2.patch) against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512006. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

          Show
          Hadoop QA added a comment - -1, because the javadoc command appears to have generated warning messages when testing the latest attachment ( http://issues.apache.org/jira/secure/attachment/12352074/rack_performance2.patch ) against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512006 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Hairong Kuang added a comment -

          The patch is free of javadoc warnings.

          Show
          Hairong Kuang added a comment - The patch is free of javadoc warnings.
          Hairong Kuang made changes -
          Attachment rack_performance3.patch [ 12352174 ]
          Hairong Kuang made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hairong Kuang made changes -
          Link This issue incorporates HADOOP-1013 [ HADOOP-1013 ]
          Hide
          Hairong Kuang added a comment -

          The rack-aware patch also fixed HADOOP-1013.

          Show
          Hairong Kuang added a comment - The rack-aware patch also fixed HADOOP-1013 .
          Hide
          Hadoop QA added a comment -

          -1, because javac generated 768 warnings (more than the acceptable 766 warnings) when testing the latest attachment (http://issues.apache.org/jira/secure/attachment/12352174/rack_performance3.patch) against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512461. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

          Show
          Hadoop QA added a comment - -1, because javac generated 768 warnings (more than the acceptable 766 warnings) when testing the latest attachment ( http://issues.apache.org/jira/secure/attachment/12352174/rack_performance3.patch ) against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512461 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Hadoop QA added a comment -
          Show
          Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12352174/rack_performance3.patch </a>) against trunk revision <a href= applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512499 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Hairong!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Hairong!
          Doug Cutting made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Doug Cutting made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          20h 15m 1 Hairong Kuang 27/Feb/07 21:39
          Open Open Patch Available Patch Available
          21d 23h 2m 2 Hairong Kuang 27/Feb/07 21:39
          Patch Available Patch Available Resolved Resolved
          22h 4m 1 Doug Cutting 28/Feb/07 19:43
          Resolved Resolved Closed Closed
          2d 3h 18m 1 Doug Cutting 02/Mar/07 23:02

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development