Hadoop Common
  1. Hadoop Common
  2. HADOOP-5638

More improvement on block placement performance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.

      I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.

      1. excludedList.patch
        10 kB
        Hairong Kuang
      2. excludedList1.patch
        12 kB
        Hairong Kuang
      3. excludedList2.patch
        13 kB
        Hairong Kuang
      4. excludedList3.patch
        12 kB
        Hairong Kuang
      5. excludedList4.patch
        15 kB
        Hairong Kuang

        Issue Links

          Activity

          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #811 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/811/ )
          Hairong Kuang made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hairong Kuang added a comment -

          I've just committed this!

          Show
          Hairong Kuang added a comment - I've just committed this!
          Hide
          Hairong Kuang added a comment -

          Ant patch passed:
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Ant test-core passed except for crashed TestBackupNode reported at HADOOP-5573.

          Show
          Hairong Kuang added a comment - Ant patch passed: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Ant test-core passed except for crashed TestBackupNode reported at HADOOP-5573 .
          Hairong Kuang made changes -
          Attachment excludedList4.patch [ 12405706 ]
          Hide
          Hairong Kuang added a comment -

          This patch additionally fixed the typos that Suresh pointed out.

          Show
          Hairong Kuang added a comment - This patch additionally fixed the typos that Suresh pointed out.
          Hide
          Suresh Srinivas added a comment -

          +1

          Nits - There are several typos replca, datanodesthat, choosen, availabe, tranverses, relicated

          Show
          Suresh Srinivas added a comment - +1 Nits - There are several typos replca , datanodesthat , choosen , availabe , tranverses , relicated
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hairong Kuang made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hairong Kuang made changes -
          Attachment excludedList3.patch [ 12405508 ]
          Hide
          Hairong Kuang added a comment -

          This patch incorporates Suresh's review comment.

          Show
          Hairong Kuang added a comment - This patch incorporates Suresh's review comment.
          Hide
          Suresh Srinivas added a comment -

          ReplicationTargetChooser.java line 205, there is no need to check for containsKey(), you can just overwrite the the existing entry (if there is any) by calling just put(). This change can be done in other places as well.

          Show
          Suresh Srinivas added a comment - ReplicationTargetChooser.java line 205, there is no need to check for containsKey() , you can just overwrite the the existing entry (if there is any) by calling just put() . This change can be done in other places as well.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12405099/excludedList2.patch
          against trunk revision 763728.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405099/excludedList2.patch against trunk revision 763728. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/178/console This message is automatically generated.
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hairong Kuang made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hairong Kuang made changes -
          Attachment excludedList2.patch [ 12405099 ]
          Hide
          Hairong Kuang added a comment -

          This failed test seemed to be caused by port collision. Maybe there was a concurrent running of another junit test. The new patch fixed this.

          Show
          Hairong Kuang added a comment - This failed test seemed to be caused by port collision. Maybe there was a concurrent running of another junit test. The new patch fixed this.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12405025/excludedList1.patch
          against trunk revision 763502.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405025/excludedList1.patch against trunk revision 763502. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/170/console This message is automatically generated.
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hairong Kuang made changes -
          Attachment excludedList1.patch [ 12405025 ]
          Hide
          Hairong Kuang added a comment -

          This patch incorporates Nicholas' comment plus fixes a few javadoc typos.

          Show
          Hairong Kuang added a comment - This patch incorporates Nicholas' comment plus fixes a few javadoc typos.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The type of excludedNodes in countNumOfAvailableNodes(..) should be Collection<Node>. Everything else looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - The type of excludedNodes in countNumOfAvailableNodes(..) should be Collection<Node>. Everything else looks good.
          Hide
          Hairong Kuang added a comment -

          I did the same experiment described in HADOOP-5603. On a full cluster with 3150 nodes, the patch reduced the amount of time on placing a block on two datanodes from 2.1s to less than 0.1s.

          Show
          Hairong Kuang added a comment - I did the same experiment described in HADOOP-5603 . On a full cluster with 3150 nodes, the patch reduced the amount of time on placing a block on two datanodes from 2.1s to less than 0.1s.
          Hairong Kuang made changes -
          Link This issue is related to HADOOP-5603 [ HADOOP-5603 ]
          Hairong Kuang made changes -
          Attachment excludedList.patch [ 12404902 ]
          Hide
          Hairong Kuang added a comment -

          This patch changed the data type of the excluded list from ArrayList to HashMap.

          Show
          Hairong Kuang added a comment - This patch changed the data type of the excluded list from ArrayList to HashMap.
          Hairong Kuang made changes -
          Field Original Value New Value
          Description Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O(n), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.

          I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O(n) in average.
          Block placement algorithm currently has an excluded node list, which contains all datanodes that have been visited. This list is implemented as an array list, whose cost of inserting is O(1) but the cost of query "contains" is O( n ), where n is the number of datanodes. This makes the cost of block placement to be O(n*n) when a cluster is full.

          I propose to change the data structure of the excluded node list as a HashMap. So in average, the cost of insertion is O(1) and the cost of query is O(1). This makes the block placement algorithm to be O( n ) in average.
          Hairong Kuang created issue -

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development