Hadoop Common
  1. Hadoop Common
  2. HADOOP-6616

Improve documentation for rack awareness

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0
    • Component/s: documentation
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      The current documentation for rack awareness (http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html#Hadoop+Rack+Awareness) should be augmented to include a sample script.

      1. hadoop-6616.patch.4
        10 kB
        Adam Faris
      2. hadoop-6616.patch.3
        10 kB
        Adam Faris
      3. hadoop-6616.patch.2
        14 kB
        Adam Faris
      4. hadoop-6616.patch
        13 kB
        Adam Faris

        Activity

        Hide
        Adam Faris added a comment -

        Here's a documentation update for cluster_setup.xml. Inside the update one will find several topology script examples, a link to the NetworkTopology.java file in Apache's subversion tree, and a expanded explanation of how rack awareness works.

        Show
        Adam Faris added a comment - Here's a documentation update for cluster_setup.xml. Inside the update one will find several topology script examples, a link to the NetworkTopology.java file in Apache's subversion tree, and a expanded explanation of how rack awareness works.
        Hide
        Joep Rottinghuis added a comment -

        Nice Adam. Nit int the second-last paragraph:

        If neither <code>topology.script.file.name</code> or <code>topology.script.file.name</code> is 
        not set, the rack id '/default-rack' is returned for any passed IP address. 
        

        Neither not is a double negative.
        You mention the property topology.script.file.name twice. Did you mean the following?

        If neither <code>topology.script.file.name</code> nor <code>topology.node.switch.mapping.impl</code> is set, the rack id '/default-rack' is returned for any passed IP address. 
        
        Show
        Joep Rottinghuis added a comment - Nice Adam. Nit int the second-last paragraph: If neither <code>topology.script.file.name</code> or <code>topology.script.file.name</code> is not set, the rack id '/default-rack' is returned for any passed IP address. Neither not is a double negative. You mention the property topology.script.file.name twice. Did you mean the following? If neither <code>topology.script.file.name</code> nor <code>topology.node.switch.mapping.impl</code> is set, the rack id '/default-rack' is returned for any passed IP address.
        Hide
        Joep Rottinghuis added a comment -

        Perhaps you can add a few words about <code>topology.script.number.args<code> which IIRC defaults to 100 and drives the number of host IPs passed to the script in one go, to allow the script to do some internal caching. When set to 1 then a process will be spawned to invoke the script for each host.

        You describe how the rack awareness works. Would it be useful to add a sentence or two about why this is used (ie. block placement uses rack awareness for fault tolerance)?

        Show
        Joep Rottinghuis added a comment - Perhaps you can add a few words about <code>topology.script.number.args<code> which IIRC defaults to 100 and drives the number of host IPs passed to the script in one go, to allow the script to do some internal caching. When set to 1 then a process will be spawned to invoke the script for each host. You describe how the rack awareness works. Would it be useful to add a sentence or two about why this is used (ie. block placement uses rack awareness for fault tolerance)?
        Hide
        Adam Faris added a comment -

        Hi Joep,

        Thanks for reviewing the patch and providing feedback. That double negative ... uggh! I wasn't aware of 'topology.script.number.args' so thanks for mentioning it. Here's an updated patch which includes updates to sample scripts for looping over STDIN, topology.script.number.args, and why hadoop should care about rack awareness.

        – Adam

        Show
        Adam Faris added a comment - Hi Joep, Thanks for reviewing the patch and providing feedback. That double negative ... uggh! I wasn't aware of 'topology.script.number.args' so thanks for mentioning it. Here's an updated patch which includes updates to sample scripts for looping over STDIN, topology.script.number.args, and why hadoop should care about rack awareness. – Adam
        Hide
        Joep Rottinghuis added a comment -

        Update looks good. I ran the perl and python example scripts. The first two ran fine. See comment below on the BASH script.

        Nit: (line 18 of the patch)
        The <code>NameNode</code> and the <code>JobTracker</code> obtains
        NN and JT obtain (singular).
        I think this same error already exists in http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html#Hadoop+Rack+Awareness

        One refinement to

        The jobtracker uses rack awareness to reduce network transfers of HDFS data blocks, as it will schedule tasks on nodes located within the same rack containing the needed HDFS data blocks.
        

        If the tasks cannot be scheduled on the DNs containing the needed HDFS blocks, then the tasks will be scheduled on the same rack to reduce network transfers if possible.

        Line 41 again the NN and JT obtain instead of obtains (plural).

        Line 65:

        Hadoop will send multiple IP addresses on STDIN when forking the topology script.
        

        I think IP addresses are passed as arguments, not on STDIN. The first Perl script reads this correctly from ARGV but the comment in the script reads that it gets it from STDIN.

        The BASH example for flat network always returns /rack-unkown
        I think that is due to

        if [ -n $# ];
        

        Something like

        if [ "$1" == "" ]; then
        

        Once that is fixed, the script errors out missing closing brace in the for statement. This should be:

        for host in ${BASH_ARGV[*]}; do
        

        I had some trouble with mis-matching single quote (which is strange as they occurred only in the comments).

        Same STDIN comment in other scripts.

        I could not get the last python script (the one that makes assumptions about the physical environment) to work because I do not have hosts that are called "dn" something.

        Show
        Joep Rottinghuis added a comment - Update looks good. I ran the perl and python example scripts. The first two ran fine. See comment below on the BASH script. Nit: (line 18 of the patch) The <code>NameNode</code> and the <code>JobTracker</code> obtains NN and JT obtain (singular). I think this same error already exists in http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html#Hadoop+Rack+Awareness One refinement to The jobtracker uses rack awareness to reduce network transfers of HDFS data blocks, as it will schedule tasks on nodes located within the same rack containing the needed HDFS data blocks. If the tasks cannot be scheduled on the DNs containing the needed HDFS blocks, then the tasks will be scheduled on the same rack to reduce network transfers if possible. Line 41 again the NN and JT obtain instead of obtains (plural). Line 65: Hadoop will send multiple IP addresses on STDIN when forking the topology script. I think IP addresses are passed as arguments, not on STDIN. The first Perl script reads this correctly from ARGV but the comment in the script reads that it gets it from STDIN. The BASH example for flat network always returns /rack-unkown I think that is due to if [ -n $# ]; Something like if [ "$1" == "" ]; then Once that is fixed, the script errors out missing closing brace in the for statement. This should be: for host in ${BASH_ARGV[*]}; do I had some trouble with mis-matching single quote (which is strange as they occurred only in the comments). Same STDIN comment in other scripts. I could not get the last python script (the one that makes assumptions about the physical environment) to work because I do not have hosts that are called "dn" something.
        Hide
        Adam Faris added a comment -

        Sorry for the delay but wanted to rethink the examples to explain how this works. I updated the bash script to show how simple topology scripts could be. I removed the perl example as both the perl and bash script were doing the same thing of splitting the IP on dots. Finally the python script has been updated to print the network instead of relying on matching host names in a contrived example.

        – Thanks, Adam

        Show
        Adam Faris added a comment - Sorry for the delay but wanted to rethink the examples to explain how this works. I updated the bash script to show how simple topology scripts could be. I removed the perl example as both the perl and bash script were doing the same thing of splitting the IP on dots. Finally the python script has been updated to print the network instead of relying on matching host names in a contrived example. – Thanks, Adam
        Hide
        Jakob Homan added a comment -

        # 1) each rack is it's own layer 3 network with a /24 subnet, which could be typical where each rack has it's own

        # can create it's 'off-rack' block copy.

        s/it's/its/g
        Otherwise looks good and ready for commit.

        Show
        Jakob Homan added a comment - # 1) each rack is it's own layer 3 network with a /24 subnet, which could be typical where each rack has it's own # can create it's 'off-rack' block copy. s/it's/its/g Otherwise looks good and ready for commit.
        Hide
        Joep Rottinghuis added a comment -

        LGTM

        Show
        Joep Rottinghuis added a comment - LGTM
        Hide
        Joep Rottinghuis added a comment -

        Adam, when you'er happy with the patch as it is (or with Jakob's last suggestion for the "it's - > its" type, you can attach the last patch, edit the jira and set the status to "patch available" to mark it to be ready to be committed. You probably also want to indicate which branch you think this applies to (you'll probably have noticed that the project structure has changed significantely between the 0.20/1.0 ; 0.21/0.22 ; and 0.23/0.24/2.0 families of branches).

        Show
        Joep Rottinghuis added a comment - Adam, when you'er happy with the patch as it is (or with Jakob's last suggestion for the "it's - > its" type, you can attach the last patch, edit the jira and set the status to "patch available" to mark it to be ready to be committed. You probably also want to indicate which branch you think this applies to (you'll probably have noticed that the project structure has changed significantely between the 0.20/1.0 ; 0.21/0.22 ; and 0.23/0.24/2.0 families of branches).
        Hide
        Adam Faris added a comment -

        Submitting final patch

        Show
        Adam Faris added a comment - Submitting final patch
        Hide
        Adam Faris added a comment -

        Submitting final patch as 4th attachment.

        Show
        Adam Faris added a comment - Submitting final patch as 4th attachment.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12549208/hadoop-6616.patch.4
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1629//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1629//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549208/hadoop-6616.patch.4 against trunk revision . +1 @author . The patch does not contain any @author tags. +0 tests included . The patch appears to be a documentation patch that doesn't require tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1629//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1629//console This message is automatically generated.
        Hide
        Jakob Homan added a comment -

        +1. Looks good. Verified generated html (working around HADOOP-8810). Committing.

        Show
        Jakob Homan added a comment - +1. Looks good. Verified generated html (working around HADOOP-8810 ). Committing.
        Hide
        Jakob Homan added a comment -

        I've committed this. Resolving as fixed. Thanks, Adam!

        Show
        Jakob Homan added a comment - I've committed this. Resolving as fixed. Thanks, Adam!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3047 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3047/)
        HADOOP-6616. Improve documentation for rack awareness. Contributed by Adam Faris. (Revision 1411359)

        Result = SUCCESS
        jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411359
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/cluster_setup.xml
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3047 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3047/ ) HADOOP-6616 . Improve documentation for rack awareness. Contributed by Adam Faris. (Revision 1411359) Result = SUCCESS jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411359 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/cluster_setup.xml
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #42 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/42/)
        HADOOP-6616. Improve documentation for rack awareness. Contributed by Adam Faris. (Revision 1411359)

        Result = SUCCESS
        jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411359
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/cluster_setup.xml
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #42 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/42/ ) HADOOP-6616 . Improve documentation for rack awareness. Contributed by Adam Faris. (Revision 1411359) Result = SUCCESS jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411359 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/cluster_setup.xml
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1232 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1232/)
        HADOOP-6616. Improve documentation for rack awareness. Contributed by Adam Faris. (Revision 1411359)

        Result = SUCCESS
        jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411359
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/cluster_setup.xml
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1232 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1232/ ) HADOOP-6616 . Improve documentation for rack awareness. Contributed by Adam Faris. (Revision 1411359) Result = SUCCESS jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411359 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/cluster_setup.xml
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1263 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1263/)
        HADOOP-6616. Improve documentation for rack awareness. Contributed by Adam Faris. (Revision 1411359)

        Result = FAILURE
        jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411359
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/cluster_setup.xml
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1263 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1263/ ) HADOOP-6616 . Improve documentation for rack awareness. Contributed by Adam Faris. (Revision 1411359) Result = FAILURE jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1411359 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/cluster_setup.xml
        Hide
        Allen Wittenauer added a comment -

        Somewhere along the way, this change got dropped. At least, I can't find a record of it in branch-2 or trunk.

        Show
        Allen Wittenauer added a comment - Somewhere along the way, this change got dropped. At least, I can't find a record of it in branch-2 or trunk.
        Hide
        Adam Faris added a comment -

        It looks like all the topology information regarding rack awareness was removed as 'cruft' in HADOOP-8427, when an effort to convert forrest docs to APT. See patch numbered 5 and the diff shows everything related to rack awareness has been removed. Removal is unfortunate as the documentation is still relevant for current versions of Hadoop.

        Show
        Adam Faris added a comment - It looks like all the topology information regarding rack awareness was removed as 'cruft' in HADOOP-8427 , when an effort to convert forrest docs to APT. See patch numbered 5 and the diff shows everything related to rack awareness has been removed. Removal is unfortunate as the documentation is still relevant for current versions of Hadoop.
        Hide
        Allen Wittenauer added a comment -

        It's more than unfortunate: it's downright hostile.

        Show
        Allen Wittenauer added a comment - It's more than unfortunate: it's downright hostile.

          People

          • Assignee:
            Adam Faris
            Reporter:
            Jeff Hammerbacher
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development