Hadoop Common
  1. Hadoop Common
  2. HADOOP-3695

[HOD] Have an ability to run multiple slaves per node

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: contrib/hod
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added an ability in HOD to start multiple workers (TaskTrackers and/or DataNodes) per node to assist testing and simulation of scale. A configuration variable ringmaster.workers_per_ring was added to specify the number of workers to start.

      Description

      Currently HOD launches at most one slave per node. For purposes of testing a large number of slaves on much fewer resources - for e.g. testing scalability of clusters, it will be useful if it can provision multiple slaves per node.

      1. patch_multiple_workers_1.txt
        7 kB
        Vinod Kumar Vavilapalli
      2. patch_multiple_workers_2.txt
        7 kB
        Hemanth Yamijala
      3. patch_multiple_workers_3.txt
        15 kB
        Vinod Kumar Vavilapalli
      4. patch_multiple_workers_4.txt
        15 kB
        Hemanth Yamijala

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        4d 21h 9m 1 Hemanth Yamijala 09/Jul/08 10:03
        Patch Available Patch Available Resolved Resolved
        5h 45m 1 Hemanth Yamijala 09/Jul/08 15:49
        Resolved Resolved Closed Closed
        134d 8h 49m 1 Nigel Daley 20/Nov/08 23:38
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )
        Hemanth Yamijala made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Release Note Added an ability in HOD to start multiple workers (TaskTrackers and/or DataNodes) per node to assist testing and simulation of scale. A configuration variable ringmaster.workers_per_ring was added to specify the number of workers to start.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this to trunk. Thanks, Vinod !

        Show
        Hemanth Yamijala added a comment - I just committed this to trunk. Thanks, Vinod !
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12385603/patch_multiple_workers_4.txt
        against trunk revision 675078.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2829/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2829/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2829/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2829/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385603/patch_multiple_workers_4.txt against trunk revision 675078. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2829/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2829/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2829/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2829/console This message is automatically generated.
        Hemanth Yamijala made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hemanth Yamijala made changes -
        Attachment patch_multiple_workers_4.txt [ 12385603 ]
        Hide
        Hemanth Yamijala added a comment -

        This patch modifies documentation in Vinod's last patch to add that this feature is basically for test/simulation purposes, and the workers need to be configured to use a proportional fraction of the resources on the node.

        Show
        Hemanth Yamijala added a comment - This patch modifies documentation in Vinod's last patch to add that this feature is basically for test/simulation purposes, and the workers need to be configured to use a proportional fraction of the resources on the node.
        Hide
        Hemanth Yamijala added a comment -

        +1 on the new patch, except for one small addition I would like to make to the documentation added for the configuration variable. Otherwise, it is good.

        Show
        Hemanth Yamijala added a comment - +1 on the new patch, except for one small addition I would like to make to the documentation added for the configuration variable. Otherwise, it is good.
        Vinod Kumar Vavilapalli made changes -
        Attachment patch_multiple_workers_3.txt [ 12385468 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching a new patch.

        mapred.system.dir is used only for pulling job files and obtained by TTs from JT via RPC (from Hadoop 0.18 on. See HADOOP-3135 ). Any value in configuration files isn't used by TTs, and so it need not be different for multiple TTs running on the same host. In fact it doesn't need to be generated on TTs at all. Also, verified this with Devraj. We will have to generate a unique value for this too if we wish to support this feature for versions previous to 0.18.

        Incorporated the rest of the three comments.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching a new patch. mapred.system.dir is used only for pulling job files and obtained by TTs from JT via RPC (from Hadoop 0.18 on. See HADOOP-3135 ). Any value in configuration files isn't used by TTs, and so it need not be different for multiple TTs running on the same host. In fact it doesn't need to be generated on TTs at all. Also, verified this with Devraj. We will have to generate a unique value for this too if we wish to support this feature for versions previous to 0.18. Incorporated the rest of the three comments.
        Hemanth Yamijala made changes -
        Attachment patch_multiple_workers_2.txt [ 12385335 ]
        Hide
        Hemanth Yamijala added a comment -

        New patch that fixes the error check in bin/hod and indentation.

        Show
        Hemanth Yamijala added a comment - New patch that fixes the error check in bin/hod and indentation.
        Hide
        Hemanth Yamijala added a comment -

        Looks good. Minor points:

        • In bin/hod, there is a check that tries to make sure ringmaster.workers_per_ring is atleast 1. But the check is incorrect, as it checks for an error condition with value <= 1, rather than < 1.
        • Small formatting errors where workers_per_ring is defined bin/hod and bin/ringmaster files.
        • While most of the local file system directories are generated with different names for the different workers, the mapred system directory is the same. We just need to make sure (with someone from the Map/Reduce team) that if tasktrackers from the same host use this system directory, there are no problems.
        • logcondense.py needs to be enhanced. It now assumes that log file names will only have patterns like 0-datanode, 0-tasktracker or 1-tasktracker. In general, this can be [number]-datanode and [number]-tasktracker with this patch.
        • We can enhance test cases in testRingmasterRPCs
        • We should enhance documentation in the config guide.
        Show
        Hemanth Yamijala added a comment - Looks good. Minor points: In bin/hod, there is a check that tries to make sure ringmaster.workers_per_ring is atleast 1. But the check is incorrect, as it checks for an error condition with value <= 1, rather than < 1. Small formatting errors where workers_per_ring is defined bin/hod and bin/ringmaster files. While most of the local file system directories are generated with different names for the different workers, the mapred system directory is the same. We just need to make sure (with someone from the Map/Reduce team) that if tasktrackers from the same host use this system directory, there are no problems. logcondense.py needs to be enhanced. It now assumes that log file names will only have patterns like 0-datanode, 0-tasktracker or 1-tasktracker. In general, this can be [number] -datanode and [number] -tasktracker with this patch. We can enhance test cases in testRingmasterRPCs We should enhance documentation in the config guide.
        Vinod Kumar Vavilapalli made changes -
        Field Original Value New Value
        Attachment patch_multiple_workers_1.txt [ 12385303 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching patch for HOD to run multiple workers per hodring. The config parameter ringmaster.workers_per_ring specifies the number of workers (DNs and TTs) that each hodring should run. It defaults to 1.

        Tested the patch on a 4 node cluster, with ringmaster.workers_per_ring set to 4 (total 2 masters, 8 workers per hadoop service) and it works fine.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching patch for HOD to run multiple workers per hodring. The config parameter ringmaster.workers_per_ring specifies the number of workers (DNs and TTs) that each hodring should run. It defaults to 1. Tested the patch on a 4 node cluster, with ringmaster.workers_per_ring set to 4 (total 2 masters, 8 workers per hadoop service) and it works fine.
        Hemanth Yamijala created issue -

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development