Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-5441

HOD refactoring to ease integration with scheduler/resource managers other than torque

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.19.1
    • None
    • contrib/hod
    • None
    • All

    • Allow dynamic loading of nodePool objects and moves remote start (pbsdsh) functionality out of Scheduler objects

    Description

      Situation: HOD currently uses the pbsdsh (a distributed shell that works via Torque's TM interface to start remote processes) command to start processes on all nodes in the job. This call is provided as part of a torqueInterface class that is meant to abstract interactions with the torque resource managers (RMs). However, this is not functionality typically provided by other RMs, and is instead typically performed by an distributed command available on the HPC system, mpiexec, ssh, or site-specific scripts. The specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs somewhat difficult as it forces the implementer to choose the remote start method on a somewhat faulty per-RM basis.

      Proposal: Refactor the torqueInterface and nodePool classes so that the choice of remote start method is available as a configuration option in hodrc. This involves fairly simple changes to remove the pbsdsh command from the Scheduler class and addition configuration step of starting the appropriate remote start wrapper. The selection of the nodePool class will be altered to allow dynamic loading of classes, so that new interfaces people choose to write will not require altering HOD code. Provide remote start classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often provide mpiexec wrappers that ensure proper selection of network interfaces, etc). Provide interface classes to SGE and Moab, as well as updated Torque class.

      Attachments

        Activity

          People

            Unassigned Unassigned
            fischer Nate Woody
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: