Uploaded image for project: 'Singa'
  1. Singa
  2. SINGA-36

Refactor job configuration, driver program and scripts



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None


      Currently, we use the google protocol buffer to generate ClusterProto and ModelProto classes for cluster (i.e., worker, server, group, etc) configuration and model (i.e., neuralnet, updater, etc) configuration respectively. Theses classes provide functions to load/parse plain text configuration files.

      To make the naming more representative and simplify the configuration process, this ticket will:

      • merge cluster configuration and model configuration into a single job configuration (JobProto).
      • move zookeeper configuration into conf/singa.conf which is global to all jobs. conf/hostfile stores all available nodes.

      The driver program is updated that users can register customized Layer implementations in the driver program.
      Once the job configuration is ready, the user submit the job via singa::SumbitJob() function.
      Header files are merged into singa.h to simplify the driver program.

      The arguments of singa-run.sh is updated that users pass the workspace (and resume option) to it. The singa-run.sh uses the default driver executable (i.e., SINGA_ROOT/singa).
      TODO enable users to pass their own driver executable to the script.

      Some layers are put into optional_layer.h (.cc) because they depend on external libraries (e.g., LMDB and OpenCV). TODO update the GNU make files, e.g., using with-feature=huge for full compilation which checks all dependencies. Otherwise only check mandatory libraries.

      Scripts for job management have minor changes, such as clean the log info.




            zhongle Xie Zhongle
            wangwei.cs wangwei
            0 Vote for this issue
            2 Start watching this issue