Details

    Description

      1.Hope to support stand-alone and distributed.

      2.Hopefully, pytorch case will be available on submarine github.

      3.Referring to the pytorch case in the documentation, the following script was run without success.

      CLASSPATH=`hadoop classpath --glob`:/home/bin/hadoop/share/hadoop/yarn/submarine-all-0.3.0-SNAPSHOT-hadoop-3.2.jar \
      java org.apache.submarine.client.cli.Cli \
      job run --name ${APP_NAME} \
      --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/ \
      --env DOCKER_HADOOP_HDFS_HOME=/app/hadoop-3.2.1 \
      --env HADOOP_HOME=/hadoop-3.2.1 \
      --env HADOOP_YARN_HOME=/hadoop-3.2.1 \
      --env HADOOP_COMMON_HOME=/hadoop-3.2.1 \
      --env HADOOP_HDFS_HOME=/hadoop-3.2.1 \
      --env HADOOP_CONF_DIR=/hadoop-3.2.1/etc/hadoop \
      --env PYTHONUNBUFFERED="0" \
      --env TZ="Asia/Shanghai" \
      --env YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=${LOCAL_PATH}/:/home/test \
      --queue dev \
      --input_path hdfs://cluster/user/work/tensorflow/data/ \
      --docker_image jx-bd-hadoop13.zeus.lianjia.com:801/runonce/tf-1.13.1-pytorch-0.4-gpu-base:0.0.1 \
      --num_workers 1 \
      --worker_resources memory=16G,vcores=2,gpu=1 \
      --worker_launch_cmd "export CLASSPATH=\$(/app/hadoop-3.2.1/bin/hadoop classpath --glob) && cd /home/test/pth && python ../pth/train_pth.py" \
      --localization /home/local/test/cifar10_estimator:./submarine_algorithm
      --verbose \
      --conf tony.containers.resources=/home/bin/hadoop/share/hadoop/yarn/submarine-all-0.3.0-SNAPSHOT-hadoop-3.2.jar \
      --conf tony.application.framework=pytorch
      

       

      Attachments

        1. localization-error.jpg
          172 kB
          huiyangjian
        2. no-pytorch.jpg
          65 kB
          huiyangjian
        3. yarn-mount-error.jpg
          80 kB
          huiyangjian

        Issue Links

          Activity

            People

              pingsutw Kevin Su
              jason_jane huiyangjian
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m