Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
yarn jar $HADOOP_BASE_DIR/home/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar job run \ -verbose \ -wait_job_finish \ -keep_staging_dir \ --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-oracle \ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.2.0-SNAPSHOT \ --name tf-job-001 \ --docker_image tangzhankun/tensorflow \ --input_path hdfs://default/user/yarn/cifar-10-data \ --worker_resources memory=4G,vcores=2 \ --worker_launch_cmd "cd /cifar10_estimator && python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=0 --train-steps=5"
Above script should work, but the job failed due to invalid path passed to "--job-dir" per my testing. It should be a URI start with "hdfs://".
2018-09-19 23:19:34,729 INFO yarnservice.YarnServiceJobSubmitter: Worker command =[cd /cifar10_estimator && python cifar10_main.py --data-dir=hdfs://default/user/yarn/cifar-10-data --job-dir=submarine/jobs/tf-job-001/staging/checkpoint_path --num-gpus=0 --train-steps=2]