Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
When a standalone submarine tf job is submitted, the following error is got :
INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa
me=(NULL)) error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa
me=(NULL)) error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)
This error may be related to hadoop classpath
Hadoop env variables of launch_container.sh are as follows:
export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
run-PRIMARY_WORKER.sh is like:
export HADOOP_YARN_HOME=
export HADOOP_HDFS_HOME=/hadoop-3.1.0
export HADOOP_CONF_DIR=$WORK_DIR
Attachments
Attachments
Issue Links
- relates to
-
SUBMARINE-3 [Submarine] Initial implementation: Training job submission and job history retrieval
- Resolved