Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8879

Kerberos principal is needed when submitting a submarine job

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.2.0, 3.3.0
    • None
    • None
    • Reviewed

    Description

      when I submitted a submarine job like this:

       ./yarn jar /home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar job run \
       --env DOCKER_JAVA_HOME=/opt/java \
       --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name distributed-tf-gpu \
       --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
       --worker_docker_image 10.120.196.232:5000/gpu-cuda9.0-tf1.8.0-with-models-7 \
       --input_path hdfs://mldev/tmp/cifar-10-data \
       --checkpoint_path hdfs://mldev/user/hadoop/tf-distributed-checkpoint \
       --num_ps 1 \
       --ps_resources memory=4G,vcores=2,gpu=0 \
       --ps_launch_cmd "python /test/cifar10_estimator/cifar10_main.py --data-dir=hdfs://mldev/tmp/cifar-10-data --job-dir=hdfs://mldev/tmp/cifar-10-jobdir --num-gpus=0" \
       --ps_docker_image 10.120.196.232:5000/dockerfile-cpu-tf1.8.0-with-models \
       --worker_resources memory=4G,vcores=2,gpu=1 --verbose \
       --num_workers 2 \
       --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py --data-dir=hdfs://mldev/tmp/cifar-10-data --job-dir=hdfs://mldev/tmp/cifar-10-jobdir --train-steps=500 --eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=1"  

       

      The following error as got:

      Exception in thread "main" java.lang.IllegalArgumentException: Kerberos principal or keytab is missing.
      at org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateKerberosPrincipal(ServiceApiUtil.java:255)
      at org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateAndResolveService(ServiceApiUtil.java:134)
      at org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:467)
      at org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.submitJob(YarnServiceJobSubmitter.java:542)
      at org.apache.hadoop.yarn.submarine.client.cli.RunJobCli.run(RunJobCli.java:231)
      at org.apache.hadoop.yarn.submarine.client.cli.Cli.main(Cli.java:94)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

      Attachments

        1. YARN-8879.001.patch
          3 kB
          Zac Zhou
        2. YARN-8879.002.patch
          3 kB
          Zac Zhou

        Activity

          People

            yuan_zac Zac Zhou
            yuan_zac Zac Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: