Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9019

spark-submit fails on yarn with kerberos enabled

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: Spark Submit
    • Environment:

      Hadoop 2.6 with YARN and kerberos enabled

      Description

      It is not possible to run jobs using spark-submit on yarn with a kerberized cluster.

      Commandline:
      /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py

      Fails with:
      15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
      15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380
      15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380.
      15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380
      15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
      15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
      15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
      15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470
      15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager
      15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470)
      15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
      15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
      15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
      15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
      15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
      java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
      at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
      at org.apache.hadoop.ipc.Client.call(Client.java:1472)
      at org.apache.hadoop.ipc.Client.call(Client.java:1399)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
      at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
      at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
      at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
      at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
      at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
      at scala.Option.foreach(Option.scala:236)
      at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.getDriverLogUrls(YarnClusterSchedulerBackend.scala:73)
      at org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:1993)
      at org.apache.spark.SparkContext.<init>(SparkContext.scala:544)
      at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
      at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
      at py4j.Gateway.invoke(Gateway.java:214)
      at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
      at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
      at py4j.GatewayConnection.run(GatewayConnection.java:207)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
      at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
      at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
      at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
      at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
      at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
      at org.apache.hadoop.ipc.Client.call(Client.java:1438)
      ... 30 more

      If not using --principal and --keytab the same error shows.

        Attachments

        1. debug-log-spark-1.5-fail
          37 kB
          Bolke de Bruin
        2. spark-submit-log-1.5.0-fail
          7 kB
          Bolke de Bruin

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                bolke Bolke de Bruin
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: