Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20266

ExecutorBackend blocked at "UserGroupInformation.doAs"

    XMLWordPrintableJSON

Details

    • Question
    • Status: Closed
    • Minor
    • Resolution: Invalid
    • 1.6.2
    • None
    • Project Infra
    • None
    • Hide
      hi friends, I have been sturtured an cluster of Spark tomorrow, and today I want to run an WordCount program on cluster
      my environment : jdk1.8.121 + scala2.10.4 + hadoop 2.6.5 + spark1.6.2
      cluster : master + slave01 + slave02
      client : client

      demo code as below :
        def main(args :Array[String]) = {

          val inputPath = "hdfs://master/970655147/input/01WordCount/"

      // 1. localMode
      // val conf = new SparkConf().setMaster("local").setAppName("WordCount")
      // 2. standaloneMode
          val conf = new SparkConf().setMaster("spark://master:7077").setAppName("WordCount")
            .set("spark.executor.memory", "64M")
            .set("spark.executor.cores", "1")

          val sc = new SparkContext(conf)
          val line = sc.textFile(inputPath)

          line.foreach(println)

          sc.stop

        }
        
      1. first, I use localMode, then program run normally
      2. second, I run it on cluster[deploy in idea], but failed, it seems there are interactive blocked, according to the log, Master allocate Executor for app, and then Executor run, but it seems blocked at "UserGroupInformation.doAs(UserGroupInformation.java:1643); SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)", so executor not register for driver, and then driver got none resources
      3. next I tried use spark-submit[spark-submit --master spark://master:7077 --class com.hx.test.Test01WordCount HelloSpark.jar] on client, or slave02, but the result are the same
      please give me some advice, thx

      some log info as follw :
      --------------------------------------------------------------------------------------------
      1. ExecutorBackend's bootstrap cmd
      root@slave02:~# jps
      7984 CoarseGrainedExecutorBackend
      6468 NodeManager
      8037 Jps
      955 Worker
      7981 CoarseGrainedExecutorBackend
      7982 CoarseGrainedExecutorBackend
      6366 DataNode
      7983 CoarseGrainedExecutorBackend
      root@slave02:~# ps -ef | grep 7983
      root 7983 955 14 06:21 ? 00:00:03 /usr/local/ProgramFiles/jdk1.8.0_121/bin/java -cp /usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/conf/:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/ProgramFiles/hadoop-2.6.5/etc/hadoop/ -Xms64M -Xmx64M -Dspark.driver.port=37230 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.0.191:37230 --executor-id 1 --hostname 192.168.0.182 --cores 1 --app-id app-20170408062155-0015 --worker-url spark://Worker@192.168.0.182:46466
      root 8050 4249 4 06:22 pts/1 00:00:00 grep --color=auto 7983
      root@slave02:~#
      --------------------------------------------------------------------------------------------
      2. executor's error log
      root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stderr
      17/04/08 06:22:20 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
      17/04/08 06:22:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      17/04/08 06:22:28 INFO spark.SecurityManager: Changing view acls to: root
      17/04/08 06:22:28 INFO spark.SecurityManager: Changing modify acls to: root
      17/04/08 06:22:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
      17/04/08 06:23:06 INFO spark.SecurityManager: Changing view acls to: root
      17/04/08 06:23:06 INFO spark.SecurityManager: Changing modify acls to: root
      17/04/08 06:23:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
      17/04/08 06:23:24 INFO slf4j.Slf4jLogger: Slf4jLogger started
      17/04/08 06:23:29 INFO Remoting: Starting remoting
      Exception in thread "main" 17/04/08 06:23:46 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
      17/04/08 06:23:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
      java.lang.reflect.UndeclaredThrowableException
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
      at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
      at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151)
      at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253)
      at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
      Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]
      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
      at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
      at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
      at scala.concurrent.Await$.result(package.scala:107)
      at akka.remote.Remoting.start(Remoting.scala:179)
      at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
      at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620)
      at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617)
      at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617)
      at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634)
      at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
      at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
      at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
      at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
      at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52)
      at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024)
      at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
      at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015)
      at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55)
      at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266)
      at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217)
      at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186)
      at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
      at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      ... 4 more
      root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stdout
      root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6#
      --------------------------------------------------------------------------------------------
      3. driver's log
      Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
      17/04/08 21:21:44 INFO SparkContext: Running Spark version 1.6.2
      17/04/08 21:21:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      17/04/08 21:21:45 INFO SecurityManager: Changing view acls to: root
      17/04/08 21:21:45 INFO SecurityManager: Changing modify acls to: root
      17/04/08 21:21:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
      17/04/08 21:21:46 INFO Utils: Successfully started service 'sparkDriver' on port 37230.
      17/04/08 21:21:47 INFO Slf4jLogger: Slf4jLogger started
      17/04/08 21:21:47 INFO Remoting: Starting remoting
      17/04/08 21:21:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.191:43974]
      17/04/08 21:21:48 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 43974.
      17/04/08 21:21:48 INFO SparkEnv: Registering MapOutputTracker
      17/04/08 21:21:48 INFO SparkEnv: Registering BlockManagerMaster
      17/04/08 21:21:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ef79b656-b7f4-4cb3-be3e-0f8bb61baa9d
      17/04/08 21:21:48 INFO MemoryStore: MemoryStore started with capacity 431.3 MB
      17/04/08 21:21:48 INFO SparkEnv: Registering OutputCommitCoordinator
      17/04/08 21:21:54 INFO Utils: Successfully started service 'SparkUI' on port 4040.
      17/04/08 21:21:54 INFO SparkUI: Started SparkUI at http://192.168.0.191:4040
      17/04/08 21:21:54 INFO AppClient$ClientEndpoint: Connecting to master spark://master:7077...
      17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170408062155-0015
      17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/0 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores
      17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/0 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM
      17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/1 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores
      17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/1 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM
      17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/2 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores
      17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/2 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM
      17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/3 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores
      17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/3 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM
      17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/4 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores
      17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/4 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM
      17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/5 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores
      17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/5 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM
      17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/6 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores
      17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/6 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM
      17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/7 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores
      17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/7 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM
      17/04/08 21:21:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42255.
      17/04/08 21:21:55 INFO NettyBlockTransferService: Server created on 42255
      17/04/08 21:21:56 INFO BlockManagerMaster: Trying to register BlockManager
      17/04/08 21:21:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.191:42255 with 431.3 MB RAM, BlockManagerId(driver, 192.168.0.191, 42255)
      17/04/08 21:21:57 INFO BlockManagerMaster: Registered BlockManager
      17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/0 is now RUNNING
      17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now RUNNING
      17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/2 is now RUNNING
      17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/3 is now RUNNING
      17/04/08 21:22:00 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/4 is now RUNNING
      17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/5 is now RUNNING
      17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/6 is now RUNNING
      17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/7 is now RUNNING
      17/04/08 21:22:03 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
      17/04/08 21:22:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB)
      17/04/08 21:22:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.8 KB, free 117.5 KB)
      17/04/08 21:22:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.191:42255 (size: 9.8 KB, free: 431.2 MB)
      17/04/08 21:22:06 INFO SparkContext: Created broadcast 0 from textFile at Test01WordCount.scala:30
      17/04/08 21:22:21 INFO FileInputFormat: Total input paths to process : 1
      17/04/08 21:22:21 INFO SparkContext: Starting job: foreach at Test01WordCount.scala:33
      17/04/08 21:22:21 INFO DAGScheduler: Got job 0 (foreach at Test01WordCount.scala:33) with 2 output partitions
      17/04/08 21:22:21 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at Test01WordCount.scala:33)
      17/04/08 21:22:21 INFO DAGScheduler: Parents of final stage: List()
      17/04/08 21:22:21 INFO DAGScheduler: Missing parents: List()
      17/04/08 21:22:21 INFO DAGScheduler: Submitting ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30), which has no missing parents
      17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 120.5 KB)
      17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1842.0 B, free 122.3 KB)
      17/04/08 21:22:21 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.191:42255 (size: 1842.0 B, free: 431.2 MB)
      17/04/08 21:22:21 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
      17/04/08 21:22:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30)
      17/04/08 21:22:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
      17/04/08 21:22:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:23:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:23:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:23:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:23:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:23:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:24:02 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now EXITED (Command exited with code 1)
      17/04/08 21:24:02 INFO SparkDeploySchedulerBackend: Executor app-20170408062155-0015/1 removed: Command exited with code 1
      17/04/08 21:24:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:24:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:24:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:24:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:25:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:25:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:25:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:25:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
      17/04/08 21:26:02 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Command exited with code 1)] in 1 attempts
      org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
      at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
      at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
      at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
      at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
      at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:370)
      at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.executorRemoved(SparkDeploySchedulerBackend.scala:144)
      at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:184)
      at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
      at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
      at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
      at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
      at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
      at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
      at scala.concurrent.Await$.result(package.scala:107)
      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
      ... 12 more
      --------------------------------------------------------------------------------------------
      Show
      hi friends, I have been sturtured an cluster of Spark tomorrow, and today I want to run an WordCount program on cluster my environment : jdk1.8.121 + scala2.10.4 + hadoop 2.6.5 + spark1.6.2 cluster : master + slave01 + slave02 client : client demo code as below :   def main(args :Array[String]) = {     val inputPath = " hdfs://master/970655147/input/01WordCount/ " // 1. localMode // val conf = new SparkConf().setMaster("local").setAppName("WordCount") // 2. standaloneMode     val conf = new SparkConf().setMaster(" spark://master:7077 ").setAppName("WordCount")       .set("spark.executor.memory", "64M")       .set("spark.executor.cores", "1")     val sc = new SparkContext(conf)     val line = sc.textFile(inputPath)     line.foreach(println)     sc.stop   }    1. first, I use localMode, then program run normally 2. second, I run it on cluster[deploy in idea], but failed, it seems there are interactive blocked, according to the log, Master allocate Executor for app, and then Executor run, but it seems blocked at "UserGroupInformation.doAs(UserGroupInformation.java:1643); SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)", so executor not register for driver, and then driver got none resources 3. next I tried use spark-submit[spark-submit --master spark://master:7077 --class com.hx.test.Test01WordCount HelloSpark.jar] on client, or slave02, but the result are the same please give me some advice, thx some log info as follw : -------------------------------------------------------------------------------------------- 1. ExecutorBackend's bootstrap cmd root@slave02 :~# jps 7984 CoarseGrainedExecutorBackend 6468 NodeManager 8037 Jps 955 Worker 7981 CoarseGrainedExecutorBackend 7982 CoarseGrainedExecutorBackend 6366 DataNode 7983 CoarseGrainedExecutorBackend root@slave02 :~# ps -ef | grep 7983 root 7983 955 14 06:21 ? 00:00:03 /usr/local/ProgramFiles/jdk1.8.0_121/bin/java -cp /usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/conf/:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/ProgramFiles/hadoop-2.6.5/etc/hadoop/ -Xms64M -Xmx64M -Dspark.driver.port=37230 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.0.191:37230 --executor-id 1 --hostname 192.168.0.182 --cores 1 --app-id app-20170408062155-0015 --worker-url spark://Worker@192.168.0.182:46466 root 8050 4249 4 06:22 pts/1 00:00:00 grep --color=auto 7983 root@slave02 :~# -------------------------------------------------------------------------------------------- 2. executor's error log root@slave02 :/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stderr 17/04/08 06:22:20 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 17/04/08 06:22:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/08 06:22:28 INFO spark.SecurityManager: Changing view acls to: root 17/04/08 06:22:28 INFO spark.SecurityManager: Changing modify acls to: root 17/04/08 06:22:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 06:23:06 INFO spark.SecurityManager: Changing view acls to: root 17/04/08 06:23:06 INFO spark.SecurityManager: Changing modify acls to: root 17/04/08 06:23:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 06:23:24 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/04/08 06:23:29 INFO Remoting: Starting remoting Exception in thread "main" 17/04/08 06:23:46 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 17/04/08 06:23:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:179) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634) at akka.actor.ActorSystem$.apply(ActorSystem.scala:142) at akka.actor.ActorSystem$.apply(ActorSystem.scala:119) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 4 more root@slave02 :/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stdout root@slave02 :/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# -------------------------------------------------------------------------------------------- 3. driver's log Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/04/08 21:21:44 INFO SparkContext: Running Spark version 1.6.2 17/04/08 21:21:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/08 21:21:45 INFO SecurityManager: Changing view acls to: root 17/04/08 21:21:45 INFO SecurityManager: Changing modify acls to: root 17/04/08 21:21:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 21:21:46 INFO Utils: Successfully started service 'sparkDriver' on port 37230. 17/04/08 21:21:47 INFO Slf4jLogger: Slf4jLogger started 17/04/08 21:21:47 INFO Remoting: Starting remoting 17/04/08 21:21:48 INFO Remoting: Remoting started; listening on addresses :[ akka.tcp://sparkDriverActorSystem@192.168.0.191:43974 ] 17/04/08 21:21:48 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 43974. 17/04/08 21:21:48 INFO SparkEnv: Registering MapOutputTracker 17/04/08 21:21:48 INFO SparkEnv: Registering BlockManagerMaster 17/04/08 21:21:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ef79b656-b7f4-4cb3-be3e-0f8bb61baa9d 17/04/08 21:21:48 INFO MemoryStore: MemoryStore started with capacity 431.3 MB 17/04/08 21:21:48 INFO SparkEnv: Registering OutputCommitCoordinator 17/04/08 21:21:54 INFO Utils: Successfully started service 'SparkUI' on port 4040. 17/04/08 21:21:54 INFO SparkUI: Started SparkUI at http://192.168.0.191:4040 17/04/08 21:21:54 INFO AppClient$ClientEndpoint: Connecting to master spark://master:7077 ... 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170408062155-0015 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/0 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/0 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/1 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/1 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/2 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/2 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/3 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/3 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/4 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/4 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/5 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/5 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/6 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/6 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/7 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/7 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42255. 17/04/08 21:21:55 INFO NettyBlockTransferService: Server created on 42255 17/04/08 21:21:56 INFO BlockManagerMaster: Trying to register BlockManager 17/04/08 21:21:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.191:42255 with 431.3 MB RAM, BlockManagerId(driver, 192.168.0.191, 42255) 17/04/08 21:21:57 INFO BlockManagerMaster: Registered BlockManager 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/0 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/2 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/3 is now RUNNING 17/04/08 21:22:00 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/4 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/5 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/6 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/7 is now RUNNING 17/04/08 21:22:03 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 17/04/08 21:22:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB) 17/04/08 21:22:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.8 KB, free 117.5 KB) 17/04/08 21:22:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.191:42255 (size: 9.8 KB, free: 431.2 MB) 17/04/08 21:22:06 INFO SparkContext: Created broadcast 0 from textFile at Test01WordCount.scala:30 17/04/08 21:22:21 INFO FileInputFormat: Total input paths to process : 1 17/04/08 21:22:21 INFO SparkContext: Starting job: foreach at Test01WordCount.scala:33 17/04/08 21:22:21 INFO DAGScheduler: Got job 0 (foreach at Test01WordCount.scala:33) with 2 output partitions 17/04/08 21:22:21 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at Test01WordCount.scala:33) 17/04/08 21:22:21 INFO DAGScheduler: Parents of final stage: List() 17/04/08 21:22:21 INFO DAGScheduler: Missing parents: List() 17/04/08 21:22:21 INFO DAGScheduler: Submitting ResultStage 0 ( hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30), which has no missing parents 17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 120.5 KB) 17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1842.0 B, free 122.3 KB) 17/04/08 21:22:21 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.191:42255 (size: 1842.0 B, free: 431.2 MB) 17/04/08 21:22:21 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 17/04/08 21:22:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 ( hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30) 17/04/08 21:22:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 17/04/08 21:22:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:02 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now EXITED (Command exited with code 1) 17/04/08 21:24:02 INFO SparkDeploySchedulerBackend: Executor app-20170408062155-0015/1 removed: Command exited with code 1 17/04/08 21:24:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:26:02 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Command exited with code 1)] in 1 attempts org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:370) at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.executorRemoved(SparkDeploySchedulerBackend.scala:144) at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:184) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more --------------------------------------------------------------------------------------------

    Attachments

      1. logsSubmitBySparkSubmitAtSlave02.zip
        6 kB
        Jerry.X.He
      2. logsSubmitByIdeaAtClient.zip
        5 kB
        Jerry.X.He

      Activity

        People

          Unassigned Unassigned
          Jerry.X.He Jerry.X.He
          Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved: