Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13711

Apache Spark driver stopping JVM when master not available

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.1, 1.6.0
    • Fix Version/s: 1.6.2, 2.0.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      In my application Java spark context is created with an unavailable master URL (you may assume master is down for a maintenance). When creating Java spark context it leads to stopping JVM that runs spark driver with JVM exit code 50.

      When I checked the logs I found SparkUncaughtExceptionHandler calling the System.exit. My program should run forever.

      package test.mains;

      import org.apache.spark.SparkConf;
      import org.apache.spark.api.java.JavaSparkContext;

      public class CheckJavaSparkContext {

      public static void main(String[] args) {

      SparkConf conf = new SparkConf();
      conf.setAppName("test");
      conf.setMaster("spark://sunshinee:7077");

      try

      { new JavaSparkContext(conf); }

      catch (Throwable e)

      { System.out.println("Caught an exception : " + e.getMessage()); }

      System.out.println("Waiting to complete...");
      while (true) {
      }
      }

      }

      Output log

      Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      16/03/04 18:01:15 INFO SparkContext: Running Spark version 1.6.0
      16/03/04 18:01:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      16/03/04 18:01:17 WARN Utils: Your hostname, pesamara-mobl-vm1 resolves to a loopback address: 127.0.0.1; using 10.30.9.107 instead (on interface eth0)
      16/03/04 18:01:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
      16/03/04 18:01:18 INFO SecurityManager: Changing view acls to: ps40233
      16/03/04 18:01:18 INFO SecurityManager: Changing modify acls to: ps40233
      16/03/04 18:01:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ps40233); users with modify permissions: Set(ps40233)
      16/03/04 18:01:19 INFO Utils: Successfully started service 'sparkDriver' on port 55309.
      16/03/04 18:01:21 INFO Slf4jLogger: Slf4jLogger started
      16/03/04 18:01:21 INFO Remoting: Starting remoting
      16/03/04 18:01:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.30.9.107:52128]
      16/03/04 18:01:22 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 52128.
      16/03/04 18:01:22 INFO SparkEnv: Registering MapOutputTracker
      16/03/04 18:01:22 INFO SparkEnv: Registering BlockManagerMaster
      16/03/04 18:01:22 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-87c20178-357d-4252-a46a-62a755568a98
      16/03/04 18:01:22 INFO MemoryStore: MemoryStore started with capacity 457.7 MB
      16/03/04 18:01:22 INFO SparkEnv: Registering OutputCommitCoordinator
      16/03/04 18:01:23 INFO Utils: Successfully started service 'SparkUI' on port 4040.
      16/03/04 18:01:23 INFO SparkUI: Started SparkUI at http://10.30.9.107:4040
      16/03/04 18:01:24 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
      16/03/04 18:01:24 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077
      java.io.IOException: Failed to connect to sunshinee:7077
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
      at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.nio.channels.UnresolvedAddressException
      at sun.nio.ch.Net.checkAddress(Net.java:123)
      at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)
      at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
      at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
      at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
      at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
      at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
      at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      ... 1 more
      16/03/04 18:01:44 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
      16/03/04 18:01:44 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077
      java.io.IOException: Failed to connect to sunshinee:7077
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
      at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.nio.channels.UnresolvedAddressException
      at sun.nio.ch.Net.checkAddress(Net.java:123)
      at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)
      at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
      at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
      at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
      at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
      at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
      at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      ... 1 more
      16/03/04 18:02:04 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
      16/03/04 18:02:04 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
      16/03/04 18:02:04 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077
      java.io.IOException: Failed to connect to sunshinee:7077
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
      at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.nio.channels.UnresolvedAddressException
      at sun.nio.ch.Net.checkAddress(Net.java:123)
      at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)
      at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
      at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
      at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
      at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
      at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
      at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      ... 1 more
      16/03/04 18:02:24 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
      16/03/04 18:02:24 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077...
      16/03/04 18:02:24 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
      16/03/04 18:02:24 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
      16/03/04 18:02:24 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077
      java.io.IOException: Failed to connect to sunshinee:7077
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
      at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
      at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
      at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.nio.channels.UnresolvedAddressException
      at sun.nio.ch.Net.checkAddress(Net.java:123)
      at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)
      at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
      at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
      at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
      at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
      at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
      at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
      at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
      at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      ... 1 more
      16/03/04 18:02:24 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44298.
      16/03/04 18:02:24 INFO NettyBlockTransferService: Server created on 44298
      16/03/04 18:02:24 INFO BlockManagerMaster: Trying to register BlockManager
      16/03/04 18:02:24 INFO BlockManagerMasterEndpoint: Registering block manager 10.30.9.107:44298 with 457.7 MB RAM, BlockManagerId(driver, 10.30.9.107, 44298)
      16/03/04 18:02:24 INFO BlockManagerMaster: Registered BlockManager
      16/03/04 18:02:24 INFO SparkUI: Stopped Spark web UI at http://10.30.9.107:4040
      16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Shutting down all executors
      16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
      16/03/04 18:02:24 WARN AppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
      16/03/04 18:02:24 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
      java.lang.InterruptedException
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
      at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
      at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
      at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
      at scala.concurrent.Await$.result(package.scala:107)
      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
      at org.apache.spark.deploy.client.AppClient.stop(AppClient.scala:290)
      at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.org$apache$spark$scheduler$cluster$SparkDeploySchedulerBackend$$stop(SparkDeploySchedulerBackend.scala:198)
      at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:101)
      at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:446)
      at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1582)
      at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1731)
      at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229)
      at org.apache.spark.SparkContext.stop(SparkContext.scala:1730)
      at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127)
      at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
      at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134)
      at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163)
      at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      16/03/04 18:02:24 INFO DiskBlockManager: Shutdown hook called
      16/03/04 18:02:24 INFO ShutdownHookManager: Shutdown hook called
      16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52
      16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52/userFiles-db548748-a55c-4406-adcb-c09e63b118bd
      Java Result: 50

        Attachments

          Activity

            People

            • Assignee:
              zsxwing Shixiong Zhu
              Reporter:
              erangac Era
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: