Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.7.0
-
None
-
Spark Standalone 2.0.2
Scoped Interpreter
Description
Problem
Sometimes we experience an issue that when restarting the Spark interpreter, the driver is not shutdown correctly. Due to this, the applications stays submitted to the (standalone) cluster and still consumes resources. We are using the Spark interpreter in scoped mode.
Looking at the logs there is not much to see, at least no error message. How does Zeppelin terminate the Spark session when restarting the interpreter?
Example
Due to some query that is stuck we are restarting the interpreter at 17:27, which was started at 14:01. Instead of restarting it, it attempts to start a new driver which fails to bind to the existing port, binding a new one. The old application stays submitted.
grep "SparkUI" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log INFO [2017-03-13 14:01:06,415] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42935. INFO [2017-03-13 14:01:06,418] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42935 WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936. INFO [2017-03-13 17:27:58,731] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42936. INFO [2017-03-13 17:27:58,737] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42936
grep -B 300 "could not bind on port 42935" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log INFO [2017-03-13 17:26:19,998] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:26:20,000] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:26:22,192] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:26:22,193] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:52,196] ({Thread-0} RemoteInterpreterServer.java[run]:96) - Starting remote interpreter server on port 47609 INFO [2017-03-13 17:27:52,561] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter INFO [2017-03-13 17:27:52,617] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter INFO [2017-03-13 17:27:52,623] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter INFO [2017-03-13 17:27:52,626] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.DepInterpreter INFO [2017-03-13 17:27:52,673] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1489422472669 started by scheduler interpreter_1754241267 INFO [2017-03-13 17:27:52,682] ({pool-2-thread-2} PySparkInterpreter.java[createPythonScript]:106) - File /tmp/zeppelin_pyspark-142944822224657232.py created INFO [2017-03-13 17:27:56,198] ({pool-2-thread-75} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1489422348563 finished by scheduler interpreter_999576780 INFO [2017-03-13 17:27:56,583] ({pool-2-thread-2} SparkInterpreter.java[createSparkSession]:350) - ------ Create new SparkContext spark://yyyyyyyyyyyyy:7077 ------- INFO [2017-03-13 17:27:56,638] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Running Spark version 2.0.2 WARN [2017-03-13 17:27:57,427] ({pool-2-thread-2} NativeCodeLoader.java[<clinit>]:62) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable INFO [2017-03-13 17:27:57,664] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls to: zeppelin INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls to: zeppelin INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls groups to: INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls groups to: INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zeppelin); groups with view permissions: Set(); users with modify permissions: Set(zeppelin); groups with modify permissions: Set() INFO [2017-03-13 17:27:58,148] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'sparkDriver' on port 56861. INFO [2017-03-13 17:27:58,179] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering MapOutputTracker INFO [2017-03-13 17:27:58,212] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering BlockManagerMaster INFO [2017-03-13 17:27:58,234] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Created local directory at /tmp/blockmgr-5dd56028-ee7c-4996-9011-245dcc7f3d6b INFO [2017-03-13 17:27:58,254] ({pool-2-thread-2} Logging.scala[logInfo]:54) - MemoryStore started with capacity 4.1 GB INFO [2017-03-13 17:27:58,324] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering OutputCommitCoordinator INFO [2017-03-13 17:27:58,487] ({pool-2-thread-2} Log.java[initialized]:186) - Logging initialized @7223ms INFO [2017-03-13 17:27:58,630] ({pool-2-thread-2} Server.java[doStart]:327) - jetty-9.2.z-SNAPSHOT INFO [2017-03-13 17:27:58,656] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@4a482724{/jobs,null,AVAILABLE} INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,AVAILABLE} INFO [2017-03-13 17:27:58,658] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9f092b{/stages,null,AVAILABLE} INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,660] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,AVAILABLE} INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,AVAILABLE} INFO [2017-03-13 17:27:58,662] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@74d9c119{/storage,null,AVAILABLE} INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,AVAILABLE} INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,665] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,AVAILABLE} INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@704ce48{/executors,null,AVAILABLE} INFO [2017-03-13 17:27:58,667] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,668] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,AVAILABLE} INFO [2017-03-13 17:27:58,669] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,676] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3224456e{/static,null,AVAILABLE} INFO [2017-03-13 17:27:58,677] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9eea220{/,null,AVAILABLE} INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@70743df{/api,null,AVAILABLE} INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,AVAILABLE} WARN [2017-03-13 17:27:58,682] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935}: java.net.BindException: Address already in use WARN [2017-03-13 17:27:58,686] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED org.spark_project.jetty.server.Server@26ba9b5f: java.net.BindException: Address already in use INFO [2017-03-13 17:27:58,689] ({pool-2-thread-2} AbstractConnector.java[doStop]:306) - Stopped ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935} INFO [2017-03-13 17:27:58,691] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@70743df{/api,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9eea220{/,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3224456e{/static,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@704ce48{/executors,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@74d9c119{/storage,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9f092b{/stages,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@4a482724{/jobs,null,UNAVAILABLE} WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936.
Attachments
Attachments
Issue Links
- duplicates
-
ZEPPELIN-1832 Zombie Interpreter processes
- Resolved