Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-2258

Spark driver not closed on interpreter restart

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.7.0
    • 0.7.1
    • Interpreters
    • None
    • Spark Standalone 2.0.2
      Scoped Interpreter

    Description

      Problem

      Sometimes we experience an issue that when restarting the Spark interpreter, the driver is not shutdown correctly. Due to this, the applications stays submitted to the (standalone) cluster and still consumes resources. We are using the Spark interpreter in scoped mode.

      Looking at the logs there is not much to see, at least no error message. How does Zeppelin terminate the Spark session when restarting the interpreter?

      Example

      Due to some query that is stuck we are restarting the interpreter at 17:27, which was started at 14:01. Instead of restarting it, it attempts to start a new driver which fails to bind to the existing port, binding a new one. The old application stays submitted.

      grep "SparkUI" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log
      
       INFO [2017-03-13 14:01:06,415] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42935.
       INFO [2017-03-13 14:01:06,418] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42935
       WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936.
       INFO [2017-03-13 17:27:58,731] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42936.
       INFO [2017-03-13 17:27:58,737] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42936
      
      grep -B 300 "could not bind on port 42935" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log
      
       INFO [2017-03-13 17:26:19,998] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
       INFO [2017-03-13 17:26:20,000] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
       INFO [2017-03-13 17:26:22,192] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
       INFO [2017-03-13 17:26:22,193] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
       INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
       INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
       INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
       INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
       INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
       INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
       INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
       INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
       INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
       INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
       INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184
       INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184
       INFO [2017-03-13 17:27:52,196] ({Thread-0} RemoteInterpreterServer.java[run]:96) - Starting remote interpreter server on port 47609
       INFO [2017-03-13 17:27:52,561] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter
       INFO [2017-03-13 17:27:52,617] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter
       INFO [2017-03-13 17:27:52,623] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter
       INFO [2017-03-13 17:27:52,626] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.DepInterpreter
       INFO [2017-03-13 17:27:52,673] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1489422472669 started by scheduler interpreter_1754241267
       INFO [2017-03-13 17:27:52,682] ({pool-2-thread-2} PySparkInterpreter.java[createPythonScript]:106) - File /tmp/zeppelin_pyspark-142944822224657232.py created
       INFO [2017-03-13 17:27:56,198] ({pool-2-thread-75} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1489422348563 finished by scheduler interpreter_999576780
       INFO [2017-03-13 17:27:56,583] ({pool-2-thread-2} SparkInterpreter.java[createSparkSession]:350) - ------ Create new SparkContext spark://yyyyyyyyyyyyy:7077 -------
       INFO [2017-03-13 17:27:56,638] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Running Spark version 2.0.2
       WARN [2017-03-13 17:27:57,427] ({pool-2-thread-2} NativeCodeLoader.java[<clinit>]:62) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
       INFO [2017-03-13 17:27:57,664] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls to: zeppelin
       INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls to: zeppelin
       INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls groups to: 
       INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls groups to: 
       INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zeppelin); groups with view permissions: Set(); users  with modify permissions: Set(zeppelin); groups with modify permissions: Set()
       INFO [2017-03-13 17:27:58,148] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'sparkDriver' on port 56861.
       INFO [2017-03-13 17:27:58,179] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering MapOutputTracker
       INFO [2017-03-13 17:27:58,212] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering BlockManagerMaster
       INFO [2017-03-13 17:27:58,234] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Created local directory at /tmp/blockmgr-5dd56028-ee7c-4996-9011-245dcc7f3d6b
       INFO [2017-03-13 17:27:58,254] ({pool-2-thread-2} Logging.scala[logInfo]:54) - MemoryStore started with capacity 4.1 GB
       INFO [2017-03-13 17:27:58,324] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering OutputCommitCoordinator
       INFO [2017-03-13 17:27:58,487] ({pool-2-thread-2} Log.java[initialized]:186) - Logging initialized @7223ms
       INFO [2017-03-13 17:27:58,630] ({pool-2-thread-2} Server.java[doStart]:327) - jetty-9.2.z-SNAPSHOT
       INFO [2017-03-13 17:27:58,656] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@4a482724{/jobs,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,658] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9f092b{/stages,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,660] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,662] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@74d9c119{/storage,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,665] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@704ce48{/executors,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,667] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,668] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,669] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,676] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3224456e{/static,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,677] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9eea220{/,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@70743df{/api,null,AVAILABLE}
       INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,AVAILABLE}
       WARN [2017-03-13 17:27:58,682] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935}: java.net.BindException: Address already in use
       WARN [2017-03-13 17:27:58,686] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED org.spark_project.jetty.server.Server@26ba9b5f: java.net.BindException: Address already in use
       INFO [2017-03-13 17:27:58,689] ({pool-2-thread-2} AbstractConnector.java[doStop]:306) - Stopped ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935}
       INFO [2017-03-13 17:27:58,691] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@70743df{/api,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9eea220{/,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3224456e{/static,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@704ce48{/executors,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@74d9c119{/storage,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9f092b{/stages,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,UNAVAILABLE}
       INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@4a482724{/jobs,null,UNAVAILABLE}
       WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936.
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            frosner Frank Rosner
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment