Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21159

Cluster mode, driver throws connection refused exception submitted by SparkLauncher

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.1.2, 2.2.0
    • Component/s: Spark Core, Spark Submit
    • Labels:
      None
    • Environment:

      Server A-Master
      Server B-Slave

      Description

      When an spark application submitted by SparkLauncher#startApplication method, this will get a SparkAppHandle. In the test environment, the launcher runs on server A, if it runs in Client mode, everything is ok. In cluster mode, the launcher will run on Server A, and the driver will be run on Server B, in this scenario, when initialize SparkContext, a LauncherBackend will try to connect to the launcher application via specified port and ip address. the problem is the implementation of LauncherBackend uses loopback ip to connect which is 127.0.0.1. this will cause the connection refused as server B never ran the launcher.

      The expected behavior is the LauncherBackend should use Server A's Ip address to connect for reporting the running status.

      Below is the stacktrace:
      17/06/20 17:24:37 ERROR SparkContext: Error initializing SparkContext.
      java.net.ConnectException: Connection refused
      at java.net.PlainSocketImpl.socketConnect(Native Method)
      at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
      at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
      at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
      at java.net.Socket.connect(Socket.java:589)
      at java.net.Socket.connect(Socket.java:538)
      at java.net.Socket.<init>(Socket.java:434)
      at java.net.Socket.<init>(Socket.java:244)
      at org.apache.spark.launcher.LauncherBackend.connect(LauncherBackend.scala:43)
      at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.start(StandaloneSchedulerBackend.scala:60)
      at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
      at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
      at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
      at com.asura.grinder.datatask.task.AbstractCommonSparkTask.executeSparkJob(AbstractCommonSparkTask.scala:91)
      at com.asura.grinder.datatask.task.AbstractCommonSparkTask.runSparkJob(AbstractCommonSparkTask.scala:25)
      at com.asura.grinder.datatask.main.TaskMain$.main(TaskMain.scala:61)
      at com.asura.grinder.datatask.main.TaskMain.main(TaskMain.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
      at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
      17/06/20 17:24:37 INFO SparkUI: Stopped Spark web UI at http://172.25.108.62:4040
      17/06/20 17:24:37 INFO StandaloneSchedulerBackend: Shutting down all executors
      17/06/20 17:24:37 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
      17/06/20 17:24:37 ERROR Utils: Uncaught exception in thread main
      java.lang.NullPointerException
      at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.org$apache$spark$scheduler$cluster$StandaloneSchedulerBackend$$stop(StandaloneSchedulerBackend.scala:214)
      at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.stop(StandaloneSchedulerBackend.scala:116)
      at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:467)
      at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1588)
      at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1826)
      at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
      at org.apache.spark.SparkContext.stop(SparkContext.scala:1825)
      at org.apache.spark.SparkContext.<init>(SparkContext.scala:587)
      at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
      at com.asura.grinder.datatask.task.AbstractCommonSparkTask.executeSparkJob(AbstractCommonSparkTask.scala:91)
      at com.asura.grinder.datatask.task.AbstractCommonSparkTask.runSparkJob(AbstractCommonSparkTask.scala:25)
      at com.asura.grinder.datatask.main.TaskMain$.main(TaskMain.scala:61)
      at com.asura.grinder.datatask.main.TaskMain.main(TaskMain.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
      at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
      17/06/20 17:24:37 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
      17/06/20 17:24:37 INFO MemoryStore: MemoryStore cleared
      17/06/20 17:24:37 INFO BlockManager: BlockManager stopped
      17/06/20 17:24:37 INFO BlockManagerMaster: BlockManagerMaster stopped
      17/06/20 17:24:37 WARN MetricsSystem: Stopping a MetricsSystem that is not running
      17/06/20 17:24:37 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
      17/06/20 17:24:37 INFO SparkContext: Successfully stopped SparkContext
      17/06/20 17:24:37 ERROR MongoPilotTask: error occurred group

      {2}

      :task(222)
      java.net.ConnectException: Connection refused
      at java.net.PlainSocketImpl.socketConnect(Native Method)
      at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
      at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
      at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
      at java.net.Socket.connect(Socket.java:589)
      at java.net.Socket.connect(Socket.java:538)
      at java.net.Socket.<init>(Socket.java:434)
      at java.net.Socket.<init>(Socket.java:244)
      at org.apache.spark.launcher.LauncherBackend.connect(LauncherBackend.scala:43)
      at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.start(StandaloneSchedulerBackend.scala:60)
      at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
      at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
      at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
      at com.asura.grinder.datatask.task.AbstractCommonSparkTask.executeSparkJob(AbstractCommonSparkTask.scala:91)
      at com.asura.grinder.datatask.task.AbstractCommonSparkTask.runSparkJob(AbstractCommonSparkTask.scala:25)
      at com.asura.grinder.datatask.main.TaskMain$.main(TaskMain.scala:61)
      at com.asura.grinder.datatask.main.TaskMain.main(TaskMain.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
      at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)

        Attachments

          Activity

            People

            • Assignee:
              vanzin Marcelo Masiero Vanzin
              Reporter:
              teclusky@gmail.com niefei
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: