Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24668

PySpark crashes when getting the webui url if the webui is disabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.3.0, 2.4.0
    • None
    • PySpark
      • Spark 2.3.0
      • Spark-on-YARN
      • Java 8
      • Python 3.6.5
      • Jupyter 4.4.0

    Description

      Repro:

       

      Evaluate `sc` in a Jupyter notebook:

       

       

      ---------------------------------------------------------------------------
      Py4JJavaError                             Traceback (most recent call last)
      /opt/conda/lib/python3.6/site-packages/IPython/core/formatters.py in _call_(self, obj)
          343             method = get_real_method(obj, self.print_method)
          344             if method is not None:
      --> 345                 return method()
          346             return None
          347         else:

      /usr/lib/spark/python/pyspark/context.py in repr_html(self)
          261         </div>
          262         """.format(
      --> 263             sc=self
          264         )
          265 

      /usr/lib/spark/python/pyspark/context.py in uiWebUrl(self)
          373     def uiWebUrl(self):
          374         """Return the URL of the SparkUI instance started by this SparkContext"""
      --> 375         return self._jsc.sc().uiWebUrl().get()
          376 
          377     @property

      /usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in _call_(self, *args)
         1158         answer = self.gateway_client.send_command(command)
         1159         return_value = get_return_value(
      -> 1160             answer, self.gateway_client, self.target_id, self.name)
         1161 
         1162         for temp_arg in temp_args:

      /usr/lib/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
           61     def deco(*a, **kw):
           62         try:
      ---> 63             return f(*a, **kw)
           64         except py4j.protocol.Py4JJavaError as e:
           65             s = e.java_exception.toString()

      /usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
          318                 raise Py4JJavaError(
      {{    319                     "An error occurred while calling {0}

      {1}

      {2}.\n".}}
      --> 320                     format(target_id, ".", name), value)
          321             else:
          322                 raise Py4JError(

      Py4JJavaError: An error occurred while calling o80.get.
      : java.util.NoSuchElementException: None.get
              at scala.None$.get(Option.scala:347)
              at scala.None$.get(Option.scala:345)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
              at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
              at py4j.Gateway.invoke(Gateway.java:282)
              at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
              at py4j.commands.CallCommand.execute(CallCommand.java:79)
              at py4j.GatewayConnection.run(GatewayConnection.java:214)
              at java.lang.Thread.run(Thread.java:748)

       

      PySpark only prints out the web ui url in `repr_html`, not `repr_`, so this only happens in notebooks that render html, not the pyspark shell. https://github.com/apache/spark/commit/f654b39a63d4f9b118733733c7ed2a1b58649e3d

       

      Disabling Spark's UI with `spark.ui.enabled` is valuable outside of tests. A couple reasons that come to mind:

      1) If you run multiple spark applications from one machine, Spark irritatingly starts picking the same port (4040), as the first application, then increments (4041, 4042, etc) until it finds an open port. If you are running 10 spark apps, then the 11th prints out 10 warnings about ports being taken until it finally finds one.

      2) You can serve the spark web ui from a dedicated spark history server instead of per-driver. This is documented here, at least for Spark-on-YARN: https://spark.apache.org/docs/latest/running-on-yarn.html#using-the-spark-history-server-to-replace-the-spark-web-ui.

       

      PySpark should not crash if the web ui is disabled. There are a couple of options:

      1) SparkContext#uiWebUrl() in Scala should return the driver web ui url or the history server url, depending on which one is being used.

      2) PySpark should call getOrElse(None) rather than get().

       

      I strongly prefer option 1), but I can't figure out how to do it in a non-hacky way. In SparkContext.scala, uiWebUrl() comes from `ui.map(.webUrl)`, where `_ui` contains the actual SparkUI if spark.ui.enabled=true.

      1) I could set `_ui` to SparkUI.createHistoryUI(), and then just avoid calling `bind()` on the UI server. I'm not sure what the implications would be for classes outside of SparkContext that use SparkContext#ui.

      2) I could make `_ui` and `uiWebUrl()` inconsistent. `_ui` only contains the in-driver UI and `uiWebUrl()` returns the in-driver or history URL.

       

      I would appreciate some help figuring out how to proceed.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Karthik Palaniappan Karthik Palaniappan
            Holden Karau Holden Karau
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: