Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.0, 2.4.0
-
None
-
- Spark 2.3.0
- Spark-on-YARN
- Java 8
- Python 3.6.5
- Jupyter 4.4.0
Description
Repro:
Evaluate `sc` in a Jupyter notebook:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/IPython/core/formatters.py in _call_(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
/usr/lib/spark/python/pyspark/context.py in repr_html(self)
261 </div>
262 """.format(
--> 263 sc=self
264 )
265
/usr/lib/spark/python/pyspark/context.py in uiWebUrl(self)
373 def uiWebUrl(self):
374 """Return the URL of the SparkUI instance started by this SparkContext"""
--> 375 return self._jsc.sc().uiWebUrl().get()
376
377 @property
/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in _call_(self, *args)
1158 answer = self.gateway_client.send_command(command)
1159 return_value = get_return_value(
-> 1160 answer, self.gateway_client, self.target_id, self.name)
1161
1162 for temp_arg in temp_args:
/usr/lib/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
318 raise Py4JJavaError(
{{ 319 "An error occurred while calling {0}
{2}.\n".}}
--> 320 format(target_id, ".", name), value)
321 else:
322 raise Py4JError(
Py4JJavaError: An error occurred while calling o80.get.
: java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
PySpark only prints out the web ui url in `repr_html`, not `repr_`, so this only happens in notebooks that render html, not the pyspark shell. https://github.com/apache/spark/commit/f654b39a63d4f9b118733733c7ed2a1b58649e3d
Disabling Spark's UI with `spark.ui.enabled` is valuable outside of tests. A couple reasons that come to mind:
1) If you run multiple spark applications from one machine, Spark irritatingly starts picking the same port (4040), as the first application, then increments (4041, 4042, etc) until it finds an open port. If you are running 10 spark apps, then the 11th prints out 10 warnings about ports being taken until it finally finds one.
2) You can serve the spark web ui from a dedicated spark history server instead of per-driver. This is documented here, at least for Spark-on-YARN: https://spark.apache.org/docs/latest/running-on-yarn.html#using-the-spark-history-server-to-replace-the-spark-web-ui.
PySpark should not crash if the web ui is disabled. There are a couple of options:
1) SparkContext#uiWebUrl() in Scala should return the driver web ui url or the history server url, depending on which one is being used.
2) PySpark should call getOrElse(None) rather than get().
I strongly prefer option 1), but I can't figure out how to do it in a non-hacky way. In SparkContext.scala, uiWebUrl() comes from `ui.map(.webUrl)`, where `_ui` contains the actual SparkUI if spark.ui.enabled=true.
1) I could set `_ui` to SparkUI.createHistoryUI(), and then just avoid calling `bind()` on the UI server. I'm not sure what the implications would be for classes outside of SparkContext that use SparkContext#ui.
2) I could make `_ui` and `uiWebUrl()` inconsistent. `_ui` only contains the in-driver UI and `uiWebUrl()` returns the in-driver or history URL.
I would appreciate some help figuring out how to proceed.