Description
Currently we read the pyspark port through stdout of the spark-submit subprocess. However, if there is stdout interference, e.g. spark-submit echoes something unexpected to stdout, we print the following:
Exception: Launching GatewayServer failed! (Warning: unexpected output detected.)
This condition is fine. However, we actually throw the same exception if there is no output from the subprocess as well. This is very confusing because it implies that the subprocess is outputting something (possibly whitespace, which is not visible) when it's actually not.
Attachments
Issue Links
- relates to
-
SPARK-2313 PySpark should accept port via a command line argument rather than STDIN
- Resolved
- links to
Is it the gateway server JVM -> PySpark driver communication that's getting messed up (the step where the Python driver's Java child process launches with some ephemeral port and communicates that port number back to the Python driver)? Wouldn't that imply that the GatewayServer is has some extra logging to stdout that's being printed before it writes the port number?