[SPARK-3140] PySpark start-up throws confusing exception - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.0.2
Fix Version/s: 1.1.0
Component/s: PySpark
Labels:
None

Target Version/s:

1.1.0

Description

Currently we read the pyspark port through stdout of the spark-submit subprocess. However, if there is stdout interference, e.g. spark-submit echoes something unexpected to stdout, we print the following:

Exception: Launching GatewayServer failed! (Warning: unexpected output detected.)

This condition is fine. However, we actually throw the same exception if there is no output from the subprocess as well. This is very confusing because it implies that the subprocess is outputting something (possibly whitespace, which is not visible) when it's actually not.

Attachments

Issue Links

relates to

SPARK-2313 PySpark should accept port via a command line argument rather than STDIN

Resolved

links to

[Github] Pull Request #2067 (andrewor14)

Activity

Ascending order - Click to sort in descending order

Josh Rosen added a comment - 20/Aug/14 17:21

Is it the gateway server JVM -> PySpark driver communication that's getting messed up (the step where the Python driver's Java child process launches with some ephemeral port and communicates that port number back to the Python driver)? Wouldn't that imply that the GatewayServer is has some extra logging to stdout that's being printed before it writes the port number?

Josh Rosen added a comment - 20/Aug/14 17:21 Is it the gateway server JVM -> PySpark driver communication that's getting messed up (the step where the Python driver's Java child process launches with some ephemeral port and communicates that port number back to the Python driver)? Wouldn't that imply that the GatewayServer is has some extra logging to stdout that's being printed before it writes the port number?

Andrew Or added a comment - 20/Aug/14 18:13

Yes, normally it implies exactly that. What I mean is even when there is no stdout printed at all (i.e. even the port is not printed) it still throws this exception because it tries to read the empty string as an int. For the longest time I was looking for where we print out some whitespace in the code, but really it there was simply no stdout.

Andrew Or added a comment - 20/Aug/14 18:13 Yes, normally it implies exactly that. What I mean is even when there is no stdout printed at all (i.e. even the port is not printed) it still throws this exception because it tries to read the empty string as an int. For the longest time I was looking for where we print out some whitespace in the code, but really it there was simply no stdout.

Apache Spark added a comment - 20/Aug/14 22:17

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/2067

Apache Spark added a comment - 20/Aug/14 22:17 User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/2067

People

Assignee:: Andrew Or

Reporter:: Andrew Or

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Aug/14 22:29

Updated:: 05/Nov/14 10:45

Resolved:: 25/Aug/14 19:01