Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3140

PySpark start-up throws confusing exception

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.2
    • 1.1.0
    • PySpark
    • None

    Description

      Currently we read the pyspark port through stdout of the spark-submit subprocess. However, if there is stdout interference, e.g. spark-submit echoes something unexpected to stdout, we print the following:

      Exception: Launching GatewayServer failed! (Warning: unexpected output detected.)
      

      This condition is fine. However, we actually throw the same exception if there is no output from the subprocess as well. This is very confusing because it implies that the subprocess is outputting something (possibly whitespace, which is not visible) when it's actually not.

      Attachments

        Issue Links

          Activity

            joshrosen Josh Rosen added a comment -

            Is it the gateway server JVM -> PySpark driver communication that's getting messed up (the step where the Python driver's Java child process launches with some ephemeral port and communicates that port number back to the Python driver)? Wouldn't that imply that the GatewayServer is has some extra logging to stdout that's being printed before it writes the port number?

            joshrosen Josh Rosen added a comment - Is it the gateway server JVM -> PySpark driver communication that's getting messed up (the step where the Python driver's Java child process launches with some ephemeral port and communicates that port number back to the Python driver)? Wouldn't that imply that the GatewayServer is has some extra logging to stdout that's being printed before it writes the port number?
            andrewor14 Andrew Or added a comment -

            Yes, normally it implies exactly that. What I mean is even when there is no stdout printed at all (i.e. even the port is not printed) it still throws this exception because it tries to read the empty string as an int. For the longest time I was looking for where we print out some whitespace in the code, but really it there was simply no stdout.

            andrewor14 Andrew Or added a comment - Yes, normally it implies exactly that. What I mean is even when there is no stdout printed at all (i.e. even the port is not printed) it still throws this exception because it tries to read the empty string as an int. For the longest time I was looking for where we print out some whitespace in the code, but really it there was simply no stdout.
            apachespark Apache Spark added a comment -

            User 'andrewor14' has created a pull request for this issue:
            https://github.com/apache/spark/pull/2067

            apachespark Apache Spark added a comment - User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/2067

            People

              andrewor14 Andrew Or
              andrewor14 Andrew Or
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: