Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23240

PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.1
    • 2.4.0
    • PySpark
    • None

    Description

      Environmental issues or site-local customizations (i.e., sitecustomize.py present in the python install directory) can interfere with daemon.py’s output to stdout. PythonWorkerFactory produces unhelpful messages when this happens, causing some head scratching before the actual issue is determined.

      Case #1: Extraneous data in pyspark.daemon’s stdout. In this case, PythonWorkerFactory uses the output as the daemon’s port number and ends up throwing an exception when creating the socket:

      java.lang.IllegalArgumentException: port out of range:1819239265
      	at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
      	at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
      	at java.net.Socket.<init>(Socket.java:244)
      	at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:78)
      

      Case #2: No data in pyspark.daemon’s stdout. In this case, PythonWorkerFactory throws an EOFException exception reading the from the Process input stream.

      The second case is somewhat less mysterious than the first, because PythonWorkerFactory also displays the stderr from the python process.

      When there is unexpected or missing output in pyspark.daemon’s stdout, PythonWorkerFactory should say so.

       

      Attachments

        Activity

          People

            bersprockets Bruce Robbins
            bersprockets Bruce Robbins
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: