Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1579

PySpark should distinguish expected IOExceptions from unexpected ones in the worker

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.0
    • PySpark
    • None

    Description

      I chatted with adav a bit about this. Right now we drop IOExceptions because they are (in some cases) expected if a Python worker returns before consuming its entire input. The issue is this swallows legitimate IO exceptions when they occur.

      One thought we had was to change the daemon.py file to, instead of closing the socket when the function is over, simply busy-wait on the socket being closed. We'd transfer the responsibility for closing the socket to the Java reader. The Java reader could, when it has finished consuming output form Python, set a flag on a volatile variable to indicate that Python has fully returned, and then close the socket. Then if an IOException is thrown in the write thread, it only swallows the exception if we are expecting it.

      This would also let us remove the warning message right now.

      Attachments

        Issue Links

          Activity

            People

              adav Aaron Davidson
              pwendell Patrick Wendell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: