Description
I chatted with adav a bit about this. Right now we drop IOExceptions because they are (in some cases) expected if a Python worker returns before consuming its entire input. The issue is this swallows legitimate IO exceptions when they occur.
One thought we had was to change the daemon.py file to, instead of closing the socket when the function is over, simply busy-wait on the socket being closed. We'd transfer the responsibility for closing the socket to the Java reader. The Java reader could, when it has finished consuming output form Python, set a flag on a volatile variable to indicate that Python has fully returned, and then close the socket. Then if an IOException is thrown in the write thread, it only swallows the exception if we are expecting it.
This would also let us remove the warning message right now.
Attachments
Issue Links
- is related to
-
SPARK-1019 pyspark RDD take() throws NPE
- Resolved