Details
Description
I have done some experiments when the frameSize is around 10MB .
1) spark.akka.frameSize = 10
If one of the partition size is very close to 10MB, say 9.97MB, the execution blocks without any exception or warning. Worker finished the task to send the serialized result, and then throw exception saying hadoop IPC client connection stops (changing the logging to debug level). However, the master never receives the results and the program just hangs.
But if sizes for all the partitions less than some number btw 9.96MB amd 9.97MB, the program works fine.
2) spark.akka.frameSize = 9
when the partition size is just a little bit smaller than 9MB, it fails as well.
This bug behavior is not exactly what spark-1112 is about.
Attachments
Issue Links
- relates to
-
SPARK-1712 ParallelCollectionRDD operations hanging forever without any error messages
- Resolved