Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.0.0
Description
MemoryStore.putIteratorAsBytes() may silently lose values when used with KryoSerializer because it does not properly close the serialization stream before attempting to deserialize the already-serialized values, which may cause values buffered in Kryo's internal buffers to not be read.
This is the root cause behind a user-reported "wrong answer" bug in PySpark caching reported by Ben Leslie on the Spark user mailing list in a thread titled "pyspark persist MEMORY_ONLY vs MEMORY_AND_DISK")