Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17491

MemoryStore.putIteratorAsBytes() may silently lose values when KryoSerializer is used

    XMLWordPrintableJSON

Details

    Description

      MemoryStore.putIteratorAsBytes() may silently lose values when used with KryoSerializer because it does not properly close the serialization stream before attempting to deserialize the already-serialized values, which may cause values buffered in Kryo's internal buffers to not be read.

      This is the root cause behind a user-reported "wrong answer" bug in PySpark caching reported by Ben Leslie on the Spark user mailing list in a thread titled "pyspark persist MEMORY_ONLY vs MEMORY_AND_DISK")

      Attachments

        Activity

          People

            joshrosen Josh Rosen
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: