Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.4.0
-
None
Description
This is a somewhat obscure bug, but I think that it will seriously impact KryoSerializer users who use custom registrators which disabled auto-reset. When auto-reset is disabled, then this breaks things in some of our shuffle paths which actually end up creating multiple OutputStreams from the same shared SerializerInstance (which is unsafe). To illustrate this, the following test fails in 1.4:
class KryoSerializerAutoResetDisabledSuite extends FunSuite with SharedSparkContext { conf.set("spark.serializer", classOf[KryoSerializer].getName) conf.set("spark.kryo.registrator", classOf[RegistratorWithoutAutoReset].getName) test("sort-shuffle with bypassMergeSort") { val myObject = ("Hello", "World") assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet === Set(myObject)) } }
This was introduced by a patch (SPARK-3386) which enables serializer re-use in some of the shuffle paths, since constructing new serializer instances is actually pretty costly for KryoSerializer. We had already fixed another corner-case (SPARK-7766) bug related to this, but missed this one. From an engineering risk management perspective, we probably should have just reverted the original serializer reuse patch and added a big cross-product-of-configurations-and-shuffle-managers test suite before attempting to fix the defects.
I think that I have a pretty simple fix for this, but we still might want to consider a revert for 1.4 just to be safe.
Attachments
Issue Links
- is broken by
-
SPARK-3386 Reuse serializer and serializer buffer in shuffle block iterator
- Resolved
- links to