[SPARK-39140] JavaSerializer doesn't serialize the fields of superclass - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

To reproduce:

abstract class AA {
  val ts = System.nanoTime()
}
case class BB(x: Int) extends AA {
}

val input = BB(1)
println("original ts: " + input.ts)

val javaSerializer = new JavaSerializer(new SparkConf())
val javaInstance = javaSerializer.newInstance()
val bytes1 = javaInstance.serialize[BB](input)
val obj1 = javaInstance.deserialize[BB](bytes1)
println("deserialization result from java: " + obj1.ts)

val kryoSerializer = new KryoSerializer(new SparkConf())
val kryoInstance = kryoSerializer.newInstance()
val bytes2 = kryoInstance.serialize[BB](input)
val obj2 = kryoInstance.deserialize[BB](bytes2)
println("deserialization result from kryo: " + obj2.ts)

The output is

original ts: 115014173658666
deserialization result from java: 115014306794333
deserialization result from kryo: 115014173658666

We can see that the fields from the superclass AA are not serialized with JavaSerializer. When switching to KryoSerializer, it works.

This caused bugs in the project ~~SPARK-38615~~: TreeNode.origin with actual information is not serialized to executors when a plan can't be executed with whole-staged-codegen.

It could also lead to bugs in serializing the lambda function within RDD API like

mapPartitions/mapPartitionsWithIndex/etc.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Gengliang Wang

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 10/May/22 15:10

Updated:: 11/May/22 03:49

Resolved:: 11/May/22 03:47